Python implementation notes¶
The Python implementation of JSON-delta consists of a package
json_delta
, whose top-level namespace is documented in
The JSON-delta API. The implementation is divided into five sub-modules of
the package, whose names all begin with an underscore to highlight the
fact that they are not part of the API: the way the functions
documented in The JSON-delta API are implemented is subject to refactoring
at any time. Nevertheless, the sub-modules are documented here.
json_delta._diff¶
Functions for computing JSON-format diffs.
-
json_delta._diff.
diff
(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None)¶ Compose a sequence of diff stanzas sufficient to convert the structure
left_struc
into the structureright_struc
. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align
: Useneedle_diff()
to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff'
,needle_diff()
will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths
: If at any level[[key, right_struc]]
can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_struc
andright_struc
. May result in smaller diffs.common_key_threshold
: Skip recursion intoleft_struc
andright_struc
if the fraction of keys they have in common (as computed bycommonality()
, which see) is less than this parm (which should be a float between0.0
and1.0
).verbose
: Print compression statistics to stderr.
The parameter
key
is present because this function is mutually recursive withneedle_diff()
andkeyset_diff()
. If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.
append_key
(stanzas, left_struc, keypath=())¶ Get the appropriate key for appending to the sequence
left_struc
.stanzas
should be a diff, some of whose stanzas may modify a sequenceleft_struc
that appears at pathkeypath
. If any of the stanzas append toleft_struc
, the return value is the largest index inleft_struc
they address, plus one. Otherwise, the return value islen(left_struc)
(i.e. the index that a value would have if it was appended toleft_struc
).>>> append_key([], []) 0 >>> append_key([[[2], 'Baz']], ['Foo', 'Bar']) 3 >>> append_key([[[2], 'Baz'], [['Quux', 0], 'Foo']], [], ['Quux']) 1
-
json_delta._diff.
commonality
(left_struc, right_struc)¶ Return a float between
0.0
and1.0
representing the amount that the structuresleft_struc
andright_struc
have in common.Return value is computed as the fraction (elements in common) / (total elements).
-
json_delta._diff.
compute_diff_stats
(target, diff, percent=True)¶ Calculate the size of a minimal JSON dump of
target
anddiff
, and the ratio of the two sizes.The ratio is expressed as a percentage if
percent
isTrue
in a Boolean context, or as a float otherwise.Return value is a tuple of the form
({ratio}, {size of target}, {size of diff})
>>> compute_diff_stats([{}, 'foo', 'bar'], [], False) (0.125, 16, 2) >>> compute_diff_stats([{}, 'foo', 'bar'], [[0], {}]) (50.0, 16, 8)
-
json_delta._diff.
compute_keysets
(left_seq, right_seq)¶ Compare the keys of
left_seq
vs.right_seq
.Determines which keys
left_seq
andright_seq
have in common, and which are unique to each of the structures. Arguments should be instances of the same basic type, which must be a non-terminal: i.e.list
ordict
. If they are lists, the keys compared will be integer indices.- Returns:
- Return value is a 3-tuple of sets
({overlap}, {left_only}, {right_only})
. As their names suggest,overlap
is a set of keysleft_seq
have in common,left_only
represents keys only found inleft_seq
, andright_only
holds keys only found inright_seq
. - Raises:
- AssertionError if
left_seq
is not an instance oftype(right_seq)
, or if they are not of a non-terminal type.
>>> (compute_keysets({'foo': None}, {'bar': None}) ... == (set([]), {'foo'}, {'bar'})) True >>> (compute_keysets({'foo': None, 'baz': None}, ... {'bar': None, 'baz': None}) ... == ({'baz'}, {'foo'}, {'bar'})) True >>> (compute_keysets(['foo', 'baz'], ['bar', 'baz']) ... == ({0, 1}, set([]), set([]))) True >>> compute_keysets(['foo'], ['bar', 'baz']) == ({0}, set([]), {1}) True >>> compute_keysets([], ['bar', 'baz']) == (set([]), set([]), {0, 1}) True
-
json_delta._diff.
diff
(left_struc, right_struc, array_align=True, compare_lengths=True, common_key_threshold=0.0, verbose=True, key=None) Compose a sequence of diff stanzas sufficient to convert the structure
left_struc
into the structureright_struc
. (Whether you can add ‘necessary and’ to ‘sufficient to’ depends on the setting of the other parms, and how many cycles you want to burn; see below).- Optional parameters:
array_align
: Useneedle_diff()
to compute deltas between arrays. Computationally expensive, but likely to produce shorter diffs. If this parm is set to the string'udiff'
,needle_diff()
will optimize for the shortest udiff, instead of the shortest JSON-format diff. Otherwise, set to any value that is true in a Boolean context to enable.compare_lengths
: If at any level[[key, right_struc]]
can be encoded as a shorter JSON-string, return it instead of examining the internal structure ofleft_struc
andright_struc
. May result in smaller diffs.common_key_threshold
: Skip recursion intoleft_struc
andright_struc
if the fraction of keys they have in common (as computed bycommonality()
, which see) is less than this parm (which should be a float between0.0
and1.0
).verbose
: Print compression statistics to stderr.
The parameter
key
is present because this function is mutually recursive withneedle_diff()
andkeyset_diff()
. If set to a list, it will be prefixed to every keypath in the output.
-
json_delta._diff.
sort_stanzas
(stanzas)¶ Sort the stanzas in a diff.
Object changes can occur in any order, but deletions from arrays have to happen last node first:
['foo', 'bar', 'baz']
→['foo', 'bar']
→['foo']
→[]
; additions to arrays have to happen leftmost-node-first:[]
→['foo']
→['foo', 'bar']
→['foo', 'bar', 'baz']
, and insert-and-shift alterations to arrays must happen last:['foo', 'quux']
→['foo', 'bar', 'quux']
→['foo', 'bar', 'baz', 'quux']
.Finally, stanzas are sorted in descending order of length of keypath, so that the most deeply-nested structures are altered before alterations which might change their keypaths take place.
Note that this will also sort changes to objects (dicts) so that they occur first of all.
-
json_delta._diff.
split_diff
(stanzas)¶ Split a diff into modifications, deletions and insertions.
Return value is a 4-tuple of lists: the first is a list of stanzas from
stanzas
that modify JSON objects, the second is a list of stanzas that add or change elements in JSON arrays, the third is a list of stanzas which delete elements from arrays, and the fourth is a list of stanzas which insert elements into arrays (stanzas ending in"i"
).
-
json_delta._diff.
structure_comparable
(left_struc, right_struc)¶ Test if
left_struc
andright_struc
can be efficiently diffed.
-
json_delta._diff.
this_level_diff
(left_struc, right_struc, key=None, common=None)¶ Return a sequence of diff stanzas between the structures
left_struc
andright_struc
, assuming that they are each at the key-pathkey
within the overall structure.>>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}) ... == [[['baz']]]) True >>> (this_level_diff({'foo': 'bar', 'baz': 'quux'}, ... {'foo': 'bar'}, ['quordle']) ... == [[['quordle', 'baz']]]) True
json_delta._patch¶
Functions for applying JSON-format patches.
-
json_delta._patch.
patch
(struc, diff, in_place=True)¶ Apply the sequence of diff stanzas
diff
to the structurestruc
.By default, this function modifies
struc
in place; setin_place
toFalse
to return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.
patch
(struc, diff, in_place=True) Apply the sequence of diff stanzas
diff
to the structurestruc
.By default, this function modifies
struc
in place; setin_place
toFalse
to return a patched copy of struc instead:>>> will_change = [16] >>> wont_change = [16] >>> patch(will_change, [[[0]]]) [] >>> will_change [] >>> patch(wont_change, [[[0]]], False) [] >>> wont_change [16]
-
json_delta._patch.
patch_stanza
(struc, stanza)¶ Applies the stanza
stanza
to the structurestruc
as a patch.Note that this function modifies
struc
in-place into the target ofstanza
. Ifstruc
is atuple()
, you get a new tuple with the appropriate modification made:>>> patch_stanza((17, 3.141593, None), [[1], 3.14159265]) (17, 3.14159265, None)
json_delta._udiff¶
Functions for computing udiffs. Main entry point: udiff()
.
The data structure representing a udiff that these functions all
manipulate is a pair of lists of iterators (left_lines,
right_lines)
. These lists are expected (principally by
generate_udiff_lines()
, which processes them), to be of the
same length. A pair of iterators (left_lines[i], right_lines[i])
may yield exactly the same sequence of output lines, each with ' '
as the first character (representing parts of the structure the input
and output have in common). Alternatively, they may each yield zero
or more lines (referring to parts of the structure that are unique to
the inputs they represent). In this case, all lines yielded by
left_lines[i]
should begin with '-'
, and all lines yielded by
right_lines[i]
should begin with '+'
.
-
json_delta._udiff.
udiff
(left, right, patch=None, indent=0, use_ellipses=True, entry=True)¶ Render the difference between the structures
left
andright
as a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
left
andright
. If such a diff has already been computed (e.g. by callingdiff()
), pass it as thepatch
parameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
left
andright
are abbreviated using'...'
by default. To disable this behavior, setuse_ellipses
toFalse
.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
class
json_delta._udiff.
Gap
¶ Class to represent gaps introduced by sequence alignment.
-
json_delta._udiff.
add_matter
(seq, matter, indent)¶ Add material to
seq
, treating it appropriately for its type.matter
may be an iterator, in which case it is appended toseq
. If it is a sequence, it is assumed to be a sequence of iterators, the sequence is concatenated ontoseq
. Ifmatter
is a string, it is turned into a patch band usingsingle_patch_band()
, which is appended. Finally, ifmatter
isNone
, an empty iterable is appended toseq
.This function is a udiff-forming primitive, called by more specific functions defined within
udiff_dict()
andudiff_list()
.
-
json_delta._udiff.
commafy
(gen, comma=True)¶ Yield from
gen
, ensuring that the final result ends with a comma iffcomma
isTrue
.>>> gen = ['Example line'] >>> next(commafy(iter(gen))) == 'Example line,' True >>> next(commafy(iter(gen), False)) == 'Example line' True >>> gen = ['Line with a comma at the end,'] >>> (next(commafy(iter(gen), comma=True)) ... == next(commafy(iter(gen), comma=False)) ... == 'Line with a comma at the end,') True
-
json_delta._udiff.
curry_functions
(local_ns)¶ Create partials of
_add_common_matter()
,_add_differing_matter()
and_commafy_last()
, with values forleft_lines
,right_lines
and (where appropriate)indent
taken from the dictionarylocal_ns
.Appropriate defaults are also included in the partials, namely
left=None
andright=None
for_add_differing_matter()
andleft_comma=True
andright_comma=None
for_commafy_last()
.
-
json_delta._udiff.
generate_udiff_lines
(left, right)¶ Combine the diff lines from
left
andright
, and generate the lines of the resulting udiff.
-
json_delta._udiff.
patch_bands
(indent, material, sigil=u' ')¶ Generate appropriately indented patch bands, with
sigil
as the first character.
-
json_delta._udiff.
reconstruct_alignment
(left, right, stanzas)¶ Reconstruct the sequence alignment between the lists
left
andright
implied bystanzas
.
-
json_delta._udiff.
single_patch_band
(indent, line, sigil=u' ')¶ Convenience function returning an iterable that generates a single patch band.
-
json_delta._udiff.
udiff
(left, right, patch=None, indent=0, use_ellipses=True, entry=True) Render the difference between the structures
left
andright
as a string in a fashion inspired by diff -u.Generating a udiff is strictly slower than generating a normal diff with the same option parameters, since the udiff is computed on the basis of a normal diff between
left
andright
. If such a diff has already been computed (e.g. by callingdiff()
), pass it as thepatch
parameter:>>> (next(udiff({"foo": None}, {"foo": None}, patch=[])) == ... ' {...}') True
As you can see above, structures that are identical in
left
andright
are abbreviated using'...'
by default. To disable this behavior, setuse_ellipses
toFalse
.>>> ('\n'.join(udiff({"foo": None}, {"foo": None}, ... patch=[], use_ellipses=False)) == ... """ { ... "foo": ... null ... }""") True
>>> ('\n'.join(udiff([None, None, None], [None, None, None], ... patch=[], use_ellipses=False)) == ... """ [ ... null, ... null, ... null ... ]""") True
-
json_delta._udiff.
udiff_dict
(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
left
andright
.This function probably shouldn’t be called directly. Instead, use
udiff()
with the same arguments.udiff()
andudiff_dict()
are mutually recursive, anyway.
-
json_delta._udiff.
udiff_list
(left, right, stanzas, indent=0, use_ellipses=True)¶ Construct a human-readable delta between
left
andright
.This function probably shouldn’t be called directly. Instead, use
udiff()
with the same arguments.udiff()
andudiff_list()
are mutually recursive, anyway.
json_delta._upatch¶
-
json_delta._upatch.
upatch
(struc, udiff, reverse=False, in_place=True)¶ Apply a patch as output by
json_delta.udiff()
tostruc
.As with
json_delta.patch()
,struc
is modified in place by default. Set the parmin_place
toFalse
if this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiff
is the output ofudiff(left, right)
, you can reconstructright
givenleft
andudiff
(by runningupatch(left, udiff)
), or you can also reconstructleft
givenright
and udiff (by runningupatch(right, udiff, reverse=True)
). This is not possible for JSON-format diffs, since a[keypath]
stanza (meaning “delete the structure atkeypath
”) does not record what the deleted structure was.
-
json_delta._upatch.
ellipsis_handler
(jstring, point, key)¶ Extends
key_tracker()
to handle the…
construction.
-
json_delta._upatch.
is_none_key
(key)¶ Is the last element of
key
None
?
-
json_delta._upatch.
reconstruct_diff
(udiff, reverse=False)¶ Turn a udiff back into a JSON-format diff.
Set
reverse
toTrue
to generate a reverse diff (i.e. swap the significance of line-initial+
and-
).Header lines (if present) are ignored:
>>> udiff = """--- <stdin> ... +++ <stdin> ... -false ... +true""" >>> reconstruct_diff(udiff) [[[], True]] >>> reconstruct_diff(udiff, reverse=True) [[[], False]]
-
json_delta._upatch.
skip_key
(point, key, origin, keys, predicate)¶ Find the next result in
keys
for whichpredicate(key)
isFalse
.If none is found, or if
key
is already such a result, the return value is(point, key)
.
-
json_delta._upatch.
sort_stanzas
(stanzas)¶ Sorts the stanzas in a diff.
reconstruct_diff()
works on different assumptions fromjson_delta._diff.needle_diff()
when it comes to stanzas altering arrays: keys in such stanzas relate to the element’s position within the array’s longest intermediate representation during the transformation (that is after all insert-and-shifts, after all appends, but before any deletions). This function sortsstanzas
to reflect that order of operations.As with
json_delta._diff.sort_stanzas()
(which see), stanzas are sorted for length so the most deeply-nested structures get their modifications first.
-
json_delta._upatch.
udiff_key_tracker
(udiff, point=0, start_key=None)¶ Find points within the udiff where the active keypath changes.
-
json_delta._upatch.
upatch
(struc, udiff, reverse=False, in_place=True) Apply a patch as output by
json_delta.udiff()
tostruc
.As with
json_delta.patch()
,struc
is modified in place by default. Set the parmin_place
toFalse
if this is not the desired behaviour.The udiff format has enough information in it that this transformation can be applied in reverse: i.e. if
udiff
is the output ofudiff(left, right)
, you can reconstructright
givenleft
andudiff
(by runningupatch(left, udiff)
), or you can also reconstructleft
givenright
and udiff (by runningupatch(right, udiff, reverse=True)
). This is not possible for JSON-format diffs, since a[keypath]
stanza (meaning “delete the structure atkeypath
”) does not record what the deleted structure was.
json_delta._util¶
Utility functions and constants used by more than one submodule.
The majority of python 2/3 compatibility shims also appear in this module.
-
json_delta._util.
predicate_count
(iterable, predicate=lambda x: True)¶ Count items
x
initerable
such thatpredicate(x)
.The default
predicate
islambda x: True
, sopredicate_count(iterable)
will count the values generated byiterable
. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.
uniquify
(bytestring, key=lambda x: x)¶ Remove duplicate elements from a list while preserving order.
key
works as formin()
,max()
, etc. in the standard library.
-
json_delta._util.
sniff_encoding
(bytestring, starts=JSON_STARTS, complete=True)¶ Determine the encoding of a UTF-x encoded string.
The argument
starts
must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()
for a function that can build such a mapping).The
complete
flag signifies whether the input represents the entire string: if it is setFalse
, the function will attempt to determine the encoding, but will raise aUnicodeError
if it is ambiguous. For example, an input ofb'\xff\xfe'
could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util.
_load_and_func
(func, parm1=None, parm2=None, both=None, **flags)¶ Decode JSON-serialized parameters and apply func to them.
-
json_delta._util.
all_paths
(struc)¶ Generate key-paths to every node in
struc
.Both terminal and non-terminal nodes are visited, like so:
>>> paths = [x for x in all_paths({'foo': None, 'bar': ['baz', 'quux']})] >>> [] in paths # ([] is the path to ``struc`` itself.) True >>> ['foo'] in paths True >>> ['bar'] in paths True >>> ['bar', 0] in paths True >>> ['bar', 1] in paths True >>> len(paths) 5
-
json_delta._util.
check_diff_structure
(diff)¶ Return
diff
(orTrue
) if it is structured as a sequence ofdiff
stanzas. Otherwise returnFalse
.[]
is a valid diff, so if it is passed to this function, the return value isTrue
, so that the return value is always true in a Boolean context ifdiff
is valid.>>> check_diff_structure('This is certainly not a diff!') False >>> check_diff_structure([]) True >>> check_diff_structure([None]) False >>> example_valid_diff = [[["foo", 6, 12815316313, "bar"], None]] >>> check_diff_structure(example_valid_diff) == example_valid_diff True >>> check_diff_structure([[["foo", 6, 12815316313, "bar"], None], ... [["foo", False], True]]) False
-
json_delta._util.
compact_json_dumps
(obj)¶ Compute the most compact possible JSON representation of
obj
.>>> test = { ... 'foo': 'bar', ... 'baz': ... ['quux', 'spam', ... 'eggs'] ... } >>> compact_json_dumps(test) in ( ... '{"foo":"bar","baz":["quux","spam","eggs"]}', ... '{"baz":["quux","spam","eggs"],"foo":"bar"}' ... ) True >>>
-
json_delta._util.
decode_json
(file_or_str)¶ Decode a JSON file-like object or string.
The following doctest is probably pointless as documentation. It is here so json-delta can claim 100% code coverage for its test suite!
>>> try: ... from StringIO import StringIO ... except ImportError: ... from io import StringIO >>> foo = '[]' >>> decode_json(foo) [] >>> decode_json(StringIO(foo)) []
-
json_delta._util.
decode_udiff
(file_or_str)¶ Decode a file-like object or bytestring udiff into a unicode string.
The udiff may be encoded in UTF-8, -16 or -32 (with or without BOM):
>>> udiff = u'- true\n+ false' >>> decode_udiff(udiff.encode('utf_32_be')) == udiff True >>> try: ... from StringIO import StringIO ... except ImportError: ... from io import BytesIO as StringIO >>> decode_udiff(StringIO(udiff.encode('utf-8-sig'))) == udiff True
An empty string is a valid udiff; this function will convert it to a unicode string:
>>> decode_udiff(b'') == u'' True
The function is idempotent: if you pass it a unicode string, it will be returned unmodified:
>>> decode_udiff(udiff) is udiff True
If you pass it a non-empty bytestring that cannot be interpreted as beginning with
' '
,'+'
,'-'
or a BOM in any encoding, aValueError
is raised:>>> decode_udiff(b':-)') Traceback (most recent call last): ... ValueError: String does not begin with any of the specified start chars.
-
json_delta._util.
follow_path
(struc, path)¶ Retrieve the value found at the key-path
path
withinstruc
.
-
json_delta._util.
in_array
(key, accept_None=False)¶ Should the keypath
key
point at a JSON array ([]
)?Works by testing whether
key[-1]
is anint
or (where appropriate)long
:>>> in_array([u'bar', 16]) True >>> import sys >>> sys.version >= '3' or eval("in_array([u'foo', 94L])") True
Returns
False
ifkey
addresses a non-array object…>>> in_array(["foo"]) False >>> in_array([u'bar']) False
…or if
key == []
(as in that case there’s no way of knowing whetherkey
addresses an object or an array).>>> in_array([]) False
If the
accept_None
flag is set, this function will not raise aValueError
ifkey[-1] is None
(keypaths of this form are used bykey_tracker()
, to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_array([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_array([None], True) False >>> in_array([None], accept_None=True) False
Otherwise, a
ValueError
is raised ifkey
is not a valid keypath:>>> keypath = [{str("spam"): str("spam")}, "pickled eggs and spam", 7] >>> in_array(keypath) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[0] == {'spam': 'spam'})
-
json_delta._util.
in_object
(key, accept_None=False)¶ Should the keypath
key
point at a JSON object ({}
)?Works by testing whether
key[-1]
is a string or (where appropriate)unicode()
:>>> in_object(["foo"]) True >>> in_object([u'bar']) True
Returns
False
ifkey
addresses an array…>>> in_object([u'bar', 16]) False >>> import sys >>> False if sys.version >= '3' else eval("in_object([u'bar', 16L])") False
…if
key == []
…>>> in_object([]) False
If the
accept_None
flag is set, this function will also returnTrue
ifkey[-1] is None
(this functionality is used bykey_tracker()
, to signal points within a JSON string where a new object key is expected, but not yet found).>>> in_object([None]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not NoneType (key[0] == None)
>>> in_object([None], True) True >>> in_object([None], accept_None=True) True
Raises a
ValueError
ifkey
is not a valid keypath:>>> in_object(['foo', {}]) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not dict (key[1] == {})
>>> in_object([False, u'foo']) Traceback (most recent call last): ... ValueError: keypath elements must be instances of str, unicode, int or long, not bool (key[0] == False)
-
json_delta._util.
in_x_error
(key, offender)¶ Build the instance of
ValueError
in_object()
andin_array()
raise ifkeypath
is invalid.
-
json_delta._util.
json_bytestring_length
(string)¶ Find the length of the JSON for a string without actually encoding it.
Attempts to give the shortest possible version: encoding as UTF-8 and using escape sequences only where necessary.
-
json_delta._util.
json_length
(obj)¶ Find the length of the JSON for
obj
without actually encoding it.
-
json_delta._util.
key_tracker
(jstring, point=0, start_key=None, special_handler=None)¶ Generate points within
jstring
where the keypath changes.This function also identifies points within objects where a new
key: value
pair is expected, by yielding a pseudo-keypath withNone
as the final element.- Parameters:
jstring
: The JSON string to search.point
: The point to start at.start_key
: The starting keypath.special_handler
: A function for handling extensions to JSON syntax (e.g._upatch.ellipsis_handler()
, used to handle the...
construction in udiffs).
>>> next(key_tracker('{}')) (1, (None,))
-
json_delta._util.
keypath_lengths
(keypaths)¶ Build a dict of lengths of (hashable!) keypaths from a structure.
keypaths
must be a list of all keypaths within a single structure, e.g. as returned byall_paths()
.
-
json_delta._util.
licit_starts
(start_chars=u'{}[]"-0123456789tfn \t\n\r')¶ Compute the bytestrings a UTF-x encoded string can begin with.
This function is intended for encoding detection when the beginning of the encoded string must be one of a limited set of characters, as for JSON or the udiff format. The argument
start_chars
must be an iterable of valid beginnings.
-
json_delta._util.
nearest_of
(string, *subs)¶ Find the index of the substring in
subs
that occurs earliest instring
, orlen(string)
if none of them do.
-
json_delta._util.
predicate_count
(iterable, predicate=<function <lambda>>) Count items
x
initerable
such thatpredicate(x)
.The default
predicate
islambda x: True
, sopredicate_count(iterable)
will count the values generated byiterable
. Note that if the iterable is a generator, this function will exhaust it, and if it is an infinite generator, this function will never return!>>> predicate_count([True] * 16) 16 >>> predicate_count([True, True, False, True, True], lambda x: x) 4
-
json_delta._util.
read_bytestring
(file)¶ Read the contents of
file
as abytes
object.
-
json_delta._util.
skip_string
(jstring, point)¶ Assuming
jstring
is a string, andjstring[point]
is a"
that starts a JSON string, returnx
such thatjstring[x-1]
is the"
that terminates the string.When a
"
is found, it is necessary to check that it is not escaped by a preceding backslash. As a backslash may itself be escaped, this amounts to checking that the number of backslashes immediately preceding the"
is even (counting 0 as an even number):>>> test_string = r'"Fred \"Foonly\" McQuux"' >>> skip_string(test_string, 0) == len(test_string) True >>> backslash = chr(0x5c) >>> dbl_quote = chr(0x22) >>> even_slashes = ((r'"\\\\\\"', json.dumps(backslash * 3)), ... (r'"\\\\"', json.dumps(backslash * 2)), ... (r'"\\"', json.dumps(backslash))) >>> all((json.loads(L) == json.loads(R) for (L, R) in even_slashes)) True >>> all((skip_string(L, 0) == len(L) for (L, R) in even_slashes)) True >>> def cat_dump(*args): return json.dumps(''.join(args)) >>> odd_slashes = ( ... (r'"\\\\\\\" "', cat_dump(backslash * 3, dbl_quote, ' ' * 2)), ... (r'"\\\\\" "', cat_dump(backslash * 2, dbl_quote, ' ' * 4)), ... (r'"\\\" "', cat_dump(backslash * 1, dbl_quote, ' ' * 6)), ... (r'"\" "', cat_dump(dbl_quote, ' ' * 8)), ... ) >>> all((json.loads(L) == json.loads(R) for (L, R) in odd_slashes)) True >>> all((skip_string(L, 0) == 12 for (L, R) in odd_slashes)) True
-
json_delta._util.
sniff_encoding
(bytestring, starts={'\x00\x00\x007': u'utf_32_be', '\x00\n': u'utf_16_be', '\x00\x00\x00\r': u'utf_32_be', '\x00\t': u'utf_16_be', '\x00\x00\x00\t': u'utf_32_be', '\x00\x00\x00\n': u'utf_32_be', '\x00\r': u'utf_16_be', '"\x00\x00\x00': u'utf_32_le', '2\x00': u'utf_16_le', '\x00\x00\x00]': u'utf_32_be', '\xef\xbb\xbf': u'utf_8_sig', '\x00"': u'utf_16_be', ' ': u'utf_8', '\x00 ': u'utf_16_be', '\x00\x00\x00 ': u'utf_32_be', '\x00\x00\x00"': u'utf_32_be', '\x00\x00\x00-': u'utf_32_be', '\x00-': u'utf_16_be', '\x002': u'utf_16_be', '0': u'utf_8', '\x000': u'utf_16_be', '\x001': u'utf_16_be', '\x006': u'utf_16_be', '4': u'utf_8', '\x004': u'utf_16_be', '\x005': u'utf_16_be', '8': u'utf_8', '\x008': u'utf_16_be', '\xff\xfe\x00\x00': u'utf_32', '\x00\x00\x008': u'utf_32_be', '\x00\x00\x001': u'utf_32_be', ']\x00\x00\x00': u'utf_32_le', '-\x00': u'utf_16_le', 'f\x00\x00\x00': u'utf_32_le', '\x00\x00\x00f': u'utf_32_be', '\x00[': u'utf_16_be', '5\x00': u'utf_16_le', 't\x00': u'utf_16_le', '\x00]': u'utf_16_be', ' \x00': u'utf_16_le', '\x00f': u'utf_16_be', '\x00\x00\x00n': u'utf_32_be', '\x00n': u'utf_16_be', '1\x00\x00\x00': u'utf_32_le', '\x00\x00\x00t': u'utf_32_be', 't': u'utf_8', '\x00t': u'utf_16_be', '4\x00\x00\x00': u'utf_32_le', '\x00{': u'utf_16_be', '\x00}': u'utf_16_be', '\x00\x00\xfe\xff': u'utf_32', '7\x00\x00\x00': u'utf_32_le', '0\x00': u'utf_16_le', '8\x00': u'utf_16_le', 'f\x00': u'utf_16_le', '3': u'utf_8', '7': u'utf_8', '{\x00\x00\x00': u'utf_32_le', ']\x00': u'utf_16_le', '\x00\x00\x00}': u'utf_32_be', '\t\x00': u'utf_16_le', '[': u'utf_8', '3\x00': u'utf_16_le', '\x00\x00\x00{': u'utf_32_be', '{': u'utf_8', '-\x00\x00\x00': u'utf_32_le', '\n': u'utf_8', '0\x00\x00\x00': u'utf_32_le', 'n\x00\x00\x00': u'utf_32_le', '6\x00': u'utf_16_le', '\x00\x00\x004': u'utf_32_be', '"': u'utf_8', '3\x00\x00\x00': u'utf_32_le', '\x003': u'utf_16_be', '\x00\x00\x00[': u'utf_32_be', '\x00\x00\x006': u'utf_32_be', '2': u'utf_8', '}\x00': u'utf_16_le', '6\x00\x00\x00': u'utf_32_le', '6': u'utf_8', 't\x00\x00\x00': u'utf_32_le', '\x00\x00\x000': u'utf_32_be', '\x007': u'utf_16_be', '\x00\x00\x002': u'utf_32_be', '9\x00\x00\x00': u'utf_32_le', '\t\x00\x00\x00': u'utf_32_le', '1\x00': u'utf_16_le', '[\x00': u'utf_16_le', '[\x00\x00\x00': u'utf_32_le', '\x009': u'utf_16_be', ' \x00\x00\x00': u'utf_32_le', 'f': u'utf_8', '9\x00': u'utf_16_le', '}\x00\x00\x00': u'utf_32_le', 'n': u'utf_8', '\xfe\xff': u'utf_16', '\t': u'utf_8', '\n\x00\x00\x00': u'utf_32_le', '\r': u'utf_8', '\r\x00\x00\x00': u'utf_32_le', '\n\x00': u'utf_16_le', '4\x00': u'utf_16_le', '-': u'utf_8', '1': u'utf_8', '{\x00': u'utf_16_le', '5': u'utf_8', '9': u'utf_8', '\xff\xfe': u'utf_16', '2\x00\x00\x00': u'utf_32_le', '\x00\x00\x005': u'utf_32_be', 'n\x00': u'utf_16_le', '5\x00\x00\x00': u'utf_32_le', '\x00\x00\x003': u'utf_32_be', ']': u'utf_8', '\x00\x00\x009': u'utf_32_be', '"\x00': u'utf_16_le', '\r\x00': u'utf_16_le', '7\x00': u'utf_16_le', '8\x00\x00\x00': u'utf_32_le', '}': u'utf_8'}, complete=True) Determine the encoding of a UTF-x encoded string.
The argument
starts
must be a mapping of bytestrings the input can begin with onto the encoding that such a beginning would represent (seelicit_starts()
for a function that can build such a mapping).The
complete
flag signifies whether the input represents the entire string: if it is setFalse
, the function will attempt to determine the encoding, but will raise aUnicodeError
if it is ambiguous. For example, an input ofb'\xff\xfe'
could be the UTF-16 little-endian byte-order mark, or, if the input is incomplete, it could be the first two characters of the UTF-32-LE BOM:>>> sniff_encoding(b'\xff\xfe') == 'utf_16' True >>> sniff_encoding(b'\xff\xfe', complete=False) Traceback (most recent call last): ... UnicodeError: String encoding is ambiguous.
-
json_delta._util.
stanzas_addressing
(stanzas, keypath)¶ Find diff stanzas modifying the structure at
keypath
.The purpose of this function is to keep track of changes made to the overall structure by stanzas earlier in the sequence, e.g.:
>>> struc = [ ... 'foo', ... 'bar', [ ... 'baz' ... ] ... ] >>> stanzas = [ ... [ [2, 1], 'quux'], ... [ [0] ], ... [ [1, 2], 'quordle'] ... ] >>> (stanzas_addressing(stanzas, [2]) ... == [ ... [ [1], 'quux' ], ... [ [2], 'quordle' ] ... ]) True
stanzas[0]
andstanzas[2]
both address the same element ofstruc
— the list that starts off as['baz']
, even though their keypaths are completely different, because the diff stanza[[0]]
moves the list['baz']
from index 2 ofstruc
to index 1.The return value is a sub-diff: a list of stanzas fit to modify the element at
keypath
within the overall structure.
-
json_delta._util.
struc_lengths
(struc)¶ Build dicts for lengths of nodes in a JSON-serializable structure.
Return value is a 2-tuple
(terminals, nonterminals)
. Theterminals
dict is keyed by the values of the terminal nodes themselves, as these are all hashable types.WARNING: The
nonterminals
dict is keyed by theid()
value of the list or dict, so if the object is modified after this function is called, the lengths recorded may no longer be valid.