Python – Three-way dictionary deep merging in python

Three-way dictionary deep merging in python… here is a solution to the problem.

Three-way dictionary deep merging in python

I want to merge two dictionaries A and B, knowing that both dictionaries have a common previous state C. I also need to merge in the subdictionary. In case of a real conflict, I need to raise an exception.

1 – In the example below, the merge method should understand that A and B edit different items, so the merge should not cause conflicts

C = {"x": 0, "y": 0}
A = {"x": 1, "y": 0} # Edit x, but not y
B = {"x": 0, "y": 1} # Edit y, but not x
# merge(A, B, C) => {"x": 1, "y": 1}

2 – The function needs to be able to handle new and deleted items

C = {"x": 0}
A = {"x": 0, "y": 0} # Add y, keep x untouched
B = {}               # Delete x
# merge(A, B, C) => {"y": 0}

3 – The function should throw an exception when a real conflict occurs

C = {"x": 0}
A = {"x": 1}         # Edit x 
B = {"x": 2}         # Also edit x
# merge(A, B, C) => raise Exception

C = {"x": 0}
A = {"x": 1}         # Edit x 
B = {}               # Delete x
# merge(A, B, C) => raise Exception

4 – Functions should work recursively

C = {"deeper": {"x": 0, "y": 0}}
A = {"deeper": {"x": 1, "y": 0}} # Edit deeper["x"], but not deeper["y"]
B = {"deeper": {"x": 0, "y": 1}} # Edit deeper["y"], but not deeper["x"]
# merge(A, B, C) => {"deeper": {"x": 1, "y": 1}}

What is the best way to achieve this merging feature?

Solution

It is possible to convert all terms in dict to sets, find conflicts by the intersection of keys with symmetric difference to C, and merge with the intersection of 3 sets (public terms) and the union of differences to C. Recursively merge the sub-dicts common to A, B, and C, convert the sub-dicts to tuples of item pairs to allow them to be hashable and converted to collections, and then convert them back to dicts after merging.

EDIT: If the dict values are non-hashable objects, such as collections, you must serialize those values first (I recommend using pickle as a serializer because it has Python’s native support) to convert items in dict to a set, merge and deserialize:

import pickle

def merge(a, b, c):
    # recursively merge sub-dicts that are common to a, b and c
    for k in a.keys() & b.keys() & c.keys():
        if all(isinstance(d.get(k), dict) for d in (a, b, c)):
            a[k] = b[k] = c[k] = merge(a[k], b[k], c[k])
    # convert sub-dicts into tuples of item pairs to allow them to be hashable
    for d in a, b, c:
        for k, v in d.items():
            if isinstance(v, dict):
                d[k] = tuple(v.items())
    # convert all the dict items into sets
    set_a, set_b, set_c = (set((k, pickle.dumps(v)) for k, v in d.items()) for d in (a, b, c))
    # intersect keys from the symmetric set differences to c to find conflicts
    for k in set(k for k, _ in set_a ^ set_c) & set(k for k, _ in set_b ^ set_c):
        # it isn't really a conflict if the new values of a and b are the same
        if a.get(k) != b.get(k) or (k in a) ^ (k in b):
            raise ValueError("Conflict found in key %s" % k)
    # merge the dicts by union'ing the differences to c with the common items
    d = dict(set_a & set_b & set_c | set_a - set_c | set_b - set_c)
    # convert the tuple of items back to dicts for output
    for k, v in d.items():
        v = pickle.loads(v)
        if isinstance(v, tuple):
            d[k] = dict(v)
        else:
            d[k] = v
    return d

This way:

C = {"x": 0, "y": 0}
A = {"x": 1, "y": 0} # Edit x, but not y
B = {"x": 0, "y": 1} # Edit y, but not x
print(merge(A, B, C))
C = {"x": 0}
A = {"x": 0, "y": 0} # Add y, keep x untouched
B = {}               # Delete x
print(merge(A, B, C))
C = {"x": 0}
A = {"x": 1}  # Edit x
B = {"x": 1}  # Edit x with the same value
print(merge(A, B, C))
C = {"deeper": {"x": 0, "y": {3, 4}}}
A = {"deeper": {"x": {1, 2}, "y": {4, 3}}} # Edit deeper["x"], but not deeper["y"]
B = {"deeper": {"x": 0, "y": 1}} # Edit deeper["y"], but not deeper["x"]
print(merge(A, B, C))
C = {"deeper": 1}
A = {"deeper": {"x": 0, "y": 1}} # Edit deeper and turn it into a dict
B = {"deeper": 1, "x": 2} # Add x, keep deeper untouched
print(merge(A, B, C))
C = {"deeper": {"x": 0, "y": 1}}
A = {"deeper": {"x": 0, "y": 1}} # Keep deeper untouched
B = {"deeper": 1} # Turn deeper into a scalar
print(merge(A, B, C))

Will output:

{'x': 1, 'y': 1}
{'y': 0}
{'x': 1}
{'deeper': {'x': {1, 2}, 'y': 1}}
{'deeper': {'x': 0, 'y': 1}, 'x': 2}
{'deeper': 1}

At the same time:

C = {"x": 0}
A = {"x": 1}         # Edit x
B = {"x": 2}         # Edit x with a different value
print(merge(A, B, C))

Will improve:

ValueError: Conflict found in key x

And:

C = {"deeper": {"x": 0, "y": 1}}
A = {"deeper": {"x": 0, "y": 2}} # Edit deeper["y"], but not deeper["x"]
B = {"deeper": 1} # Turn deeper into a scalar
print(merge(A, B, C))

Will improve:

ValueError: Conflict found in key deeper

Related Problems and Solutions