Python – Arbitrarily nested dictionaries from tuples

Arbitrarily nested dictionaries from tuples… here is a solution to the problem.

Arbitrarily nested dictionaries from tuples

Given a dictionary

with tuples as keys (and numbers/scalars as values), what is the Pythonic method to convert to a nested dictionary? The problem is that the length of the tuple is arbitrary from input to input.

Below, d1, d2, and d3 demonstrate incremental nesting:

from itertools import product

d1 = dict(zip(product('AB', [0, 1]), range(2*2)))
d2 = dict(zip(product('AB', [0, 1], [True, False]), range(2*2*2)))
d3 = dict(zip(product('CD', [0, 1], [True, False], 'AB'), range(2*2*2*2)))

The nested version they produce will be:

# For d1
{'A': {0: 0, 1: 1}, 'B': {0: 2, 1: 3}}

# For d2
{'A': {0: {True: 0, False: 1}, 1: {True: 2, False: 3}},
 'B': {0: {True: 4, False: 5}, 1: {True: 6, False: 7}}}

# Beginning of result for d3
{
'C': {
    0: {
        True: {
            'A': 0
            'B': 1
        },
        False: {
            'A': 2,
            'B': 3
        },
    1: # ...

My attempt: I like this trick for initializing empty data structures, which is given in many other SO answers:

from collections import defaultdict

def nested_dict():
    return defaultdict(nested_dict)

But I’m having a hard time implementing this because the number of levels is uncertain. I can use something like :

def nest(d: dict) -> dict:
    res = nested_dict()
    for (i, j, k), v in d.items():
        res[i][j][k] = v
    return res

But this only works for d2 because its keys have more than 3 levels (i, j, k).

Here’s how I’m trying to fix this, but I’m guessing there’s an easier way.

def set_arbitrary_nest(keys, value):
    """
    >>> keys = 1, 2, 3
    >>> value = 5
    result --> {1: {2: {3: 5}}}
    """

it = iter(keys)
    last = next(it)
    res = {last: {}}
    lvl = res
    while True:
        try:
            k = next(it)
            lvl = lvl[last]
            lvl[k] = {}
            last = k
        except StopIteration:
            lvl[k] = value
            return res

>>> set_arbitrary_nest([1, 2, 3], 5)
{1: {2: {3: 5}}}

Solution

Just iterate through each key and add a dictionary using everything but the last element of the key. Keep a reference to the last dictionary so set, and then use the last element in the key tuple to actually set a key-value pair in the output dictionary:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

You can use >functools.reduce() to handle dictionary traversal work:

from functools import reduce

def nest(d: dict) -> dict:
    result = {}
    traverse = lambda r, k: r.setdefault(k, {})
    for key, value in d.items():
        reduce(traverse, key[:-1], result)[key[-1]] = value
    return result

I used dict.setdefault() instead of automatically activating the defaultdict(nested_dict) option, because this would generate a regular dictionary that doesn’t implicitly add keys further when they don’t exist yet.

Demo:

>>> from pprint import pprint
>>> pprint(nest(d1))
{'A': {0: 0, 1: 1}, 'B': {0: 2, 1: 3}}
>>> pprint(nest(d2))
{'A': {0: {False: 1, True: 0}, 1: {False: 3, True: 2}},
 'B': {0: {False: 5, True: 4}, 1: {False: 7, True: 6}}}
>>> pprint(nest(d3))
{'C': {0: {False: {'A': 2, 'B': 3}, True: {'A': 0, 'B': 1}},
       1: {False: {'A': 6, 'B': 7}, True: {'A': 4, 'B': 5}}},
 'D': {0: {False: {'A': 10, 'B': 11}, True: {'A': 8, 'B': 9}},
       1: {False: {'A': 14, 'B': 15}, True: {'A': 12, 'B': 13}}}}

Note that the above solution is a clean O(N)

loop (N is the length of the input dictionary), while the groupby solution proposed by Ajax1234 must sort the inputs to work, making it an O(NlogN) solution. This means that for a dictionary with 1000 elements, groupby() will take 10.000 steps to generate the output, while the O(N) linear loop takes only 1000 steps. For a million keys, this increases to 20 million steps, and so on.

Also, recursion in Python… Slow because Python cannot optimize such solutions as iterative approaches. Function calls are relatively expensive, so using recursion comes with a significant performance cost because you significantly increase the number of function calls and scale the frame stack operations.

The time trial shows how important this is; Using your example d3 and 100k run, we can easily see a 5x speed difference:

>>> from timeit import timeit
>>> timeit('n(d)', 'from __main__ import create_nested_dict as n, d3; d=d3.items()', number=100_000)
8.210276518017054
>>> timeit('n(d)', 'from __main__ import nest as n, d3 as d', number=100_000)
1.6089267160277814

Related Problems and Solutions