Python – How to extract duplicate keys and values from a list of python dictionaries?

How to extract duplicate keys and values from a list of python dictionaries?… here is a solution to the problem.

How to extract duplicate keys and values from a list of python dictionaries?

I have a list of dictionaries extracted from products and their variants defined as follows:

attribute_list = [
    {'Finish': 'Chrome'},
    {'Size': 'Large'},
    {'Weight': '1.6kg'},
    {'Finish': 'Chrome'},
    {'Weight': '1.9kg'}
]

I want to create two lists, one of which contains dictionaries that are not duplicated in the list, i.e.:

compiled_list = [
    {'Finish': 'Chrome'}
    {'Size': 'Large'}
]

… There is also a duplicate key and value inside, i.e., :

duplicates_list = [
    {'Weight': '1.6kg'}
    {'Weight': '1.9kg'}
]

Below is my code so far, which gets me to two dictionaries, but 1) I think this is very inefficient, and 2) I don’t know how to remove the first instance of duplicate dictionaries

compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
    for k, v in attribute.items():
        if k not in compiled_list_keys:
            compiled_list_keys.append(k)
            compiled_list.append(attribute)
        else:
            if attribute not in compiled_list:
                duplicates_list.append(attribute)
                compiled_list_keys.remove(k)

Solution

This solution involves using Pandas, a Python package that is better suited for data management. You’ll see why:

  1. First, we convert the dictionary list to pandas. Here we delete the exact same copy:

    df = pd. DataFrame([list(attr.items())[0] for attr in attribute_list],
                      columns=['key', 'value']).drop_duplicates()
    #>      key     value
      0     Finish  Chrome
      1     Size    Large
      2     Weight  1.6kg
      4     Weight  1.9kg
    
  2. Now we apply our search function. Using pandas is very simple:

    compiled_df = df.drop_duplicates(subset='key', keep=False)
    #>      key     value
      0     Finish  Chrome
      1     Size    Large
    duplicated_df=df[df.key.duplicated(keep=False)]
    #>      key     value
      2     Weight  1.6kg
      4     Weight  1.9kg
    
  3. Now we convert back to the original dictionary list:

    compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
    #> [{'Finish': 'Chrome'}, {'Size': 'Large'}]
    
    duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]
    #> [{'Weight': '1.6kg'}, {'Weight': '1.9kg'}
    

This may not be the most efficient method, but it is much more versatile. In short, 5 lines of code:

df = pd. DataFrame([list(attr.items())[0] for attr in attribute_list],
                      columns=['key', 'value']).drop_duplicates()
compiled_df = df.drop_duplicates(subset='key', keep=False)
duplicated_df=df[df.key.duplicated(keep=False)]
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]        

Related Problems and Solutions