How to extract duplicate keys and values from a list of python dictionaries?
I have a list of dictionaries extracted from products and their variants defined as follows:
attribute_list = [
{'Finish': 'Chrome'},
{'Size': 'Large'},
{'Weight': '1.6kg'},
{'Finish': 'Chrome'},
{'Weight': '1.9kg'}
]
I want to create two lists, one of which contains dictionaries that are not duplicated in the list, i.e.:
compiled_list = [
{'Finish': 'Chrome'}
{'Size': 'Large'}
]
… There is also a duplicate key and value inside, i.e., :
duplicates_list = [
{'Weight': '1.6kg'}
{'Weight': '1.9kg'}
]
Below is my code so far, which gets me to two dictionaries, but 1) I think this is very inefficient, and 2) I don’t know how to remove the first instance of duplicate dictionaries
compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
for k, v in attribute.items():
if k not in compiled_list_keys:
compiled_list_keys.append(k)
compiled_list.append(attribute)
else:
if attribute not in compiled_list:
duplicates_list.append(attribute)
compiled_list_keys.remove(k)
Solution
This solution involves using Pandas, a Python package that is better suited for data management. You’ll see why:
First, we convert the dictionary list to pandas. Here we delete the exact same copy:
df = pd. DataFrame([list(attr.items())[0] for attr in attribute_list], columns=['key', 'value']).drop_duplicates() #> key value 0 Finish Chrome 1 Size Large 2 Weight 1.6kg 4 Weight 1.9kg
Now we apply our search function. Using pandas is very simple:
compiled_df = df.drop_duplicates(subset='key', keep=False) #> key value 0 Finish Chrome 1 Size Large duplicated_df=df[df.key.duplicated(keep=False)] #> key value 2 Weight 1.6kg 4 Weight 1.9kg
Now we convert back to the original dictionary list:
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()] #> [{'Finish': 'Chrome'}, {'Size': 'Large'}] duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()] #> [{'Weight': '1.6kg'}, {'Weight': '1.9kg'}
This may not be the most efficient method, but it is much more versatile. In short, 5 lines of code:
df = pd. DataFrame([list(attr.items())[0] for attr in attribute_list],
columns=['key', 'value']).drop_duplicates()
compiled_df = df.drop_duplicates(subset='key', keep=False)
duplicated_df=df[df.key.duplicated(keep=False)]
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]