Create a frequency table of unique values from a list of lists… here is a solution to the problem.
Create a frequency table of unique values from a list of lists
I have a list as follows:
test = [[‘abc’, ‘bcd’, ‘dce’], [‘abc’, ‘ab’, ‘cd’], [‘cd’, be’]]
I want to get the frequency of each unique value for each sublist. For example, the first sublist has
abc 1
Radio station 1
1
0
0
CD 0
is 0
I’m trying something like this:
def freq(list_):
df = []
for c in list_:
df_= pd. DataFrame.from_dict(Counter(c), orient = "index")
df_.index.name = 'motif'
df_.reset_index(inplace = True)
df.append(df_)
print(df_)
print(df)
df = reduce(lambda left,right: pd.merge(left,right,on=[0],
how='outer'), df).fillna('void')
df = df. T
df.columns = df.iloc[0]
df = df.iloc[1:]
df[df == "void"] = 0
col_names = sorted(df.columns)
df = df[col_names]
vals = df.values
sums = np.sum(vals, axis = 1)
freqs = vals / sums[:,None]
return pd. DataFrame(freqs). T
But it doesn’t work.
The output I want is a data frame with each unique value as a column feature and each sublist as a row.
How to do this?
Edit:
Expected output:
ab abc bcd be cd dce
0 0 .33 .33 0 0 .33
1 .33 .33 0 0 .33 0
2 0 0 0 .5 .5 0
Solution
Use > get_dummies And
:
df = pd.get_dummies(pd. DataFrame(test), prefix_sep='', prefix='').sum(level=0, axis=1)
print (df)
abc cd ab bcd be dce
0 1 0 0 1 0 1
1 1 1 1 0 0 0
2 0 1 0 0 1 0
Or Counter
uses the DataFrame
constructor, replacing NaN
with 0
and converting it to integer
s:
from collections import Counter
df = pd. DataFrame([Counter(x) for x in test]).fillna(0).astype(int)
print (df)
ab abc bcd be cd dce
0 0 1 1 0 0 1
1 1 1 0 0 1 0
2 0 0 0 1 1 0
And then:
df = df.div(df.sum(axis=1), axis=0)
print (df)
ab abc bcd be cd dce
0 0.000000 0.333333 0.333333 0.0 0.000000 0.333333
1 0.333333 0.333333 0.000000 0.0 0.333333 0.000000
2 0.000000 0.000000 0.000000 0.5 0.500000 0.000000