Python – Create a frequency table of unique values from a list of lists

Create a frequency table of unique values from a list of lists… here is a solution to the problem.

Create a frequency table of unique values from a list of lists

I have a list as follows:

test = [[‘abc’, ‘bcd’, ‘dce’], [‘abc’, ‘ab’, ‘cd’], [‘cd’, be’]]

I want to get the frequency of each unique value for each sublist. For example, the first sublist has

abc 1
Radio station 1
1
0
0
CD 0
is 0

I’m trying something like this:

def freq(list_):
    df = []
    for c in list_:
        df_= pd. DataFrame.from_dict(Counter(c), orient = "index")
        df_.index.name = 'motif'
        df_.reset_index(inplace = True)
        df.append(df_)
        print(df_)
    print(df)
    df = reduce(lambda  left,right: pd.merge(left,right,on=[0],
                                    how='outer'), df).fillna('void')
    df = df. T
    df.columns = df.iloc[0]
    df = df.iloc[1:]
    df[df == "void"] = 0
    col_names = sorted(df.columns)
    df = df[col_names]
    vals = df.values
    sums = np.sum(vals, axis = 1)
    freqs = vals / sums[:,None]
    return pd. DataFrame(freqs). T

But it doesn’t work.

The output I want is a data frame with each unique value as a column feature and each sublist as a row.

How to do this?

Edit:

Expected output:

   ab  abc  bcd  be  cd  dce
0   0    .33    .33   0   0    .33
1   .33    .33    0   0   .33    0
2   0    0    0   .5   .5    0

Solution

Use > get_dummies And:

df = pd.get_dummies(pd. DataFrame(test), prefix_sep='', prefix='').sum(level=0, axis=1)
print (df)
   abc  cd  ab  bcd  be  dce
0    1   0   0    1   0    1
1    1   1   1    0   0    0
2    0   1   0    0   1    0

Or Counter uses the DataFrame constructor, replacing NaN with 0 and converting it to integers:

from collections import Counter

df = pd. DataFrame([Counter(x) for x in test]).fillna(0).astype(int)
print (df)
   ab  abc  bcd  be  cd  dce
0   0    1    1   0   0    1
1   1    1    0   0   1    0
2   0    0    0   1   1    0

And then:

df = df.div(df.sum(axis=1), axis=0)
print (df)
         ab       abc       bcd   be        cd       dce
0  0.000000  0.333333  0.333333  0.0  0.000000  0.333333
1  0.333333  0.333333  0.000000  0.0  0.333333  0.000000
2  0.000000  0.000000  0.000000  0.5  0.500000  0.000000

Related Problems and Solutions