Python – Gets the weighted value of the Pandas string series

Gets the weighted value of the Pandas string series… here is a solution to the problem.

Gets the weighted value of the Pandas string series

The series is as follows:

value
aa aa bb cc
dd ee aa
ff aa cc

I want to count the number of occurrences of a certain word in a line and multiply it by the weight given in the dictionary

weights = {
   'aa':1,
   'bb':1,
   'cc':0.5
}

The result should be

value_score
3.5
1
1.5

The above can be interpreted as the sum (the occurrence of words in the dictionary * weight in the dictionary), i.e. for the first value, it is 2*1 + 1*1 + 1*0.5 = 3.5

I’ve implemented it so far with str.count, but as more values come in, it doesn’t work very well

df['value_score'] = (df['value'].str.count('aa', regex=False) * weights['aa'] +
                     df['value'].str.count('bb', regex=False) * weights['bb'] +
                     df['value'].str.count('cc', regex=False) * weights['cc'] )

Solution

Use a list understanding of values that do not match 0

:

df['value_score'] = df['value'].apply(lambda x: sum(weights.get(y, 0) for y in x.split()))
print (df)
         value  value_score
0  aa aa bb cc          3.5
1     dd ee aa          1.0
2     ff aa cc          1.5

Another solution:

df['value_score'] = df['value'].str.split(expand=True).stack().map(weights).sum(level=0)
print (df)
         value  value_score
0  aa aa bb cc          3.5
1     dd ee aa          1.0
2     ff aa cc          1.5

Related Problems and Solutions