Gets the weighted value of the Pandas string series… here is a solution to the problem.
Gets the weighted value of the Pandas string series
The series is as follows:
value
aa aa bb cc
dd ee aa
ff aa cc
I want to count the number of occurrences of a certain word in a line and multiply it by the weight given in the dictionary
weights = {
'aa':1,
'bb':1,
'cc':0.5
}
The result should be
value_score
3.5
1
1.5
The above can be interpreted as the sum (the occurrence of words in the dictionary * weight in the dictionary), i.e. for the first value, it is 2*1 + 1*1 + 1*0.5 = 3.5
I’ve implemented it so far with str.count, but as more values come in, it doesn’t work very well
df['value_score'] = (df['value'].str.count('aa', regex=False) * weights['aa'] +
df['value'].str.count('bb', regex=False) * weights['bb'] +
df['value'].str.count('cc', regex=False) * weights['cc'] )
Solution
Use a
list understanding of values that do not match 0
:
df['value_score'] = df['value'].apply(lambda x: sum(weights.get(y, 0) for y in x.split()))
print (df)
value value_score
0 aa aa bb cc 3.5
1 dd ee aa 1.0
2 ff aa cc 1.5
Another solution:
df['value_score'] = df['value'].str.split(expand=True).stack().map(weights).sum(level=0)
print (df)
value value_score
0 aa aa bb cc 3.5
1 dd ee aa 1.0
2 ff aa cc 1.5