An iteration of the data frame groupby… here is a solution to the problem.
An iteration of the data frame groupby
A B C
0 Bob 10 2
1 Bob 11 8
2 Sarah 23 -2
3 Sarah 24 4
4 Jack 19 -4
5 Jack 21 -1
I want to get a new df["Point"]
as follows:
- For the Bob group:
df["Point")
is the first B value multiplied by the C value. 10*2=20; 10*8=80。 - Sarah” group:
df["Point")
is the first B value multiplied by the C value. 23*(-2)=(-46); 23*4=92。 - Jack” group:
df["Point"]
is the first B value multiplied by the C value. 19*(-4)=(-76); 19*(-1)=(-19)。
To the “
To the “
I
mean, I want to get:
A B C Point
0 Bob 10 2 20
1 Bob 11 8 80
2 Sarah 23 -2 -46
3 Sarah 24 4 92
4 Jack 19 -4 -76
5 Jack 21 -1 -19
After that, I want to do the following iteration:
results = {}
grouped = df.groupby("A")
for idx, group in grouped:
if (group["Point"] > 50).any():
results[idx] = group[group["Point"] > 50].head(1)
print ("")
else:
results[idx] = group.tail(1)
print ("")
print(results[idx])
And get this result
:
A B C Point
1 Bob 11 8 80
A B C Point
3 Sarah 23 4 92
A B C Point
5 Jack 21 -1 -19
I think I have to do
two iterations, but I don’t know how to do it or if I can do it differently.
Solution
Start by transform
to create a new column with first
multiplied by C
column:
df['point'] = df.groupby('A')['B'].transform('first').mul(df['C'])
print (df)
A B C point
0 Bob 10 2 20
1 Bob 11 8 80
2 Sarah 23 -2 -46
3 Sarah 24 4 92
4 Jack 19 -4 -76
5 Jack 21 -1 -19
Then first filter all rows by criteria, then press > drop_ duplicates only gets the first line – keep='first'
by default:
df1 = df[df['point'] > 50].drop_duplicates('A')
print (df1)
A B C point
1 Bob 11 8 80
3 Sarah 24 4 92
Then filter not in DF1. Rows and columns in A
isin
and inversion conditions ~
, again > drop_ duplicates only keeps the last line:
df2 = df[~df['A'].isin(df1['A'])].drop_duplicates('A', keep='last')
print (df2)
A B C point
5 Jack 21 -1 -19
Last time I used concat
and dict Comprehension
Final Dictionary
:
d = {k: v for k, v in pd.concat([df1, df2]).groupby('A')}
print (d)
{'Bob': A B C point
1 Bob 11 8 80, 'Jack': A B C point
5 Jack 21 -1 -19, 'Sarah': A B C point
3 Sarah 24 4 92}