Pandas data frame: get column item when the corresponding item in another column is greater than a value… here is a solution to the problem.
Pandas data frame: get column item when the corresponding item in another column is greater than a value
I have the following Pandas data frame. It is a large data frame with more than 500,000 rows.
Event_Number Well p_and_s
0 1 7 4.0
1 1 9 0.0
2 1 15 0.0
3 2 7 2.0
4 2 9 7.0
5 2 15 0.0
6 3 5 0.0
7 3 7 8.0
8 3 16 3.0
9 4 7 8.0
10 4 15 0.0
11 5 7 8.0
12 5 9 3.0
13 5 15 6.0
14 6 5 0.0
15 6 7 8.0
16 7 7 8.0
17 7 9 0.0
18 7 15 0.0
19 8 7 8.0
20 8 15 4.0
I want to find for each group of [column: Event_Number] what [column: Well] has a value greater than 2 in the [p_and_s] column.
The final dataFrame should look like this, with a new column listing all columns with p_and_s greater than 2
Event_Number Well p_and_s well_array
0 1 7 4.0 [7]
1 1 9 0.0 [7]
2 1 15 0.0 [7]
3 2 7 2.0 [9]
4 2 9 7.0 [9]
5 2 15 0.0 [9]
6 3 5 0.0 [7, 16]
7 3 7 8.0 [7, 16]
8 3 16 3.0 [7, 16]
9 4 7 8.0 [7]
10 4 15 0.0 [7]
11 5 7 8.0 [7, 9, 15]
12 5 9 3.0 [7, 9, 15]
13 5 15 6.0 [7, 9, 15]
14 6 5 0.0 [7]
15 6 7 8.0 [7]
16 7 7 8.0 [7]
17 7 9 0.0 [7]
18 7 15 0.0 [7]
19 8 7 8.0 [7, 15]
20 8 15 4.0 [7, 15]
Solution
Here’s one way.
s = df[df['p_and_s'] > 2].groupby('Event_Number')['Well'].apply(list)
df['well_array'] = df['Event_Number'].map(s)
Explain
- After applying the filter on
p_and_s
, create a series that mapsEvent_Number
toWell
. - Via
PD. Series.map
maps to the original data frame. - To improve performance, you should avoid
lambda
functions whenever possible because they represent expensive implicit loops.
Result
Event_Number Well p_and_s well_array
0 1 7 4.0 [7]
1 1 9 0.0 [7]
2 1 15 0.0 [7]
3 2 7 2.0 [9]
4 2 9 7.0 [9]
5 2 15 0.0 [9]
6 3 5 0.0 [7, 16]
7 3 7 8.0 [7, 16]
8 3 16 3.0 [7, 16]
9 4 7 8.0 [7]
10 4 15 0.0 [7]
11 5 7 8.0 [7, 9, 15]
12 5 9 3.0 [7, 9, 15]
13 5 15 6.0 [7, 9, 15]
14 6 5 0.0 [7]
15 6 7 8.0 [7]
16 7 7 8.0 [7]
17 7 9 0.0 [7]
18 7 15 0.0 [7]
19 8 7 8.0 [7, 15]
20 8 15 4.0 [7, 15]