Pandas – Count the number of lines of the function – for each input line
I have a data frame that needs to be added to a column. The column needs to be the count of all other rows in the table that meet the criteria that accept input from the Input row and the Output row.
For example, if it is a data frame describing people, I want to make a column to count how many people are taller and lighter than the current row.
I want the height and weight of the row, and the
height and weight of the other rows in the function, so I can do this:
def example_function(height1, weight1, height2, weight2):
if height1 > height2 and weight1 < weight2:
return True
else:
return False
It summarizes all True and gives that sum in the column.
Is such a thing possible?
Thanks in advance for any ideas!
Edit: Example input:
id name height weight country
0 Adam 70 180 USA
1 Bill 65 190 CANADA
2 Chris 71 150 GERMANY
3 Eric 72 210 USA
4 Fred 74 160 FRANCE
5 Gary 75 220 MEXICO
6 Henry 61 230 SPAIN
The result needs to be:
id name height weight country new_column
0 Adam 70 180 USA 1
1 Bill 65 190 CANADA 1
2 Chris 71 150 GERMANY 3
3 Eric 72 210 USA 1
4 Fred 74 160 FRANCE 4
5 Gary 75 220 MEXICO 1
6 Henry 61 230 SPAIN 0
I
believe it needs some kind of functionality because the actual logic I need to use is more complicated.
Edit 2: Fix typos
Solution
You can add the bool value like this:
count = ((df.height1 > df.height2) & (df.weight1 < df.weight2)).sum()
Edit:
I tested it a bit and then changed the condition with a custom function:
def f(x):
#check boolean mask
#print ((df.height > x.height) & (df.weight < x.weight))
return ((df.height < x.height) & (df.weight > x.weight)).sum()
df['new_column'] = df.apply(f, axis=1)
print (df)
id name height weight country new_column
0 0 Adam 70 180 USA 2
1 1 Bill 65 190 CANADA 1
2 2 Chris 71 150 GERMANY 3
3 3 Eric 72 210 USA 1
4 4 Fred 74 160 FRANCE 4
5 5 Gary 75 220 MEXICO 1
6 6 Henry 61 230 SPAIN 0
Explanation:
For each row, compare the value and count the simple sum
value True
.