How to populate the np.nan value with data from another row based on a match… here is a solution to the problem.
How to populate the np.nan value with data from another row based on a match
I need to do the following
a=[1,2,3,4,5]
c=[0,100,100,200,200,0]
b=['2013-06-10', np.nan, '2013-02-15', np.nan, '2013-05-15']
df=pd. DataFrame({'a':a,'b':b,'c':c})
This will give:
a b c
0 1 2013-06-10 100
1 2 NaN 100
2 3 2013-02-15 200
3 4 NaN 200
4 5 2013-05-15 100
I want to look up the same value in the previous row based on the value in column C and fill the date when column B is empty.
It should end up like this :-
a b c
0 1 2013-06-10 100
1 2 2013-06-10 100
2 3 2013-02-15 200
3 4 2013-02-15 200
4 5 2013-05-15 100
I currently use the Apply Lambda row formula function to fill in dates, but because my raw data has millions of rows, the speed is much lower. I was wondering if anyone knows of a faster way to populate values with data from different rows based on the same value in column C
Solution
You can use ffill
:
df['b'] = df.groupby('c')['b'].ffill()
print (df)
a b c
0 1 2013-06-10 100
1 2 2013-06-10 100
2 3 2013-02-15 200
3 4 2013-02-15 200
4 5 2013-05-15 100
Also, if the first value in some group in b
is NaN
, use apply
because two functions need to be applied by group:
print (df)
a b c
0 1 NaN 100 <- NaN
1 1 2013-06-10 100
2 2 NaN 100
3 3 2013-02-15 200
4 4 NaN 200
5 5 2013-05-15 100
df['b'] = df.groupby('c')['b'].apply(lambda x: x.ffill().bfill())
print (df)
a b c
0 1 2013-06-10 100
1 1 2013-06-10 100
2 2 2013-06-10 100
3 3 2013-02-15 200
4 4 2013-02-15 200
5 5 2013-05-15 100