Concatenation and grouping populate NaN values… here is a solution to the problem.
Concatenation and grouping populate NaN values
I have this data frame:
df:
companycode name address A B C ...
1234 asd qwe,56 Tyh 123 923
1234 asd qwe,56 Zfhs 4828 01992
6472 yui iop,56 Retgh 8484 8484
...
I also have one that looks like this :
df2:
companycode A B C ...
1234 Jid 4123 141
6472 Low 1312 3234
...
The name and address are always the same for a single company code
I’d like to connect, join, merge, or attach them in a way that will eventually look like this :
companycode name address A B C ...
1234 asd qwe,56 Tyh 123 923
1234 asd qwe,56 Zfhs 4828 01992
6472 yui iop,56 Retgh 8484 8484
1234 asd qwe,56 Jid 4123 141
6472 yui iop,56 Low 1312 3234
...
Since the name and address
of a single company code are always the same, basically I want to concatenate df2 with df in axis=0 and pull the name and address from the original df company code to this new line. It’s confusing to write, but I think it works better visually.
What should I do?
Solution
pd.concat
followed by the groupby
operation should do the trick.
df = pd.concat([df1, df2], 0, ignore_index=True)\
.groupby('companycode').ffill()
df
A B C address companycode name
0 Tyh 123 923 qwe,56 1234 asd
1 Zfhs 4828 1992 qwe,56 1234 asd
2 Retgh 8484 8484 iop,56 6472 yui
3 Jid 4123 141 qwe,56 1234 asd
4 Low 1312 3234 iop,56 6472 yui
ignore_index=True
is set to create a new index on connection- The join leaves a previously non-existent
NaN
value inthe column of df2
- Perform
a groupby
operation oncompanycode
, and thenperform ffill
to populate thesesame groups of NaNs
with the correct values from.