What is the best way to insert a value into the “proper” position in a Pandas dataframe with some (index) parameter?… here is a solution to the problem.
What is the best way to insert a value into the “proper” position in a Pandas dataframe with some (index) parameter?
I have a data frame, df
, as shown below:
Word Row ID Remark
abc 1 xyz
def 2 xyz
ghi 4 uvw
jkl 5 qrs
mno 7 wxy
The missing value is in another data frame, df1
:
Word Row ID Remark
pqr 3 uuu
stu 6 vvv
I want to insert the missing values in df1 into their proper places in df1
, so this is the desired output:
Word Row ID Remark
abc 1 xyz
def 2 xyz
pqr 3 uuu
ghi 4 uvw
jkl 5 qrs
stu 6 vvv
mno 7 wxy
My code is as follows:
for i in range(len(df1)): # run through each of the missing values
if df2['Row ID'][i] not in range(min(df['Row ID']), df2['Row ID'][i]):
df.loc[-1] = df2.loc[i] # adding a row with -1 index
df.index += 1 # shifting index so that it does not overwrite the current value in that position
df = df.sort_values('Row ID')
But I don’t think it’s the most efficient way because:
- There is a for loop. I think there has to be a vectorized way to do this.
- There is a sort operation at the end of the for loop. I think if there was a vectorized way, it would merge the sorting into the step itself, rather than in a separate step.
Solution
With searchsorted
, I personally think that concat
+sort_values
can also solve the problem.
df1.index=np.searchsorted(df. RowID.values,df1. RowID.values)
pd.concat([df1,df]).sort_index()
Out[187]:
Word RowID Remark
0 abc 1 xyz
1 def 2 xyz
2 pqr 3 uuu
2 ghi 4 uvw
3 jkl 5 qrs
4 stu 6 vvv
4 mno 7 wxy