Adds a blank line to the MultiIndex DataFrame… here is a solution to the problem.
Adds a blank line to the MultiIndex DataFrame
As the title indicates, I want to add a blank row to my MultiIndex
DataFrame
. The primary index needs to have a defined index value, and the secondary index needs to be np.nan.
The value in the column must be np.nan.
Consider the following:
import pandas as pd
import numpy as np
iterables = [['foo'], ['r_1', 'r_2', 'r_3']]
idx = pd. MultiIndex.from_product(iterables, names=['idx_1', 'idx_2'])
data = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
df = pd. DataFrame(data, idx, columns=['col_1', 'col_2', 'col_3'])
df
Out[93]:
col_1 col_2 col_3
idx_1 idx_2
foo r_1 1 2 3
r_2 4 5 6
r_3 7 8 9
If this isn’t a MultiIndex
like this, I’ll usually attach a Series
:
s = pd. Series(
[np.nan, np.nan, np.nan],
index=['col_1', 'col_2', 'col_3'],
name='bar'
)
df.append(s)
Out[95]:
col_1 col_2 col_3
(foo, r_1) 1.0 2.0 3.0
(foo, r_2) 4.0 5.0 6.0
(foo, r_3) 7.0 8.0 9.0
bar NaN NaN NaN
In this case, my MultiIndex
is converted to a tuple. I can’t use ignore_index=True
in the append
method because that removes MultiIndex
. I feel like I’m close, but so far.
My output should look like this:
# some magic
Out[96]:
col_1 col_2 col_3
col_a col_b
foo r_1 1.0 2.0 3.0
r_2 4.0 5.0 6.0
r_3 7.0 8.0 9.0
bar NaN NaN NaN NaN
(The secondary index None
can also be accepted).
What should I do?
Use Python 3.6 and Pandas 0.20.3.
Solution
Use setting with enlargement :
df.loc[('bar', ''), ['col_1', 'col_2', 'col_3']] = np.nan
Or use tuples in name
:
s = pd. Series(
[np.nan, np.nan, np.nan],
index=['col_1', 'col_2', 'col_3'],
name=('bar', np.nan)
)
print (df.append(s))
col_1 col_2 col_3
idx_1 idx_2
foo r_1 1.0 2.0 3.0
r_2 4.0 5.0 6.0
r_3 7.0 8.0 9.0
bar NaN NaN NaN NaN
s = pd. Series(
[np.nan, np.nan, np.nan],
index=['col_1', 'col_2', 'col_3'],
name=('bar', '')
)
print (df.append(s))
col_1 col_2 col_3
idx_1 idx_2
foo r_1 1.0 2.0 3.0
r_2 4.0 5.0 6.0
r_3 7.0 8.0 9.0
bar NaN NaN NaN