Python – Pandas set_levels, how to avoid tag sorting?

Pandas set_levels, how to avoid tag sorting?… here is a solution to the problem.

Pandas set_levels, how to avoid tag sorting?

I’m having trouble with set_levels with multiple indexes

from io import StringIO

txt = '''Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1'''

df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])

df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)

Name Height   Age
   Metres             
0      A    NaN  25.0
1      B   95.0   NaN

If I run the same command again

df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)

Name Height   Age
       Metres      
0    A    NaN  25.0
1    B   95.0   NaN

Now this has produced the desired results. Why does this behavior occur? Is it possible to leave the labels unsorted on the first try?

Solution

I

don’t quite understand why this is so, but I found the cause and solution to the problem :

If we look at the column labels, we find something strange

>>> df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])
>>> df.columns
MultiIndex(levels=[['Age', 'Height', 'Name'], ['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1']],
           labels=[[2, 1, 0], [1, 0, 2]])

The index of the second tier does not match the index of the first level. When you replace a string, you do so on the array in the correct order:

>>> df.columns.get_level_values(level=1)
Index(['Unnamed: 0_level_1', 'Metres', 'Unnamed: 2_level_1'], dtype='object')

But you can get the array in the wrong order by index:

>>> df.columns.levels[1]
Index(['Metres', 'Unnamed: 0_level_1', 'Unnamed: 2_level_1'], dtype='object')

So delete the unnamed index:

>>> df.columns = df.columns.set_levels(df.columns.levels[1].str.replace('Un.*', ''), level=1)
>>> df

Name Height   Age
       Metres
0    A    NaN  25.0
1    B   95.0   NaN

However, I would like someone to point out why this behavior occurs with the use of get_ and set_levels.

Related Problems and Solutions