Python – Combines MultiIndex columns into a single index in a Pandas data frame

Combines MultiIndex columns into a single index in a Pandas data frame… here is a solution to the problem.

Combines MultiIndex columns into a single index in a Pandas data frame

In my code, I integrate 2 databases into 1 database. The problem is that when I add a column to the database, the result is not as expected. Use Python 2.7

Code:

import pandas as pd
import pandas.io.formats.excel
import numpy as np

# Leemos ambos archivos y los cargamos en DataFrames
df1 = pd.read_excel("archivo1.xlsx")
df2 = pd.read_excel("archivo2.xlsx")

df = (pd.concat([df1,df2])
        .set_index(["Cliente",'Fecha'])
        .stack()
        .unstack(1)
        .sort_index(ascending=(True, False)))

m = df.index.get_level_values(1) == 'Impresiones'
df.index = np.where(m, 'Impresiones', df.index.get_level_values(0))

# Creamos el xlsx de salida
pandas.io.formats.excel.header_style = None

with pd. ExcelWriter("Data.xlsx",
                    engine='xlsxwriter',
                    date_format='dd/mm/yyyy',
                    datetime_format='dd/mm/yyyy') as writer:

df.to_excel(writer, sheet_name='Sheet1')

Archive 1:

Fecha       Cliente     Impresiones Impresiones 2   Revenue
20/12/17    Jose        1312        35              $12
20/12/17    Martin      12          56              $146
20/12/17    Pedro       5443        124             $1,256
20/12/17    Esteban     667         1235            $1

Archive 2:

Fecha       Cliente     Impresiones Impresiones 2   Revenue
21/12/17    Jose        25          5               $2
21/12/17    Martin      6347        523             $123
21/12/17    Pedro       2368        898             $22
21/12/17    Esteban     235         99              $7,890

Desired outcome:

I tried using m1 = df.index.get_level_values(1) == ‘Impresiones 2’

df.index = np.where(m1, 'Impresiones 2', df.index.get_level_values(0)) but I have this error: IndexError: Too many levels: Index has only 1 level, not 2

Solution

The first bit of the solution is similar to jezrael’s answer For your previous problem, use concat+ set_index + stack + unstack + sort_index .

df = pd.concat([df1, df2])\
       .set_index(['Cliente', 'Fecha'])\
       .stack()\
       .unstack(-2)\
       .sort_index(ascending=[True, False])

Now for the challenging part, we have to merge the names of the 0th level to the1st level and then reset the index.

I use np.insert to insert the name above the revenue entry in the index.

i, j = df.index.get_level_values(0), df.index.get_level_values(1)
k = np.insert(j.values, np.flatnonzero(j == 'Revenue'), i.unique())

Now, I create a new MultiIndex, which I then use to reindex df

idx = pd. MultiIndex.from_arrays([i.unique().repeat(len(df.index.levels[1]) + 1), k])
df = df.reindex(idx).fillna('')

Now, drop the extra levels –

df.index = df.index.droplevel()

df

Fecha        20/12/17 21/12/17
Esteban                       
Revenue            $1   $7,890
Impresiones2     1235       99
Impresiones       667      235
Jose                          
Revenue           $12       $2
Impresiones2       35        5
Impresiones      1312       25
Martin                        
Revenue          $146     $123
Impresiones2       56      523
Impresiones        12     6347
Pedro                         
Revenue        $1,256      $22
Impresiones2      124      898
Impresiones      5443     2368

Related Problems and Solutions