Python – How to select a specific value in a pandas data frame and replace it with a NaN. How to remove columns from each level 1 multiple index

How to select a specific value in a pandas data frame and replace it with a NaN. How to remove columns from each level 1 multiple index… here is a solution to the problem.

How to select a specific value in a pandas data frame and replace it with a NaN. How to remove columns from each level 1 multiple index

I

have a csv file that I read into the pandas framework :

import pandas as pd

csv_file = pd.read_csv('hello.csv', engine='c', delimiter=',', index_col=0,
                       skiprows=1, header=[0, 1])

This is the View (print(csv_file)): of the csv file

bodyparts        nose                  ...        right_ear              
coords              x           y      ...                y    likelihood
0          197.486369    4.545954      ...       206.351233  1.280000e-06
1          319.946460  191.035224      ...       206.321893  9.680000e-07
2          319.880388  191.012984      ...       206.322207  9.520000e-07
3          320.286005  190.843329      ...       206.227396  1.020000e-06
4          320.210989  190.863304      ...         3.106570  8.350000e-07
5          320.212529  190.867178      ...         3.116692  8.460000e-07
6           -0.794705    2.462400      ...         3.112797  8.500000e-07
7           -0.785404    2.485562      ...         3.117945  8.430000e-07
8          319.786777  191.003882      ...         3.125062  8.820000e-07
9          319.947064  191.030201      ...       206.202980  9.210000e-07
10         319.845807  191.002510      ...       206.177779  8.660000e-07
11         320.135816  190.967408      ...       206.190732  8.910000e-07
12          -0.935765    2.568168      ...       206.260773  8.860000e-07
13          -0.932833    2.525062      ...       206.273504  8.780000e-07
14          -0.960939    2.500079      ...       206.272811  8.680000e-07
15          -0.832561    2.442907      ...       206.266416  8.720000e-07
16          -0.838884    2.421689      ...       206.242941  9.440000e-07
17          -0.857173    2.421467      ...       206.243972  9.950000e-07
18          -0.841627    2.414854      ...       206.225004  9.820000e-07
...               ...         ...      ...              ...           ...
10459      349.556703  301.995042      ...       307.018688  9.999745e-01
10460      348.608277  301.098244      ...       309.648986  9.999962e-01
10461      349.995217  303.397438      ...       311.149967  9.999974e-01
10462      349.109666  305.710711      ...       311.893106  9.999955e-01
10463      352.142571  310.081763      ...       317.420410  9.907742e-01
10464      351.916488  317.078128      ...       319.407211  2.706501e-01
10465      353.809847  320.086683      ...       323.478481  9.911720e-01
10466      349.233529  321.859424      ...       323.383276  8.724346e-01

The resulting dataframe is MultiIndexed:, with two levels

tuple(('body_part1', 'body_part2', ..., 'body_partn'), ('x', 'y', 'likelihood')

Print (df.column()):

MultiIndex(levels=[['left_ear', 'nose', 'right_ear', 'tail'], ['likelihood', 'x', 'y']],
           labels=[[1, 1, 1, 3, 3, 3, 0, 0, 0, 2, 2, 2], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
           names=['bodyparts', 'coords'])

If the probability of the coordinates is low, I don’t want to replace the coordinates with NaN. The new data frame does not have a possibility column. The first line is from the example of “Nose”:

coords           x           y    likelihood
0       197.486369    4.545954  3.890000e-07

The After function should look like this:

coords           x           y
0              NaN         NaN

Note that the excellent value remains the same during this process!

Solution

Let’s say you have a threshold that defines the likelihood of being “low:”

for col in df.columns.levels[0]:
    df.loc[df[(col, 'likelihood')] < threshold, [(col, 'x'), (col, 'y')]] = np.nan

I also think there might be a more optimized way to do this (without looping through the columns), but that should work as well.

Related Problems and Solutions