Python – Crossing2 Pandas data frame

Crossing2 Pandas data frame… here is a solution to the problem.

Crossing2 Pandas data frame

In my question, I have 2 dataframes mydataframe1 and mydataframe2 as shown below.

mydataframe1
Out[13]:
  Start   End       Remove     
  50      60        1  
  61      105       0  
  106     150       1  
  151     160       0  
  161     180       1  
  181     200       0  
  201     400       1  

mydataframe2
Out[14]: 
    Start   End  
    55      100
    105     140
    151     154
    155     185
    220     240    

From mydataframe2 I want to remove the row that contains (and partially contains) the interval Start-End in any interval in "Remove"=1 of my data frame 1. In other words, there should be no intersection between the interval of mydataframe2 and the interval of mydataframe1

In this scenario, mydataframe2 becomes

mydataframe2
Out[15]: 
    Start   End  
    151     154

Solution

You can use pd. IntervalIndex crosses

Gets the row to delete

In [313]: dfr = df1.query('Remove == 1')

Construct an IntervalIndex from the scope to be deleted

In [314]: s1 = pd. IntervalIndex.from_arrays(dfr. Start, dfr. End, 'both')

Construct the IntervalIndex from the test

under test

In [315]: s2 = pd. IntervalIndex.from_arrays(df2. Start, df2. End, 'both')

Select the S2 row that is not in the scope of S1

In [316]: df2.loc[[x not in s1 for x in s2]]
Out[316]:
   Start  End
2    151  154

Details

In [320]: df1
Out[320]:
   Start  End  Remove
0     50   60       1
1     61  105       0
2    106  150       1
3    151  160       0
4    161  180       1
5    181  200       0
6    201  400       1

In [321]: df2
Out[321]:
   Start  End
0     55  100
1    105  140
2    151  154
3    155  185
4    220  240

In [322]: dfr
Out[322]:
   Start  End  Remove
0     50   60       1
2    106  150       1
4    161  180       1
6    201  400       1

IntervalIndex details

In [323]: s1
Out[323]:
IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]]
              closed='both',
              dtype='interval[int64]')

In [324]: s2
Out[324]:
IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]]
              closed='both',
              dtype='interval[int64]')

In [326]: [x not in s1 for x in s2]
Out[326]: [False, False, True, False, False]

Related Problems and Solutions