Crossing2 Pandas data frame… here is a solution to the problem.
Crossing2 Pandas data frame
In my question, I have 2 dataframes mydataframe1 and mydataframe2
as shown below.
mydataframe1
Out[13]:
Start End Remove
50 60 1
61 105 0
106 150 1
151 160 0
161 180 1
181 200 0
201 400 1
mydataframe2
Out[14]:
Start End
55 100
105 140
151 154
155 185
220 240
From mydataframe2
I want to remove the row that contains (and partially contains) the interval Start-End in any interval in "Remove"
=1 of my data frame 1
. In other words, there should be no intersection between the interval of mydataframe2 and the interval of mydataframe1
In this scenario, mydataframe2 becomes
mydataframe2
Out[15]:
Start End
151 154
Solution
You can use pd. IntervalIndex
crosses
Gets the row to delete
In [313]: dfr = df1.query('Remove == 1')
Construct an IntervalIndex from the scope to be deleted
In [314]: s1 = pd. IntervalIndex.from_arrays(dfr. Start, dfr. End, 'both')
Construct the IntervalIndex from the test
under test
In [315]: s2 = pd. IntervalIndex.from_arrays(df2. Start, df2. End, 'both')
Select the S2 row that is not in the scope of S1
In [316]: df2.loc[[x not in s1 for x in s2]]
Out[316]:
Start End
2 151 154
Details
In [320]: df1
Out[320]:
Start End Remove
0 50 60 1
1 61 105 0
2 106 150 1
3 151 160 0
4 161 180 1
5 181 200 0
6 201 400 1
In [321]: df2
Out[321]:
Start End
0 55 100
1 105 140
2 151 154
3 155 185
4 220 240
In [322]: dfr
Out[322]:
Start End Remove
0 50 60 1
2 106 150 1
4 161 180 1
6 201 400 1
IntervalIndex details
In [323]: s1
Out[323]:
IntervalIndex([[50, 60], [106, 150], [161, 180], [201, 400]]
closed='both',
dtype='interval[int64]')
In [324]: s2
Out[324]:
IntervalIndex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]]
closed='both',
dtype='interval[int64]')
In [326]: [x not in s1 for x in s2]
Out[326]: [False, False, True, False, False]