Python compares a list of dates with the start and end date columns in a data frame … here is a solution to the problem.
Python compares a list of dates with the start and end date columns in a data frame
Question: I have a data frame with two columns: start date and end date. I also have a list of dates. So suppose the data looks like this:
data = [[1/1/2018,3/1/2018],[2/1/2018,3/1/2018],[4/1/2018,6/1/2018]]
df = pd. DataFrame(data,columns=['startdate','enddate'])
dates=[1/1/2018,2/1/2018]
What I need to do is:
1) Create a new column for each date in the date list
2) For each row in df, assign 1 if the date of the new column is between the start date and end date, and assign a 0 if not.
I
tried using zip but then I realized that the df line will have thousands of lines and the date list will contain about 24 items (spanning 2 years), so it stops when the date list runs out, i.e., 24 years old.
Here’s what the original df looked like and what it looked like after that:
Before:
startdate enddate
0 2018-01-01 2018-03-01
1 2018-02-01 2018-03-01
2 2018-04-01 2018-06-01
After :
startdate enddate 1/1/2018 2/1/2018
0 1/1/2018 3/1/2018 1 1
1 2/1/2018 3/1/2018 0 1
2 4/1/2018 6/1/2018 0 0
Any help would be appreciated, thank you!
Solution
Use numpy
broadcasting
s1=df.startdate.values
s2=df.enddate.values
v=pd.to_datetime(pd. Series(dates)).values[:,None]
newdf=pd. DataFrame(((s1<=v)&(s2>=v)). T.astype(int),columns=dates,index=df.index)
pd.concat([df,newdf],axis=1)
startdate enddate 1/1/2018 2/1/2018
0 2018-01-01 2018-03-01 1 1
1 2018-02-01 2018-03-01 0 1
2 2018-04-01 2018-06-01 0 0