Python – Based on the sum of date ranges in two separate columns

Based on the sum of date ranges in two separate columns… here is a solution to the problem.

Based on the sum of date ranges in two separate columns

I want to sum all values in one column based on a date range in two columns:

Start_Date  Value_to_sum  End_date
2017-12-13    2          2017-12-13
2017-12-13    3          2017-12-16 
2017-12-14    4          2017-12-15
2017-12-15    2          2017-12-15

Simple groupby doesn’t do this because it only adds values for a specific date.

We could do an embedded for loop, but it will run forever:

unique_date = carry. Start_Date.unique()
carry = pd. DataFrame({'Date':unique_date})
carry['total'] = 0
for n in tqdm(range(len(carry))):
    tr = data.loc[data['Start_Date'] >= carry['Date'][n]]
    for i in tr.index:
        if carry['Date'][n] <= tr['End_date'][i]:
                carry['total'][n] += tr['Value_to_sum'][i]

Something like that will work, but like I said it will take a long time.

The expected output is a unique date and total number per day.

It should be

2017-12-13 = 5, 2017-12-14 = 7, 2017-12-15 = 9.

How do I calculate sum based on date range?

Solution

First, group by [“Start_Date”, “End_date”] to save some operations.

from collections import Counter
c = Counter()
df_g = df.groupby(["Start_Date", "End_date"]).sum().reset_index()

def my_counter(row):
    s, v, e = row. Start_Date, row. Value_to_sum, row. End_date
    if s == e:
        c[pd. Timestamp(s, freq="D")] += row. Value_to_sum
    else:
         c.update({date: v for date in pd.date_range(s, e)})

df_g.apply(my_counter, axis=1) 
print(c)
"""
Counter({Timestamp('2017-12-15 00:00:00', freq='D'): 9,
     Timestamp('2017-12-14 00:00:00', freq='D'): 7,
     Timestamp('2017-12-13 00:00:00', freq='D'): 5,
     Timestamp('2017-12-16 00:00:00', freq='D'): 3})
"""

Tools used:

Counter.update([iterable-or-mapping]):
Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs. — Cited from Python 3 Documentation

pandas.date_range

Related Problems and Solutions