Performs calculation/arithmetic operations on two fields in the Pandas Dataframe constructor… here is a solution to the problem.
Performs calculation/arithmetic operations on two fields in the Pandas Dataframe constructor
I need to simulate some trading data using numpy and pandas, similar to the code below:
import random
import numpy as np
import pandas as pd
n=1000
sample_df = pd. DataFrame({
'arrival_date':np.random.choice( pd.date_range('1/1/2015', periods=n,
freq='D'), n),
'days_stay': [random.randint(1,14) for x in range(n)]
})
The dataframe needs to have 3 fields, two of which are calculated similar to the above, plus another date field that adds the values of the two fields:
'departure_date': 'arrival_date' + 'days_stay'
Note that I prefer to define all three fields in the pandas data frame constructor instead of having to define a function for the last field and then reference it in the second data frame step to get the data.
sample_df = pd. DataFrame({
'arrival_date':np.random.choice( pd.date_range('1/1/2015', periods=n,
freq='D'), n),
'days_stay': [random.randint(1,14) for x in range(n)],
'departure_date': 'arrival_date' + 'days_stay'
})
Is this possible?
Thanks in advance.
Solution
Try the following. In PD. Using assign on Dataframe() we can use the created df and its data and assign a new column.
sample_df = pd. DataFrame({
'arrival_date':np.random.choice( pd.date_range('1/1/2015', periods=n,
freq='D'), n),
'days_stay': [random.randint(1,14) for x in range(n)],
}).assign(departure_date = lambda x: x.arrival_date + x.days_stay.apply(lambda x: pd. Timedelta(str(x)+'D')))
Sample output:
arrival_date days_stay departure_date
0 2015-02-17 3 2015-02-20
1 2015-01-18 13 2015-01-31
2 2015-02-12 6 2015-02-18
3 2015-01-15 14 2015-01-29
4 2015-03-11 5 2015-03-16