Use Python to filter csv files based on user input… here is a solution to the problem.
Use Python to filter csv files based on user input
I have a large database like this:
id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57
It might be easier to do this with pandas, but I don’t have much experience. I studied modules like arrow
and datetime
and wanted to filter data based on user input. Using this input, the user returns filtered data. For example:
def get_month('data.csv'):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
date = '1 ' + month + ', 2017'
with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
writer = csv.writer(fout, delimiter=' ')
for row in csv.reader(fin, delimiter=' '):
if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
return writer.writerow(row)
Am I doing it right? I think I may have gone in the wrong direction in the date = '1' + month + ', 2017'
section. Is there a way to filter data using only inputs like January
?
Solution
For structured data, PANDAS
offers an efficient solution:
from datetime import datetime
import pandas as pd
# read data from file
df = pd.read_csv('data.csv')
# this creates a dataframe as below:
# id Start Time End Time
# 0 0 2017-01-01 00:00:21 2017-01-01 00:11:41
# 1 1 2017-01-01 00:00:45 2017-01-01 00:11:46
# 2 2 2017-02-01 00:00:57 2017-02-01 00:22:08
# 3 3 2017-03-01 00:01:10 2017-03-01 00:11:42
# 4 4 2017-01-01 00:01:51 2017-01-01 00:12:57
# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
def get_month(df):
month = input('\nWhich month? January, February, March, April, May, or June?\n')
return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]
get_month(df)