Python – Use Python to filter csv files based on user input

Use Python to filter csv files based on user input… here is a solution to the problem.

Use Python to filter csv files based on user input

I have a large database like this:

id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57

It might be easier to do this with pandas, but I don’t have much experience. I studied modules like arrow and datetime and wanted to filter data based on user input. Using this input, the user returns filtered data. For example:

def get_month('data.csv'):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    date = '1 ' + month + ', 2017'
    with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
         writer = csv.writer(fout, delimiter=' ')
         for row in csv.reader(fin, delimiter=' '):
             if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
                 return writer.writerow(row)

Am I doing it right? I think I may have gone in the wrong direction in the date = '1' + month + ', 2017' section. Is there a way to filter data using only inputs like January?

Solution

For structured data, PANDAS offers an efficient solution:

from datetime import datetime
import pandas as pd

# read data from file
df = pd.read_csv('data.csv')

# this creates a dataframe as below:
#    id           Start Time             End Time
# 0   0  2017-01-01 00:00:21  2017-01-01 00:11:41
# 1   1  2017-01-01 00:00:45  2017-01-01 00:11:46
# 2   2  2017-02-01 00:00:57  2017-02-01 00:22:08
# 3   3  2017-03-01 00:01:10  2017-03-01 00:11:42
# 4   4  2017-01-01 00:01:51  2017-01-01 00:12:57

# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

def get_month(df):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]

get_month(df)

Related Problems and Solutions