Read the SQL file and use Countvectorizer to get the number of word occurrences… here is a solution to the problem.
Read the SQL file and use Countvectorizer to get the number of word occurrences
I want to read a SQL
file and get the number of word occurrences using CountVectorizer
.
So far I have the following code:
import re
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_sql(q, dlconn)
print(df)
count_vect = CountVectorizer()
X_train_counts= count_vect.fit_transform(df)
print(X_train_counts.shape)
print(count_vect.vocabulary_)
This gives the output of ‘cat’: 1, 'dog': 0
It seems to just take the name of the animal and start counting from there.
How do I get it to access the full column and get a graph showing each word in the column and its frequency?
Solution
according to the CountVectorizer
docs, method fit_transform()
requires an iterable string.
It cannot handle DataFrames
directly.
But iterating through the data frame returns the labels of the columns, not the values. I suggest you try Related Problems and Solutions