Python – Read the SQL file and use Countvectorizer to get the number of word occurrences

Read the SQL file and use Countvectorizer to get the number of word occurrences… here is a solution to the problem.

Read the SQL file and use Countvectorizer to get the number of word occurrences

I want to read a SQL file and get the number of word occurrences using CountVectorizer.

So far I have the following code:

import re
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

df = pd.read_sql(q, dlconn)
print(df)

count_vect = CountVectorizer()
X_train_counts= count_vect.fit_transform(df)

print(X_train_counts.shape)
print(count_vect.vocabulary_)

This gives the output of ‘cat’: 1, 'dog': 0

It seems to just take the name of the animal and start counting from there.

How do I get it to access the full column and get a graph showing each word in the column and its frequency?

Solution

according to the CountVectorizer docs, method fit_transform() requires an iterable string.
It cannot handle DataFrames directly.

But iterating through the data frame returns the labels of the columns, not the values. I suggest you try

Related Problems and Solutions