Python - Read the SQL file and use Countvectorizer to get the number of word occurrences

Read the SQL file and use Countvectorizer to get the number of word occurrences… here is a solution to the problem.

Read the SQL file and use Countvectorizer to get the number of word occurrences

I want to read a SQL file and get the number of word occurrences using CountVectorizer.

So far I have the following code:

import re
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

df = pd.read_sql(q, dlconn)
print(df)

count_vect = CountVectorizer()
X_train_counts= count_vect.fit_transform(df)

print(X_train_counts.shape)
print(count_vect.vocabulary_)

This gives the output of ‘cat’: 1, 'dog': 0

It seems to just take the name of the animal and start counting from there.

How do I get it to access the full column and get a graph showing each word in the column and its frequency?

Solution

according to the CountVectorizer docs, method fit_transform() requires an iterable string.
It cannot handle DataFrames directly.

But iterating through the data frame returns the labels of the columns, not the values. I suggest you try

Python – Read the SQL file and use Countvectorizer to get the number of word occurrences

Read the SQL file and use Countvectorizer to get the number of word occurrences

Solution

Related Problems and Solutions