pickle . PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python… here is a solution to the problem.
pickle . PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python
I’m trying to remove the stop word via spark, the code is as follows
from nltk.corpus import stopwords
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
word_list=["ourselves","out","over", "own", "same" ,"shan't" ,"she", "she'd", "what", "the", "fuck", "is", "this","world","too","who","who's","whom","yours","yourself"," yourselves"]
wordlist=spark.createDataFrame([word_list]).rdd
def stopwords_delete(word_list):
filtered_words=[]
print word_list
for word in word_list:
print word
if word not in stopwords.words('english'):
filtered_words.append(word)
filtered_words=wordlist.map(stopwords_delete)
print(filtered_words)
I get an error like this:
pickle. PicklingError: args[0] from newobj args has the wrong class
I don’t know why, who can help me.
Thanks in advance
Solution
Related to uploading stop word modules. As a workaround to import the deactivation thesaurus in the function itself. See similar issues linked below.
I’m having the same issue and this workaround solves the problem.
def stopwords_delete(word_list):
from nltk.corpus import stopwords
filtered_words=[]
print word_list
I would recommend from pyspark.ml.feature import StopWordsRemover
as a permanent fix.