Python – pickle . PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python

pickle . PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python… here is a solution to the problem.

pickle . PicklingError : args[0] from __newobj__ args has the wrong class with hadoop python

I’m trying to remove the stop word via spark, the code is as follows

from nltk.corpus import stopwords
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext('local')
spark = SparkSession(sc)
word_list=["ourselves","out","over", "own", "same" ,"shan't" ,"she", "she'd", "what", "the", "fuck", "is", "this","world","too","who","who's","whom","yours","yourself"," yourselves"]

wordlist=spark.createDataFrame([word_list]).rdd

def stopwords_delete(word_list):
    filtered_words=[]
    print word_list

for word in word_list:
        print word
        if word not in stopwords.words('english'):
            filtered_words.append(word)

filtered_words=wordlist.map(stopwords_delete)
print(filtered_words)

I get an error like this:

pickle. PicklingError: args[0] from newobj args has the wrong class

I don’t know why, who can help me.
Thanks in advance

Solution

Related to uploading stop word modules. As a workaround to import the deactivation thesaurus in the function itself. See similar issues linked below.
I’m having the same issue and this workaround solves the problem.

    def stopwords_delete(word_list):
        from nltk.corpus import stopwords
        filtered_words=[]
        print word_list

Similar Issue

I would recommend from pyspark.ml.feature import StopWordsRemover as a permanent fix.

Related Problems and Solutions