Python - Pandas Series.apply cannot consist of strings

Pandas Series.apply cannot consist of strings… here is a solution to the problem.

Pandas Series.apply cannot consist of strings

This seems to be related to the Japanese language problem,
So I asked < a href=" https://ja.stackoverflow.com/questions/41031/pandas%E3%81%AEapply%E3%83%A1%E3%82%BD%E3%83%83%E3%83%89%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E5%88%97%E3%81%AE%E6%96%87%E5%AD%97%E5%88%97%E3%81%AB%E5%AF%BE%E3%81%97%E3%81%A6mecab%E3%81%A7%E5%BD%A2%E6%85%8B%E7%B4%A0%E8%A7%A3%E6%9E%90%E3%82%92%E3%81%97%E3%81%9F%E3%81%84" rel="noreferrer noopener nofollow"> Japanese StackOverflow also.

It works fine when I only use the string object.

tried coding but I can’t find the cause of this error.
Can you give me advice?

MeCab is an open-source text segmentation library for working with text written in Japanese, originally developed by the Nara Institute of Science and Technology and currently maintained by Taku Kudou as part of his work on Google Japanese Input project.
https://en.wikipedia.org/wiki/MeCab

Sample .csv

0,今天も夜まで働きました。
1,オフィスには誰もいませんが、エラーと格闘中
2,デバッグばかりしていますが、どうにもなりません。

This is Pandas Python3 code

import pandas as pd
import MeCab  
# https://en.wikipedia.org/wiki/MeCab
from tqdm import tqdm_notebook as tqdm
# This is working...
df = pd.read_csv('sample.csv', encoding='utf-8')

m = MeCab.Tagger ("-Ochasen")

text = "りんごを食べました、そして、みかんも食べました"
a = m.parse(text)

print(a)# working! 

# But I want to use Pandas's Series

def extractKeyword(text):
    """Morphological analysis of text and returning a list of only nouns"""
    tagger = MeCab.Tagger('-Ochasen')
    node = tagger.parseToNode(text)
    keywords = []
    while node:
        if node.feature.split(",")[0] == u"noun": # this means noun
            keywords.append(node.surface)
        node = node.next
    return keywords

aa = extractKeyword(text) #working!!

me = df.apply(lambda x: extractKeyword(x))

#TypeError: ("in method 'Tagger_parseToNode', argument 2 of type 'char const *'", 'occurred at index 0')

This is a tracking error

りんご

リンゴ りんご 名詞-一般       
を ヲ を 助詞-格助詞-一般       
食べ タベ 食べる 動詞-自立 一段 連用形
まし マシ ます 助動詞 特殊・マス 連用形
た タ た 助動詞 特殊・タ 基本形
, , , , mark-dot       
そして ソシテ そして 接続詞     
, , , , mark-dot       
みかん ミカン みかん 名詞-一般       
も モ も 助詞 - 係助詞      
食べ タベ 食べる 動詞-自立 一段 連用形
まし マシ ます 助動詞 特殊・マス 連用形
た タ た 助動詞 特殊・タ 基本形
EOS

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-174-81a0d5d62dc4> in <module>()
    32 aa = extractKeyword(text) #working!!
    33 
---> 34 me = df.apply(lambda x: extractKeyword(x))

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4260                         f, axis,
4261                         reduce=reduce,
-> 4262                         ignore_failures=ignore_failures)
4263             else:
4264                 return self._apply_broadcast(f, axis)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4356             try:
4357                 for i, v in enumerate(series_gen):
-> 4358                     results[i] = func(v)
4359                     keys.append(v.name)
4360             except Exception as e:

<ipython-input-174-81a0d5d62dc4> in <lambda>(x)
    32 aa = extractKeyword(text) #working!!
    33 
---> 34 me = df.apply(lambda x: extractKeyword(x))

<ipython-input-174-81a0d5d62dc4> in extractKeyword(text)
    20     """Morphological analysis of text and returning a list of only nouns"""
    21     tagger = MeCab.Tagger('-Ochasen')
---> 22     node = tagger.parseToNode(text)
    23     keywords = []
    24     while node:

~/anaconda3/lib/python3.6/site-packages/MeCab.py in parseToNode(self, *args)
    280     __repr__ = _swig_repr
    281     def parse(self, *args): return _MeCab.Tagger_parse(self, *args)
--> 282     def parseToNode(self, *args): return _MeCab.Tagger_parseToNode(self, *args)
    283     def parseNBest(self, *args): return _MeCab.Tagger_parseNBest(self, *args)
    284     def parseNBestInit(self, *args): return _MeCab.Tagger_parseNBestInit(self, *args)

TypeError: ("in method 'Tagger_parseToNode', argument 2 of type 'char const *'", 'occurred at index 0')w

Solution

I see you get some help with the Japanese StackOverflow, but here’s an answer in English:

The first thing to fix is that read_csv treats the first line of example.csv as a header. To resolve this issue, use the names parameter in read_csv.

Next, df.apply will apply the function on the columns of the DataFrame by default. You need to do something like df.apply(lambda x: extractKeyword(x['String']), axis=1), but this won’t work because each sentence has a different number of nouns and Pandas will prompt it that it can’t stack a 1×2 array on top of a 1×5 array. The easiest way is to apply it on the String family.

The last problem is that there is an error in the MeCab Python3 binding (bind): see https://github.com/SamuraiT/mecab-python3/issues/3 you found a workaround by running parseToNode twice, you can also call parse before parseToNode.

Put all these three things together:

import pandas as pd
import MeCab  
df = pd.read_csv('sample.csv', encoding='utf-8', names=['Number', 'String'])

def extractKeyword(text):
    """Morphological analysis of text and returning a list of only nouns"""
    tagger = MeCab.Tagger('-Ochasen')
    tagger.parse(text)
    node = tagger.parseToNode(text)
    keywords = []
    while node:
        if node.feature.split(",")[0] == u"noun": # this means noun
            keywords.append(node.surface)
        node = node.next
    return keywords

me = df['String'].apply(extractKeyword)
print(me)

When you run this script, use the example.csv:

➜  python3 demo.py
0 [Today, Night]
1 [オフィス, 誰, エラー, 格闘, 中]
2                   [デバッグ]
Name: String, dtype: object

Python – Pandas Series.apply cannot consist of strings

Pandas Series.apply cannot consist of strings

Solution

Related Problems and Solutions