如何在熊猫中正确地将柱子列为列?
我正在尝试通过社交媒体的评论来解决数据集中的令牌化问题。我想从熊猫列中象征性,诱饵,删除标点和停车词。我正在为每个评论做如何做。试图获取令牌时,我会收到以下错误:
import pandas as pd
import nltk
...
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message']), axis=1)
TypeError: expected string or bytes-like object
当我试图告诉Pandas我将其传递给字符串对象时,它会给我以下错误消息:
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message'].str), axis=1)
AttributeError: 'str' object has no attribute 'str'
我在做什么错?
I am trying to solve tokenization problem in my dataset with comments from social media. I want to tokenize, lemmatize, remove punctuations and stop-words from the pandas column. I am struggling how to do it for each of the comment. I receive the following error when trying to get tokens:
import pandas as pd
import nltk
...
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message']), axis=1)
TypeError: expected string or bytes-like object
When I am trying to tell pandas that I am passing it a string object, it gives me the following error message:
merged['message_tokens'] = merged.apply(lambda x: nltk.tokenize.word_tokenize(x['Clean_message'].str), axis=1)
AttributeError: 'str' object has no attribute 'str'
What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
astype
将列类型迫使字符串,如果您想查看原始列中的错误,则可以使用
out
dataFrame包含<< dataframe dataframe。代码> clean_message 列不是字符串。You can use
astype
to force the column type to stringIf you want to look at what's wrong in original column, you can use
out
dataframe contains the rows where the type ofClean_message
column is not string.