使用 pandas 库删除停用词

发布于 2025-01-16 03:19:29 字数 76 浏览 2 评论 0原文

从列中删除停用词并将同一列中的剩余单词移动到新列。这只能通过使用 pandas 库来完成。停用词存储在字典中。应对整列的每一行执行此操作。

Remover of stop words from a column and moving the remaining words in that same column to a new column. This should only be done with the use of pandas library. The stop words are stored in a dictionary. This should be carried out for each row through the entire column.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

桜花祭 2025-01-23 03:19:29

我会将停用词列表存储在列表中而不是字典中:

import pandas as pd

#in the following dataframe
df = pd.DataFrame(['some sentance with a few stopwords the', 'another sentance with other stopwords the a or'], columns = ['col1'])


#say you have the following stopwords
stopword_list = ['the', 'a', 'or']

#creating a copy of the initial column, splitting word of each row in a list
df['col2'] = df['col1'].str.split()

#removing stopwords
df['col2'] = df['col2'].apply(lambda x: [i for i in x if i not in stopword_list])

#joining the list, to get a sentance 
df['col2'] = df['col2'].apply(lambda x: ' '.join(x))

I would store the list of stopwords in a list instead of a dict :

import pandas as pd

#in the following dataframe
df = pd.DataFrame(['some sentance with a few stopwords the', 'another sentance with other stopwords the a or'], columns = ['col1'])


#say you have the following stopwords
stopword_list = ['the', 'a', 'or']

#creating a copy of the initial column, splitting word of each row in a list
df['col2'] = df['col1'].str.split()

#removing stopwords
df['col2'] = df['col2'].apply(lambda x: [i for i in x if i not in stopword_list])

#joining the list, to get a sentance 
df['col2'] = df['col2'].apply(lambda x: ' '.join(x))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文