计数句子中的单词频率
我有两个列 - 一个带有句子,另一个带有单词。
句子 | 单词 |
---|---|
“这样的一天!这是美好的一天”, | “美丽的 |
一天!这是美好的一天!在那里是美好的一天”, | “天” |
,“我对悲伤的天气感到难过”, | “天气” |
“我对悲伤的天气” | “悲伤” |
我想计算“句子”列中“词”列的频率 并实现此输出:
句子 | “ | n |
---|---|---|
“这样的一天!这是美好的一天”, | 美丽” | “ |
一天 | 1 | 如此 |
。 | 美好的一天!这是美好的 天气“ | 1 |
”我对悲伤的天气感到难过” | “悲伤” | 2 |
我尝试了:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
但是它不会停止运行,并且需要很长时间,因此对于我的实际数据集而言,由于它具有50k行,因此不可行。
有人可以帮助实现这一目标吗?
I have two columns - one with sentences and the other with single words.
Sentence | word |
---|---|
"Such a day! It's a beautiful day out there" | "beautiful" |
"Such a day! It's a beautiful day out there" | "day" |
"I am sad by the sad weather" | "weather" |
"I am sad by the sad weather" | "sad" |
I want to count the frequency of the "word" column in the "sentence" column
and achieve this output:
Sentence | word | n |
---|---|---|
"Such a day! It's a beautiful day out there" | "beautiful" | 1 |
"Such a day! It's a beautiful day out there" | "day" | 2 |
"I am sad by the sad weather" | "weather" | 1 |
"I am sad by the sad weather" | "sad" | 2 |
I tried:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
However it does NOT stop running and takes A VERY long time, so is not feasible for my actual dataset as it has 50k rows.
Anyone can help to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用
ZIP
进行操作You can do it with
zip
尝试使用
pandas.apply
:结果:
Try using
pandas.apply
:Result:
您可以使用以下代码计数字符串中的字符串
输出:
字符串一词的计数是:2
You can count string in a string using below code
Output:
The count of the word string is: 2