计数句子中的单词频率
我有两个列 - 一个带有句子,另一个带有单词。
句子 | 单词 |
---|---|
“这样的一天!这是美好的一天”, | “美丽的 |
一天!这是美好的一天!在那里是美好的一天”, | “天” |
,“我对悲伤的天气感到难过”, | “天气” |
“我对悲伤的天气” | “悲伤” |
我想计算“句子”列中“词”列的频率 并实现此输出:
句子 | “ | n |
---|---|---|
“这样的一天!这是美好的一天”, | 美丽” | “ |
一天 | 1 | 如此 |
。 | 美好的一天!这是美好的 天气“ | 1 |
”我对悲伤的天气感到难过” | “悲伤” | 2 |
我尝试了:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
但是它不会停止运行,并且需要很长时间,因此对于我的实际数据集而言,由于它具有50k行,因此不可行。
有人可以帮助实现这一目标吗?
I have two columns - one with sentences and the other with single words.
Sentence | word |
---|---|
"Such a day! It's a beautiful day out there" | "beautiful" |
"Such a day! It's a beautiful day out there" | "day" |
"I am sad by the sad weather" | "weather" |
"I am sad by the sad weather" | "sad" |
I want to count the frequency of the "word" column in the "sentence" column
and achieve this output:
Sentence | word | n |
---|---|---|
"Such a day! It's a beautiful day out there" | "beautiful" | 1 |
"Such a day! It's a beautiful day out there" | "day" | 2 |
"I am sad by the sad weather" | "weather" | 1 |
"I am sad by the sad weather" | "sad" | 2 |
I tried:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
However it does NOT stop running and takes A VERY long time, so is not feasible for my actual dataset as it has 50k rows.
Anyone can help to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用
ZIP
进行操作You can do it with
zip
尝试使用
pandas.apply
:结果:
Try using
pandas.apply
:Result:
您可以使用以下代码计数字符串中的字符串
输出:
字符串一词的计数是:2
You can count string in a string using below code
Output:
The count of the word string is: 2