R-如何计算DF行中的所有单词并将输出添加到新列中?理想情况下,使用整理或平淡的文本
我正在尝试在文本中找到单词的位置,也是同一文本的总词尺寸。
# library(tidyverse)
# library(tidytext)
txt<-tibble(text=c("we're meeting here today to talk about our earnings. we will also discuss global_warming.", "hi all, global_warming and the on-going strike is at the top of our agenda, because unionizing threatens our revenue goals.", "we will discuss global_warming tomorrow, today the focus is our Q3 earnings"))
dict <- tibble(words=c("global_warming"))
x<-txt %>% unnest_tokens(output = "words",
input = "text",
drop = FALSE) %>%
group_by(text) %>%
mutate(word_loc = row_number()) %>%
ungroup() %>%
inner_join(dict)
这给了我以下输出:
# A tibble: 3 x 3
text words word_loc
<chr> <chr> <int>
1 we're meeting here today to talk about our earnings. we will also discuss global_warming. global_warm… 14
2 hi all, global_warming and the on-going strike is at the top of our agenda, because unioni… global_warm… 3
3 we will discuss global_warming tomorrow, today the focus is our Q3 earnings global_warm… 4
如何添加一列,这为每行的总字数计算吗?
I'm trying to find the location of words in a text, and also the total wordcount of the same text.
# library(tidyverse)
# library(tidytext)
txt<-tibble(text=c("we're meeting here today to talk about our earnings. we will also discuss global_warming.", "hi all, global_warming and the on-going strike is at the top of our agenda, because unionizing threatens our revenue goals.", "we will discuss global_warming tomorrow, today the focus is our Q3 earnings"))
dict <- tibble(words=c("global_warming"))
x<-txt %>% unnest_tokens(output = "words",
input = "text",
drop = FALSE) %>%
group_by(text) %>%
mutate(word_loc = row_number()) %>%
ungroup() %>%
inner_join(dict)
This gives me the following output:
# A tibble: 3 x 3
text words word_loc
<chr> <chr> <int>
1 we're meeting here today to talk about our earnings. we will also discuss global_warming. global_warm… 14
2 hi all, global_warming and the on-going strike is at the top of our agenda, because unioni… global_warm… 3
3 we will discuss global_warming tomorrow, today the focus is our Q3 earnings global_warm… 4
How can I add one column, that gives me the total word count for each row?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们可以使用
str_count
获取每个字符串的总单词总数,其中\\ s+
计数非空间字符上的所有序列。或使用base r:
输出
的另一个选项,或者如果要计算收缩,言语等。那么您可以使用
\\ w+
:We can use
str_count
to get the total number of words for each string, where\\S+
counts all sequences on non-space characters.Or another option using base R:
Output
Or if you want to count contractions, words with hypens, etc. then you can use
\\w+
instead: