可以从r中 *.txt文件中生成所有单词的单词计数

发布于 2025-01-28 06:45:15 字数 932 浏览 2 评论 0原文

我在 lorem ipsum 上生成了10000个随机单词，并保存为TXT文件。然后编写以下代码：

r代码：

art <- read.delim(file.choose())   # selecting the txt file from local machine

art_u <- unlist(art)   # unlisting the words from a single string

art_split <- strsplit(art_u, split = " ", fixed = T)   # spliting the words

art_sep <- c()   # creating an empty vector to store splitted words
for (i in art_split){art_sep=c(art_sep, i)}   # storing the words into the vector

art_fac <- factor(art_sep)   # factorizing the words from the vector
art_sum <- summary(art_fac)   # getting result with counts
art_wc_df <- as.data.frame(art_sum)   # turning the result into a dataframe

在创建的数据帧中，在99个观测/行之后，第100个观察顿/行作为其他以大量计数。它是在Rstudio和RGUI中尝试的，但给出了相同的结果。无法弄清楚怎么了。有什么方法可以修复它，还是编码错误？

NB：尝试 rstudio 2021.09.1构建372， rgui x64 4.1.2

原文

I generated 10000 random words at Lorem Ipsum and saved as txt file. Then wrote following code:

R Code:

art <- read.delim(file.choose())   # selecting the txt file from local machine

art_u <- unlist(art)   # unlisting the words from a single string

art_split <- strsplit(art_u, split = " ", fixed = T)   # spliting the words

art_sep <- c()   # creating an empty vector to store splitted words
for (i in art_split){art_sep=c(art_sep, i)}   # storing the words into the vector

art_fac <- factor(art_sep)   # factorizing the words from the vector
art_sum <- summary(art_fac)   # getting result with counts
art_wc_df <- as.data.frame(art_sum)   # turning the result into a dataframe

In the created dataframe, after 99 observations/rows, the 100th observaton/row comes as others with a large count. It was tried both in RStudio and RGui, but gives the same result. Can't figure out what went wrong. Is there any way to fix it, or the coding went wrong?

NB: Tried on RStudio 2021.09.1 Build 372, RGui x64 4.1.2

分享到QQ

分享到微博