R：如何使用R从txt文件中获取信息

发布于 2024-11-10 18:53:56 字数 987 浏览 3 评论 0原文

R专家，

我有一个很大的文本文件，它有特定的模式和格式。

我的text.txt包含

x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt

x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 

x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 

x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt

然后，我想要求R找到一个单词列表，在本例中是x1，x2，x3和x4 在这之间，我想为每个人获取一个列表，即“xx”和“yy”之间的列表。

因此，结果将是四个列表

x1 = c("nkkna", "nmm  cataitha")
x2 = c("ngkna")
x3 = c("nkg,kna", "cna")
x4 = c("NA")

但是，我面临两个问题想请求您的帮助。

如何将大文本文件读入R？我从 stackoverflow 了解到命令

x <- read.csv(textConnection"xxx") 可能会有所帮助，但问题是我的文件太大而无法复制和过去，并且该文件应该以 csv 形式读取。有没有更好的方法将我的文本文件作为对象加载到 R 中，然后可以进行搜索和 grep ？

如何编写代码来获取所有这些信息？

我学习了 strsplit 可能会使用，它似乎在 RCurl 报废材料中起作用，它在这里也起作用吗？如果是的话，你介意教我怎么做吗？

太感谢了.....

原文

R experts,

I have a large text file, which has specific pattern and format.

My text.txt contains

x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt

x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 

x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 

x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt

Then, I want to ask R to find a list of words, in this case is x1, x2, x3 and x4
And inbetween, I want to get a list for each of them, that is between "xx" and "yy".

As such, the results will be four lists

x1 = c("nkkna", "nmm  cataitha")
x2 = c("ngkna")
x3 = c("nkg,kna", "cna")
x4 = c("NA")

However, I am facing two problems would like to ask for your help.

how to readin a large text file to R? I learn from stackoverflow that the command

x <- read.csv(textConnection"xxx") may help, but the problem is my file is too large to be copy and past, and the file should be be readin as csv. Are there any much better way to load my text file to R as an object that can be search and grep afterwards?

how to write a code to get all these information?

I learn strsplit maybe used, it seems to work in RCurl scrapped materials, does it work here too? If yes, could you mind to teach me how?

Thank you so much.....

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凡间太子 2024-11-17 18:53:56

要回答您的第一个问题，要读取文本文件，您应该使用函数 scan()。您在 SO 上看到的对 textConnection 的引用纯粹是为了读取粘贴到控制台中的一些示例数据。这就是我接下来要执行的读取数据的操作：

txt <- "
x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt
x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 
x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 
x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt"

dtxt <- textConnection(txt)

然后我以相同的方式使用 scan 来读取 textConnetion 数据。在您自己的代码中，您应该修改以下行，因此 tat dtxt 是您的文件位置。我将其保留为这种格式，以便其他人可以复制我的结果，而无需在自己的文件系统上创建文件：

dat <- scan(dtxt, what="character", sep="\n")

现在您已经读取了数据，这是对 sapply 的（有点复杂）调用code>、strsplit 和 gsub 来操作数据。

sapply(seq_along(dat), 
    function(i)unlist(c(sapply(strsplit(dat[i], "`xx`"), 
              function(x)gsub("^(.*?)`.*", "\\1", x)[-1]))))

结果与您指定的完全一样：

[[1]]
[1] "nkkna"         "nmm  cataitha"

[[2]]
[1] "ngkna"

[[3]]
[1] "nkg,kna" "cna"    

[[4]]
character(0)

To answer your first question, to read a text file you should use the function scan(). The references you see on SO to textConnection are purely to read in some example data that is pasted into the console. This is what I am doing next to read your data:

txt <- "
x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm  cataitha`yy`knkcnaktnhakt
x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt 
x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknk`xx`cna`yy`ktnhakt 
x4  nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm  cataithaknkcnaktnhakt"

dtxt <- textConnection(txt)

Then I use scan in the same way to read the textConnetion data. In your own code, you should modify the following line, so tat dtxt is your file location. I keep it in this format, so that other people can replicate my results without having to create a file on their own file system:

dat <- scan(dtxt, what="character", sep="\n")

Now that you have read the data, it is a (somewhat complicated) call to sapply, strsplit and gsub to manipulate the data.

sapply(seq_along(dat), 
    function(i)unlist(c(sapply(strsplit(dat[i], "`xx`"), 
              function(x)gsub("^(.*?)`.*", "\\1", x)[-1]))))

The results are exactly as you specified:

[[1]]
[1] "nkkna"         "nmm  cataitha"

[[2]]
[1] "ngkna"

[[3]]
[1] "nkg,kna" "cna"    

[[4]]
character(0)

回复收藏 0 原文

~没有更多了~

关于作者

掀纱窥君容

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

R：如何使用R从txt文件中获取信息

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

R：如何使用R从txt文件中获取信息

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。