R:如何使用R从txt文件中获取信息
R专家,
我有一个很大的文本文件,它有特定的模式和格式。
我的text.txt包含
x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm cataitha`yy`knkcnaktnhakt
x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknkcnaktnhakt
x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknk`xx`cna`yy`ktnhakt
x4 nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknkcnaktnhakt
然后,我想要求R找到一个单词列表,在本例中是x1,x2,x3和x4 在这之间,我想为每个人获取一个列表,即“xx”和“yy”之间的列表。
因此,结果将是四个列表
x1 = c("nkkna", "nmm cataitha")
x2 = c("ngkna")
x3 = c("nkg,kna", "cna")
x4 = c("NA")
但是,我面临两个问题想请求您的帮助。
- 如何将大文本文件读入R?我从 stackoverflow 了解到命令
x <- read.csv(textConnection"xxx") 可能会有所帮助,但问题是我的文件太大而无法复制和过去,并且该文件应该以 csv 形式读取。有没有更好的方法将我的文本文件作为对象加载到 R 中,然后可以进行搜索和 grep ?
- 如何编写代码来获取所有这些信息?
我学习了 strsplit 可能会使用,它似乎在 RCurl 报废材料中起作用,它在这里也起作用吗?如果是的话,你介意教我怎么做吗?
太感谢了.....
R experts,
I have a large text file, which has specific pattern and format.
My text.txt contains
x1 `xx`nkkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakd`xx`nmm cataitha`yy`knkcnaktnhakt
x2 `xx`ngkna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknkcnaktnhakt
x3 `xx`nkg,kna`yy`taktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknk`xx`cna`yy`ktnhakt
x4 nkkndataktnaknvcaklrhkahnktn, altlkhakthakdnmm cataithaknkcnaktnhakt
Then, I want to ask R to find a list of words, in this case is x1, x2, x3 and x4
And inbetween, I want to get a list for each of them, that is between "xx" and "yy".
As such, the results will be four lists
x1 = c("nkkna", "nmm cataitha")
x2 = c("ngkna")
x3 = c("nkg,kna", "cna")
x4 = c("NA")
However, I am facing two problems would like to ask for your help.
- how to readin a large text file to R? I learn from stackoverflow that the command
x <- read.csv(textConnection"xxx") may help, but the problem is my file is too large to be copy and past, and the file should be be readin as csv. Are there any much better way to load my text file to R as an object that can be search and grep afterwards?
- how to write a code to get all these information?
I learn strsplit maybe used, it seems to work in RCurl scrapped materials, does it work here too? If yes, could you mind to teach me how?
Thank you so much.....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要回答您的第一个问题,要读取文本文件,您应该使用函数
scan()
。您在 SO 上看到的对textConnection
的引用纯粹是为了读取粘贴到控制台中的一些示例数据。这就是我接下来要执行的读取数据的操作:然后我以相同的方式使用
scan
来读取 textConnetion 数据。在您自己的代码中,您应该修改以下行,因此 tat dtxt 是您的文件位置。我将其保留为这种格式,以便其他人可以复制我的结果,而无需在自己的文件系统上创建文件:现在您已经读取了数据,这是对 sapply 的(有点复杂)调用code>、
strsplit
和gsub
来操作数据。结果与您指定的完全一样:
To answer your first question, to read a text file you should use the function
scan()
. The references you see on SO totextConnection
are purely to read in some example data that is pasted into the console. This is what I am doing next to read your data:Then I use
scan
in the same way to read the textConnetion data. In your own code, you should modify the following line, so tat dtxt is your file location. I keep it in this format, so that other people can replicate my results without having to create a file on their own file system:Now that you have read the data, it is a (somewhat complicated) call to
sapply
,strsplit
andgsub
to manipulate the data.The results are exactly as you specified: