grep 还是 pmatch?
我正在尝试从目录导入一系列文件并将它们每个转换为数据帧。我还想使用文件标题创建两个具有标题相关值的新列。输入文件的格式为:xx_yy.out
其中 XX 目前可以是三个值之一。 YY 目前有两个可能的值。未来这些数字将会上升。
根据评论编辑解决方案(参见下面的原始问题)
再次编辑以反映@JoshO'Brien的建议
filelist <- as.list(dir(pattern = ".*.out"))
for(i in filelist) {
tempdata <- read.table(i) #read the table
filelistshort <- gsub(".out$", "", i) #remove the end of the file
tempsplit <- strsplit(filelistshort, "_") #remove the underscore
xx <- sapply(tempsplit, "[", 1) #get xx
yy <- sapply(tempsplit, "[", 2) #get yy
tempdata$XX <- xx #add XX column
tempdata$YY <- yy #add YY column
assign(gsub(".out","",i), tempdata) # give the dataframe a shortened name
}
下面是原始代码,显示我想使用某种方法来获取 XX 和 YY 值,但不确定最好的方法:
我的大纲(在 @romanlustrik post 之后)是如下所示:
filelist <- as.list(dir(pattern = ".*.out"))
lapply(filelist, FUN = function(x) {
xx <- grep() or pmatch()
yy <- grep() or pmatch()
x <- data.frame(read.table(x))
x$colx <- xx
x$coly <- yy
return(x)
})
其中 xx <-
和 yy <-
行将是基于 pmatch 或 grep 的查找。我正在尝试让其中任何一个发挥作用,但欢迎任何建议。
I am trying to import a series of files from directory and convert each of them into a dataframe. I would also like to use the file title to create two new columns with title-dependent values. Input files have the format: xx_yy.out
Where XX can currently be one of three values. YY currently has two possible values. In the future these numbers will go up.
Edit of Solution based on the comments (see below for the original question)
edited again to reflect suggestions of @JoshO'Brien
filelist <- as.list(dir(pattern = ".*.out"))
for(i in filelist) {
tempdata <- read.table(i) #read the table
filelistshort <- gsub(".out$", "", i) #remove the end of the file
tempsplit <- strsplit(filelistshort, "_") #remove the underscore
xx <- sapply(tempsplit, "[", 1) #get xx
yy <- sapply(tempsplit, "[", 2) #get yy
tempdata$XX <- xx #add XX column
tempdata$YY <- yy #add YY column
assign(gsub(".out","",i), tempdata) # give the dataframe a shortened name
}
Below is the original code showing that I wanted to use some means to ge teh XX and YY values but wasn't sure of the best way:
My outline (after @romanlustrik post ) is as follows:
filelist <- as.list(dir(pattern = ".*.out"))
lapply(filelist, FUN = function(x) {
xx <- grep() or pmatch()
yy <- grep() or pmatch()
x <- data.frame(read.table(x))
x$colx <- xx
x$coly <- yy
return(x)
})
where the xx <-
and yy <-
lines would be a lookup based on either pmatch or grep. I am playing around to make either one work but would welcome any suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我们可以假设您的文件名仅包含一个
"_"
,我就不会使用grep()
或pmatch()
根本不。strsplit()
似乎提供了一个更干净、更简单的解决方案:If we can assume that your file names will contain only a single
"_"
, I wouldn't usegrep()
orpmatch()
at all.strsplit()
seems to provide a cleaner and simpler solution:这是一个丑陋的黑客,但完成了工作。
This is an ugly hack, but gets the job done.