在数据框列表中应用错误的维数..?
我有一个包含数百个数据帧的大型列表,并尝试从 Z 列中包含模式 VALUE1 和 VALUE2 的两个值之间过滤行。像这样:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
57 84 NA
68 99 NA
59 82 NA
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
过滤后我会得到:
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
我正在使用的代码是:
lapply(df, function(x){
start <- which(grepl("VALUE1", x$Z))
end <- which(grepl("VALUE2", x$Z))
rows <- unlist(lapply(seq_along(start), function(y){start[y]:end[y]}))
return(df[rows,])})
但是每当我尝试运行脚本时,都会收到一条错误消息:
df[rows, ] 中的错误:维度数不正确
为什么会发生这种情况以及如何解决它......?
编辑:添加了实际数据表的最小样本数据(列表的第一个数据框和第一个元素,VALUE2 将在某个时刻始终跟随 VALUE 1)
> head(tbl[[1]])
# A tibble: 6 × 4
t speed off Z
<dbl> <dbl> <dbl> <chr>
1 27.3 27.8 0.485 "{\"type\":\"M\",\"msg\":\"VALUE1\",\"time\":27.2498,\"dist\":0.410454}"
2 27.4 27.8 0.457 NA
3 27.5 27.8 0.430 NA
4 27.6 27.8 0.402 NA
5 27.7 27.8 0.374 NA
6 27.8 27.8 0.347 NA
I have a large list of several hundreds of data frames and trying to filter rows from between two values containing a pattern VALUE1 and VALUE2 in the column Z. Like this:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
57 84 NA
68 99 NA
59 82 NA
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
And after filtering I would get:
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
The code I'm using is:
lapply(df, function(x){
start <- which(grepl("VALUE1", x$Z))
end <- which(grepl("VALUE2", x$Z))
rows <- unlist(lapply(seq_along(start), function(y){start[y]:end[y]}))
return(df[rows,])})
But whenever I try to run the script, I get an error message saying:
Error in df[rows, ] : incorrect number of dimensions
Why does this happen and how can I get around it..?
EDIT: Added a minimal sample data of the actual datasheet (the first data frame and first element of the list, VALUE2 will follow VALUE 1 always at some point)
> head(tbl[[1]])
# A tibble: 6 × 4
t speed off Z
<dbl> <dbl> <dbl> <chr>
1 27.3 27.8 0.485 "{\"type\":\"M\",\"msg\":\"VALUE1\",\"time\":27.2498,\"dist\":0.410454}"
2 27.4 27.8 0.457 NA
3 27.5 27.8 0.430 NA
4 27.6 27.8 0.402 NA
5 27.7 27.8 0.374 NA
6 27.8 27.8 0.347 NA
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设'VALUE1','VALUE2'数量相等,获取'VALUE1','VALUE2'的位置索引,分别用
grep
,创建一个序列(:
)通过循环Map
中的相应位置,unlist
并使用序列对数据进行子集化-
如果
df
是单个数据。框架,当我们使用lapply
循环遍历data.frame
,它将循环遍历列,因此每个list
元素都是一个向量。因此,不存在x$Z
。每个x
将是相应的列如果它是一个
列表
,那么当存在没有'VALUE1'或'VALUE2'的情况或者如果数量“VALUE1”匹配不等于“VALUE2”。 之前检查这些元素可能会更好在执行
:
数据Assuming there are equal number of 'VALUE1', 'VALUE2', get the position index of 'VALUE1', 'VALUE2', separately with
grep
, create a sequence (:
) by looping over the corresponding positions inMap
,unlist
and use the sequence to subset the data-output
If the
df
is a single data.frame, when we loop over thedata.frame
withlapply
, it will be looping over the columns and thus eachlist
element is a vector. Therefore, there is nox$Z
. Eachx
will be the corresponding columnIf it is a
list
, then the error can occur when there are cases with no 'VALUE1' or 'VALUE2' or if the number of 'VALUE1' matches are not equal to 'VALUE2'. It may be better to check those elements before doing the:
data