split 函数不返回任何具有大数据集的观察结果
我有一个像这样的数据框:
seqnames pos strand nucleotide count
id1 12 + A 13
id1 13 + C 25
id2 24 + G 10
id2 25 + T 25
id2 26 + A 10
id3 10 + C 5
但它总共有超过 100,000 行,seqnames
有 3138 个级别。我想根据 seqnames 将其拆分为数据帧列表,因此我使用了 split 函数:
data_list <- split(data,data$seqnames)
但它只返回类似这样的内容:
List of 3138
$ id1:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels "+","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
$ id2:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels "+","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
我无法弄清楚为什么它是这样的,因为我已经在一个组成的数据帧上使用了它所有数字(当然,没有这个行那么多)并且它可以工作。 我该如何解决这个问题?
I have a dataframe like this:
seqnames pos strand nucleotide count
id1 12 + A 13
id1 13 + C 25
id2 24 + G 10
id2 25 + T 25
id2 26 + A 10
id3 10 + C 5
But it has more than 100,000 rows in total, seqnames
has 3138 levels. I would like to split it into lists of dataframes according to seqnames, so I used split function:
data_list <- split(data,data$seqnames)
But it only returns something like this:
List of 3138
$ id1:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels "+","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
$ id2:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels "+","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
I can't figure out why it is like this because I have used it on a made up dataframe with all numbers (of course, not as many rows as this one) and it works.
How can I solve this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是有许多未使用的级别,因为“seqnames”列是一个
因素
。使用split
,可以选择drop
(drop = TRUE
- 默认情况下为FALSE
)来删除那些列表元素。否则,它们将返回为包含 0 行的data.frame
。如果我们希望这些元素被NULL
替换,那么找到那些行数 (nrow
) 为 0 的元素并将其赋值给NULL
>对
NULL
进行赋值- 再次检查
数据
It is just that there are many unused levels as the column 'seqnames' is a
factor
. Withsplit
, there is an option todrop
(drop = TRUE
- by default it isFALSE
) to remove those list elements. Otherwise, they will return asdata.frame
with 0 rows. If we want those elements to be replaced byNULL
, then find those elements where the number of rows (nrow
) are 0 and assign it toNULL
Doing the assignment to
NULL
-check again
data