从 R 中的 data.frame 中删除整列
有谁知道如何从 R 中的 data.frame 中删除整个列?例如,如果给我这个 data.frame:
> head(data)
chr genome region
1 chr1 hg19_refGene CDS
2 chr1 hg19_refGene exon
3 chr1 hg19_refGene CDS
4 chr1 hg19_refGene exon
5 chr1 hg19_refGene CDS
6 chr1 hg19_refGene exon
并且我想删除第二列。
Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:
> head(data)
chr genome region
1 chr1 hg19_refGene CDS
2 chr1 hg19_refGene exon
3 chr1 hg19_refGene CDS
4 chr1 hg19_refGene exon
5 chr1 hg19_refGene CDS
6 chr1 hg19_refGene exon
and I want to remove the 2nd column.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
您可以将其设置为
NULL
。正如评论中指出的,这里有一些其他可能性:
您可以通过以下方式删除多列:
不过要小心矩阵子集,因为您最终可能会得到一个向量:
You can set it to
NULL
.As pointed out in the comments, here are some other possibilities:
You can remove multiple columns via:
Be careful with matrix-subsetting though, as you can end up with a vector:
要按名称删除一个或多个列,当列名称已知(而不是在运行时确定)时,我喜欢
subset()
语法。例如,对于数据框,您可以仅删除
a
列,并删除您可以执行的
b
和d
列您可以删除
d
和b
之间的所有列:正如我上面所说,此语法仅在列名已知时才有效。当说以编程方式确定列名称(即分配给变量)时,它将不起作用。我将从
?subset
文档中重现此警告:To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the
subset()
syntax. E.g. for the data-frameto remove just the
a
column you could doand to remove the
b
andd
columns you could doYou can remove all columns between
d
andb
with:As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (i.e. assigned to a variable). I'll reproduce this Warning from the
?subset
documentation:(为了完整性)如果您想按名称删除列,您可以这样做:
包括
drop = F
确保结果仍然是data.frame
即使仅剩下一列。(For completeness) If you want to remove columns by name, you can do this:
Including
drop = F
ensures that the result will still be adata.frame
even if only one column remains.使用
data.frame
时发布的答案非常好。然而,从内存的角度来看,这些任务的效率可能相当低。对于大数据,删除列可能会花费异常长的时间和/或由于内存不足
错误而失败。包data.table
有助于使用:=
运算符解决此问题:我应该组合一个更大的示例来显示差异。我会在某个时候更新这个答案。
The posted answers are very good when working with
data.frame
s. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due toout of memory
errors. Packagedata.table
helps address this problem with the:=
operator:I should put together a bigger example to show the differences. I'll update this answer at some point with that.
这样,您可以删除
列
并将变量
存储到另一个变量
中。With this you can remove the
column
and storevariable
into anothervariable
.有多种选项可用于使用 dplyr::select() 和一些辅助函数删除一列或多列。辅助函数可能很有用,因为有些函数不需要命名所有要删除的特定列。请注意,要使用
select()
删除列,您需要使用前导-
来否定列名称。使用
dplyr::starwars
示例数据来表示某些列名称:您还可以按列号删除:
There are several options for removing one or more columns with
dplyr::select()
and some helper functions. The helper functions can be useful because some do not require naming all the specific columns to be dropped. Note that to drop columns usingselect()
you need to use a leading-
to negate the column names.Using the
dplyr::starwars
sample data for some variety in column names:You can also drop by column number:
使用 dplyR,可以执行以下操作:
data <- select(data, -genome)
根据此处找到的文档 https://www.marsja.se/how-to-remove-a-column-in-r-using-dplyr-by-name-and-index/#:~:text=select(starwars%2C %20%2D高度)
Using dplyR, the following works:
data <- select(data, -genome)
as per documentation found here https://www.marsja.se/how-to-remove-a-column-in-r-using-dplyr-by-name-and-index/#:~:text=select(starwars%2C%20%2Dheight)
我只是想添加一个尚未提及的内容。它很简单,但也很有趣,因为在我浏览互联网的过程中,我没有看到它,尽管高度相关的 %in% 出现在很多地方。
另外,我没有看到有人发布 grep 替代方案。这些对于删除与模式匹配的多个列非常方便。
I just thought I'd add one in that wasn't mentioned yet. It's simple but also interesting because in all my perusing of the internet I did not see it, even though the highly related %in% appears in many places.
Also, I didn't see anyone post grep alternatives. These can be very handy for removing multiple columns that match a pattern.
chr = chr[,-2]
如果这样做会更容易,只需从 df 中删除第二列并将其再次存储在 df 中即可。
chr = chr[,-2]
It's easier if you do this way, just remove the second column from the df and store it in the df again.