从 R 中的 data.frame 中删除整列

发布于 2024-11-14 17:16:37 字数 309 浏览 4 评论 0原文

有谁知道如何从 R 中的 data.frame 中删除整个列?例如,如果给我这个 data.frame:

> head(data)
   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

并且我想删除第二列。

Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:

> head(data)
   chr       genome region
1 chr1 hg19_refGene    CDS
2 chr1 hg19_refGene   exon
3 chr1 hg19_refGene    CDS
4 chr1 hg19_refGene   exon
5 chr1 hg19_refGene    CDS
6 chr1 hg19_refGene   exon

and I want to remove the 2nd column.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

绝影如岚 2024-11-21 17:16:37

您可以将其设置为NULL

> Data$genome <- NULL
> head(Data)
   chr region
1 chr1    CDS
2 chr1   exon
3 chr1    CDS
4 chr1   exon
5 chr1    CDS
6 chr1   exon

正如评论中指出的,这里有一些其他可能性:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL  # same as above
Data <- Data[,-2]  # Ian Fellows
Data <- Data[-2]   # same as above

您可以通过以下方式删除多列:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL        # does not work!

不过要小心矩阵子集,因为您最终可能会得到一个向量:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE]  # still a data.frame

You can set it to NULL.

> Data$genome <- NULL
> head(Data)
   chr region
1 chr1    CDS
2 chr1   exon
3 chr1    CDS
4 chr1   exon
5 chr1    CDS
6 chr1   exon

As pointed out in the comments, here are some other possibilities:

Data[2] <- NULL    # Wojciech Sobala
Data[[2]] <- NULL  # same as above
Data <- Data[,-2]  # Ian Fellows
Data <- Data[-2]   # same as above

You can remove multiple columns via:

Data[1:2] <- list(NULL)  # Marek
Data[1:2] <- NULL        # does not work!

Be careful with matrix-subsetting though, as you can end up with a vector:

Data <- Data[,-(2:3)]             # vector
Data <- Data[,-(2:3),drop=FALSE]  # still a data.frame
指尖上得阳光 2024-11-21 17:16:37

要按名称删除一个或多个列,当列名称已知(而不是在运行时确定)时,我喜欢 subset() 语法。例如,对于数据框,

df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)

您可以仅删除 a

Data <- subset( Data, select = -a )

,并删除您可以执行的 bd 列您

Data <- subset( Data, select = -c(d, b ) )

可以删除db 之间的所有列:

Data <- subset( Data, select = -c( d : b )

正如我上面所说,此语法仅在列名已知时才有效。当说以编程方式确定列名称(即分配给变量)时,它将不起作用。我将从 ?subset 文档中重现此警告:

警告:

这是一个旨在交互使用的便利功能。
对于编程,最好使用标准子集
像“[”这样的函数,特别是非标准评估
参数“子集”可能会产生意想不到的后果。

To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset() syntax. E.g. for the data-frame

df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)

to remove just the a column you could do

Data <- subset( Data, select = -a )

and to remove the b and d columns you could do

Data <- subset( Data, select = -c(d, b ) )

You can remove all columns between d and b with:

Data <- subset( Data, select = -c( d : b )

As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (i.e. assigned to a variable). I'll reproduce this Warning from the ?subset documentation:

Warning:

This is a convenience function intended for use interactively.
For programming it is better to use the standard subsetting
functions like '[', and in particular the non-standard evaluation
of argument 'subset' can have unanticipated consequences.

傲娇萝莉攻 2024-11-21 17:16:37

(为了完整性)如果您想按名称删除列,您可以这样做:

cols.dont.want <- "genome"
cols.dont.want <- c("genome", "region") # if you want to remove multiple columns

data <- data[, ! names(data) %in% cols.dont.want, drop = F]

包括 drop = F 确保结果仍然是 data.frame 即使仅剩下一列。

(For completeness) If you want to remove columns by name, you can do this:

cols.dont.want <- "genome"
cols.dont.want <- c("genome", "region") # if you want to remove multiple columns

data <- data[, ! names(data) %in% cols.dont.want, drop = F]

Including drop = F ensures that the result will still be a data.frame even if only one column remains.

红玫瑰 2024-11-21 17:16:37

使用 data.frame 时发布的答案非常好。然而,从内存的角度来看,这些任务的效率可能相当低。对于大数据,删除列可能会花费异常长的时间和/或由于内存不足错误而失败。包 data.table 有助于使用 := 运算符解决此问题:

library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
     b c
[1,] 1 1

我应该组合一个更大的示例来显示差异。我会在某个时候更新这个答案。

The posted answers are very good when working with data.frames. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due to out of memory errors. Package data.table helps address this problem with the := operator:

library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
     b c
[1,] 1 1

I should put together a bigger example to show the differences. I'll update this answer at some point with that.

檐上三寸雪 2024-11-21 17:16:37

这样,您可以删除并将变量存储到另一个变量中。

df = subset(data, select = -c(genome) )

With this you can remove the column and store variable into another variable.

df = subset(data, select = -c(genome) )
听风吹 2024-11-21 17:16:37

有多种选项可用于使用 dplyr::select() 和一些辅助函数删除一列或多列。辅助函数可能很有用,因为有些函数不需要命名所有要删除的特定列。请注意,要使用 select() 删除列,您需要使用前导 - 来否定列名称。

使用 dplyr::starwars 示例数据来表示某些列名称:

library(dplyr)

starwars %>% 
  select(-height) %>%                  # a specific column name
  select(-one_of('mass', 'films')) %>% # any columns named in one_of()
  select(-(name:hair_color)) %>%       # the range of columns from 'name' to 'hair_color'
  select(-contains('color')) %>%       # any column name that contains 'color'
  select(-starts_with('bi')) %>%       # any column name that starts with 'bi'
  select(-ends_with('er')) %>%         # any column name that ends with 'er'
  select(-matches('^v.+s

您还可以按列号删除:

starwars %>% 
  select(-2, -(4:10)) # column 2 and columns 4 through 10
)) %>% # any column name matching the regex pattern select_if(~!is.list(.)) %>% # not by column name but by data type head(2) # A tibble: 2 x 2 homeworld species <chr> <chr> 1 Tatooine Human 2 Tatooine Droid

您还可以按列号删除:

There are several options for removing one or more columns with dplyr::select() and some helper functions. The helper functions can be useful because some do not require naming all the specific columns to be dropped. Note that to drop columns using select() you need to use a leading - to negate the column names.

Using the dplyr::starwars sample data for some variety in column names:

library(dplyr)

starwars %>% 
  select(-height) %>%                  # a specific column name
  select(-one_of('mass', 'films')) %>% # any columns named in one_of()
  select(-(name:hair_color)) %>%       # the range of columns from 'name' to 'hair_color'
  select(-contains('color')) %>%       # any column name that contains 'color'
  select(-starts_with('bi')) %>%       # any column name that starts with 'bi'
  select(-ends_with('er')) %>%         # any column name that ends with 'er'
  select(-matches('^v.+s

You can also drop by column number:

starwars %>% 
  select(-2, -(4:10)) # column 2 and columns 4 through 10
)) %>% # any column name matching the regex pattern select_if(~!is.list(.)) %>% # not by column name but by data type head(2) # A tibble: 2 x 2 homeworld species <chr> <chr> 1 Tatooine Human 2 Tatooine Droid

You can also drop by column number:

娇纵 2024-11-21 17:16:37

使用 dplyR,可以执行以下操作:

data <- select(data, -genome)

根据此处找到的文档 https://www.marsja.se/how-to-remove-a-column-in-r-using-dplyr-by-name-and-index/#:~:text=select(starwars%2C %20%2D高度)

Using dplyR, the following works:

data <- select(data, -genome)

as per documentation found here https://www.marsja.se/how-to-remove-a-column-in-r-using-dplyr-by-name-and-index/#:~:text=select(starwars%2C%20%2Dheight)

灼痛 2024-11-21 17:16:37

我只是想添加一个尚未提及的内容。它很简单,但也很有趣,因为在我浏览互联网的过程中,我没有看到它,尽管高度相关的 %in% 出现在很多地方。

df <- df[ , -which(names(df) == 'removeCol')]

另外,我没有看到有人发布 grep 替代方案。这些对于删除与模式匹配的多个列非常方便。

I just thought I'd add one in that wasn't mentioned yet. It's simple but also interesting because in all my perusing of the internet I did not see it, even though the highly related %in% appears in many places.

df <- df[ , -which(names(df) == 'removeCol')]

Also, I didn't see anyone post grep alternatives. These can be very handy for removing multiple columns that match a pattern.

夜巴黎 2024-11-21 17:16:37

chr = chr[,-2]
如果这样做会更容易,只需从 df 中删除第二列并将其再次存储在 df 中即可。

chr = chr[,-2]
It's easier if you do this way, just remove the second column from the df and store it in the df again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文