按名称删除数据框列

发布于 2024-10-10 07:49:19 字数 263 浏览 4 评论 0原文

我想从数据框中删除许多列。我知道我们可以使用以下命令单独删除它们：

df$x <- NULL

但我希望用更少的命令来完成此操作。

另外，我知道我可以使用整数索引删除列，如下所示：

df <- df[ -c(1, 3:6, 12) ]

但我担心变量的相对位置可能会改变。

鉴于 R 的强大功能，我认为可能有比逐一删除每一列更好的方法。

原文

I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:

df$x <- NULL

But I was hoping to do this with fewer commands.

Also, I know that I could drop columns using integer indexing like this:

df <- df[ -c(1, 3:6, 12) ]

But I am concerned that the relative position of my variables may change.

Given how powerful R is, I figured there might be a better way than dropping each column one by one.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怪我入戏太深 2024-10-17 07:49:19

您可以使用简单的名称列表：

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)
drops <- c("x","z")
DF[ , !(names(DF) %in% drops)]

或者，您可以制作一个要保留的列表并按名称引用它们：

keeps <- c("y", "a")
DF[keeps]

编辑：
对于那些还不熟悉索引函数的 drop 参数的人来说，如果您想保留一列作为数据框，您可以这样做：

keeps <- "y"
DF[ , keeps, drop = FALSE]

drop=TRUE （或者不提及它）将删除不必要的维度，因此返回一个包含 y 列值的向量。

You can use a simple list of names :

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)
drops <- c("x","z")
DF[ , !(names(DF) %in% drops)]

Or, alternatively, you can make a list of those to keep and refer to them by name :

keeps <- c("y", "a")
DF[keeps]

EDIT :
For those still not acquainted with the drop argument of the indexing function, if you want to keep one column as a data frame, you do:

keeps <- "y"
DF[ , keeps, drop = FALSE]

drop=TRUE (or not mentioning it) will drop unnecessary dimensions, and hence return a vector with the values of column y.

回复收藏 0 原文

大姐，你呐 2024-10-17 07:49:19

还有 subset 命令，如果您知道想要哪些列，则很有用：

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

@hadley 评论后更新：要删除列 a,c，您可以执行以下操作：

df <- subset(df, select = -c(a, c))

There's also the subset command, useful if you know which columns you want:

df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))

UPDATED after comment by @hadley: To drop columns a,c you could do:

df <- subset(df, select = -c(a, c))

回复收藏 0 原文

执笏见 2024-10-17 07:49:19

within(df, rm(x))

可能是最简单的，或者对于多个变量：

within(df, rm(x, y))

或者如果您正在处理 data.table（根据 How do you在 data.table 中按名称删除列？)：

dt[, x := NULL]   # Deletes column x by reference instantly.

dt[, !"x"]   # Selects all but x into a new data.table.

或对于多个变量

dt[, c("x","y") := NULL]

dt[, !c("x", "y")]

within(df, rm(x))

is probably easiest, or for multiple variables:

within(df, rm(x, y))

Or if you're dealing with data.tables (per How do you delete a column by name in data.table?):

dt[, x := NULL]   # Deletes column x by reference instantly.

dt[, !"x"]   # Selects all but x into a new data.table.

or for multiple variables

dt[, c("x","y") := NULL]

dt[, !c("x", "y")]

回复收藏 0 原文

眼眸里的快感 2024-10-17 07:49:19

您可以像这样使用 %in% ：

df[, !(colnames(df) %in% c("x","bar","foo"))]

You could use %in% like this:

df[, !(colnames(df) %in% c("x","bar","foo"))]

回复收藏 0 原文

顾挽 2024-10-17 07:49:19

列表（NULL）也有效：

dat <- mtcars
colnames(dat)
# [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
# [11] "carb"
dat[,c("mpg","cyl","wt")] <- list(NULL)
colnames(dat)
# [1] "disp" "hp"   "drat" "qsec" "vs"   "am"   "gear" "carb"

list(NULL) also works:

dat <- mtcars
colnames(dat)
# [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
# [11] "carb"
dat[,c("mpg","cyl","wt")] <- list(NULL)
colnames(dat)
# [1] "disp" "hp"   "drat" "qsec" "vs"   "am"   "gear" "carb"

回复收藏 0 原文

过度放纵 2024-10-17 07:49:19

如果您想通过引用删除列并避免与 data.frames 相关的内部复制，那么您可以使用 data.table 包和函数 :=< /code>

您可以将字符向量名称传递到 := 运算符的左侧，并将 NULL 作为 RHS。

library(data.table)

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)
# or more simply  DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10) #

DT[, c('a','b') := NULL]

如果您想在 [ 调用之外将名称预定义为字符向量，请将对象的名称包装在 () 或 {} 中强制 LHS 在调用范围内进行计算，而不是作为 DT 范围内的名称。

del <- c('a','b')
DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, (del) := NULL]
DT <-  <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, {del} := NULL]
# force or `c` would also work.

您还可以使用set，它避免了[.data.table的开销，并且也适用于data.frames！< /强>

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)

# drop `a` from df (no copying involved)

set(df, j = 'a', value = NULL)
# drop `b` from DT (no copying involved)
set(DT, j = 'b', value = NULL)

If you want remove the columns by reference and avoid the internal copying associated with data.frames then you can use the data.table package and the function :=

You can pass a character vector names to the left hand side of the := operator, and NULL as the RHS.

library(data.table)

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)
# or more simply  DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10) #

DT[, c('a','b') := NULL]

If you want to predefine the names as as character vector outside the call to [, wrap the name of the object in () or {} to force the LHS to be evaluated in the calling scope not as a name within the scope of DT.

del <- c('a','b')
DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, (del) := NULL]
DT <-  <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, {del} := NULL]
# force or `c` would also work.

You can also use set, which avoids the overhead of [.data.table, and also works for data.frames!

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)

# drop `a` from df (no copying involved)

set(df, j = 'a', value = NULL)
# drop `b` from DT (no copying involved)
set(DT, j = 'b', value = NULL)

回复收藏 0 原文

蓝海 2024-10-17 07:49:19

基于 grep() 将返回数字向量这一事实，有一种可能更强大的策略。如果您有一个很长的变量列表，就像我在一个数据集中所做的那样，一些变量以“.A”结尾，其他变量以“.B”结尾，而您只需要以“.A”结尾的变量（以及对于与任一模式都不匹配的所有变量，执行以下操作：

dfrm2 <- dfrm[ , -grep("\\.B$", names(dfrm)) ]

对于当前的情况，使用 Joris Meys 示例，它可能不会那么紧凑，但会是：

DF <- DF[, -grep( paste("^",drops,"$", sep="", collapse="|"), names(DF) )]

There is a potentially more powerful strategy based on the fact that grep() will return a numeric vector. If you have a long list of variables as I do in one of my dataset, some variables that end in ".A" and others that end in ".B" and you only want the ones that end in ".A" (along with all the variables that don't match either pattern, do this:

dfrm2 <- dfrm[ , -grep("\\.B$", names(dfrm)) ]

For the case at hand, using Joris Meys example, it might not be as compact, but it would be:

DF <- DF[, -grep( paste("^",drops,"$", sep="", collapse="|"), names(DF) )]

回复收藏 0 原文

慵挽 2024-10-17 07:49:19

另一个 dplyr 答案。
使用select(-column)。

如果您的变量具有一些通用的命名结构，您可以尝试 starts_with()。例如，

library(dplyr)
df <- data.frame(var1 = rnorm(5), var2 = rnorm(5), var3 = rnorm (5), 
                 var4 = rnorm(5), char1 = rnorm(5), char2 = rnorm(5))
df
#        var2      char1        var4       var3       char2       var1
#1 -0.4629512 -0.3595079 -0.04763169  0.6398194  0.70996579 0.75879754
#2  0.5489027  0.1572841 -1.65313658 -1.3228020 -1.42785427 0.31168919
#3 -0.1707694 -0.9036500  0.47583030 -0.6636173  0.02116066 0.03983268

df1 <- df %>% select(-starts_with("char"))

df1
#        var2        var4       var3       var1
#1 -0.4629512 -0.04763169  0.6398194 0.75879754
#2  0.5489027 -1.65313658 -1.3228020 0.31168919
#3 -0.1707694  0.47583030 -0.6636173 0.03983268

如果要删除数据框中的变量序列，可以使用 :。例如，如果您想删除 var2、var3 以及其间的所有变量，则只需留下 var1< /代码>：

df2 <- df1 %>% select(-c(var2:var3) )  
df2
#        var1
#1 0.75879754
#2 0.31168919
#3 0.03983268

Another dplyr answer.
Use select(-column).

If your variables have some common naming structure, you might try starts_with(). For example

library(dplyr)
df <- data.frame(var1 = rnorm(5), var2 = rnorm(5), var3 = rnorm (5), 
                 var4 = rnorm(5), char1 = rnorm(5), char2 = rnorm(5))
df
#        var2      char1        var4       var3       char2       var1
#1 -0.4629512 -0.3595079 -0.04763169  0.6398194  0.70996579 0.75879754
#2  0.5489027  0.1572841 -1.65313658 -1.3228020 -1.42785427 0.31168919
#3 -0.1707694 -0.9036500  0.47583030 -0.6636173  0.02116066 0.03983268

df1 <- df %>% select(-starts_with("char"))

df1
#        var2        var4       var3       var1
#1 -0.4629512 -0.04763169  0.6398194 0.75879754
#2  0.5489027 -1.65313658 -1.3228020 0.31168919
#3 -0.1707694  0.47583030 -0.6636173 0.03983268

If you want to drop a sequence of variables in the data frame, you can use :. For example if you wanted to drop var2, var3, and all variables in between, you'd just be left with var1:

df2 <- df1 %>% select(-c(var2:var3) )  
df2
#        var1
#1 0.75879754
#2 0.31168919
#3 0.03983268

回复收藏 0 原文

空心空情空意 2024-10-17 07:49:19

Dplyr 解决方案

我怀疑这会在这里引起太多关注，但是如果您有一个要删除的列列表，并且您想在 dplyr 链中执行此操作我在 select 子句中使用 one_of()：

这是一个简单的、可重现的示例：

undesired <- c('mpg', 'cyl', 'hp')

mtcars <- mtcars %>%
  select(-one_of(undesired))

可以通过运行 ?one_of 或此处找到文档：

http://genomicsclass.github.io/book/pages/dplyr_tutorial.html

Dplyr Solution

I doubt this will get much attention down here, but if you have a list of columns that you want to remove, and you want to do it in a dplyr chain I use one_of() in the select clause:

Here is a simple, reproducable example:

undesired <- c('mpg', 'cyl', 'hp')

mtcars <- mtcars %>%
  select(-one_of(undesired))

Documentation can be found by running ?one_of or here:

http://genomicsclass.github.io/book/pages/dplyr_tutorial.html

回复收藏 0 原文

鹿童谣 2024-10-17 07:49:19

另一种可能性：

df <- df[, setdiff(names(df), c("a", "c"))]

或者

df <- df[, grep('^(a|c)
, names(df), invert=TRUE)]

Another possibility:

df <- df[, setdiff(names(df), c("a", "c"))]

df <- df[, grep('^(a|c)
, names(df), invert=TRUE)]

回复收藏 0 原文

夏至、离别 2024-10-17 07:49:19

出于兴趣，这标志着 R 奇怪的多重语法不一致之一。例如，给定一个两列数据框：

df <- data.frame(x=1, y=2)

这给出了一个数据框

subset(df, select=-y)

，但这给出了一个向量

df[,-2]

这在 ?[ 中都有解释，但这并不是完全预期的行为。好吧，至少对我来说不是……

Out of interest, this flags up one of R's weird multiple syntax inconsistencies. For example given a two-column data frame:

df <- data.frame(x=1, y=2)

This gives a data frame

subset(df, select=-y)

but this gives a vector

df[,-2]

This is all explained in ?[ but it's not exactly expected behaviour. Well at least not to me...

回复收藏 0 原文

风铃鹿 2024-10-17 07:49:19

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)
DF

输出：

    x  y z  a
1   1 10 5 11
2   2  9 5 12
3   3  8 5 13
4   4  7 5 14
5   5  6 5 15
6   6  5 5 16
7   7  4 5 17
8   8  3 5 18
9   9  2 5 19
10 10  1 5 20

DF[c("a","x")] <- list(NULL)

输出：

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)
DF

Output:

    x  y z  a
1   1 10 5 11
2   2  9 5 12
3   3  8 5 13
4   4  7 5 14
5   5  6 5 15
6   6  5 5 16
7   7  4 5 17
8   8  3 5 18
9   9  2 5 19
10 10  1 5 20

DF[c("a","x")] <- list(NULL)

Output:

回复收藏 0 原文

漆黑的白昼 2024-10-17 07:49:19

这是一种 dplyr 方法：

#df[ -c(1,3:6, 12) ]  # original
df.cut <- df %>% select(-col.to.drop.1, -col.to.drop.2, ..., -col.to.drop.6)  # with dplyr::select()

我喜欢这个，因为它易于阅读和使用。无需注释即可理解，并且对数据框中位置变化的列具有鲁棒性。它还遵循使用 - 删除元素的矢量化习惯用法。

Here is a dplyr way to go about it:

#df[ -c(1,3:6, 12) ]  # original
df.cut <- df %>% select(-col.to.drop.1, -col.to.drop.2, ..., -col.to.drop.6)  # with dplyr::select()

I like this because it's intuitive to read & understand without annotation and robust to columns changing position within the data frame. It also follows the vectorized idiom using - to remove elements.

回复收藏 0 原文

抱猫软卧 2024-10-17 07:49:19

我一直认为一定有更好的习惯用法，但是为了按名称减去列，我倾向于执行以下操作：

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)

# return everything except a and c
df <- df[,-match(c("a","c"),names(df))]
df

I keep thinking there must be a better idiom, but for subtraction of columns by name, I tend to do the following:

df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)

# return everything except a and c
df <- df[,-match(c("a","c"),names(df))]
df

回复收藏 0 原文

没︽人懂的悲伤 2024-10-17 07:49:19

Bernd Bischl 的 BBmisc 包中有一个名为 dropNamed() 的函数正是执行此操作。

BBmisc::dropNamed(df, "x")

优点是它避免了重复数据帧参数，因此适合在 magrittr 中进行管道传输（就像 dplyr 方法一样）：

df %>% BBmisc::dropNamed("x")

There's a function called dropNamed() in Bernd Bischl's BBmisc package that does exactly this.

BBmisc::dropNamed(df, "x")

The advantage is that it avoids repeating the data frame argument and thus is suitable for piping in magrittr (just like the dplyr approaches):

df %>% BBmisc::dropNamed("x")

回复收藏 0 原文

流心雨 2024-10-17 07:49:19

除了早期答案中演示的 select(-one_of(drop_col_names)) 之外，还有其他几个 dplyr 选项用于使用 select() 删除列，不涉及定义所有特定的列名称（使用 dplyr starwars 示例数据来表示某些列名称）：

library(dplyr)
starwars %>% 
  select(-(name:mass)) %>%        # the range of columns from 'name' to 'mass'
  select(-contains('color')) %>%  # any column name that contains 'color'
  select(-starts_with('bi')) %>%  # any column name that starts with 'bi'
  select(-ends_with('er')) %>%    # any column name that ends with 'er'
  select(-matches('^f.+s
如果您需要删除数据框中可能存在或不存在的列，请使用  select_if() 与使用 one_of() 不同，如果列名不存在，则不会抛出 Unknown columns: 警告。在此示例中，“bad_column”不是数据框中的列：
starwars %>% 
  select_if(!names(.) %in% c('height', 'mass', 'bad_column'))

)) %>%  # any column name matching the regex pattern
  select_if(~!is.list(.)) %>%     # not by column name but by data type
  head(2)

# A tibble: 2 x 2
homeworld species
  <chr>     <chr>  
1 Tatooine  Human  
2 Tatooine  Droid

如果您需要删除数据框中可能存在或不存在的列，请使用 select_if() 与使用 one_of() 不同，如果列名不存在，则不会抛出 Unknown columns: 警告。在此示例中，“bad_column”不是数据框中的列：

Beyond select(-one_of(drop_col_names)) demonstrated in earlier answers, there are a couple other dplyr options for dropping columns using select() that do not involve defining all the specific column names (using the dplyr starwars sample data for some variety in column names):

library(dplyr)
starwars %>% 
  select(-(name:mass)) %>%        # the range of columns from 'name' to 'mass'
  select(-contains('color')) %>%  # any column name that contains 'color'
  select(-starts_with('bi')) %>%  # any column name that starts with 'bi'
  select(-ends_with('er')) %>%    # any column name that ends with 'er'
  select(-matches('^f.+s
If you need to drop a column that may or may not exist in the data frame, here's a slight twist using select_if() that unlike using one_of() will not throw an Unknown columns: warning if the column name does not exist. In this example 'bad_column' is not a column in the data frame:
starwars %>% 
  select_if(!names(.) %in% c('height', 'mass', 'bad_column'))

)) %>%  # any column name matching the regex pattern
  select_if(~!is.list(.)) %>%     # not by column name but by data type
  head(2)

# A tibble: 2 x 2
homeworld species
  <chr>     <chr>  
1 Tatooine  Human  
2 Tatooine  Droid

If you need to drop a column that may or may not exist in the data frame, here's a slight twist using select_if() that unlike using one_of() will not throw an Unknown columns: warning if the column name does not exist. In this example 'bad_column' is not a column in the data frame:

回复收藏 0 原文

送君千里 2024-10-17 07:49:19

如果您不想使用上面的@hadley，还有另一个解决方案：如果“COLUMN_NAME”是您要删除的列的名称：

df[,-which(names(df) == "COLUMN_NAME")]

Another solution if you don't want to use @hadley's above: If "COLUMN_NAME" is the name of the column you want to drop:

df[,-which(names(df) == "COLUMN_NAME")]

回复收藏 0 原文

删除→记忆 2024-10-17 07:49:19

提供数据框和要删除的逗号分隔名称字符串：

remove_features <- function(df, features) {
  rem_vec <- unlist(strsplit(features, ', '))
  res <- df[,!(names(df) %in% rem_vec)]
  return(res)
}

用法：

remove_features(iris, "Sepal.Length, Petal.Width")

Provide the data frame and a string of comma separated names to remove:

remove_features <- function(df, features) {
  rem_vec <- unlist(strsplit(features, ', '))
  res <- df[,!(names(df) %in% rem_vec)]
  return(res)
}

Usage:

remove_features(iris, "Sepal.Length, Petal.Width")

回复收藏 0 原文

烟花肆意 2024-10-17 07:49:19

您可以采用多种方法...

选项 1：

df[ , -which(names(df) %in% c("name1","name2"))]

选项 2：

df[!names(df) %in% c("name1", "name2")]

选项 3：

subset(df, select=-c(name1,name2))

There are a lot of ways you can do...

Option-1:

df[ , -which(names(df) %in% c("name1","name2"))]

Option-2:

df[!names(df) %in% c("name1", "name2")]

Option-3:

subset(df, select=-c(name1,name2))

回复收藏 0 原文

浮光之海 2024-10-17 07:49:19

按数据框中的列名称删除和删除列。

A <- df[ , c("Name","Name1","Name2","Name3")]

Drop and delete columns by columns name in data frame.

A <- df[ , c("Name","Name1","Name2","Name3")]

回复收藏 0 原文

聽兲甴掵 2024-10-17 07:49:19

使用 which 查找要删除的列的索引。给这些索引加上负号 (*-1)。然后对这些值进行子集化，这会将它们从数据框中删除。这是一个例子。

DF <- data.frame(one=c('a','b'), two=c('c', 'd'), three=c('e', 'f'), four=c('g', 'h'))
DF
#  one two three four
#1   a   d     f    i
#2   b   e     g    j

DF[which(names(DF) %in% c('two','three')) *-1]
#  one four
#1   a    g
#2   b    h

Find the index of the columns you want to drop using which. Give these indexes a negative sign (*-1). Then subset on those values, which will remove them from the dataframe. This is an example.

DF <- data.frame(one=c('a','b'), two=c('c', 'd'), three=c('e', 'f'), four=c('g', 'h'))
DF
#  one two three four
#1   a   d     f    i
#2   b   e     g    j

DF[which(names(DF) %in% c('two','three')) *-1]
#  one four
#1   a    g
#2   b    h

回复收藏 0 原文

も星光 2024-10-17 07:49:19

如果您有一个很大的 data.frame 并且内存不足，请使用 [ 。。。。或 rm 和 within 来删除 data.frame 的列，作为子集< /code> 当前 (R 3.6.2) 使用更多内存 - 除了手册中交互使用子集的提示。

getData <- function() {
  n <- 1e7
  set.seed(7)
  data.frame(a = runif(n), b = runif(n), c = runif(n), d = runif(n))
}

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- DF[setdiff(names(DF), c("a", "c"))] ##
#DF <- DF[!(names(DF) %in% c("a", "c"))] #Alternative
#DF <- DF[-match(c("a","c"),names(DF))]  #Alternative
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- subset(DF, select = -c(a, c)) ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#357 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- within(DF, rm(a, c)) ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF[c("a", "c")]  <- NULL ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

If you have a large data.frame and are low on memory use [ . . . . or rm and within to remove columns of a data.frame, as subset is currently (R 3.6.2) using more memory - beside the hint of the manual to use subset interactively.

getData <- function() {
  n <- 1e7
  set.seed(7)
  data.frame(a = runif(n), b = runif(n), c = runif(n), d = runif(n))
}

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- DF[setdiff(names(DF), c("a", "c"))] ##
#DF <- DF[!(names(DF) %in% c("a", "c"))] #Alternative
#DF <- DF[-match(c("a","c"),names(DF))]  #Alternative
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- subset(DF, select = -c(a, c)) ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#357 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF <- within(DF, rm(a, c)) ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

DF <- getData()
tt <- sum(.Internal(gc(FALSE, TRUE, TRUE))[13:14])
DF[c("a", "c")]  <- NULL ##
sum(.Internal(gc(FALSE, FALSE, TRUE))[13:14]) - tt
#0.1 MB are used

回复收藏 0 原文

紫竹語嫣☆ 2024-10-17 07:49:19

另一种选择是使用 collapse 包中的函数 fselect 。以下是一个可重现的示例：

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)

library(collapse)
fselect(DF, -z)
#>     x  y  a
#> 1   1 10 11
#> 2   2  9 12
#> 3   3  8 13
#> 4   4  7 14
#> 5   5  6 15
#> 6   6  5 16
#> 7   7  4 17
#> 8   8  3 18
#> 9   9  2 19
#> 10 10  1 20

^{创建于 2022 年 8 月 26 日，使用 reprex v2.0.2}

Another option using the function fselect from the collapse package. Here is a reproducible example:

DF <- data.frame(
  x=1:10,
  y=10:1,
  z=rep(5,10),
  a=11:20
)

library(collapse)
fselect(DF, -z)
#>     x  y  a
#> 1   1 10 11
#> 2   2  9 12
#> 3   3  8 13
#> 4   4  7 14
#> 5   5  6 15
#> 6   6  5 16
#> 7   7  4 17
#> 8   8  3 18
#> 9   9  2 19
#> 10 10  1 20

^{Created on 2022-08-26 with reprex v2.0.2}

回复收藏 0 原文

愁以何悠 2024-10-17 07:49:19

另一个尚未发布的 data.table 选项是使用特殊动词 .SD，它代表数据子集。与 .SDcols 参数一起，您可以按名称或索引选择/删除列。

require(data.table)
# data
dt = data.table(
  A = LETTERS[1:5],
  B = 1:5,
  C = rep(TRUE, 5)
)
# delete B
dt[ , .SD, .SDcols =! 'B' ]
# delete all matches (i.e. all columns)
cols = grep('[A-Z]+', names(dt), value = TRUE)
dt[ , .SD, .SDcols =! cols ]

data.table 中此类任务的所有选项的摘要可以在此处找到

Another data.table option which hasn't been posted yet is using the special verb .SD, which stands for subset of data. Together with the .SDcols argument you can select/drop columns by name or index.

require(data.table)
# data
dt = data.table(
  A = LETTERS[1:5],
  B = 1:5,
  C = rep(TRUE, 5)
)
# delete B
dt[ , .SD, .SDcols =! 'B' ]
# delete all matches (i.e. all columns)
cols = grep('[A-Z]+', names(dt), value = TRUE)
dt[ , .SD, .SDcols =! cols ]

A summary of all the options for such a task in data.table can be found here

回复收藏 0 原文

我爱人 2024-10-17 07:49:19

df <- data.frame(
+   a=1:5,
+   b=6:10,
+   c=rep(22,5),
+   d=round(runif(5)*100, 2),
+   e=round(runif(5)*100, 2),
+   f=round(runif(5)*100, 2),
+   g=round(runif(5)*100, 2),
+   h=round(runif(5)*100, 2)
+ )
> df
  a  b  c     d     e     f     g     h
1 1  6 22 76.31 39.96 66.62 72.75 73.14
2 2  7 22 53.41 94.85 96.02 97.31 85.32
3 3  8 22 98.29 38.95 12.61 29.67 88.45
4 4  9 22 20.04 53.53 83.07 77.50 94.99
5 5 10 22  5.67  0.42 15.07 59.75 31.21

> # remove cols: d g h
> newDf <- df[, c(1:3, 5), drop=TRUE]
> newDf
  a  b  c     e
1 1  6 22 39.96
2 2  7 22 94.85
3 3  8 22 38.95
4 4  9 22 53.53
5 5 10 22  0.42

df <- data.frame(
+   a=1:5,
+   b=6:10,
+   c=rep(22,5),
+   d=round(runif(5)*100, 2),
+   e=round(runif(5)*100, 2),
+   f=round(runif(5)*100, 2),
+   g=round(runif(5)*100, 2),
+   h=round(runif(5)*100, 2)
+ )
> df
  a  b  c     d     e     f     g     h
1 1  6 22 76.31 39.96 66.62 72.75 73.14
2 2  7 22 53.41 94.85 96.02 97.31 85.32
3 3  8 22 98.29 38.95 12.61 29.67 88.45
4 4  9 22 20.04 53.53 83.07 77.50 94.99
5 5 10 22  5.67  0.42 15.07 59.75 31.21

> # remove cols: d g h
> newDf <- df[, c(1:3, 5), drop=TRUE]
> newDf
  a  b  c     e
1 1  6 22 39.96
2 2  7 22 94.85
3 3  8 22 38.95
4 4  9 22 53.53
5 5 10 22  0.42

回复收藏 0 原文

~没有更多了~