当具有与变量相同名称的列时，将函数中的变量名称删除（data.table）

发布于 2025-02-10 23:06:36 字数 2125 浏览 2 评论 0原文

我的函数具有名为source的变量。该函数正常工作，但是如果应用该函数的数据框架的列也命名为source，则无效。

一个简单的dplyr和过滤示例：以下这两个行有效，但是它们基于列过滤（我想过滤函数中定义的变量名称）：

corpus %>% dplyr::filter(!!source=="027021335")
corpus %>% dplyr::filter(source=="027021335")

以下行工作并正确使用函数中定义的变量：

corpus %>% dplyr::filter(!!rlang::sym(source)=="027021335")

如何使用数据实现相同的事物。桌子（）？我尝试了c（），get（）和..的许多组合，而无需设法使其正常工作。我以为copus [get（source）==“ 027021335”]应该有效，但事实并非如此，因为它返回第一个参数具有长度＆gt; 1错误。

编辑：我认为我获得此错误的一个可能原因是，除了源为变量之外，还有一个列名，还有一个源（）函数是base r。

使用DPUT（语料库）语料库：

structure(list(idref = c("027021335", "182132870", "221468579", 
"034574654", "069546592", "159340950", "169800458", "028529413", 
"076605442", "026762889"), iddoc = c(97466L, 101100L, 103772L, 
110039L, 134077L, 55693L, 38787L, 39304L, 73483L, 74350L), nom = c("Méhaut", 
"Favre", "Guerdjikova", "Diebolt", "Giraud-Héraud", "Charlier", 
"Moumni", "Henni", "Bonnel", "Callens"), prenom = c("Philippe", 
"Karine", "Ani", "Claude", "Eric", "Christophe", "Nicolas", "Ahmed", 
"Patrick", "Stéphane"), order = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), role = c("supervisor", "supervisor", "supervisor", "supervisor", 
"supervisor", "supervisor", "supervisor", "supervisor", "supervisor", 
"supervisor"), Annee_soutenance = c("2011", "2014", "2018", "2009", 
"2006", "2015", "2012", "2008", "2009", "2010"), source = c("as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)"
), time_variable = c("as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)")), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001ee3a911ef0>)

原文

I have a function with a variable named source. The function works properly, but if the data frame on which the function is applied has a column also named source, it doesn't work.

A simple example with dplyr and filtering:
These two following lines works, but they filter based on the column (I want to filter on the variable name defined in the function):

corpus %>% dplyr::filter(!!source=="027021335")
corpus %>% dplyr::filter(source=="027021335")

This following line work and properly use the variable defined in the function:

corpus %>% dplyr::filter(!!rlang::sym(source)=="027021335")

How to achieve the same thing using data.table()? I have tried numerous combination of c(), get() and .. without managing to make it work. I thought that corpus[get(source)=="027021335"] should have worked but it is not the case as it returns a first argument has length > 1 error.

Edit:
I think one possible reason I get this error is that in addition to source as a variable and as a column name, there is a source() function is base r.

Corpus using dput(corpus):

structure(list(idref = c("027021335", "182132870", "221468579", 
"034574654", "069546592", "159340950", "169800458", "028529413", 
"076605442", "026762889"), iddoc = c(97466L, 101100L, 103772L, 
110039L, 134077L, 55693L, 38787L, 39304L, 73483L, 74350L), nom = c("Méhaut", 
"Favre", "Guerdjikova", "Diebolt", "Giraud-Héraud", "Charlier", 
"Moumni", "Henni", "Bonnel", "Callens"), prenom = c("Philippe", 
"Karine", "Ani", "Claude", "Eric", "Christophe", "Nicolas", "Ahmed", 
"Patrick", "Stéphane"), order = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), role = c("supervisor", "supervisor", "supervisor", "supervisor", 
"supervisor", "supervisor", "supervisor", "supervisor", "supervisor", 
"supervisor"), Annee_soutenance = c("2011", "2014", "2018", "2009", 
"2006", "2015", "2012", "2008", "2009", "2010"), source = c("as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)"
), time_variable = c("as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)", "as.character(idref)", 
"as.character(idref)", "as.character(idref)")), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001ee3a911ef0>)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

泛泛之交 2025-02-17 23:06:36

使用data.table开发版本（1.14.3），可以使用新的env> env参数来完成，请参见 data.table.table ：

data.table::update.dev.pkg()
source = "idref"
corpus[source=="027021335",env=list(source=source)]

       idref iddoc    nom   prenom order       role Annee_soutenance              source       time_variable
1: 027021335 97466 Méhaut Philippe     0 supervisor             2011 as.character(idref) as.character(idref)

With data.table development version (1.14.3), this can be done with the new env argument, see programming on data.table:

data.table::update.dev.pkg()
source = "idref"
corpus[source=="027021335",env=list(source=source)]

       idref iddoc    nom   prenom order       role Annee_soutenance              source       time_variable
1: 027021335 97466 Méhaut Philippe     0 supervisor             2011 as.character(idref) as.character(idref)

回复收藏 0 原文

~没有更多了~