转换 data.table 中的列类
我在使用 data.table 时遇到问题:如何转换列类?这是一个简单的例子:使用 data.frame 转换它没有问题,使用 data.table 我只是不知道如何:
df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])
library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE)
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE])
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") :
#unused argument(s) (with = FALSE)
我在这里错过了一些明显的东西吗?
由于 Matthew 的帖子而更新:我之前使用过旧版本,但即使更新到 1.6.6(我现在使用的版本)后,我仍然收到错误。
更新2:假设我想将“因子”类的每一列转换为“字符”列,但事先不知道哪一列属于哪个类。使用 data.frame,我可以执行以下操作:
classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)
我可以对 data.table 执行类似的操作吗?
更新3:
会话信息() R版本2.13.1 (2011-07-08) 平台:x86_64-pc-mingw32/x64(64位)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.6
loaded via a namespace (and not attached):
[1] tools_2.13.1
I have a problem using data.table: How do I convert column classes? Here is a simple example: With data.frame I don't have a problem converting it, with data.table I just don't know how:
df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])
library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE)
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE])
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") :
#unused argument(s) (with = FALSE)
Do I miss something obvious here?
Update due to Matthew's post: I used an older version before, but even after updating to 1.6.6 (the version I use now) I still get an error.
Update 2: Let's say I want to convert every column of class "factor" to a "character" column, but don't know in advance which column is of which class. With a data.frame, I can do the following:
classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)
Can I do something similar with data.table?
Update 3:
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.6
loaded via a namespace (and not attached):
[1] tools_2.13.1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
对于单列:
使用
lapply
和as.character
:For a single column:
Using
lapply
andas.character
:试试这个
Try this
将 Matt Dowle 的评论提高到 Geneorama 的答案 (https://stackoverflow.com/a/20808945/4241780) 以使其更加显而易见(正如所鼓励的),您可以使用 for(...)set(...) 。
由 reprex 包 (v0.3.0) 创建于 2020 年 2 月 12 日
查看其他马特 (Matt) 在 https://stackoverflow.com/a/33000778/4241780 了解更多信息。
编辑。
正如 Espen 和
help(set)
中所述,j
可以是“在以下情况下要赋值的列名称(字符)或数字(整数)”列已经存在”。所以names_factors <- c(1L, 3L)
也可以工作。Raising Matt Dowle's comment to Geneorama's answer (https://stackoverflow.com/a/20808945/4241780) to make it more obvious (as encouraged), you can use
for(...)set(...)
.Created on 2020-02-12 by the reprex package (v0.3.0)
See another of Matt's comments at https://stackoverflow.com/a/33000778/4241780 for more info.
Edit.
As noted by Espen and in
help(set)
,j
may be "Column name(s) (character) or number(s) (integer) to be assigned value when column(s) already exist". Sonames_factors <- c(1L, 3L)
will also work.如果 data.table 中有一个列名列表,您想要更改 do 的类:
If you have a list of column names in data.table, you want to change the class of do:
这是一个糟糕的方法!我只会留下这个答案,以防它解决其他奇怪的问题。这些更好的方法可能部分是较新的 data.table 版本的结果......所以值得花时间记录这种困难的方法。另外,这是
eval
substitute
语法的一个很好的语法示例。这给了你
This is a BAD way to do it! I'm only leaving this answer in case it solves other weird problems. These better methods are the probably partly the result of newer data.table versions... so it's worth while to document this hard way. Plus, this is a nice syntax example for
eval
substitute
syntax.which gives you
我尝试了几种方法。
,或者其他方式
I tried several approaches.
, or otherwise
我提供了一种更通用、更安全的方法来完成这些工作,
函数
..
确保我们获得一个超出 data.table 范围的变量; set_colclass 将设置你的 cols 的类。你可以这样使用它:
I provide a more general and safer way to do this stuff,
The function
..
makes sure we get a variable out of the scope of data.table; set_colclass will set the classes of your cols.You can use it like this:
这里与 @Nera 建议首先检查类的方法相同,但不是使用
.SD
而是使用 data.table 的快速循环和set
作为 @Matt Dowle添加了类检查的解决方案。Here is the same way as @Nera suggested to check the class first but instead of using
.SD
is to use the fast loop of data.table withset
as @Matt Dowle solution with added class check.for 循环将列向量的属性更改为字符类。它实际上将 data.table 视为列表类型。
for loop change the column vector's attribute to character class. It actually treats the data.table as the list type.
尝试:
try: