apply 将数字视为字符
我在网上找不到这个问题的解决方案,虽然看起来很简单。 就是这样:
#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))
#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])
#Look at the output--all columns treated as character columns...
test
#Look at the format of the original data--the first two columns are integers.
str(tf)
一般来说,我想根据行/列包含的数据类型来区分我在行/列上应用的函数。
在这里,如果列是数字,我需要一个简单的平均值;如果列是字符列,我需要第一个唯一值。正如您所看到的,apply
按照我编写此函数的方式将所有列视为字符。
I couldn't find a solution for this problem online, as simple as it seems.
Here's it is:
#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))
#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])
#Look at the output--all columns treated as character columns...
test
#Look at the format of the original data--the first two columns are integers.
str(tf)
In general terms, I want to differentiate what function I apply
over a row/column based on what type of data that row/column contains.
Here, I want a simple mean
if the column is numeric and the first unique
value if the column is a character column. As you can see, apply
treats all columns as characters the way I've written this function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
只需编写一个专门的函数并将其放入
sapply
中即可...不要使用apply(dtf, 2, fun)
。此外,您的角色并不像您想象的那么有个性 - 运行getOption("stringsAsFactors")
并亲自查看。编辑
或者更好 - 使用
lapply
:Just write a specialised function and put it within
sapply
... don't useapply(dtf, 2, fun)
. Besides, your character ain't so characterish as you may think - rungetOption("stringsAsFactors")
and see for yourself.EDIT
Or even better - use
lapply
:您想使用 lapply() 或 sapply(),而不是 apply()。 data.frame 是一个底层列表,apply 会在执行任何操作之前尝试转换为矩阵。由于数据框中至少有一列是字符,因此在形成该矩阵时,所有其他列也会被强制为字符。
You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.
我发现
plyr
包中的numcolwise
和catcolwise
函数在这里很有用,这是一个语法上简单的解决方案:首先让我们命名列,以避免难看进行聚合时的列名称:
然后您可以使用以下一行代码获得所需的结果:
说明:
numcolwise(f)
转换其参数(在本例中f
是 < code>mean 函数)转换为接受数据帧并仅将f
应用于数据帧的数字列的函数。类似地,catcolwise
将其函数参数转换为仅对分类列进行操作的函数。I find the
numcolwise
andcatcolwise
functions from theplyr
package useful here, for a syntactically simple solution:First let's name the columns, to avoid ugly column names when doing the aggregation:
Then you get your desired result with this one-liner:
Explanation:
numcolwise(f)
converts its argument ( in this casef
is themean
function ) into a function that takes a data-frame and appliesf
only to the numeric columns of the data-frame. Similarly thecatcolwise
converts its function argument to a function that operates only on the categorical columns.