apply 将数字视为字符

发布于 2024-10-21 08:18:56 字数 560 浏览 6 评论 0原文

我在网上找不到这个问题的解决方案,虽然看起来很简单。 就是这样:

#Construct test dataframe 
tf <- data.frame(1:3,4:6,c("A","A","A")) 

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) 

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are integers. 
str(tf) 

一般来说,我想根据行/列包含的数据类型来区分我在行/列上应用的函数。

在这里,如果列是数字,我需要一个简单的平均值;如果列是字符列,我需要第一个唯一值。正如您所看到的,apply 按照我编写此函数的方式将所有列视为字符。

I couldn't find a solution for this problem online, as simple as it seems.
Here's it is:

#Construct test dataframe 
tf <- data.frame(1:3,4:6,c("A","A","A")) 

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) 

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are integers. 
str(tf) 

In general terms, I want to differentiate what function I apply over a row/column based on what type of data that row/column contains.

Here, I want a simple mean if the column is numeric and the first unique value if the column is a character column. As you can see, apply treats all columns as characters the way I've written this function.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

千笙结 2024-10-28 08:18:56

只需编写一个专门的函数并将其放入 sapply 中即可...不要使用 apply(dtf, 2, fun)。此外,您的角色并不像您想象的那么有个性 - 运行 getOption("stringsAsFactors") 并亲自查看。

sapply(tf, class)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"         "factor" 
sapply(tf, storage.mode)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"        "integer"

编辑

或者更好 - 使用lapply

fn <- function(x) {
  if(is.numeric(x) & !is.factor(x)) {
    mean(x)
  } else if (is.character(x)) {
    unique(x)[1]
  } else if (is.factor(x)) {
    as.character(x)[1]
  }
}

dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)

as.data.frame(lapply(dtf, fn))
  a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
  a b c
1 2 5 A 

Just write a specialised function and put it within sapply... don't use apply(dtf, 2, fun). Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors") and see for yourself.

sapply(tf, class)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"         "factor" 
sapply(tf, storage.mode)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"        "integer"

EDIT

Or even better - use lapply:

fn <- function(x) {
  if(is.numeric(x) & !is.factor(x)) {
    mean(x)
  } else if (is.character(x)) {
    unique(x)[1]
  } else if (is.factor(x)) {
    as.character(x)[1]
  }
}

dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)

as.data.frame(lapply(dtf, fn))
  a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
  a b c
1 2 5 A 
你的他你的她 2024-10-28 08:18:56

您想使用 lapply() 或 sapply(),而不是 apply()。 data.frame 是一个底层列表,apply 会在执行任何操作之前尝试转换为矩阵。由于数据框中至少有一列是字符,因此在形成该矩阵时,所有其他列也会被强制为字符。

You want to use lapply() or sapply(), not apply(). A data.frame is a list under the hood, which apply will try to convert to a matrix before doing anything. Since at least one column in your data frame is character, every other column also gets coerced to character in forming that matrix.

鼻尖触碰 2024-10-28 08:18:56

我发现 plyr 包中的 numcolwisecatcolwise 函数在这里很有用,这是一个语法上简单的解决方案:

首先让我们命名列,以避免难看进行聚合时的列名称:

tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))

然后您可以使用以下一行代码获得所需的结果:

> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
  a b d
1 2 5 A

说明: numcolwise(f) 转换其参数(在本例中 f 是 < code>mean 函数)转换为接受数据帧并仅将 f 应用于数据帧的数字列的函数。类似地,catcolwise 将其函数参数转换为仅对分类列进行操作的函数。

I find the numcolwise and catcolwise functions from the plyr package useful here, for a syntactically simple solution:

First let's name the columns, to avoid ugly column names when doing the aggregation:

tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))

Then you get your desired result with this one-liner:

> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
  a b d
1 2 5 A

Explanation: numcolwise(f) converts its argument ( in this case f is the mean function ) into a function that takes a data-frame and applies f only to the numeric columns of the data-frame. Similarly the catcolwise converts its function argument to a function that operates only on the categorical columns.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文