apply 将数字视为字符

发布于 2024-10-21 08:18:56 字数 560 浏览 6 评论 0原文

我在网上找不到这个问题的解决方案，虽然看起来很简单。就是这样：

#Construct test dataframe 
tf <- data.frame(1:3,4:6,c("A","A","A")) 

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) 

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are integers. 
str(tf)

一般来说，我想根据行/列包含的数据类型来区分我在行/列上应用的函数。

在这里，如果列是数字，我需要一个简单的平均值；如果列是字符列，我需要第一个唯一值。正如您所看到的，apply 按照我编写此函数的方式将所有列视为字符。

原文

I couldn't find a solution for this problem online, as simple as it seems.
Here's it is:

#Construct test dataframe 
tf <- data.frame(1:3,4:6,c("A","A","A")) 

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) 

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are integers. 
str(tf)

In general terms, I want to differentiate what function I apply over a row/column based on what type of data that row/column contains.

Here, I want a simple mean if the column is numeric and the first unique value if the column is a character column. As you can see, apply treats all columns as characters the way I've written this function.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千笙结 2024-10-28 08:18:56

只需编写一个专门的函数并将其放入 sapply 中即可...不要使用 apply(dtf, 2, fun)。此外，您的角色并不像您想象的那么有个性 - 运行 getOption("stringsAsFactors") 并亲自查看。

sapply(tf, class)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"         "factor" 
sapply(tf, storage.mode)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"        "integer"

编辑

或者更好 - 使用lapply：

fn <- function(x) {
  if(is.numeric(x) & !is.factor(x)) {
    mean(x)
  } else if (is.character(x)) {
    unique(x)[1]
  } else if (is.factor(x)) {
    as.character(x)[1]
  }
}

dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)

as.data.frame(lapply(dtf, fn))
  a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
  a b c
1 2 5 A

Just write a specialised function and put it within sapply... don't use apply(dtf, 2, fun). Besides, your character ain't so characterish as you may think - run getOption("stringsAsFactors") and see for yourself.

sapply(tf, class)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"         "factor" 
sapply(tf, storage.mode)
            X1.3             X4.6 c..A....A....A.. 
       "integer"        "integer"        "integer"

EDIT

Or even better - use lapply:

fn <- function(x) {
  if(is.numeric(x) & !is.factor(x)) {
    mean(x)
  } else if (is.character(x)) {
    unique(x)[1]
  } else if (is.factor(x)) {
    as.character(x)[1]
  }
}

dtf <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = FALSE)
dtf2 <- data.frame(a = 1:3, b = 4:6, c = rep("A", 3), stringsAsFactors = TRUE)

as.data.frame(lapply(dtf, fn))
  a b c
1 2 5 A
as.data.frame(lapply(dtf2, fn))
  a b c
1 2 5 A

回复收藏 0 原文

你的他你的她 2024-10-28 08:18:56

您想使用 lapply() 或 sapply()，而不是 apply()。 data.frame 是一个底层列表，apply 会在执行任何操作之前尝试转换为矩阵。由于数据框中至少有一列是字符，因此在形成该矩阵时，所有其他列也会被强制为字符。

回复收藏 0 原文

鼻尖触碰 2024-10-28 08:18:56

我发现 plyr 包中的 numcolwise 和 catcolwise 函数在这里很有用，这是一个语法上简单的解决方案：

首先让我们命名列，以避免难看进行聚合时的列名称：

tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))

然后您可以使用以下一行代码获得所需的结果：

> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
  a b d
1 2 5 A

说明： numcolwise(f) 转换其参数（在本例中 f 是 < code>mean 函数）转换为接受数据帧并仅将 f 应用于数据帧的数字列的函数。类似地，catcolwise 将其函数参数转换为仅对分类列进行操作的函数。

I find the numcolwise and catcolwise functions from the plyr package useful here, for a syntactically simple solution:

First let's name the columns, to avoid ugly column names when doing the aggregation:

tf <- data.frame(a = 1:3,b=4:6, d = c("A","A","A"))

Then you get your desired result with this one-liner:

> cbind(numcolwise(mean)(tf), catcolwise( function(z) unique(z)[1] )(tf))
  a b d
1 2 5 A

Explanation: numcolwise(f) converts its argument ( in this case f is the mean function ) into a function that takes a data-frame and applies f only to the numeric columns of the data-frame. Similarly the catcolwise converts its function argument to a function that operates only on the categorical columns.

回复收藏 0 原文

~没有更多了~