如何将数据框列转换为数字类型?

发布于 2024-08-22 09:26:28 字数 22 浏览 8 评论 0原文

如何将数据框列转换为数字类型?

How do you convert a data frame column to a numeric type?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

猫瑾少女 2024-08-29 09:26:28

由于(仍然)没有人得到复选标记,我假设您考虑了一些实际问题,主要是因为您没有指定要转换为数字的向量类型。我建议您应该应用 transform 函数来完成您的任务。

现在我要演示某些“转换异常”:

# create dummy data.frame
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)

让我们看一下 data.frame

> d
  char fake_char fac char_fac num
1    a         1   1        a   1
2    b         2   2        b   2
3    c         3   3        c   3
4    d         4   4        d   4
5    e         5   5        e   5

并运行:

> sapply(d, mode)
       char   fake_char         fac    char_fac         num 
"character" "character"   "numeric"   "numeric"   "numeric" 
> sapply(d, class)
       char   fake_char         fac    char_fac         num 
"character" "character"    "factor"    "factor"   "integer" 

现在您可能会问自己“异常在哪里?” 嗯,我在 R 中遇到了相当奇怪的事情,这并不是最令人困惑的事情,但它可能会让你感到困惑,特别是如果你在上床睡觉之前读到这篇文章。

这里是:前两列是字符。我故意将第二个nd称为fake_char。找出这个 character 变量与 Dirk 在回复中创建的变量的相似之处。它实际上是一个转换为字符数字向量。第三列和第四列是因子,最后一列是“纯”数字。

如果您使用 transform 函数,您可以将 fake_char 转换为 numeric,但不能将 char 变量本身转换。

> transform(d, char = as.numeric(char))
  char fake_char fac char_fac num
1   NA         1   1        a   1
2   NA         2   2        b   2
3   NA         3   3        c   3
4   NA         4   4        d   4
5   NA         5   5        e   5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion

但如果你对 fake_charchar_fac 做同样的事情,你会很幸运,并且不会出现 NA 的情况:

> transform(d, fake_char = as.numeric(fake_char), 
               char_fac = as.numeric(char_fac))

  char fake_char fac char_fac num
1    a         1   1        1   1
2    b         2   2        2   2
3    c         3   3        3   3
4    d         4   4        4   4
5    e         5   5        5   5

如果你保存转换后的 data.frame 并检查 modeclass,你会得到:

> D <- transform(d, fake_char = as.numeric(fake_char), 
                    char_fac = as.numeric(char_fac))

> sapply(D, mode)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"   "numeric"   "numeric"   "numeric" 
> sapply(D, class)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"    "factor"   "numeric"   "integer"

所以,结论是: 是的,你可以转换 character< /code> 向量转换为数字,但前提是它的元素“可转换”为数字如果只有一个如果向量中存在字符 元素,则在尝试将该向量转换为数字 时会出现错误。

只是为了证明我的观点:

> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion 
> char
[1]  1 NA  3  4 NA

现在,只是为了好玩(或练习),尝试猜测这些命令的输出:

> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???

向 Patrick Burns 致以亲切的问候! =)

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to numeric. I suggest that you should apply transform function in order to complete your task.

Now I'm about to demonstrate certain "conversion anomaly":

# create dummy data.frame
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)

Let us have a glance at data.frame

> d
  char fake_char fac char_fac num
1    a         1   1        a   1
2    b         2   2        b   2
3    c         3   3        c   3
4    d         4   4        d   4
5    e         5   5        e   5

and let us run:

> sapply(d, mode)
       char   fake_char         fac    char_fac         num 
"character" "character"   "numeric"   "numeric"   "numeric" 
> sapply(d, class)
       char   fake_char         fac    char_fac         num 
"character" "character"    "factor"    "factor"   "integer" 

Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.

Here goes: first two columns are character. I've deliberately called 2nd one fake_char. Spot the similarity of this character variable with one that Dirk created in his reply. It's actually a numerical vector converted to character. 3rd and 4th column are factor, and the last one is "purely" numeric.

If you utilize transform function, you can convert the fake_char into numeric, but not the char variable itself.

> transform(d, char = as.numeric(char))
  char fake_char fac char_fac num
1   NA         1   1        a   1
2   NA         2   2        b   2
3   NA         3   3        c   3
4   NA         4   4        d   4
5   NA         5   5        e   5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion

but if you do same thing on fake_char and char_fac, you'll be lucky, and get away with no NA's:

> transform(d, fake_char = as.numeric(fake_char), 
               char_fac = as.numeric(char_fac))

  char fake_char fac char_fac num
1    a         1   1        1   1
2    b         2   2        2   2
3    c         3   3        3   3
4    d         4   4        4   4
5    e         5   5        5   5

If you save transformed data.frame and check for mode and class, you'll get:

> D <- transform(d, fake_char = as.numeric(fake_char), 
                    char_fac = as.numeric(char_fac))

> sapply(D, mode)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"   "numeric"   "numeric"   "numeric" 
> sapply(D, class)
       char   fake_char         fac    char_fac         num 
"character"   "numeric"    "factor"   "numeric"   "integer"

So, the conclusion is: Yes, you can convert character vector into a numeric one, but only if it's elements are "convertible" to numeric. If there's just one character element in vector, you'll get error when trying to convert that vector to numerical one.

And just to prove my point:

> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion 
> char
[1]  1 NA  3  4 NA

And now, just for fun (or practice), try to guess the output of these commands:

> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???

Kind regards to Patrick Burns! =)

喜爱纠缠 2024-08-29 09:26:28

对我有帮助的事情是:如果您有一系列变量需要转换(或不止一个),您可以使用 sapply。

有点无意义,但仅举个例子:

data(cars)
cars[, 1:2] <- sapply(cars[, 1:2], as.factor)

假设数据帧的第 3、6-15 和 37 列需要转换为数字,则可以:

dat[, c(3,6:15,37)] <- sapply(dat[, c(3,6:15,37)], as.numeric)

Something that has helped me: if you have ranges of variables to convert (or just more than one), you can use sapply.

A bit nonsensical but just for example:

data(cars)
cars[, 1:2] <- sapply(cars[, 1:2], as.factor)

Say columns 3, 6-15 and 37 of you dataframe need to be converted to numeric one could:

dat[, c(3,6:15,37)] <- sapply(dat[, c(3,6:15,37)], as.numeric)
叫思念不要吵 2024-08-29 09:26:28

如果x是数据帧dat的列名,并且x是类型因子,则使用:

as.numeric(as.character(dat$x))

if x is the column name of dataframe dat, and x is of type factor, use:

as.numeric(as.character(dat$x))
我纯我任性 2024-08-29 09:26:28

我会添加一条评论(不能低评级)

只是为了添加 user276042 和 pangratz

dat$x = as.numeric(as.character(dat$x))

这将覆盖现有列 x 的值

I would have added a comment (cant low rating)

Just to add on user276042 and pangratz

dat$x = as.numeric(as.character(dat$x))

This will override the values of existing column x

慕烟庭风 2024-08-29 09:26:28

使用以下代码,您可以将所有数据框列转换为数字(X 是我们要转换其列的数据框):

as.data.frame(lapply(X, as.numeric))

要将整个矩阵转换为数字,您有两种方法:
或者:

mode(X) <- "numeric"

或者: 或者,

X <- apply(X, 2, as.numeric)

您可以使用 data.matrix 函数将所有内容转换为数字,但请注意,这些因素可能无法正确转换,因此将所有内容转换为字符更安全/code> 第一个:

X <- sapply(X, as.character)
X <- data.matrix(X)

如果我想同时转换为矩阵和数字,我通常使用最后一个

With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns):

as.data.frame(lapply(X, as.numeric))

and for converting whole matrix into numeric you have two ways:
Either:

mode(X) <- "numeric"

or:

X <- apply(X, 2, as.numeric)

Alternatively you can use data.matrix function to convert everything into numeric, although be aware that the factors might not get converted correctly, so it is safer to convert everything to character first:

X <- sapply(X, as.character)
X <- data.matrix(X)

I usually use this last one if I want to convert to matrix and numeric simultaneously

情深已缘浅 2024-08-29 09:26:28

虽然您的问题严格限于数字,但在开始 R 时,有许多转换很难理解。我的目标是解决提供帮助的方法。这个问题类似于这个问题

在 R 中,类型转换可能很痛苦,因为 (1) 因子不能直接转换为数字,它们需要首先转换为字符类,(2) 日期是一种特殊情况,通常需要单独处理,并且(3) 跨数据框列循环可能很棘手。幸运的是,“tidyverse”已经解决了大部分问题。

此解决方案使用 mutate_each() 将函数应用于数据框中的所有列。在本例中,我们希望应用 type.convert() 函数,该函数可以将字符串转换为数字。因为 R 喜欢因子(不知道为什么),所以应该保留字符的字符列被更改为因子。为了解决这个问题,使用 mutate_if() 函数来检测作为因子并更改为字符的列。最后,我想展示如何使用 lubridate 将字符类中的时间戳更改为日期时间,因为这通常也是初学者的一个障碍。

library(tidyverse) 
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
    mutate_all(type.convert) %>%
    mutate_if(is.factor, as.character) %>%
    mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90

While your question is strictly on numeric, there are many conversions that are difficult to understand when beginning R. I'll aim to address methods to help. This question is similar to This Question.

Type conversion can be a pain in R because (1) factors can't be converted directly to numeric, they need to be converted to character class first, (2) dates are a special case that you typically need to deal with separately, and (3) looping across data frame columns can be tricky. Fortunately, the "tidyverse" has solved most of the issues.

This solution uses mutate_each() to apply a function to all columns in a data frame. In this case, we want to apply the type.convert() function, which converts strings to numeric where it can. Because R loves factors (not sure why) character columns that should stay character get changed to factor. To fix this, the mutate_if() function is used to detect columns that are factors and change to character. Last, I wanted to show how lubridate can be used to change a timestamp in character class to date-time because this is also often a sticking block for beginners.

library(tidyverse) 
library(lubridate)

# Recreate data that needs converted to numeric, date-time, etc
data_df
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX  PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                 <chr>  <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
#> 1 2012-05-04 09:30:00    BAC     T 7.8900 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.8850   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.8900  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.8900 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.8900 85053     F  7.88 108101  7.90

# Converting columns to numeric using "tidyverse"
data_df %>%
    mutate_all(type.convert) %>%
    mutate_if(is.factor, as.character) %>%
    mutate(TIMESTAMP = as_datetime(TIMESTAMP, tz = Sys.timezone()))
#> # A tibble: 5 × 9
#>             TIMESTAMP SYMBOL    EX PRICE  SIZE  COND   BID BIDSIZ   OFR
#>                <dttm>  <chr> <chr> <dbl> <int> <chr> <dbl>  <int> <dbl>
#> 1 2012-05-04 09:30:00    BAC     T 7.890 38538     F  7.89    523  7.90
#> 2 2012-05-04 09:30:01    BAC     Z 7.885   288     @  7.88  61033  7.90
#> 3 2012-05-04 09:30:03    BAC     X 7.890  1000     @  7.88   1974  7.89
#> 4 2012-05-04 09:30:07    BAC     T 7.890 19052     F  7.88   1058  7.89
#> 5 2012-05-04 09:30:08    BAC     Y 7.890 85053     F  7.88 108101  7.90
魔法唧唧 2024-08-29 09:26:28

如果您遇到以下问题:

as.numeric(as.character(dat$x))

查看您的小数位数。如果它们是“,”而不是“.” (例如“5,3”)以上不起作用。

一个可能的解决方案是:

as.numeric(gsub(",", ".", dat$x))

我相信这在一些非英语国家很常见。

If you run into problems with:

as.numeric(as.character(dat$x))

Take a look to your decimal marks. If they are "," instead of "." (e.g. "5,3") the above won't work.

A potential solution is:

as.numeric(gsub(",", ".", dat$x))

I believe this is quite common in some non English speaking countries.

紫罗兰の梦幻 2024-08-29 09:26:28

蒂姆是正确的,谢恩有遗漏。以下是其他示例:

R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a), 
                        numchr = as.numeric(as.character(df$a)))
R> df
   a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
  a          num           numchr    
 10:1   Min.   :1.00   Min.   :10.0  
 11:1   1st Qu.:2.25   1st Qu.:11.2  
 12:1   Median :3.50   Median :12.5  
 13:1   Mean   :3.50   Mean   :12.5  
 14:1   3rd Qu.:4.75   3rd Qu.:13.8  
 15:1   Max.   :6.00   Max.   :15.0  
R> 

我们的 data.frame 现在具有因子列(计数)的摘要和 as.numeric() 的数字摘要 --- 为 < em>错误,因为它获得了数字因子级别 --- 以及 as.numeric(as.character()) 的(正确)摘要。

Tim is correct, and Shane has an omission. Here are additional examples:

R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a), 
                        numchr = as.numeric(as.character(df$a)))
R> df
   a num numchr
1 10   1     10
2 11   2     11
3 12   3     12
4 13   4     13
5 14   5     14
6 15   6     15
R> summary(df)
  a          num           numchr    
 10:1   Min.   :1.00   Min.   :10.0  
 11:1   1st Qu.:2.25   1st Qu.:11.2  
 12:1   Median :3.50   Median :12.5  
 13:1   Mean   :3.50   Mean   :12.5  
 14:1   3rd Qu.:4.75   3rd Qu.:13.8  
 15:1   Max.   :6.00   Max.   :15.0  
R> 

Our data.frame now has a summary of the factor column (counts) and numeric summaries of the as.numeric() --- which is wrong as it got the numeric factor levels --- and the (correct) summary of the as.numeric(as.character()).

妥活 2024-08-29 09:26:28

使用 type.convert() 和 rapply() 的通用方法:

convert_types <- function(x) {
    stopifnot(is.list(x))
    x[] <- rapply(x, utils::type.convert, classes = "character",
                  how = "replace", as.is = TRUE)
    return(x)
}
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)
sapply(d, class)
#>        char   fake_char         fac    char_fac         num 
#> "character" "character"    "factor"    "factor"   "integer"
sapply(convert_types(d), class)
#>        char   fake_char         fac    char_fac         num 
#> "character"   "integer"    "factor"    "factor"   "integer"

Universal way using type.convert() and rapply():

convert_types <- function(x) {
    stopifnot(is.list(x))
    x[] <- rapply(x, utils::type.convert, classes = "character",
                  how = "replace", as.is = TRUE)
    return(x)
}
d <- data.frame(char = letters[1:5], 
                fake_char = as.character(1:5), 
                fac = factor(1:5), 
                char_fac = factor(letters[1:5]), 
                num = 1:5, stringsAsFactors = FALSE)
sapply(d, class)
#>        char   fake_char         fac    char_fac         num 
#> "character" "character"    "factor"    "factor"   "integer"
sapply(convert_types(d), class)
#>        char   fake_char         fac    char_fac         num 
#> "character"   "integer"    "factor"    "factor"   "integer"
故笙诉离歌 2024-08-29 09:26:28

要将数据框列转换为数字,您只需执行以下操作:-

将因子转换为数字:-

data_frame$column <- as.numeric(as.character(data_frame$column))

To convert a data frame column to numeric you just have to do:-

factor to numeric:-

data_frame$column <- as.numeric(as.character(data_frame$column))
最佳男配角 2024-08-29 09:26:28

尽管其他人已经很好地涵盖了该主题,但我想添加这个额外的快速想法/提示。您可以使用 regexp 提前检查字符是否可能仅包含数字。

for(i in seq_along(names(df)){
     potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)

有关更复杂的正则表达式以及学习/体验其强大功能的简单原因,请参阅这个非常好的网站:http://regexr.com/< /a>

Though others have covered the topic pretty well, I'd like to add this additional quick thought/hint. You could use regexp to check in advance whether characters potentially consist only of numerics.

for(i in seq_along(names(df)){
     potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)

For more sophisticated regular expressions and a neat why to learn/experience their power see this really nice website: http://regexr.com/

森林很绿却致人迷途 2024-08-29 09:26:28

如果数据框具有多种类型的列,一些字符,一些数字,请尝试以下方法仅将包含数字值的列转换为数字:

for (i in 1:length(data[1,])){
  if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
  else {
    data[,i]<-as.numeric(data[,i])
  }
}

If the dataframe has multiple types of columns, some characters, some numeric try the following to convert just the columns that contain numeric values to numeric:

for (i in 1:length(data[1,])){
  if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
  else {
    data[,i]<-as.numeric(data[,i])
  }
}
苍暮颜 2024-08-29 09:26:28

hablar::convert

要轻松地将多个列转换为不同的数据类型,您可以使用 hablar::convert。简单语法:df %>% Convert(num(a)) 将列 a 从 df 转换为数字。

详细示例

让我们将 mtcars 的所有列转换为字符。

df <- mtcars %>% mutate_all(as.character) %>% as_tibble()

> df
# A tibble: 32 x 11
   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
 2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
 3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    

使用 hablar::convert:

library(hablar)

# Convert columns to integer, numeric and factor
df %>% 
  convert(int(cyl, vs),
          num(disp:wt),
          fct(gear))

结果:

# A tibble: 32 x 11
   mpg     cyl  disp    hp  drat    wt qsec     vs am    gear  carb 
   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
 1 21        6  160    110  3.9   2.62 16.46     0 1     4     4    
 2 21        6  160    110  3.9   2.88 17.02     0 1     4     4    
 3 22.8      4  108     93  3.85  2.32 18.61     1 1     4     1    
 4 21.4      6  258    110  3.08  3.22 19.44     1 0     3     1   

with hablar::convert

To easily convert multiple columns to different data types you can use hablar::convert. Simple syntax: df %>% convert(num(a)) converts the column a from df to numeric.

Detailed example

Lets convert all columns of mtcars to character.

df <- mtcars %>% mutate_all(as.character) %>% as_tibble()

> df
# A tibble: 32 x 11
   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
 2 21    6     160   110   3.9   2.875 17.02 0     1     4     4    
 3 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    

With hablar::convert:

library(hablar)

# Convert columns to integer, numeric and factor
df %>% 
  convert(int(cyl, vs),
          num(disp:wt),
          fct(gear))

results in:

# A tibble: 32 x 11
   mpg     cyl  disp    hp  drat    wt qsec     vs am    gear  carb 
   <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
 1 21        6  160    110  3.9   2.62 16.46     0 1     4     4    
 2 21        6  160    110  3.9   2.88 17.02     0 1     4     4    
 3 22.8      4  108     93  3.85  2.32 18.61     1 1     4     1    
 4 21.4      6  258    110  3.08  3.22 19.44     1 0     3     1   
长伴 2024-08-29 09:26:28

考虑到可能存在 char 列,这是基于 获取列类型中的 @Abdou Excel工作表自动答案:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}
df<-makenumcols(df)

Considering there might exist char columns, this is based on @Abdou in Get column types of excel sheet automatically answer:

makenumcols<-function(df){
  df<-as.data.frame(df)
  df[] <- lapply(df, as.character)
  cond <- apply(df, 2, function(x) {
    x <- x[!is.na(x)]
    all(suppressWarnings(!is.na(as.numeric(x))))
  })
  numeric_cols <- names(df)[cond]
  df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
  return(df)
}
df<-makenumcols(df)
疧_╮線 2024-08-29 09:26:28

如果您不关心保留因子,并且希望将其应用到可以转换为数字的任何列,我使用了下面的脚本。
如果 df 是您的原始数据框,您可以使用下面的脚本。

df[] <- lapply(df, as.character)
df <- data.frame(lapply(df, function(x) ifelse(!is.na(as.numeric(x)), as.numeric(x),  x)))

我引用了 Shane 的Joran 的 解决方案顺便说一句

If you don't care about preserving the factors, and want to apply it to any column that can get converted to numeric, I used the script below.
if df is your original dataframe, you can use the script below.

df[] <- lapply(df, as.character)
df <- data.frame(lapply(df, function(x) ifelse(!is.na(as.numeric(x)), as.numeric(x),  x)))

I referenced Shane's and Joran's solution btw

半山落雨半山空 2024-08-29 09:26:28

在我的电脑 (R v.3.2.3) 中,applysapply 给出错误。 lapply 效果很好。

dt[,2:4] <- lapply(dt[,2:4], function (x) as.factor(as.numeric(x)))

In my PC (R v.3.2.3), apply or sapply give error. lapply works well.

dt[,2:4] <- lapply(dt[,2:4], function (x) as.factor(as.numeric(x)))
呆° 2024-08-29 09:26:28

要将字符转换为数字,您必须通过应用将其转换为因子。

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

您必须使用相同的数据创建两列,因为一列无法转换为数字。如果您进行一次转换,则会出现以下错误

transform(BankData, LoanApp=as.numeric(LoanApproval))
警告消息:
  在 eval(substitute(list(...)), `_data`,parent.frame()) 中:
  通过强制引入的 NA

因此,在应用两列相同的数据后,

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

它将成功地将字符转换为数字

To convert character to numeric you have to convert it into factor by applying

BankFinal1 <- transform(BankLoan,   LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))

You have to make two columns with the same data, because one column cannot convert into numeric. If you do one conversion it gives the below error

transform(BankData, LoanApp=as.numeric(LoanApproval))
Warning message:
  In eval(substitute(list(...)), `_data`, parent.frame()) :
  NAs introduced by coercion

so, after doing two column of the same data apply

BankFinal1 <- transform(BankFinal1, LoanApp      = as.numeric(LoanApp), 
                                    LoanApproval = as.numeric(LoanApproval))

it will transform the character to numeric successfully

泪之魂 2024-08-29 09:26:28

df 是您的数据框。 x 是您要转换的 df 的列

as.numeric(factor(df$x))

df ist your dataframe. x is a column of df you want to convert

as.numeric(factor(df$x))
萤火眠眠 2024-08-29 09:26:28

转换为仅包含数字的列,带或不带小数点分隔符

# detect which columsn have numeric characters (digits) with or without decimal separator "."
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*$", x))  
)

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

请参阅下面带有 iris 的示例

library(dplyr) # for glimpse

# get example data
df <- iris

# convert from numeric columns to charactere
df$Sepal.Length <- as.character(df$Sepal.Length)
df$Sepal.Width <- as.character(df$Sepal.Width)
df$Petal.Length <- as.character(df$Petal.Length)
df$Petal.Width <- as.character(df$Petal.Width)

glimpse(df)

使用 glimpse() 检查数据

>glimpse(df)
Rows: 150
Columns: 5
$ Sepal.Length <chr> "5.1", "4.9", "4.7", "4.6", "5",…
$ Sepal.Width  <chr> "3.5", "3", "3.2", "3.1", "3.6",…
$ Petal.Length <chr> "1.4", "1.4", "1.3", "1.5", "1.4…
$ Petal.Width  <chr> "0.2", "0.2", "0.2", "0.2", "0.2…
$ Species      <fct> setosa, setosa, setosa, setosa, …

检测哪些列具有使用正则表达式 (regex) 带或不带小数点分隔符 (.) 的数字字符(数字)

# detect which columns have numeric characters (digits) with or without decimal separator (.)
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*$", x))
)
# where: 
# ^ indicates the begginig of the string
# \\d+ corresponds to a sequence of one or more digits 
# \\.? indicates the that points is optional (it can appear zero or more times due the ?)
# \\d* corresponds to zero or more digits after the 'optional' point 
# $ indicates the end of the string

继续使用 lapply 进行转换

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

检查最终输出

# check again 
glimpse(df)
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, …

Convert to numeric only columns with digits with or without decimal separator

# detect which columsn have numeric characters (digits) with or without decimal separator "."
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*
quot;, x))  
)

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

See an example with iris below

library(dplyr) # for glimpse

# get example data
df <- iris

# convert from numeric columns to charactere
df$Sepal.Length <- as.character(df$Sepal.Length)
df$Sepal.Width <- as.character(df$Sepal.Width)
df$Petal.Length <- as.character(df$Petal.Length)
df$Petal.Width <- as.character(df$Petal.Width)

glimpse(df)

Check the data with glimpse()

>glimpse(df)
Rows: 150
Columns: 5
$ Sepal.Length <chr> "5.1", "4.9", "4.7", "4.6", "5",…
$ Sepal.Width  <chr> "3.5", "3", "3.2", "3.1", "3.6",…
$ Petal.Length <chr> "1.4", "1.4", "1.3", "1.5", "1.4…
$ Petal.Width  <chr> "0.2", "0.2", "0.2", "0.2", "0.2…
$ Species      <fct> setosa, setosa, setosa, setosa, …

Detect which columns have numeric characters (digits) with or without decimal separator point (.) using regular expressions (regex)

# detect which columns have numeric characters (digits) with or without decimal separator (.)
columns_with_digits <- sapply(df, function(x) 
  all(grepl("^\\d+\\.?\\d*
quot;, x))
)
# where: 
# ^ indicates the begginig of the string
# \\d+ corresponds to a sequence of one or more digits 
# \\.? indicates the that points is optional (it can appear zero or more times due the ?)
# \\d* corresponds to zero or more digits after the 'optional' point 
# $ indicates the end of the string

Proceed to convert with lapply

# run as.numeric only in the detected columns 
df[, columns_with_digits] <- data.frame(lapply(df[, columns_with_digits], as.numeric))

Check final output

# check again 
glimpse(df)
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, …
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文