用fread()''csv'文件替换点,逗号和百分比标记``''

发布于 2025-02-11 21:06:16 字数 1220 浏览 1 评论 0原文

我们想从csv文件中管理列,最初是三个contrac列类,当我们使用fread(),并在此处详细介绍了参数R代码 e节(指定了分隔符和十进制参数)。 R会话版本为4.2.0data.table版本是1.14.2

输入来自csv file


col_1,col_2, col_3
/100.432,"30,84 %","4,14"
/3.200,"62,89 %","1,89"
/10.100,"50,00 %","1,62"
/15.570, "40,10 %","3,41"
/900.310, "8,00 %","0,10"

input 数据的数据r sessign

> dat
# A tibble: 5 × 3
 
  col_1   col_2    col_3
  <chr>  <chr>   <chr>
1 100.432 30,84 % 4,14 
2   3.200 62,89 % 1,89 
3  10.100 50,00 % 1,62 
4  15.570 40,10 % 3,41 
5 900.310  8,00 % 0,10

r代码


data.table::fread(
  x,
  sep = ',',
  dec = '.',
  na.strings = c('', 'NA')) %>%
as_tibble()

所需的 output 数据


> dat
# A tibble: 5 × 3
 
 col_1   col_2  col_3
  <dbl>  <dbl>  <dbl>
1 100438 30.84  4.14 
2   3200 62.89  1.89 
3  10100 50.00  1.62 
4  15570 40.10  3.41 
5 900310  8.00  0.10

问题

我们想获得所需的输出数据格式。

提前致谢

We would like to manage the columns from csv file with originally three character columns class when we used the fread() with the arguments detailed in the R code used section (separator and decimal arguments are specified). The R session version is 4.2.0 and the data.table version is 1.14.2.

Input data from csv file


col_1,col_2, col_3
/100.432,"30,84 %","4,14"
/3.200,"62,89 %","1,89"
/10.100,"50,00 %","1,62"
/15.570, "40,10 %","3,41"
/900.310, "8,00 %","0,10"

Input data in R session

> dat
# A tibble: 5 × 3
 
  col_1   col_2    col_3
  <chr>  <chr>   <chr>
1 100.432 30,84 % 4,14 
2   3.200 62,89 % 1,89 
3  10.100 50,00 % 1,62 
4  15.570 40,10 % 3,41 
5 900.310  8,00 % 0,10

R code used


data.table::fread(
  x,
  sep = ',',
  dec = '.',
  na.strings = c('', 'NA')) %>%
as_tibble()

Desired output data


> dat
# A tibble: 5 × 3
 
 col_1   col_2  col_3
  <dbl>  <dbl>  <dbl>
1 100438 30.84  4.14 
2   3200 62.89  1.89 
3  10100 50.00  1.62 
4  15570 40.10  3.41 
5 900310  8.00  0.10

Question

We would like to obtain the Desired output data format.

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

偏爱你一生 2025-02-18 21:06:16

您可以在R中进行一些后处理:

RAW = fread('col_1,col_2, col_3
/100.432,"30,84Â %","4,14"
/3.200,"62,89Â %","1,89"
/10.100,"50,00Â %","1,62"
/15.570, "40,10Â %","3,41"
/900.310, "8,00Â %","0,10"')

#       col_1    col_2  col_3
#      <char>   <char> <char>
# 1: /100.432 30,84Â %   4,14
# 2:   /3.200 62,89Â %   1,89
# 3:  /10.100 50,00Â %   1,62
# 4:  /15.570 40,10Â %   3,41
# 5: /900.310  8,00Â %   0,10

library(readr)
RAW[, lapply(.SD, \(x) parse_number(x, locale = locale(decimal_mark = ",")))]
# Or in base R:
RAW[, lapply(.SD, \(x) gsub("[^0-9.]", "", chartr(".,", "_.", x)) |> as.numeric())]

#     col_1 col_2 col_3
#     <num> <num> <num>
# 1: 100432 30.84  4.14
# 2:   3200 62.89  1.89
# 3:  10100 50.00  1.62
# 4:  15570 40.10  3.41
# 5: 900310  8.00  0.10

You could just do some postprocessing in R:

RAW = fread('col_1,col_2, col_3
/100.432,"30,84Â %","4,14"
/3.200,"62,89Â %","1,89"
/10.100,"50,00Â %","1,62"
/15.570, "40,10Â %","3,41"
/900.310, "8,00Â %","0,10"')

#       col_1    col_2  col_3
#      <char>   <char> <char>
# 1: /100.432 30,84Â %   4,14
# 2:   /3.200 62,89Â %   1,89
# 3:  /10.100 50,00Â %   1,62
# 4:  /15.570 40,10Â %   3,41
# 5: /900.310  8,00Â %   0,10

library(readr)
RAW[, lapply(.SD, \(x) parse_number(x, locale = locale(decimal_mark = ",")))]
# Or in base R:
RAW[, lapply(.SD, \(x) gsub("[^0-9.]", "", chartr(".,", "_.", x)) |> as.numeric())]

#     col_1 col_2 col_3
#     <num> <num> <num>
# 1: 100432 30.84  4.14
# 2:   3200 62.89  1.89
# 3:  10100 50.00  1.62
# 4:  15570 40.10  3.41
# 5: 900310  8.00  0.10
一片旧的回忆 2025-02-18 21:06:16

我已经改善了@Sindri_Baldur提出的后处理。在中dplyr版本1.0.9代码应为:

input 来自csv file file


dat = data.table::fread(
 'col_1,col_2, col_3
 /100.432,"30,84Â %","4,14"
 /3.200,"62,89Â %","1,89"
 /10.100,"50,00Â %","1,62"
 /15.570, "40,10Â %","3,41"
 /900.310, "8,00Â %","0,10"')

输入 r的数据会话

> dat
# A tibble: 5 × 3
 
  col_1   col_2    col_3
  <chr>  <chr>   <chr>
1 100.432 30,84 % 4,14 
2   3.200 62,89 % 1,89 
3  10.100 50,00 % 1,62 
4  15.570 40,10 % 3,41 
5 900.310  8,00 % 0,10

r代码


> dat <- dat %>%
    as_tibble() %>%
    mutate_at(
      vars(everything()),
      ~ gsub('[^0-9.]', '', chartr('.,', '_.', .))) %>%
    mutate_at(
      vars(everything()),
      ~ as.numeric(as.character(.)))

最终输出数据,


> dat
# A tibble: 5 × 3

   col_1 col_2 col_3
   <dbl> <dbl> <dbl>
1 100432  30.8  4.14
2   3200  62.9  1.89
3  10100  50    1.62
4  15570  40.1  3.41
5 900310   8    0.1

感谢大家提供解决方案并改进代码。

I've improved the postprocessing proposed by @sindri_baldur. In dplyr version 1.0.9 the code should be:

Input data from csv file


dat = data.table::fread(
 'col_1,col_2, col_3
 /100.432,"30,84Â %","4,14"
 /3.200,"62,89Â %","1,89"
 /10.100,"50,00Â %","1,62"
 /15.570, "40,10Â %","3,41"
 /900.310, "8,00Â %","0,10"')

Input data in R session

> dat
# A tibble: 5 × 3
 
  col_1   col_2    col_3
  <chr>  <chr>   <chr>
1 100.432 30,84 % 4,14 
2   3.200 62,89 % 1,89 
3  10.100 50,00 % 1,62 
4  15.570 40,10 % 3,41 
5 900.310  8,00 % 0,10

R code used


> dat <- dat %>%
    as_tibble() %>%
    mutate_at(
      vars(everything()),
      ~ gsub('[^0-9.]', '', chartr('.,', '_.', .))) %>%
    mutate_at(
      vars(everything()),
      ~ as.numeric(as.character(.)))

Final output data


> dat
# A tibble: 5 × 3

   col_1 col_2 col_3
   <dbl> <dbl> <dbl>
1 100432  30.8  4.14
2   3200  62.9  1.89
3  10100  50    1.62
4  15570  40.1  3.41
5 900310   8    0.1

Thanks to everybody for giving a solution and improving the code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文