使用read_excel或read.excel在r中跳过多个标头行

发布于 2025-01-17 15:26:24 字数 970 浏览 6 评论 0原文

对于具有多行标头的Excel文件,如(从 there ))

“在此处输入图像说明”

我如何跳过行unit频率并使用indodator_name用r读取Excel时,作为Excel文件的标题?

使用下面的代码,似乎我只能通过将Skip参数设置为整​​数来跳过一行。

library(readxl)
myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 2, col_names = myCols)

参考:

skip lows”> skip lows les read_excel或read。在r 中表现出色

For an excel file with multiple row of headers as such (test data for downloading from here):

enter image description here

How could I skip rows Unit and Frequency and use indicator_name as header of excel file while reading excel with R?

With code below, it seems I could only skip one row by setting skip parameter an integer.

library(readxl)
myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 2, col_names = myCols)

Reference:

Skip rows while use read_excel or read.excel in R

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

慵挽 2025-01-24 15:26:24

您只需要 skip = 3 而不是 2,因为您在读入数据时需要跳过标头。由于我们已经在 myCols 中定义了列名,因此当您读入列名时,我们不需要保留列名行。

library(readxl)

myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

输出

  indicator_name         M2   GDP
  <dttm>              <dbl> <dbl>
1 2018-01-01 00:00:00  6.71  8.17
2 2018-01-02 00:00:00  6.79  8.19
3 2018-01-03 00:00:00  6.77  8.21
4 2018-01-04 00:00:00  6.73  8.20
5 2018-01-05 00:00:00  6.67  8.20
6 2018-01-06 00:00:00  6.62  8.21
7 2018-01-07 00:00:00  6.62  8.21
8 2018-01-08 00:00:00  6.64  8.22
9 2018-01-09 00:00:00  6.64  8.22

如果第一个列名为空,那么您可以在读入数据之前替换列名称中的 NA

library(tidyverse)

myCols <- read_excel("./test123.xlsx", n_max = 2, col_names = FALSE) %>% 
  slice(1) %>% 
  mutate(across(everything(), ~replace_na(., "indicator_name"))) %>% 
  as.character()
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

基准测试

此时此刻,在读入行后过滤掉行似乎仍然会更快。

在此处输入图像描述

bm <- microbenchmark::microbenchmark(filter_before = {myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE));
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)},
filter_after = {myDF2 <- read_excel("./test123.xlsx");
myDF2 <- myDF2[-c(1:2),]},
times = 1000)
autoplot(bm)

You just need to skip = 3 instead of 2, as you need to skip the header when you read in the data. Since we have already defined column names in myCols, then we do not need to keep the column name row when you read it in.

library(readxl)

myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

Output

  indicator_name         M2   GDP
  <dttm>              <dbl> <dbl>
1 2018-01-01 00:00:00  6.71  8.17
2 2018-01-02 00:00:00  6.79  8.19
3 2018-01-03 00:00:00  6.77  8.21
4 2018-01-04 00:00:00  6.73  8.20
5 2018-01-05 00:00:00  6.67  8.20
6 2018-01-06 00:00:00  6.62  8.21
7 2018-01-07 00:00:00  6.62  8.21
8 2018-01-08 00:00:00  6.64  8.22
9 2018-01-09 00:00:00  6.64  8.22

If you have the first column name blank, then you could replace the NA in the column names before reading in the data.

library(tidyverse)

myCols <- read_excel("./test123.xlsx", n_max = 2, col_names = FALSE) %>% 
  slice(1) %>% 
  mutate(across(everything(), ~replace_na(., "indicator_name"))) %>% 
  as.character()
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

Benchmark

In this instant, it looks like it would still be faster to just filter out the rows after reading them in.

enter image description here

bm <- microbenchmark::microbenchmark(filter_before = {myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE));
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)},
filter_after = {myDF2 <- read_excel("./test123.xlsx");
myDF2 <- myDF2[-c(1:2),]},
times = 1000)
autoplot(bm)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文