嵌套行标签到列

发布于 2025-02-03 16:22:41 字数 1359 浏览 5 评论 0原文

我有一个CSV,它似乎是Excel Pivot表的输出,其名称嵌套为用于重复组的行标签。我想清洁数据,以便在单独的列中重复行标签,理想情况下使用dplyr。

数据看起来像这样:

dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))

dd

variables              Year1   Year2
Abington              
Number of Sales        16      8
YTD Number of Sales    50      13
Median Sale Price      415000  583000
YTD Median Sale Price  413500  575000
Acton              
Number of Sales        23      9
YTD Number of Sales    60      39
Median Sale Price      799900  995000
YTD Median Sale Price  704000  800000

我希望它看起来像这样:

Town          variables               Year1  Year2           
Abington      Number of Sales         16     8
Abington      YTD Number of Sales     50     13
Abington      Median Sale Price       415000 583000
Abington      YTD Median Sale Price   413500 575000          
Acton         Number of Sales         23      9
Acton         YTD Number of Sales     60     39
Acton         Median Sale Price       799900 995000
Acton         YTD Median Sale Price   704000 800000

I have a CSV that appears to be the output of an Excel Pivot Table with names nested as row labels for repeating groups. I would like to clean the data so that the row labels are repeated in a separate column, ideally using dplyr.

The data looks like this:

dd <- data.frame(variables = c("Abington", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price", "Acton", "Number of Sales","YTD Number of Sales","Median Sale Price","YTD Median Sale Price"), Year1 = c(" ", 16, 50,415000,413500," ",23,60,799900,704000), Year2 = c(" ",8,13,583000,575000," ",9,39,995000,800000))

dd

variables              Year1   Year2
Abington              
Number of Sales        16      8
YTD Number of Sales    50      13
Median Sale Price      415000  583000
YTD Median Sale Price  413500  575000
Acton              
Number of Sales        23      9
YTD Number of Sales    60      39
Median Sale Price      799900  995000
YTD Median Sale Price  704000  800000

And I would like it to look like this:

Town          variables               Year1  Year2           
Abington      Number of Sales         16     8
Abington      YTD Number of Sales     50     13
Abington      Median Sale Price       415000 583000
Abington      YTD Median Sale Price   413500 575000          
Acton         Number of Sales         23      9
Acton         YTD Number of Sales     60     39
Acton         Median Sale Price       799900 995000
Acton         YTD Median Sale Price   704000 800000

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

£冰雨忧蓝° 2025-02-10 16:22:41

我们可以使用tidyverse(或dplyr&amp; tidyr)为此:

library(tidyverse)

dd %>%
  mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
  fill(Town, .direction = "down") %>%
  filter(Town != variables) %>%
  relocate(Town)

结果:

      Town             variables  Year1  Year2
1 Abington       Number of Sales     16      8
2 Abington   YTD Number of Sales     50     13
3 Abington     Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5    Acton       Number of Sales     23      9
6    Acton   YTD Number of Sales     60     39
7    Acton     Median Sale Price 799900 995000
8    Acton YTD Median Sale Price 704000  8e+05

很重要,请注意,year y year 1的空值1year2实际上是whitespaces(“”),而不是空字符串或na。

We can use tidyverse (or dplyr & tidyr) for this:

library(tidyverse)

dd %>%
  mutate(Town = ifelse(Year1 == " " & Year2 == " ", variables, NA)) %>%
  fill(Town, .direction = "down") %>%
  filter(Town != variables) %>%
  relocate(Town)

Resulting in:

      Town             variables  Year1  Year2
1 Abington       Number of Sales     16      8
2 Abington   YTD Number of Sales     50     13
3 Abington     Median Sale Price 415000 583000
4 Abington YTD Median Sale Price 413500 575000
5    Acton       Number of Sales     23      9
6    Acton   YTD Number of Sales     60     39
7    Acton     Median Sale Price 799900 995000
8    Acton YTD Median Sale Price 704000  8e+05

Important to note that the empty values at Year1 and Year2 are actually whitespaces (" ") rather than empty strings or NA.

听风念你 2025-02-10 16:22:41

这是另一种方法:

bind_cols(
  tibble(Town=rep(filter(dd,is.na(as.numeric(Year1)))$variables, each=4)),
  filter(dd,!is.na(as.numeric(Year1)))
)

输出:

  Town     variables             Year1  Year2 
  <chr>    <chr>                 <chr>  <chr> 
1 Abington Number of Sales       16     8     
2 Abington YTD Number of Sales   50     13    
3 Abington Median Sale Price     415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton    Number of Sales       23     9     
6 Acton    YTD Number of Sales   60     39    
7 Acton    Median Sale Price     799900 995000
8 Acton    YTD Median Sale Price 704000 8e+05 

Here is another approach:

bind_cols(
  tibble(Town=rep(filter(dd,is.na(as.numeric(Year1)))$variables, each=4)),
  filter(dd,!is.na(as.numeric(Year1)))
)

Output:

  Town     variables             Year1  Year2 
  <chr>    <chr>                 <chr>  <chr> 
1 Abington Number of Sales       16     8     
2 Abington YTD Number of Sales   50     13    
3 Abington Median Sale Price     415000 583000
4 Abington YTD Median Sale Price 413500 575000
5 Acton    Number of Sales       23     9     
6 Acton    YTD Number of Sales   60     39    
7 Acton    Median Sale Price     799900 995000
8 Acton    YTD Median Sale Price 704000 8e+05 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文