提取大写行,然后填充直到下一个大写行

发布于 2025-01-19 08:50:24 字数 910 浏览 1 评论 0原文

我有一些数据,如下所示:

   RegionName
   <chr>     
 1 ANDALUCÍA 
 2 Almería   
 3 Abla      
 4 Abrucena  
 5 Adra      
 6 ALBÁNCHEZ 
 7 Alboloduy 
 8 Albox     
 9 ALCOLEA   
10 Alcóntar

其中一些列是大写的。我想将大写列提取到新列中,然后填充(向下)直到下一个大写列。

预期输出:

   RegionName REGIONNAME
   <chr>        <chr>
 1 ANDALUCÍA   ANDALUCÍA   -first result
 2 Almería     ANDALUCÍA
 3 Abla        ANDALUCÍA
 4 Abrucena    ANDALUCÍA
 5 Adra        ANDALUCÍA
 6 ALBÁNCHEZ   ALBÁNCHEZ  - change here
 7 Alboloduy   ALBÁNCHEZ
 8 Albox       ALBÁNCHEZ
 9 ALCOLEA     ALCOLEA    - change here
10 Alcóntar    ALCOLEA

数据:

data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

I have some data which looks like:

   RegionName
   <chr>     
 1 ANDALUCÍA 
 2 Almería   
 3 Abla      
 4 Abrucena  
 5 Adra      
 6 ALBÁNCHEZ 
 7 Alboloduy 
 8 Albox     
 9 ALCOLEA   
10 Alcóntar

Where some of the columns are uppercase. I want to extract the uppercase columns into a new column and fill(down) until the next uppercase column.

Expected output:

   RegionName REGIONNAME
   <chr>        <chr>
 1 ANDALUCÍA   ANDALUCÍA   -first result
 2 Almería     ANDALUCÍA
 3 Abla        ANDALUCÍA
 4 Abrucena    ANDALUCÍA
 5 Adra        ANDALUCÍA
 6 ALBÁNCHEZ   ALBÁNCHEZ  - change here
 7 Alboloduy   ALBÁNCHEZ
 8 Albox       ALBÁNCHEZ
 9 ALCOLEA     ALCOLEA    - change here
10 Alcóntar    ALCOLEA

Data:

data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花之痕靓丽 2025-01-26 08:50:24

一个想法是使用 grepl() 识别 [[:upper:]],将其他转换为 NA 和 fill(), IE

library(dplyr)
library(tidyr)

data %>% 
 mutate(new = replace(RegionName, !grepl("^[[:upper:]]+$", RegionName), NA)) %>% 
 fill(new)

# A tibble: 10 x 2
   RegionName new      
   <chr>      <chr>    
 1 ANDALUCÍA  ANDALUCÍA
 2 Almería    ANDALUCÍA
 3 Abla       ANDALUCÍA
 4 Abrucena   ANDALUCÍA
 5 Adra       ANDALUCÍA
 6 ALBÁNCHEZ  ALBÁNCHEZ
 7 Alboloduy  ALBÁNCHEZ
 8 Albox      ALBÁNCHEZ
 9 ALCOLEA    ALCOLEA  
10 Alcóntar   ALCOLEA 

An idea is to use grepl() to recognise the [[:upper:]], convert the others to NA and fill(), i.e.

library(dplyr)
library(tidyr)

data %>% 
 mutate(new = replace(RegionName, !grepl("^[[:upper:]]+
quot;, RegionName), NA)) %>% 
 fill(new)

# A tibble: 10 x 2
   RegionName new      
   <chr>      <chr>    
 1 ANDALUCÍA  ANDALUCÍA
 2 Almería    ANDALUCÍA
 3 Abla       ANDALUCÍA
 4 Abrucena   ANDALUCÍA
 5 Adra       ANDALUCÍA
 6 ALBÁNCHEZ  ALBÁNCHEZ
 7 Alboloduy  ALBÁNCHEZ
 8 Albox      ALBÁNCHEZ
 9 ALCOLEA    ALCOLEA  
10 Alcóntar   ALCOLEA 
乙白 2025-01-26 08:50:24

您可以根据区域名称是否为大写字母 == 将区域分组在一起。然后将组内的所有名称设置为全部大写的 first RegionName

library(tidyverse) 

df %>%
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName))

输出

   RegionName   grp REGIONNAME
   <chr>      <int> <chr>     
 1 ANDALUCÍA      1 ANDALUCÍA 
 2 Almería        1 ANDALUCÍA 
 3 Abla           1 ANDALUCÍA 
 4 Abrucena       1 ANDALUCÍA 
 5 Adra           1 ANDALUCÍA 
 6 ALBÁNCHEZ      2 ALBÁNCHEZ 
 7 Alboloduy      2 ALBÁNCHEZ 
 8 Albox          2 ALBÁNCHEZ 
 9 ALCOLEA        3 ALCOLEA   
10 Alcóntar       3 ALCOLEA 

数据

df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))

You can group the regions together based on if their name is == to their name in all upper case. Then set all names within the group to the first RegionName which is in all caps.

library(tidyverse) 

df %>%
  group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
  mutate(REGIONNAME = first(RegionName))

Output

   RegionName   grp REGIONNAME
   <chr>      <int> <chr>     
 1 ANDALUCÍA      1 ANDALUCÍA 
 2 Almería        1 ANDALUCÍA 
 3 Abla           1 ANDALUCÍA 
 4 Abrucena       1 ANDALUCÍA 
 5 Adra           1 ANDALUCÍA 
 6 ALBÁNCHEZ      2 ALBÁNCHEZ 
 7 Alboloduy      2 ALBÁNCHEZ 
 8 Albox          2 ALBÁNCHEZ 
 9 ALCOLEA        3 ALCOLEA   
10 Alcóntar       3 ALCOLEA 

Data

df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla", 
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA", 
"Alcóntar")), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))
愿与i 2025-01-26 08:50:24

ifelse填充的替代方案:

library(tidyverse)
df %>% 
  mutate(REGIONNAME = ifelse(RegionName == toupper(RegionName), RegionName, NA)) %>% 
  fill(REGIONNAME)

   RegionName REGIONNAME
1   ANDALUCÍA  ANDALUCÍA
2     Almería  ANDALUCÍA
3        Abla  ANDALUCÍA
4    Abrucena  ANDALUCÍA
5        Adra  ANDALUCÍA
6   ALBÁNCHEZ  ALBÁNCHEZ
7   Alboloduy  ALBÁNCHEZ
8       Albox  ALBÁNCHEZ
9     ALCOLEA    ALCOLEA
10   Alcóntar    ALCOLEA

An alternative with ifelse and fill:

library(tidyverse)
df %>% 
  mutate(REGIONNAME = ifelse(RegionName == toupper(RegionName), RegionName, NA)) %>% 
  fill(REGIONNAME)

   RegionName REGIONNAME
1   ANDALUCÍA  ANDALUCÍA
2     Almería  ANDALUCÍA
3        Abla  ANDALUCÍA
4    Abrucena  ANDALUCÍA
5        Adra  ANDALUCÍA
6   ALBÁNCHEZ  ALBÁNCHEZ
7   Alboloduy  ALBÁNCHEZ
8       Albox  ALBÁNCHEZ
9     ALCOLEA    ALCOLEA
10   Alcóntar    ALCOLEA
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文