根据 R 中的模式重新格式化数据

发布于 2025-01-16 21:31:50 字数 679 浏览 3 评论 0原文

我希望你能帮助我解决这个问题,我有以下这样的数据:

ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow

每次与base(base_yellow,base_red)见面时,它都会创建新的组,预期的输出如下所示,它给出了一个新变量:

ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow

I hope you can help me with this problem, I have the following data like this:

ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow

Every time meet with base (base_yellow, base_red), it creates new group, the output that is expected as shown below, which gives a new variable:

ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浅笑轻吟梦一曲 2025-01-23 21:31:50

您可以根据自己的需要进行调整。

首先,创建一个向量 vec,其中包含 colour 以“base”开头的行位置。

然后,您可以使用 purrr 中的 map2_dfr 来提供基于 vec 从开始位置到结束位置的颜色 。这将有助于解决最终在多行中使用相同颜色的情况。在此步骤中还创建了分组变量group

group 分组后,您可以仅保留具有多个 colourcolour 组,并用 str_c 折叠它们一起为同一个

library(tidyverse)

vec <- which(grepl("^base", df$colour))

map2_dfr(
  vec[-length(vec)],
  vec[-1],
  ~df[.x:.y, ],
  .id = "group"
) %>%
  group_by(group) %>%
  filter(n_distinct(colour) > 1) %>%
  summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
  select(-group)

输出

     ID colour                                                              
  <int> <chr>                                                               
1     1 base_yellow; blue; base_red                                         
2     1 base_red; blue; pink; blue; base_yellow                             
3     2 base_yellow; blue; base_red                                         
4     2 base_red; blue; pink; blue; base_yellow                             
5     3 base_yellow; blue; pink; blue; base_yellow                          
6     4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow

This is something you might be able to adapt for your needs.

First, create a vector vec that includes row positions where colour starts with "base".

Then, you can use map2_dfr from purrr that will provide colour that ranges from start to end positions based on vec. This will help with situations where the same colour is used in more than one row in the end. A grouping variable group is also created in this step.

After grouping by group, you can keep only colour groups that have more than one colour and str_c to collapse them together for the same group.

library(tidyverse)

vec <- which(grepl("^base", df$colour))

map2_dfr(
  vec[-length(vec)],
  vec[-1],
  ~df[.x:.y, ],
  .id = "group"
) %>%
  group_by(group) %>%
  filter(n_distinct(colour) > 1) %>%
  summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
  select(-group)

Output

     ID colour                                                              
  <int> <chr>                                                               
1     1 base_yellow; blue; base_red                                         
2     1 base_red; blue; pink; blue; base_yellow                             
3     2 base_yellow; blue; base_red                                         
4     2 base_red; blue; pink; blue; base_yellow                             
5     3 base_yellow; blue; pink; blue; base_yellow                          
6     4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow
月依秋水 2025-01-23 21:31:50

试试这个:

library(tidyverse)

# Read data
mydata <- tibble::tribble(~ID,~colour,
                          1,"base_yellow",
                          1,"blue",
                          1,"base_red",
                          1,"blue",
                          1,"pink",
                          1,"blue",
                          1,"base_yellow",
                          2,"base_yellow",
                          2,"blue",
                          2,"base_red",
                          2,"blue",
                          2,"pink",
                          2,"blue",
                          2,"base_yellow",
                          3,"base_yellow",
                          3,"blue",
                          3,"pink",
                          3,"blue",
                          3,"base_yellow",
                          4,"base_yellow",
                          4,"blue",
                          4,"green",
                          4,"blue",
                          4,"green",
                          4,"blue",
                          4,"pink",
                          4,"blue",
                          4,"base_yellow")

# Add column to group by words starting with "base_"
mydata <- mydata %>% 
  mutate(base = str_starts(colour, "base_")) %>% 
  mutate(base = ifelse(base, colour, NA)) %>% 
  fill(base, .direction = "down")

# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>% 
  group_by(ID, base) %>% 
  summarise(colour = paste(colour, collapse = ";")) %>% 
  select(-base)

结果:

> mydata
# A tibble: 6 × 2
# Groups:   ID [4]
     ID colour                                                      
  <dbl> <chr>                                                       
1     1 base_red;blue;pink;blue                                     
2     1 base_yellow;blue;base_yellow                                
3     2 base_red;blue;pink;blue                                     
4     2 base_yellow;blue;base_yellow                                
5     3 base_yellow;blue;pink;blue;base_yellow                      
6     4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellow

Try this:

library(tidyverse)

# Read data
mydata <- tibble::tribble(~ID,~colour,
                          1,"base_yellow",
                          1,"blue",
                          1,"base_red",
                          1,"blue",
                          1,"pink",
                          1,"blue",
                          1,"base_yellow",
                          2,"base_yellow",
                          2,"blue",
                          2,"base_red",
                          2,"blue",
                          2,"pink",
                          2,"blue",
                          2,"base_yellow",
                          3,"base_yellow",
                          3,"blue",
                          3,"pink",
                          3,"blue",
                          3,"base_yellow",
                          4,"base_yellow",
                          4,"blue",
                          4,"green",
                          4,"blue",
                          4,"green",
                          4,"blue",
                          4,"pink",
                          4,"blue",
                          4,"base_yellow")

# Add column to group by words starting with "base_"
mydata <- mydata %>% 
  mutate(base = str_starts(colour, "base_")) %>% 
  mutate(base = ifelse(base, colour, NA)) %>% 
  fill(base, .direction = "down")

# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>% 
  group_by(ID, base) %>% 
  summarise(colour = paste(colour, collapse = ";")) %>% 
  select(-base)

Results:

> mydata
# A tibble: 6 × 2
# Groups:   ID [4]
     ID colour                                                      
  <dbl> <chr>                                                       
1     1 base_red;blue;pink;blue                                     
2     1 base_yellow;blue;base_yellow                                
3     2 base_red;blue;pink;blue                                     
4     2 base_yellow;blue;base_yellow                                
5     3 base_yellow;blue;pink;blue;base_yellow                      
6     4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellow
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文