我如何在r中获得特定条件的行

发布于 2025-02-13 07:09:34 字数 1648 浏览 1 评论 0原文

说我有一个df

我想同时获得两个条件的id

  1. id 's code应该竞争一个资本,无论跟随它的数字如何。例如i11i31 ...

  2. id's code应该竞争特定代码e12

在下面的示例中,已过滤的ID应为id = 1id = 2。因为它们都包含ie12

相同id在示例中表示同一组中的意思。

structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 
3, 3, 3, 4, 4, 4, 4, 4, 4), diag = c("main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other"), code = c("I11", "E12", "I11", "Q34", 
"I31", "C33", "E12", "I34", "E12", "I45", "E12", "Z11", "E13", 
"Z12", "E14", "Z13", "I25", "E1", "I25", "E2", "I25", "E3")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -22L), groups = structure(list(
    id = c(1, 2, 3, 4), .rows = structure(list(1:6, 7:10, 11:16, 
        17:22), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

> df
# A tibble: 22 × 3
# Groups:   id [4]
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45  
# … with 12 more rows

Say that I have a df.

I want to get the id with two conditions at the same time:

  1. the id's code should contaions a capital I, regardless of the number that follows it. For example I11, I31...

  2. the id's code should contaions specific code: E12.

On the example below, the filtered id should be id = 1 and id = 2. Because they all contain I and E12.

Same id in the example means in the same group.

structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 
3, 3, 3, 4, 4, 4, 4, 4, 4), diag = c("main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other"), code = c("I11", "E12", "I11", "Q34", 
"I31", "C33", "E12", "I34", "E12", "I45", "E12", "Z11", "E13", 
"Z12", "E14", "Z13", "I25", "E1", "I25", "E2", "I25", "E3")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -22L), groups = structure(list(
    id = c(1, 2, 3, 4), .rows = structure(list(1:6, 7:10, 11:16, 
        17:22), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

> df
# A tibble: 22 × 3
# Groups:   id [4]
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45  
# … with 12 more rows

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

口干舌燥 2025-02-20 07:09:35

您的问题建议 [1] 您对满足您两个条件的id值最感兴趣;在这种情况下,您实际上不需要在数据框架上工作,而仅在向量上工作:

intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"])

创建两个字符串向量的df $ id s,一个针对每个条件,然后提取他们的共享术语(这也可以删除重复项,因此输出为1 2)。

这只需要基础r,我怀疑比任何基于桌子的方法都要高得多(尤其是在涉及分组或枢轴的情况下)。

[1]如果不是,则仍然可以使用上面的行来索引表行,例如,

df[df$id %in% intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"]), ]

这将带有df的所有行,并带有id 1或2(包括那些具有匹配ID但不同代码的行,例如“ Q34”)。

Your question suggests[1] that you're mostly interested in the id values that fulfil both of your conditions; in which case, you don't really need to work on a data frame, but only on vectors:

intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"])

You create two string vectors of df$ids, one for each condition, then extract their shared terms (this also removes duplicates, so the output is 1 2).

This requires nothing more than base R and I suspect is much more efficient than any table-based approach (especially if grouping or pivots are involved).

[1] and if you're not, you can still use the line above to index a table row-wise, e.g.

df[df$id %in% intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"]), ]

This will return all the rows of df with an id of 1 or 2 (including, however, those rows that have a matching id but a different code, e.g. "Q34").

浸婚纱 2025-02-20 07:09:35

为了澄清,您想要所有ixx或“ e12”的记录?您的“同时”使我有点扔掉。
如果这是您的意思,那应该得到您的结果。
使用库(tidyverse):

df %>% filter(grepl("^I",code) | code == "E12")

含义过滤记录,其中列代码包含i或记录代码等于e12。

To clarify, you want all records that are Ixx OR "E12"? Your 'at the same time' threw me off a little.
If this is what you mean this should get your results.
Using library(tidyverse):

df %>% filter(grepl("^I",code) | code == "E12")

meaning filter records where column code contains I OR records where code equals E12.

段念尘 2025-02-20 07:09:34

您可以做:

df |>
  group_by(id) |>
  filter(TRUE %in% str_detect(code, "I") & TRUE %in% (code == "E12")) |>
  ungroup()

输出:

# A tibble: 10 × 3
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45  

或者,如果您只想添加dintife(ID)之后过滤器(...)

# A tibble: 2 × 1
     id
  <dbl>
1     1
2     2

You could do:

df |>
  group_by(id) |>
  filter(TRUE %in% str_detect(code, "I") & TRUE %in% (code == "E12")) |>
  ungroup()

Output:

# A tibble: 10 × 3
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45  

Or if you just want the groups add distinct(id) after the filter(...):

# A tibble: 2 × 1
     id
  <dbl>
1     1
2     2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文