根据每组度量的最小响应数量来征服数据框

发布于 2025-02-05 05:42:51 字数 1555 浏览 4 评论 0原文

我希望有人可以帮助我进行此查询。我有一个很大的数据集,并且将对一组参与者进行分析,只要符合某些标准。在这种情况下,标准是每个参与者至少提供1个测量项目的答案,至少有1个量度的答案(量度为1个项目有3个项目,用于1个小节,三个项目用于2个项目)。因此,如果他们为所有测量1个项目提供了三个答案,但没有任何措施2个项目,则将它们从数据集中删除。如果他们为其中一项措施提供了两个答案,但没有对属于另一个措施的项目的答案。考虑以下示例:

df <- data.frame(tester_ID = c("A1", "A2", "A3", "A4", "A5", "A6",
                               "A7", "A1", "A2", "A3", "A4", "A5", "A6", "A7"),
                 Phase = c("Phase1", "Phase1", "Phase1", "Phase1", "Phase1",
                           "Phase1", "Phase1", "Phase2", 
                           "Phase2", "Phase2", "Phase2", "Phase2", "Phase2", 
                           "Phase2"),
                 Item1Measure1 = c(5, NA, 3, 4, 4, 1, 4, 4, 5, NA, NA, NA, NA, NA),
                 Item2Measure1 = c(5, 3, NA, NA, 4, 1, NA, 4, 5, NA, NA, 3, NA, 1),
                 Item3Measure1 = c(NA, NA, NA, NA, 4, 1, NA, 4, 5, 1, 3, 5, NA, NA),
                 Item1Measure2 = c(NA, NA, NA, NA, NA, 1, NA, 4, 5, NA,NA, NA,NA,NA),
                 Item2Measure2 = c(5, NA, NA, 4, 4, 1, 4, NA, 5, 2, 4, 1, 2, 4),
                 Item3Measure2 = c(5, NA, 3, 4, 4, 1, 4, NA, 5, NA, NA, NA, NA, NA))

在2022-06-05创建的 reprex package (v2.0.1)

我希望创建一个条件,只有参与者至少提供了一个措施1项目的答案,并且至少考虑了MeneS2项目的一个答案。例如,在第一阶段中名为A2的Tester_ID未回复用于测量2的任何项目,因此在新数据集中将排除测试器。在第2阶段中,这同样适用于Tester_ID A6,因为该测试仪仅提供了测量2个项目的答案,但无需衡量1个项目。其余的12行将符合至少一个措施的标准。

任何帮助将不胜感激。

I hope someone can help me with this query. I have a large data set and am going to run analyses on a set of participants, provided they meet certain criteria. In this case, the criterion is that each participant provided at least 1 answer to Measure 1 items AND at least 1 answer to Measure 2 items (there are three items for Measure 1 and three items for Measure 2). As such, if they provide three answers to all Measure 1 items but none to Measure 2 items, they are removed from the data set. Same thing if they provide two answers to one of the measures but No answer to items belonging to the other measure. Consider the example below:

df <- data.frame(tester_ID = c("A1", "A2", "A3", "A4", "A5", "A6",
                               "A7", "A1", "A2", "A3", "A4", "A5", "A6", "A7"),
                 Phase = c("Phase1", "Phase1", "Phase1", "Phase1", "Phase1",
                           "Phase1", "Phase1", "Phase2", 
                           "Phase2", "Phase2", "Phase2", "Phase2", "Phase2", 
                           "Phase2"),
                 Item1Measure1 = c(5, NA, 3, 4, 4, 1, 4, 4, 5, NA, NA, NA, NA, NA),
                 Item2Measure1 = c(5, 3, NA, NA, 4, 1, NA, 4, 5, NA, NA, 3, NA, 1),
                 Item3Measure1 = c(NA, NA, NA, NA, 4, 1, NA, 4, 5, 1, 3, 5, NA, NA),
                 Item1Measure2 = c(NA, NA, NA, NA, NA, 1, NA, 4, 5, NA,NA, NA,NA,NA),
                 Item2Measure2 = c(5, NA, NA, 4, 4, 1, 4, NA, 5, 2, 4, 1, 2, 4),
                 Item3Measure2 = c(5, NA, 3, 4, 4, 1, 4, NA, 5, NA, NA, NA, NA, NA))

Created on 2022-06-05 by the reprex package (v2.0.1)

I am hoping create a condition whereby only participants that provided AT LEAST one answer to a Measure1 item AND AT LEAST one answer to a Measure2 item are considered. For instance, the Tester_ID named A2, in Phase one, did not reply to any items for Measure 2, so that tester would be excluded in the new data set. The same applies to Tester_ID A6, in Phase 2, as that tester only provided answers to Measure 2 items but none to Measure 1 items. The remaining 12 rows would meet the criterion of at least one answer per Measure.

Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜血缘 2025-02-12 05:42:51

我们可以使用if_any - 循环循环'METARE1',列,检查非na元素(complete.cases)和(&amp; )单独循环在“ METAR2”上,执行相同的操作,这两个条件都返回单个true/false if_any,仅当两者都是True IE时才是正确的如果两组列中至少有一个非na

library(dplyr)
df %>% 
  filter(if_any(ends_with('Measure1'), complete.cases ) & 
         if_any(ends_with('Measure2'), complete.cases))

-oftup

 tester_ID  Phase Item1Measure1 Item2Measure1 Item3Measure1 Item1Measure2 Item2Measure2 Item3Measure2
1         A1 Phase1             5             5            NA            NA             5             5
2         A3 Phase1             3            NA            NA            NA            NA             3
3         A4 Phase1             4            NA            NA            NA             4             4
4         A5 Phase1             4             4             4            NA             4             4
5         A6 Phase1             1             1             1             1             1             1
6         A7 Phase1             4            NA            NA            NA             4             4
7         A1 Phase2             4             4             4             4            NA            NA
8         A2 Phase2             5             5             5             5             5             5
9         A3 Phase2            NA            NA             1            NA             2            NA
10        A4 Phase2            NA            NA             3            NA             4            NA
11        A5 Phase2            NA             3             5            NA             1            NA
12        A7 Phase2            NA             1            NA            NA             4            NA

We may use if_any - loop over the 'Measure1', columns, check for non-NA elements (complete.cases) and (&) loop separately over the 'Measure2', do the same, both of the conditions return a single TRUE/FALSE with if_any, which will be TRUE only if both are TRUE i.e. if there is at least one non-NA in both sets of columns

library(dplyr)
df %>% 
  filter(if_any(ends_with('Measure1'), complete.cases ) & 
         if_any(ends_with('Measure2'), complete.cases))

-output

 tester_ID  Phase Item1Measure1 Item2Measure1 Item3Measure1 Item1Measure2 Item2Measure2 Item3Measure2
1         A1 Phase1             5             5            NA            NA             5             5
2         A3 Phase1             3            NA            NA            NA            NA             3
3         A4 Phase1             4            NA            NA            NA             4             4
4         A5 Phase1             4             4             4            NA             4             4
5         A6 Phase1             1             1             1             1             1             1
6         A7 Phase1             4            NA            NA            NA             4             4
7         A1 Phase2             4             4             4             4            NA            NA
8         A2 Phase2             5             5             5             5             5             5
9         A3 Phase2            NA            NA             1            NA             2            NA
10        A4 Phase2            NA            NA             3            NA             4            NA
11        A5 Phase2            NA             3             5            NA             1            NA
12        A7 Phase2            NA             1            NA            NA             4            NA
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文