确定组数据的匹配字符串,并创建指定存在或不存在更改的新列

发布于 2025-01-26 16:03:18 字数 1144 浏览 1 评论 0原文

假设我有以下数据集:

dat<- data.frame(ID= c("A","A","A","A","A","A","B","B", "B", "B"), 
             test= rep(c("pre","post"),5),
             item= c(rep("item1",2), rep("item2",2), rep("item3", 2), rep("item1",2), rep("item2",2)),
             answer= c("science","science","science","","", "science", "some multi word string that is not science", "history", "", "social science"))

我想在答案中为ID> ID> IDitem的每个分组中的字符串的特定元素。我需要确定Science的实例,例如,例如社会科学,例如条目/字符串。 社会科学包括Science我只对Science本身的实例感兴趣。

将创建一个称为change_type的新列。

  • 级别两者指示test> test的两个级别是否存在
  • 科学of test等于pre
  • post指示Science仅在test> test的级别中存在等于发布

输出看起来像这样:

res<- data.frame(ID= c("A","A","A","B","B"), 
             item= c("item1","item2","item3","item1","item2"),
             change_type=c("both","pre", "post", "NA", "NA"))

Let's say I have the following dataset:

dat<- data.frame(ID= c("A","A","A","A","A","A","B","B", "B", "B"), 
             test= rep(c("pre","post"),5),
             item= c(rep("item1",2), rep("item2",2), rep("item3", 2), rep("item1",2), rep("item2",2)),
             answer= c("science","science","science","","", "science", "some multi word string that is not science", "history", "", "social science"))

I want to identify a specific element of the strings in answer for each grouping of ID and item. I need to identify instances of science excluding, for example, entries/strings like social science. While social science includes the word science I am only interested in instances where science is by itself.

A new column will be created called change_type.

  • The level both indicates if science was present in both levels of test,
  • pre indicates science was only present in levels of test equal to pre
  • post indicates science was only present in levels of test equal to post.

The output will look like this:

res<- data.frame(ID= c("A","A","A","B","B"), 
             item= c("item1","item2","item3","item1","item2"),
             change_type=c("both","pre", "post", "NA", "NA"))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

久隐师 2025-02-02 16:03:18

我们可以使用case_when进行操作:

library(dplyr)

dat %>% 
  group_by(ID, item) %>% 
  mutate(change_type = case_when(first(answer)=="science" & 
                                   last(answer)=="science"    ~ "both",
                                 first(answer)=="science" & first(test) == "pre" ~ "pre",
                                 last(answer) == "science" & last(test) == "post" ~ "post"
                                 )) %>% 
  group_by(ID, item,change_type) %>% 
  summarise()
  ID    item  change_type
  <chr> <chr> <chr>      
1 A     item1 both       
2 A     item2 pre        
3 A     item3 post       
4 B     item1 NA         
5 B     item2 NA  

We could do it with case_when:

library(dplyr)

dat %>% 
  group_by(ID, item) %>% 
  mutate(change_type = case_when(first(answer)=="science" & 
                                   last(answer)=="science"    ~ "both",
                                 first(answer)=="science" & first(test) == "pre" ~ "pre",
                                 last(answer) == "science" & last(test) == "post" ~ "post"
                                 )) %>% 
  group_by(ID, item,change_type) %>% 
  summarise()
  ID    item  change_type
  <chr> <chr> <chr>      
1 A     item1 both       
2 A     item2 pre        
3 A     item3 post       
4 B     item1 NA         
5 B     item2 NA  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文