R，在其他数据表中查找值来填充值

发布于 2025-01-16 20:28:41 字数 1974 浏览 1 评论 0原文

背景

为了清晰起见，对问题进行了大量编辑

我有这样的数据：

    df<-structure(list(fname = c("Linda", "Bob"), employee_number = c("00000123456", 
"654321"), Calendar = c(0, 0), Protocol = c(0, 0), Subject = c(0, 
0), CRA = c(0, 0), Regulatory = c(1, 1), Finance = c(0, 1), ResearchNurse = c(0, 
0)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))

在我在这里问的上一个问题，我提到我需要将这些数据从宽转为长，以便将其导出到其他地方。答案非常有效！

问题是，我发现我的数据集中的一些人没有正确填写他们的调查，并且在某些有问题的列中全为零。即，当它们被旋转并过滤为“1”值时，它们就会被丢弃。

幸运的是（取决于你如何看待）我可以纠正他们的错误。如果他们将这些列留空，我可以根据其他列填充他们应该拥有的内容。也就是说，他们在“CRA”、“监管”、“财务”或“研究护士”下填写的内容将决定他们在“日历”、“协议”或“主题”中是否得到 1 或 0

要弄清楚这些列中的内容，我们创建了这个工作职责矩阵：

    jobs<-structure(list(`Roles (existing)` = c("Calendar Build", "Protocol Management", 
"Subject Management"), `CRA/ Manager/ Senior` = c(1, 1, 0), Regulatory = c(0, 
1, 1), Finance = c(0, 0, 0), `Research Nurse` = c(1, 0, 1)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

因此，如果您到目前为止一直在关注，无论“Bob”在他的专栏中添加了什么内容“日历”，“协议”或“主题”（他目前有零），它将根据他在其他列中输入的内容被覆盖。因此，如果 Bob 在他的“监管”列中输入“1”，根据我截屏的矩阵，他应该在协议列和主题列中都得到 1。

具体问题

那么我如何告诉 R，“查看 bob 的“CRA、监管、财务和研究护士”列，然后交叉引用“工作”数据框，并覆盖他的“日历、协议、和主题”列？

在这种特殊情况下我的预期输出是：

最后一个小细节：我可以看到数字会覆盖的实例（取决于顺序）也就是说，如果鲍勃应该在协议中得到 1，因为他在监管方面得到 1...但他在金融方面得到 1，这意味着他应该在协议中得到 0...... 如有疑问，如果某列被 1 覆盖，则绝不应将其转回零。我希望这是有道理的。

原文

The background

Question edited heavily for clarity

I have data like this:

    df<-structure(list(fname = c("Linda", "Bob"), employee_number = c("00000123456", 
"654321"), Calendar = c(0, 0), Protocol = c(0, 0), Subject = c(0, 
0), CRA = c(0, 0), Regulatory = c(1, 1), Finance = c(0, 1), ResearchNurse = c(0, 
0)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))

In a previous question I asked on here, I mentioned that I needed to pivot this data from wide to long in order to export it elsewhere. Answers worked great!

Problem is, I discovered that some of the people in my dataset didn't fill out their surveys correctly and have all zero's in certain problematic columns. I.e. when they get pivoted and filtered to "1" values, they get dropped.

Luckily (depending on how you think about it) I can fix their mistakes. If they left those columns blank, I can populate what they should have based on their other columns. I.e. what they filled out under "CRA","Regulatory", "Finance" or "ResearchNurse" will determine whether they get 1's or 0's in "Calendar","Protocol" or "Subject"

To figure out what goes in those columns, we created this matrix of job responsibilities:

    jobs<-structure(list(`Roles (existing)` = c("Calendar Build", "Protocol Management", 
"Subject Management"), `CRA/ Manager/ Senior` = c(1, 1, 0), Regulatory = c(0, 
1, 1), Finance = c(0, 0, 0), `Research Nurse` = c(1, 0, 1)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

So if you're following so far, no matter what "Bob" put in his columns for "Calendar", "Protocol" or "subject" (he currently has zeros), it will be overwritten based on what he put in other columns. So if Bob put a "1" in his 'Regulatory' column, based on that matrix I screenshotted, he should get a 1 in both the protocol and subject columns.

The specific question

So how do I tell R, "look at bob's "CRA,Regulatory, Finance, and researchNurse" columns, and then crossreference the "jobs" dataframe, and overwrite his "calendar, protocol, and subjects" columns?

My expected output in this particular case would be:

One last little detail: I could see instances where (depending on the order), numbers would overwrite each other. I.e. if Bob should get a 1 in protocol because he's got a 1 in regulatory... but he's got a 1 in finance which would mean he should get a 0 in protocol.....
When in doubt, if a column is overwritten with a 1, it should never be turned back into a zero. I hope that makes sense.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星光不落少年眉 2025-01-23 20:28:42

library(tidyverse)

首先，通过旋转 df 和 jobs ，任务应该变得容易得多

(df_long <- df %>% 
  pivot_longer(
    cols = -c(fname, employee_number), names_to = "term"
  ) %>% 
  filter(value == 1) %>% 
  select(-value))

#> # A tibble: 3 x 3
#>   fname employee_number term      
#>   <chr> <chr>           <chr>     
#> 1 Linda 00000123456     Regulatory
#> 2 Bob   654321          Regulatory
#> 3 Bob   654321          Finance

现在，如果我正确理解你的问题，鲍勃应该添加“协议”
和“主题”在他的调查中，因为他从事“金融”工作。幸运的是，我们可以添加
自动为他提供该信息。我们调整工作岗位并清理
名称/术语与 df 中的名称/术语相匹配。可以这样完成：

(jobs_long <- jobs %>% 
  rename(
    CRA = `CRA/ Manager/ Senior`, ResearchNurse = `Research Nurse`
  ) %>% 
  mutate(
    roles = `Roles (existing)` %>% str_extract("^\\w+"),
    .keep = "unused"
  ) %>% 
  pivot_longer(-roles, names_to = "term") %>% 
  filter(value == 1) %>% 
  select(-value))

#> # A tibble: 6 x 2
#>   roles    term         
#>   <chr>    <chr>        
#> 1 Calendar CRA          
#> 2 Calendar ResearchNurse
#> 3 Protocol CRA          
#> 4 Protocol Regulatory   
#> 5 Subject  Regulatory   
#> 6 Subject  ResearchNurse

一旦形成这个形状，我们可以连接两个表，做一些整理，然后我们
最终得到正确的信息。我们可以从这里继续争论
将数据恢复为宽形状，但这样可能更有用
所以这就是我要停下来的地方。

df_long %>% 
  left_join(jobs_long, by = c("term" = "term")) %>% 
  pivot_longer(cols = c(term, roles), values_drop_na = TRUE) %>% 
  distinct(fname, employee_number, term = value)

#> # A tibble: 7 x 3
#>   fname employee_number term      
#>   <chr> <chr>           <chr>     
#> 1 Linda 00000123456     Regulatory
#> 2 Linda 00000123456     Protocol  
#> 3 Linda 00000123456     Subject   
#> 4 Bob   654321          Regulatory
#> 5 Bob   654321          Protocol  
#> 6 Bob   654321          Subject   
#> 7 Bob   654321          Finance

^{由 reprex 软件包 (v1.0.0) 创建于 2022 年 3 月 31 日}

library(tidyverse)

First, by pivoting both df and jobs, the task should become much easier

(df_long <- df %>% 
  pivot_longer(
    cols = -c(fname, employee_number), names_to = "term"
  ) %>% 
  filter(value == 1) %>% 
  select(-value))

#> # A tibble: 3 x 3
#>   fname employee_number term      
#>   <chr> <chr>           <chr>     
#> 1 Linda 00000123456     Regulatory
#> 2 Bob   654321          Regulatory
#> 3 Bob   654321          Finance

Now, if I understand your question correctly, Bob should have added “Protocol”
and “Subject”in his survey because he works in “Finance”. Luckily, we can add
that information for him automatically. We pivot jobs and clean up the
names/terms to match those in df. This can be done like this:

(jobs_long <- jobs %>% 
  rename(
    CRA = `CRA/ Manager/ Senior`, ResearchNurse = `Research Nurse`
  ) %>% 
  mutate(
    roles = `Roles (existing)` %>% str_extract("^\\w+"),
    .keep = "unused"
  ) %>% 
  pivot_longer(-roles, names_to = "term") %>% 
  filter(value == 1) %>% 
  select(-value))

#> # A tibble: 6 x 2
#>   roles    term         
#>   <chr>    <chr>        
#> 1 Calendar CRA          
#> 2 Calendar ResearchNurse
#> 3 Protocol CRA          
#> 4 Protocol Regulatory   
#> 5 Subject  Regulatory   
#> 6 Subject  ResearchNurse

Once in this shape, we can join the two tables, do some tidying, and then we
end up with the correct information. We could continue from here and wrangle
the data back into the wide shape, but it’s probably more useful like this
so that’s where I would stop.

df_long %>% 
  left_join(jobs_long, by = c("term" = "term")) %>% 
  pivot_longer(cols = c(term, roles), values_drop_na = TRUE) %>% 
  distinct(fname, employee_number, term = value)

#> # A tibble: 7 x 3
#>   fname employee_number term      
#>   <chr> <chr>           <chr>     
#> 1 Linda 00000123456     Regulatory
#> 2 Linda 00000123456     Protocol  
#> 3 Linda 00000123456     Subject   
#> 4 Bob   654321          Regulatory
#> 5 Bob   654321          Protocol  
#> 6 Bob   654321          Subject   
#> 7 Bob   654321          Finance

^{Created on 2022-03-31 by the reprex package (v1.0.0)}

回复收藏 0 原文

半城柳色半声笛 2025-01-23 20:28:41

我建议将您的逻辑转换为 ifelse 语句：

df$Calendar <- ifelse(df$CRA == 1 | df$ResearchNurse == 1, 1, df$Calendar)
df$Protocol <- ifelse(df$CRA == 1 | df$Regulatory == 1, 1, df$Protocol)
df$Subject <- ifelse(df$Regulatory == 1 | df$ResearchNurse == 1, 1, df$Subject)

df
#>   fname employee_number Calendar Protocol Subject CRA Regulatory Finance
#> 1 Linda     00000123456        0        1       1   0          1       0
#> 2   Bob          654321        0        1       1   0          1       1
#>   ResearchNurse
#> 1             0
#> 2             0

数据：

df <- structure(list(
  fname = c("Linda", "Bob"),
  employee_number = c("00000123456", "654321"),
  Calendar = c(0, 0), Protocol = c(0, 0), Subject = c(0, 0),
  CRA = c(0, 0), Regulatory = c(1, 1), Finance = c(0, 1),
  ResearchNurse = c(0, 0)), row.names = c(NA, -2L), class = c("data.frame"))

Created on 2022-03-28 by the reprex package (v2.0.1)

I'd suggest converting your logic to ifelse statement(s):

df$Calendar <- ifelse(df$CRA == 1 | df$ResearchNurse == 1, 1, df$Calendar)
df$Protocol <- ifelse(df$CRA == 1 | df$Regulatory == 1, 1, df$Protocol)
df$Subject <- ifelse(df$Regulatory == 1 | df$ResearchNurse == 1, 1, df$Subject)

df
#>   fname employee_number Calendar Protocol Subject CRA Regulatory Finance
#> 1 Linda     00000123456        0        1       1   0          1       0
#> 2   Bob          654321        0        1       1   0          1       1
#>   ResearchNurse
#> 1             0
#> 2             0

data:

df <- structure(list(
  fname = c("Linda", "Bob"),
  employee_number = c("00000123456", "654321"),
  Calendar = c(0, 0), Protocol = c(0, 0), Subject = c(0, 0),
  CRA = c(0, 0), Regulatory = c(1, 1), Finance = c(0, 1),
  ResearchNurse = c(0, 0)), row.names = c(NA, -2L), class = c("data.frame"))

Created on 2022-03-28 by the reprex package (v2.0.1)

回复收藏 0 原文

很酷又爱笑 2025-01-23 20:28:41

两个表都需要一个共同的查找值。

例如，在您的 df 表中，有一个 employee_number 列。 jobs 表中是否有相同的字段？如果是这样，可以使用 left_join() 和 case_when() 轻松完成，

您将需要简化当前的 jobs 表以获得一些摘要您在帖子中放入的逻辑值，例如（如果鲍勃在监管中得到 1，那么他应该在协议和主题栏中得到 1）。这可以通过一些表操作函数来完成。我无法告诉你具体是哪些，因为我不完全理解其中的逻辑。

假设您对此很清楚，并且知道如何总结每行的 jobs 表（并且您有唯一的 employee_number），那么下面的代码应该可以工作。

left_join(x=df,y=jobs,by="employee_number") %>% 
  muate(new_col1=case_when(logic_1 ~ value1,
                             logic_2 ~ value2,
                             logic_3 ~ value3,
                             TRUE ~ default_value))

您可以根据需要对其他列重复 newcol 逻辑。

Both tables need a common look up value.

So for example in your df table there is a employee_number column. Do you have the same field in the jobs table? If so this is easy to do with left_join() and then a case_when()

You will need simplify your current jobs table to have some summary value of the logic you put in your post eg(if Bob has a 1 in regulatory then he should get a 1 in protocol and subject columns). This can be done with some table manipulation functions. I can't tell you exactly which ones because I don't fully understand the logic.

Assuming that is clear to you and you know how to summarize that jobs table (and you have the unique employee_number) for each row then the below should work.

left_join(x=df,y=jobs,by="employee_number") %>% 
  muate(new_col1=case_when(logic_1 ~ value1,
                             logic_2 ~ value2,
                             logic_3 ~ value3,
                             TRUE ~ default_value))

You can repeat the newcol logic for additional columns as required.

回复收藏 0 原文

~没有更多了~