R:将行与相同的ID结合

发布于 2025-02-12 16:21:30 字数 3061 浏览 2 评论 0原文

编辑:我将var4更改为字符串值,因为我的问题对我的数据还不够精确,因此由于无效的类型,答案失败了。抱歉,

这是我在这里的第一个问题,希望有人可以帮助我。

我有以下数据集:

id日期n_datevar1var2var3var4类型
14.7.225000012na nanananalum natry
14.7.2250000na23na na na nana na na正常
14.7.2250000na 5 na 5 na 5 na 5 na 5 nana na5na5
na4.7.2250000NANANA ASD正常3
24.7.2250000NA2Na Na NA NA NA NA NA NANA NA NA NA正常
5.7.22200007NANANANANARANOM

我的目标是每个ID只有一排。因此,我想做的是将每个ID的VAR列值移动或以某种方式将它们组合起来。如您所见,目前,每行的VAR列中永远不会有一个以上的值。因此,使用相应的“实际值”重写NAS应该很容易。我还发现了类似的问题,但在我的情况下,答案没有帮助:

如何将行与相同的标识符r呢?

我认为我的情况是,我的列具有诸如“ date”,“ n_date”之类的列(这是对该日期的观察)和“类型”。在这些情况下,我的代码应该看到,它对于相应的ID完全相同,例如以第一个值为例。

因此,最终我只有3行,其中包含所有信息,其中包含相同数量的列。

非常感谢任何有想法解决这个问题的人。

Edit: I changed Var4 to a string value as my question was not precise enough about my data and therefore answers were failing because of invalid types. Sorry for that

this is my first question here and I hope someone can help me.

I have the following data set:

IDDateN_DateVar1Var2Var3Var4type
14.7.225000012NANANAnormal
14.7.2250000NA23NANAnormal
14.7.2250000NANA5NAnormal
14.7.2250000NANANAasdnormal
24.7.2250000NA2NANAnormal
35.7.22200007NANANAnormal

My goal is to have just one row for each ID. So what I want R to do, is to shift the Var column values for each ID up or somehow combine them. As you can see, at the moment, there is never more than one value in a Var column for each row. So it should be easy to rewrite the NAs with the corresponding "real value". I also found similiar questions but the answer did not help in my case:

How to combine rows with the same identifier R?

I think the problem in my case is, that I have columns like "date", "N_date" (which is the number of observations on that date) and "type". In these cases my code should see, that it is exactly the same value for the corresponding ID, and just take the first value for example.

So that in the end I just have 3 rows with same number of columns, containing all information.

Thank you very much for anyone who has an idea how to solve this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

套路撩心 2025-02-19 16:21:30

这样的事情:
在这里,我们首先要组的所有组,除了var变量,然后我们使用摘要(跨...,如@limey在注释部分中所建议的。
主要功能是使用na.rm = true

library(dplyr)

df %>% 
  group_by(ID, Date, N_Date, type) %>% 
  summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))
     ID Date   N_Date type    Var1  Var2  Var3  Var4
  <int> <chr>   <int> <chr>  <int> <int> <int> <int>
1     1 4.7.22  50000 normal    12    23     5    54
2     2 4.7.22   4000 normal     0     2     0     0
3     3 5.7.22  20000 normal     7     0     0     0

Something like this:
Here we first group for all except the Var variables, then we use summarise(across... as suggested by @Limey in the comments section.
Main feature is to use na.rm=TRUE:

library(dplyr)

df %>% 
  group_by(ID, Date, N_Date, type) %>% 
  summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))
     ID Date   N_Date type    Var1  Var2  Var3  Var4
  <int> <chr>   <int> <chr>  <int> <int> <int> <int>
1     1 4.7.22  50000 normal    12    23     5    54
2     2 4.7.22   4000 normal     0     2     0     0
3     3 5.7.22  20000 normal     7     0     0     0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文