R：将行与相同的ID结合

发布于 2025-02-12 16:21:30 字数 3061 浏览 2 评论 0原文

编辑：我将var4更改为字符串值，因为我的问题对我的数据还不够精确，因此由于无效的类型，答案失败了。抱歉，

这是我在这里的第一个问题，希望有人可以帮助我。

我有以下数据集：

id	日期	n_date	var1	var2	var3	var4	类型
1	4.7.22	50000	12	na na	na	na	nalum natry
1	4.7.22	50000	na	23	na na na na	na na na	正常
1	4.7.22	50000	na 5 na 5 na 5 na 5 na 5 na	na na	5	na	5
na	4.7.22	50000	NA	NA	NA ASD	正常	3
2	4.7.22	50000	NA	2	Na Na NA NA NA NA NA NA	NA NA NA NA	正常
5.7.22	20000	7	NA	NA	NA	NA	NARANOM

我的目标是每个ID只有一排。因此，我想做的是将每个ID的VAR列值移动或以某种方式将它们组合起来。如您所见，目前，每行的VAR列中永远不会有一个以上的值。因此，使用相应的“实际值”重写NAS应该很容易。我还发现了类似的问题，但在我的情况下，答案没有帮助：

如何将行与相同的标识符r呢？

我认为我的情况是，我的列具有诸如“ date”，“ n_date”之类的列（这是对该日期的观察）和“类型”。在这些情况下，我的代码应该看到，它对于相应的ID完全相同，例如以第一个值为例。

因此，最终我只有3行，其中包含所有信息，其中包含相同数量的列。

非常感谢任何有想法解决这个问题的人。

原文

Edit: I changed Var4 to a string value as my question was not precise enough about my data and therefore answers were failing because of invalid types. Sorry for that

this is my first question here and I hope someone can help me.

I have the following data set:

ID	Date	N_Date	Var1	Var2	Var3	Var4	type
1	4.7.22	50000	12	NA	NA	NA	normal
1	4.7.22	50000	NA	23	NA	NA	normal
1	4.7.22	50000	NA	NA	5	NA	normal
1	4.7.22	50000	NA	NA	NA	asd	normal
2	4.7.22	50000	NA	2	NA	NA	normal
3	5.7.22	20000	7	NA	NA	NA	normal

My goal is to have just one row for each ID. So what I want R to do, is to shift the Var column values for each ID up or somehow combine them. As you can see, at the moment, there is never more than one value in a Var column for each row. So it should be easy to rewrite the NAs with the corresponding "real value". I also found similiar questions but the answer did not help in my case:

How to combine rows with the same identifier R?

I think the problem in my case is, that I have columns like "date", "N_date" (which is the number of observations on that date) and "type". In these cases my code should see, that it is exactly the same value for the corresponding ID, and just take the first value for example.

So that in the end I just have 3 rows with same number of columns, containing all information.

Thank you very much for anyone who has an idea how to solve this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

套路撩心 2025-02-19 16:21:30

这样的事情：
在这里，我们首先要组的所有组，除了var变量，然后我们使用摘要（跨...，如@limey在注释部分中所建议的。
主要功能是使用na.rm = true：

library(dplyr)

df %>% 
  group_by(ID, Date, N_Date, type) %>% 
  summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))

     ID Date   N_Date type    Var1  Var2  Var3  Var4
  <int> <chr>   <int> <chr>  <int> <int> <int> <int>
1     1 4.7.22  50000 normal    12    23     5    54
2     2 4.7.22   4000 normal     0     2     0     0
3     3 5.7.22  20000 normal     7     0     0     0

Something like this:
Here we first group for all except the Var variables, then we use summarise(across... as suggested by @Limey in the comments section.
Main feature is to use na.rm=TRUE:

library(dplyr)

df %>% 
  group_by(ID, Date, N_Date, type) %>% 
  summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))

     ID Date   N_Date type    Var1  Var2  Var3  Var4
  <int> <chr>   <int> <chr>  <int> <int> <int> <int>
1     1 4.7.22  50000 normal    12    23     5    54
2     2 4.7.22   4000 normal     0     2     0     0
3     3 5.7.22  20000 normal     7     0     0     0

回复收藏 0 原文

~没有更多了~