使重复级别在 r 中唯一

发布于 2024-12-06 14:17:58 字数 882 浏览 3 评论 0原文

我必须根据另一个文件的日期/时间窗口从一个大文件中提取每个人的数据。我有 ID、开始、结束日期作为我的窗口文件和一个巨大的文件（完整数据），其中包含各种其他数据，但我们将简化它并说它只有 ID、日期、X、Y、Z。

例如., 日期窗口数据文件

ID   Start              End Date         
c1    03/03/03  01:02    03/15/03 12:00
a2    03/03/03  04:35    03/10/03 13:00   
f3    02/03/03  11:02    05/15/03 15:00
c1    01/03/04  13:02    02/15/05 17:00
a2    04/23/04  04:02    05/14/05 10:00

有重复的 ID，但我需要从中获取数据的唯一时间窗口。

在我的代码中，我需要设置级别，

IDNames = levels(dates$ID)

但这样做会使 ID 级别唯一，并且仅使用该级别第一次显示时的窗口。因此，在我的示例中，c1 和 a2 将仅使用一次，并从它们第一次出现时提取信息。

我已经尝试过了，

levels(make.unique(IDNames))

但这会将 ID“c1”变成“c1.1”，这将与我其余命令中的 FULLDATA 文件不匹配。

for (n in 1:(length(IDNames))) {
  IndivData = FULLDATA[FULLDATA$ID==IDNames[n],]
  Dateswindow = dates[dates$ID==IDNames[n],]
}

原文

I have to extract data for each individual from one large file based on a date/time window from another file. I have ID, Start, End date as my window file and a huge file (FULLDATA) with all sorts of other data in it, but we shall just simplify it and say it has only ID, Date, X, Y, Z.

Eg., Dates window Data file

ID   Start              End Date         
c1    03/03/03  01:02    03/15/03 12:00
a2    03/03/03  04:35    03/10/03 13:00   
f3    02/03/03  11:02    05/15/03 15:00
c1    01/03/04  13:02    02/15/05 17:00
a2    04/23/04  04:02    05/14/05 10:00

There are duplicate ID's but unique windows of time I need to get data from.

In my code I need to set the levels

IDNames = levels(dates$ID)

but in doing this it will make the ID levels unique and only use the window from the 1st time that level showed. So, in my example c1 and a2 will only be used 1 time and extract the information from their first appearance.

I have tried

levels(make.unique(IDNames))

But this then will turn the ID "c1" into "c1.1" which will not match up with the FULLDATA file in the rest of my commands.

for (n in 1:(length(IDNames))) {
  IndivData = FULLDATA[FULLDATA$ID==IDNames[n],]
  Dateswindow = dates[dates$ID==IDNames[n],]
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

季末如歌 2024-12-13 14:17:58

如果我理解正确的话，听起来您想生成一个列来标识 ID、Start 和 End Date 的唯一组合>，从某种意义上讲，它保留了每个部分，以便您可以使用另一个数据框引用它们。

为什么不简单地将所有内容强制为字符并使用有用的分隔字符将它们粘贴在一起？就像这样：

IDNames <- paste(as.character(dates$ID),as.character(...),as.character(...),sep = "-")

然后，每个 ID-Start-End 组合都会有一个唯一的值，并且您可以轻松地再次拉出每个部分，以便与其他数据合并回来。

If I'm understanding you correctly, it sounds like you want to produce a single column that identifies a unique combination of ID, Start and End Date, that in some sense preserves each piece so that you can refer to them using another data frame.

Why not simply coerce everything to characters and paste them together with a useful separating character? So something like:

IDNames <- paste(as.character(dates$ID),as.character(...),as.character(...),sep = "-")

Then you'll have a unique value for each ID-Start-End combination, and you can easily pull each piece out again to use to merge back with your other data.

回复收藏 0 原文

~没有更多了~