合并一段时间内的记录
首先我要说这个问题与 R(stat 编程语言)有关,但我对其他环境提出简单的建议。
目标是将数据帧 (df) A 的结果合并到 df B 中的子元素。这是一对多关系,但是,这是变化,一旦记录与键匹配,它们也具有匹配由开始时间和持续时间给出的特定时间帧。
例如,df A 中的一些记录:
OBS ID StartTime Duration Outcome
1 01 10:12:06 00:00:10 Normal
2 02 10:12:30 00:00:30 Weird
3 01 10:15:12 00:01:15 Normal
4 02 10:45:00 00:00:02 Normal
以及来自 df B:
OBS ID Time
1 01 10:12:10
2 01 10:12:17
3 02 10:12:45
4 01 10:13:00
合并的所需结果将是:
OBS ID Time Outcome
1 01 10:12:10 Normal
3 02 10:12:45 Weird
所需结果:数据帧 B 的结果从 A 合并。请注意,观察 2 和 4 被删除,因为尽管它们与记录上的 ID 匹配在 A 中,它们不属于给定的任何时间间隔内。
问题
是否可以在 R 中执行此类操作以及如何开始?如果没有,您能推荐一个替代工具吗?
Let me begin by saying this question pertains to R (stat programming language) but I'm open straightforward suggestions for other environments.
The goal is to merge outcomes from dataframe (df) A to sub-elements in df B. This is a one to many relationship but, here's the twist, once the records are matched by keys they also have to match over a specific frame of time given by a start time and duration.
For example, a few records in df A:
OBS ID StartTime Duration Outcome
1 01 10:12:06 00:00:10 Normal
2 02 10:12:30 00:00:30 Weird
3 01 10:15:12 00:01:15 Normal
4 02 10:45:00 00:00:02 Normal
And from df B:
OBS ID Time
1 01 10:12:10
2 01 10:12:17
3 02 10:12:45
4 01 10:13:00
The desired outcome from the merge would be:
OBS ID Time Outcome
1 01 10:12:10 Normal
3 02 10:12:45 Weird
Desired result: dataframe B with outcomes merged in from A. Notice observations 2 and 4 were dropped because although they matched IDs on records in A they did not fall within any of the time intervals given.
Question
Is it possible to perform this sort of operation in R and how would you get started? If not, can you suggest an alternative tool?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
发布评论
评论(3)
这是一个示例:
# first, merge by ID
z <- merge(A[, -1], B, by = "ID")
# convert string to POSIX time
z <- transform(z,
s_t = as.numeric(strptime(as.character(z$StartTime), "%H:%M:%S")),
dur = as.numeric(strptime(as.character(z$Duration), "%H:%M:%S")) -
as.numeric(strptime("00:00:00", "%H:%M:%S")),
tim = as.numeric(strptime(as.character(z$Time), "%H:%M:%S")))
# subset by time range
subset(z, s_t < tim & tim < s_t + dur)
输出:
ID StartTime Duration Outcome OBS Time s_t dur tim
1 1 10:12:06 00:00:10 Normal 1 10:12:10 1321665126 10 1321665130
2 1 10:12:06 00:00:10 Normal 2 10:12:15 1321665126 10 1321665135
7 2 10:12:30 00:00:30 Weird 3 10:12:45 1321665150 30 1321665165
OBS #2 看起来在范围内。这有意义吗?
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
设置数据
首先设置输入数据框。我们创建两个版本的数据框:
A
和B
仅使用字符列表示时间,At
和Bt
> 使用 chron 包"times"
类来表示时间(它比"character"
类有优势,可以对它们进行加减运算):sqldf with times class
现在我们可以使用 sqldf 包。我们使用
method="raw"
(它不会为输出分配类),因此我们必须将"time"
类分配给输出"Time"< /code> 列我们自己:
结果是:
使用 sqldf 的开发版本,无需使用
method="raw"
即可完成此操作,并且"Time"
列将自动设置为"times"
类sqldf 类分配启发式:具有字符类的 sqldf
实际上可以通过在 sqlite 中执行所有时间计算来不使用
“times”
类使用 sqlite 的 strftime 函数的字符串。不幸的是,SQL 语句涉及更多一些:编辑:
一系列编辑修复了语法,添加了额外的方法并修复/改进了 read.table 语句。
编辑:
简化/改进的最终 sqldf 语句。
Set up data
First set up the input data frames. We create two versions of the data frames:
A
andB
just use character columns for the times andAt
andBt
use the chron package"times"
class for the times (which has the advantage over"character"
class that one can add and subtract them):sqldf with times class
Now we can perform the calculation using the sqldf package. We use
method="raw"
(which does not assign classes to the output) so we must assign the"times"
class to the output"Time"
column ourself:The result is:
With the development version of sqldf this can be done without using
method="raw"
and the"Time"
column will automatically be set to"times"
class by the sqldf class assignment heuristic:sqldf with character class
Its actually possible to not use the
"times"
class by performing all time calculations in sqlite out of character strings employing sqlite's strftime function. The SQL statement is unfortunately a bit more involved:EDIT:
A series of edits which fixed grammar, added additional approaches and fixed/improved the
read.table
statements.EDIT:
Simplified/improved final sqldf statement.