Data.Table计数观测值在当前观察的距离和时间上接近
我希望通过计算次数在sec +/- 5和x +/- 5之内以及当前行的y +/- 5中来计算新列“拥塞”。从本质上讲,我想找到在当前观察结果的近距离(x,y)和时间段(sec)内发生的观察结果,这只是一个很大的计数。所有值都是数值。
当前数据。表格
data <- data.table(x = c(1,3,10,15,6),
y = c(5,5,11,14,19),
sec=c(1,3,5,6,9))
中所需的输出
data <- data.table(x = c(1,3,10,15,6),
y = c(5,5,11,14,6),
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)
优选解决方案。表图,但乐于在dplyr中工作。
I am looking to calculate a new column "congestion" by counting the number of times values are within sec +/- 5 and within x +/- 5 and within y +/- 5 of the current row. Essentially I am wanting to find observations that occur within a close distance (x,y) and time period (sec) of the current observation which is just a big count ifelse statement. All values are numerical.
current data.table
data <- data.table(x = c(1,3,10,15,6),
y = c(5,5,11,14,19),
sec=c(1,3,5,6,9))
desired output
data <- data.table(x = c(1,3,10,15,6),
y = c(5,5,11,14,6),
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)
preferable solution in data.table but happy to work in dplyr.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
考虑到您指定的标准,我认为您的“所需输出”是不正确的。
但是,如果您的数据足够小,则可以使用本身对
Data
进行全加入,并且过滤无效的组合可能会更有效地进行
by = .eachi进行。
join(从I think your "desired output" is incorrect given the criteria you've specified.
However, if your data is small enough you can do a full-join on the
data
with itself, and filter out invalid combinationsA slightly more efficient approach might be to do a
by = .EACHI
join (borrowing the concept from this answer您可以定义限制并加入它们:
根据+/- 5规则,我发现的拥塞少于您的预期结果。如果我正确理解约束,这对我来说似乎是正确的。
You could define the limits and join on them:
According to the +/- 5 rule, I find less congestions than your expected result. If I understood correctly the constraints, this seems correct to me.