Data.Table计数观测值在当前观察的距离和时间上接近

发布于 2025-01-25 15:54:26 字数 424 浏览 3 评论 0原文

我希望通过计算次数在sec +/- 5和x +/- 5之内以及当前行的y +/- 5中来计算新列“拥塞”。从本质上讲，我想找到在当前观察结果的近距离（x，y）和时间段（sec）内发生的观察结果，这只是一个很大的计数。所有值都是数值。

当前数据。表格

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

中所需的输出

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

优选解决方案。表图，但乐于在dplyr中工作。

原文

I am looking to calculate a new column "congestion" by counting the number of times values are within sec +/- 5 and within x +/- 5 and within y +/- 5 of the current row. Essentially I am wanting to find observations that occur within a close distance (x,y) and time period (sec) of the current observation which is just a big count ifelse statement. All values are numerical.

current data.table

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

desired output

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

preferable solution in data.table but happy to work in dplyr.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

郁金香雨 2025-02-01 15:54:26

考虑到您指定的标准，我认为您的“所需输出”是不正确的。

但是，如果您的数据足够小，则可以使用本身对Data进行全加入，并且过滤无效的组合

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

可能会更有效地进行by = .eachi进行。 join（从

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

I think your "desired output" is incorrect given the criteria you've specified.

However, if your data is small enough you can do a full-join on the data with itself, and filter out invalid combinations

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

A slightly more efficient approach might be to do a by = .EACHI join (borrowing the concept from this answer

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

回复收藏 0 原文

那支青花 2025-02-01 15:54:26

您可以定义限制并加入它们：

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

根据+/- 5规则，我发现的拥塞少于您的预期结果。如果我正确理解约束，这对我来说似乎是正确的。

You could define the limits and join on them:

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

According to the +/- 5 rule, I find less congestions than your expected result. If I understood correctly the constraints, this seems correct to me.

回复收藏 0 原文

~没有更多了~