Data.Table计数观测值在当前观察的距离和时间上接近

发布于 2025-01-25 15:54:26 字数 424 浏览 3 评论 0原文

我希望通过计算次数在sec +/- 5和x +/- 5之内以及当前行的y +/- 5中来计算新列“拥塞”。从本质上讲,我想找到在当前观察结果的近距离(x,y)和时间段(sec)内发生的观察结果,这只是一个很大的计数。所有值都是数值。

当前数据。表格

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

中所需的输出

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

优选解决方案。表图,但乐于在dplyr中工作。

I am looking to calculate a new column "congestion" by counting the number of times values are within sec +/- 5 and within x +/- 5 and within y +/- 5 of the current row. Essentially I am wanting to find observations that occur within a close distance (x,y) and time period (sec) of the current observation which is just a big count ifelse statement. All values are numerical.

current data.table

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

desired output

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

preferable solution in data.table but happy to work in dplyr.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

郁金香雨 2025-02-01 15:54:26

考虑到您指定的标准,我认为您的“所需输出”是不正确的。

但是,如果您的数据足够小,则可以使用本身对Data进行全加入,并且过滤无效的组合

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3



可能会更有效地进行by = .eachi进行。 join(从

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

I think your "desired output" is incorrect given the criteria you've specified.

However, if your data is small enough you can do a full-join on the data with itself, and filter out invalid combinations

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3



A slightly more efficient approach might be to do a by = .EACHI join (borrowing the concept from this answer

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

那支青花 2025-02-01 15:54:26

您可以定义限制并加入它们:

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

根据+/- 5规则,我发现的拥塞少于您的预期结果。如果我正确理解约束,这对我来说似乎是正确的。

You could define the limits and join on them:

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

According to the +/- 5 rule, I find less congestions than your expected result. If I understood correctly the constraints, this seems correct to me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文