是否可以根据r中的时间变量过滤掉出与时间变量重叠的异常值?

发布于 2025-01-25 03:19:45 字数 556 浏览 4 评论 0原文

如何使用filter()检查ID列与日期/时间列之间的重叠来删除离群值?

例如,如下前2行所示,具有ID = 1的行的行重叠,因此需要删除。

ID时间开始时间结束
12015-03-16 10:40:002015-03-16 11:10:00
12015-03-16 10:50:002015-03-16 10:59:00
22015- 2015- 03-16 10:40:002015-03-16 10:45:00
12015-03-16 11:20:002015-03-16 11:28:56

How do I delete outliers by checking for an overlap between ID column and date/time columns withfilter()?

For example, rows with ID =1 overlap in time as shown in the first 2 rows below,thus need to be deleted.

IDTime startTime end
12015-03-16 10:40:002015-03-16 11:10:00
12015-03-16 10:50:002015-03-16 10:59:00
22015-03-16 10:40:002015-03-16 10:45:00
12015-03-16 11:20:002015-03-16 11:28:56

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花间憩 2025-02-01 03:19:45

尝试以删除组内的任何时间重叠。请使用更多数据对其进行测试,以查看是否执行您想要的工作。我只尝试了下面的小样本。

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(slider)

tribble(
  ~id, ~start, ~end,
  1, "2015-03-16 10:40:00", "2015-03-16 11:10:00",
  1, "2015-03-16 10:50:00", "2015-03-16 10:59:00",
  1, "2015-03-16 11:09:00", "2015-03-16 11:11:00",
  2, "2015-03-16 10:40:00", "2015-03-16 10:45:00",
  1, "2015-03-16 11:20:00", "2015-03-16 11:28:56",
  1, "2015-03-16 11:27:00", "2015-03-16 11:30:56",
  2, "2015-03-16 10:44:00", "2015-03-16 11:45:00"
) |>
  mutate(
    start = ymd_hms(start, tz = Sys.timezone()),
    end = ymd_hms(end, tz = Sys.timezone())
  ) |>
  arrange(id, start, end) |>
  group_by(id) |>
  mutate(
    roll_start = slide_vec(start, min, .before = Inf),
    roll_end = slide_vec(end, max, .before = Inf),
    overlap = if_else((start >= lag(roll_start) & start <= lag(roll_end)) |
      (end >= lag(roll_start) & end <= lag(roll_end)), "yes", "no")
  ) |>
  filter(overlap == "no" | is.na(overlap)) |> 
  select(- c(starts_with("roll_"), overlap))
#> # A tibble: 3 × 3
#> # Groups:   id [2]
#>      id start               end                
#>   <dbl> <dttm>              <dttm>             
#> 1     1 2015-03-16 10:40:00 2015-03-16 11:10:00
#> 2     1 2015-03-16 11:20:00 2015-03-16 11:28:56
#> 3     2 2015-03-16 10:40:00 2015-03-16 10:45:00

Try this to remove any time overlaps within a group. Please test it with more data to see if it does what you want. I only tried the small sample below.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(slider)

tribble(
  ~id, ~start, ~end,
  1, "2015-03-16 10:40:00", "2015-03-16 11:10:00",
  1, "2015-03-16 10:50:00", "2015-03-16 10:59:00",
  1, "2015-03-16 11:09:00", "2015-03-16 11:11:00",
  2, "2015-03-16 10:40:00", "2015-03-16 10:45:00",
  1, "2015-03-16 11:20:00", "2015-03-16 11:28:56",
  1, "2015-03-16 11:27:00", "2015-03-16 11:30:56",
  2, "2015-03-16 10:44:00", "2015-03-16 11:45:00"
) |>
  mutate(
    start = ymd_hms(start, tz = Sys.timezone()),
    end = ymd_hms(end, tz = Sys.timezone())
  ) |>
  arrange(id, start, end) |>
  group_by(id) |>
  mutate(
    roll_start = slide_vec(start, min, .before = Inf),
    roll_end = slide_vec(end, max, .before = Inf),
    overlap = if_else((start >= lag(roll_start) & start <= lag(roll_end)) |
      (end >= lag(roll_start) & end <= lag(roll_end)), "yes", "no")
  ) |>
  filter(overlap == "no" | is.na(overlap)) |> 
  select(- c(starts_with("roll_"), overlap))
#> # A tibble: 3 × 3
#> # Groups:   id [2]
#>      id start               end                
#>   <dbl> <dttm>              <dttm>             
#> 1     1 2015-03-16 10:40:00 2015-03-16 11:10:00
#> 2     1 2015-03-16 11:20:00 2015-03-16 11:28:56
#> 3     2 2015-03-16 10:40:00 2015-03-16 10:45:00

Created on 2022-04-30 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文