在R中提取共享时间持续时间的共同词
我需要从已经包含单变量异常的数据框中提取联合异常。
# Libraries
library(dplyr)
library(lubridate)
library(stringr)
# Create input dataframe
DF <- data.frame(
rowID = as.factor(c(1,2,3,4,5,6,7,8)),
Start = as_datetime(c('2022-01-01 09:00:00', '2022-01-01 12:00:00', '2022-01-02 15:00:00',
'2022-01-02 23:30:00', '2022-01-03 00:10:00', '2022-01-29 00:10:00',
'2023-12-25 06:00:00', '2023-12-25 08:00:00')),
Finish = as_datetime(c('2022-01-01 11:00:00', '2022-01-01 15:00:00','2022-01-03 01:00:00',
'2022-01-02 23:50:00', '2022-01-03 03:00:00', '2022-01-31 03:00:00',
'2023-12-25 11:00:00', '2023-12-25 12:00:00')),
Process = c('Process1', 'Process2', 'Process1', 'Process2', 'Process3', 'Process3', 'Process3', 'Process3'),
Anomaly = c('Y','N','Y','Y','Y', 'Y', 'Y', 'Y')
) %>%
arrange(Start, Process) %>%
mutate(Interval = interval(Start, Finish)) %>%
as_tibble()
我能够成功标记与感兴趣的进程 (Process3) 类似的时间段内发生的协同异常。
# Declare process of interest
c <- 'Process3'
# Extract co-anomalies within and between Process3
Result <- DF %>%
filter(int_overlaps(Interval, Interval[Process == c]) == TRUE) %>%
mutate(coAnomaly = ifelse(Anomaly == 'Y', 'Y', 'N')) %>%
left_join(DF, ., by = c('rowID' = 'rowID')) %>%
select(contains('.x'), coAnomaly) %>%
rename_with(~str_remove(., '.x'))
该代码正确标记了进程 3 和其他进程之间的协同异常。尽管它在检测进程 3 自身时会出错。
第 6 行是一个错误,该异常不会在另一个 Process3 内或任何其他进程之间同时发生。
我正在尝试正确标记:
- 哪个 Process3s 与其他进程共同发生(在 LHS 之间)
- 哪个其他进程与 Process3s 共同发生(在 RHS 之间)
- 哪个 Process3s 与 Process3s 共同发生(内部)
I need to extract co-anomalies from a data-frame which already contains univariate anomalies.
# Libraries
library(dplyr)
library(lubridate)
library(stringr)
# Create input dataframe
DF <- data.frame(
rowID = as.factor(c(1,2,3,4,5,6,7,8)),
Start = as_datetime(c('2022-01-01 09:00:00', '2022-01-01 12:00:00', '2022-01-02 15:00:00',
'2022-01-02 23:30:00', '2022-01-03 00:10:00', '2022-01-29 00:10:00',
'2023-12-25 06:00:00', '2023-12-25 08:00:00')),
Finish = as_datetime(c('2022-01-01 11:00:00', '2022-01-01 15:00:00','2022-01-03 01:00:00',
'2022-01-02 23:50:00', '2022-01-03 03:00:00', '2022-01-31 03:00:00',
'2023-12-25 11:00:00', '2023-12-25 12:00:00')),
Process = c('Process1', 'Process2', 'Process1', 'Process2', 'Process3', 'Process3', 'Process3', 'Process3'),
Anomaly = c('Y','N','Y','Y','Y', 'Y', 'Y', 'Y')
) %>%
arrange(Start, Process) %>%
mutate(Interval = interval(Start, Finish)) %>%
as_tibble()
I'm able to successfully tag co-anomalies which occurred over similar time periods as the process of interest (Process3).
# Declare process of interest
c <- 'Process3'
# Extract co-anomalies within and between Process3
Result <- DF %>%
filter(int_overlaps(Interval, Interval[Process == c]) == TRUE) %>%
mutate(coAnomaly = ifelse(Anomaly == 'Y', 'Y', 'N')) %>%
left_join(DF, ., by = c('rowID' = 'rowID')) %>%
select(contains('.x'), coAnomaly) %>%
rename_with(~str_remove(., '.x'))
The code correctly tags co-anomalies between process 3 and other processes. Although it makes errors when detecting process 3 against itself.
Row 6 is an error, the anomaly doesn't co-occur within another Process3 or between any other process.
I'm trying to correctly tag:
- Which Process3s co-occurred with other-processes (Between LHS)
- Which other-processes co-occurred with Process3s (Between RHS)
- Which Process3s co-occurred with Process3s (Within)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
continue
You can try this approach using
rowwise()
:Output:
Updated, given OP's additional request of separating Between/Within, and new frame:
Output:
All types of overlaps:
Here is another approach, which does not depend on indicating a
Process
of interest (i.e. no need forc="Process3"
.rowwise
andunnest
Output: