使用滑动时间窗口/垃圾箱 - kql查询在阈值范围之间的值是否值

发布于 2025-01-24 20:40:10 字数 3448 浏览 5 评论 0 原文

我想在KQL中编写一个滑动窗口查询,该查询将在5分钟的时间窗口中检查汽车的速度是否始终在一定速度限制(例如B/W 100和150 km/h)之间。

以下是一个示例数据集:

时间戳 速度 温度
2022-01-01 00:00:00.0000000 142.5 25.1
2022-01-01-01-01 00:01:00.0000000 147.4 25.5
20222-01-01-01-01-01-01-01-01 00:02 :00.0000 000 158.2 25.2 258.2 258.2 258.2 258.2 258.2 258.2 258.4
20222-011 -01 00:03:00.0000000 134.8 25.6
2022-01-01 00:04:00.0000000 125.3 25.5
2022-01-01 00:05:00.0000000 118.4 25.4
2022-01-01 00:06:00.0000000 106.3 26.3
2022-01-01 00:07:00.0000000 119.6 26.5
2022-01-01 00:08:00.0000000 134.7 25.4
2022-01-01 00:09:00.0000000 153.2 26.6
2022-01-01 00:10:00.0000000 137.5 25.5
2022-01-01 00: 11:00.0000000 129.9 27.4
2022-01-01 00:12:00.0000000 118.1 26.3
2022-01-01 00:13:00.0000000 105.4 25.7
2022-01-01 00:14:00.0000000 101.7 24.4
2022-01-01 00:15: 00.0000000 100.8 25.6
2022-01-01 00:16:00.0000000 95.4 26.2
2022-01-01-01-01 00:17:00.0000000 105.00000 105.6 26.7

首先检查窗口是否在定义的范围为0-4mins,然后是1-4mins,,然后检查速度,,然后检查窗口然后,2-6分钟,然后3-7分钟,然后,4-8分钟等.....直到10-14分钟,然后是11-15分钟,然后是12-16分钟,然后13-17分钟。如果速度在100-150km/h的范围内连续,则查询将返回这些行作为输出。

我希望以下输出:

时间戳 速度 温度
2022-01-01 00:03:00.0000000 134.8 25.6
2022-01-01-01 00:04:00.0000000 125.3 25.5
2022-01-01-01-01-01-01-01 00:05:05:00.00000 118.4 25.4 25.4
25.4 2022222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222太体2121- 01 00:06:00.0000000 106.3 26.3
2022-01-01 00:07:00.0000000 119.6 26.5
2022-01-01 00:08:00.0000000 134.7 25.4
2022-01-01 00:10:00.0000000 137.5 25.5
2022-01-01 00 :11:00.0000000 129.9 27.4
2022-01-01 00:12:00.0000000 118.1 26.3
2022-01-01-01 00:13:00.0000000 105.4 25.4 25.7
2022-01-01-01-01-01-01-01-14:14:00.00.00.00000 101.00000 101.724.4 24.4
2022222-01-15 :00.0000000 100.8 25.6

在输出数据集中,从0-2分钟开始的时间戳被过滤掉,因为当我们从0-4分钟检查时,有一个值(158.2km/h)以外(100-150 kmm/h)。同样,我们在检查1-5分钟以及从2-6分钟检查时发现此值。

从3-7分钟开始,所有速度值都在范围内持续5分钟。从4-8分钟开始,这就是为什么要保持这些行的原因。

最后,我只想在所有5分钟的时间窗口中绘制速度始终在范围内的所有时间窗口。绘图零件很明确,因此我只需要过滤行的帮助即可。

提前致谢!

I would like to write a sliding window query in KQL which would check if the the speed of a car is ALWAYS between a certain speed limit (e.g. b/w 100 and 150 km/h) for a time window of 5 mins.

Following is a sample dataset for it:

Timestamp Speed Temperature
2022-01-01 00:00:00.0000000 142.5 25.1
2022-01-01 00:01:00.0000000 147.4 25.5
2022-01-01 00:02:00.0000000 158.2 25.4
2022-01-01 00:03:00.0000000 134.8 25.6
2022-01-01 00:04:00.0000000 125.3 25.5
2022-01-01 00:05:00.0000000 118.4 25.4
2022-01-01 00:06:00.0000000 106.3 26.3
2022-01-01 00:07:00.0000000 119.6 26.5
2022-01-01 00:08:00.0000000 134.7 25.4
2022-01-01 00:09:00.0000000 153.2 26.6
2022-01-01 00:10:00.0000000 137.5 25.5
2022-01-01 00:11:00.0000000 129.9 27.4
2022-01-01 00:12:00.0000000 118.1 26.3
2022-01-01 00:13:00.0000000 105.4 25.7
2022-01-01 00:14:00.0000000 101.7 24.4
2022-01-01 00:15:00.0000000 100.8 25.6
2022-01-01 00:16:00.0000000 95.4 26.2
2022-01-01 00:17:00.0000000 105.6 26.7

First the window would check if the speed is in the defined range from 0-4mins, then 1-5mins, then, 2-6mins, then 3-7mins, then, 4-8mins and so on ..... until 10-14mins, then 11-15mins, then 12-16mins and then 13-17mins. If the speed is continuously in the 100-150km/h range the query would return those rows as output.

I would expect the following output:

Timestamp Speed Temperature
2022-01-01 00:03:00.0000000 134.8 25.6
2022-01-01 00:04:00.0000000 125.3 25.5
2022-01-01 00:05:00.0000000 118.4 25.4
2022-01-01 00:06:00.0000000 106.3 26.3
2022-01-01 00:07:00.0000000 119.6 26.5
2022-01-01 00:08:00.0000000 134.7 25.4
2022-01-01 00:10:00.0000000 137.5 25.5
2022-01-01 00:11:00.0000000 129.9 27.4
2022-01-01 00:12:00.0000000 118.1 26.3
2022-01-01 00:13:00.0000000 105.4 25.7
2022-01-01 00:14:00.0000000 101.7 24.4
2022-01-01 00:15:00.0000000 100.8 25.6

In the output dataset, timestamps from 0-2mins are filtered out because when we check from 0-4mins there is a value (158.2km/h) out of the range (100-150km/h). Similarly, we find this value when checking from 1-5mins and also when checking from 2-6mins.

From 3-7mins all the speed values are within the range constantly for 5 mins. And from 4-8mins as well that's why these rows are kept.

In the end, I would just like to plot the temperature for all the 5 min time windows where the speed was always within the range. The plotting part is clear so I just need help with filtering the rows.

Thanks in Advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

错爱 2025-01-31 20:40:10

这是一个解决方案,没有任何数据的假设。
其他信息具有时间戳的粒度,可以帮助简化它。

  1. 将行分解为组,将记录以范围的(OOR)速度作为边界。还添加一个合成记录,后来可以用作最小边界。
  2. 对于每个组,找到其边界,与滑动间隔(1m)对齐。这里的主要挑战是找到每个组的下限,因为对于上边界,我们使用 bin ,它是地板的代名词,但我们对天花板没有等效。
  3. 删除小于定义窗口(5m)的组。
  4. 加入小组边界与小组的记录并删除OOR记录。

let p_sliding_interval = 1m;
let p_window = 5m;
let t = 
datatable (Timestamp:datetime ,Speed:real ,Temperature:real)
[
     '2022-01-01 00:00:00.0000000' ,142.5 ,25.1
    ,'2022-01-01 00:01:00.0000000' ,147.4 ,25.5
    ,'2022-01-01 00:02:00.0000000' ,158.2 ,25.4
    ,'2022-01-01 00:03:00.0000000' ,134.8 ,25.6
    ,'2022-01-01 00:04:00.0000000' ,125.3 ,25.5
    ,'2022-01-01 00:05:00.0000000' ,118.4 ,25.4
    ,'2022-01-01 00:06:00.0000000' ,106.3 ,26.3
    ,'2022-01-01 00:07:00.0000000' ,119.6 ,26.5
    ,'2022-01-01 00:08:00.0000000' ,134.7 ,25.4
    ,'2022-01-01 00:09:00.0000000' ,153.2 ,26.6
    ,'2022-01-01 00:10:00.0000000' ,137.5 ,25.5
    ,'2022-01-01 00:11:00.0000000' ,129.9 ,27.4
    ,'2022-01-01 00:12:00.0000000' ,118.1 ,26.3
    ,'2022-01-01 00:13:00.0000000' ,105.4 ,25.7
    ,'2022-01-01 00:14:00.0000000' ,101.7 ,24.4
    ,'2022-01-01 00:15:00.0000000' ,100.8 ,25.6
    ,'2022-01-01 00:16:00.0000000' ,95.4  ,26.2
    ,'2022-01-01 00:17:00.0000000' ,105.6 ,26.7
];
let min_Timestamp = toscalar(t | summarize min(Timestamp));
let max_Timestamp = toscalar(t | summarize max(Timestamp));
let row_level = 
t
| extend out_of_range_record_flag = iff(Speed !between (100 .. 150),1,0)
| union (print Timestamp = datetime(null) , out_of_range_record_flag = 1)
| order by Timestamp asc nulls first 
| extend in_range_group_id = row_cumsum(out_of_range_record_flag);
let group_boundries =
row_level 
| where out_of_range_record_flag == 1
| project in_range_group_id, from_timestamp = coalesce(Timestamp, min_Timestamp)
| order by in_range_group_id asc
| extend to_timestamp = coalesce(next(from_timestamp), max_Timestamp)
| extend bin_from_timestamp = bin(from_timestamp, p_sliding_interval)
| extend ceil_from_timestamp = bin_from_timestamp + iff(bin_from_timestamp == from_timestamp, 0ms, p_sliding_interval)
| extend in_range_window = bin(to_timestamp, p_sliding_interval) - ceil_from_timestamp
| where in_range_window >= p_window;
group_boundries
| join kind=inner row_level on in_range_group_id
| where out_of_range_record_flag == 0
| project Timestamp, Speed, Temperature
Timestamp Speed Temperature
2022-01-01T00:03:00Z 134.8 25.6
2022-01-01T00:04:00Z 125.3 25.5
2022-01-01T00:05:00Z 118.4 25.4
2022-01-01T00:06:00Z 106.3 26.3
2022-01 -01T00:07:00Z 119.6 26.5
2022-01-01-01T00:08:00Z 134.7 25.4
2022-01-01-01T00:10:00Z 137.5 25.5 25.5
2022-01-01-01-1T00:11:11:11:00Z 129.9 27.9 27.9 27.4
20222-01-01-01-01 00Z00: 118.1 26.3
2022-01-01T00:13:00Z 105.4 25.7
2022-01-01-01T00:14:00z 101.7 24.4
2022-01-01-01-01t00:15:00z 100.8 25.8 25.6

Fiddle

Here is a solution without any assumptions on your data.
Additional information, such has the granularity of the timestamps, could help to simplify it.

  1. Break the rows to groups, treating records with out-of-range (OOR) speed as boundaries. Also add a synthetic record that could later be used as a minimal boundary.
  2. For each group find its boundaries, which are aligned to the sliding interval (1m). The main challenge here is to find the lower boundry for each group, since for the upper boundry we use bin which is a synonym to floor, but we don't have an equivalent for ceiling.
  3. Remove the groups that are smaller than the defined window (5m).
  4. Join the groups boundaries with the groups' records and drop the OOR records.

let p_sliding_interval = 1m;
let p_window = 5m;
let t = 
datatable (Timestamp:datetime ,Speed:real ,Temperature:real)
[
     '2022-01-01 00:00:00.0000000' ,142.5 ,25.1
    ,'2022-01-01 00:01:00.0000000' ,147.4 ,25.5
    ,'2022-01-01 00:02:00.0000000' ,158.2 ,25.4
    ,'2022-01-01 00:03:00.0000000' ,134.8 ,25.6
    ,'2022-01-01 00:04:00.0000000' ,125.3 ,25.5
    ,'2022-01-01 00:05:00.0000000' ,118.4 ,25.4
    ,'2022-01-01 00:06:00.0000000' ,106.3 ,26.3
    ,'2022-01-01 00:07:00.0000000' ,119.6 ,26.5
    ,'2022-01-01 00:08:00.0000000' ,134.7 ,25.4
    ,'2022-01-01 00:09:00.0000000' ,153.2 ,26.6
    ,'2022-01-01 00:10:00.0000000' ,137.5 ,25.5
    ,'2022-01-01 00:11:00.0000000' ,129.9 ,27.4
    ,'2022-01-01 00:12:00.0000000' ,118.1 ,26.3
    ,'2022-01-01 00:13:00.0000000' ,105.4 ,25.7
    ,'2022-01-01 00:14:00.0000000' ,101.7 ,24.4
    ,'2022-01-01 00:15:00.0000000' ,100.8 ,25.6
    ,'2022-01-01 00:16:00.0000000' ,95.4  ,26.2
    ,'2022-01-01 00:17:00.0000000' ,105.6 ,26.7
];
let min_Timestamp = toscalar(t | summarize min(Timestamp));
let max_Timestamp = toscalar(t | summarize max(Timestamp));
let row_level = 
t
| extend out_of_range_record_flag = iff(Speed !between (100 .. 150),1,0)
| union (print Timestamp = datetime(null) , out_of_range_record_flag = 1)
| order by Timestamp asc nulls first 
| extend in_range_group_id = row_cumsum(out_of_range_record_flag);
let group_boundries =
row_level 
| where out_of_range_record_flag == 1
| project in_range_group_id, from_timestamp = coalesce(Timestamp, min_Timestamp)
| order by in_range_group_id asc
| extend to_timestamp = coalesce(next(from_timestamp), max_Timestamp)
| extend bin_from_timestamp = bin(from_timestamp, p_sliding_interval)
| extend ceil_from_timestamp = bin_from_timestamp + iff(bin_from_timestamp == from_timestamp, 0ms, p_sliding_interval)
| extend in_range_window = bin(to_timestamp, p_sliding_interval) - ceil_from_timestamp
| where in_range_window >= p_window;
group_boundries
| join kind=inner row_level on in_range_group_id
| where out_of_range_record_flag == 0
| project Timestamp, Speed, Temperature
Timestamp Speed Temperature
2022-01-01T00:03:00Z 134.8 25.6
2022-01-01T00:04:00Z 125.3 25.5
2022-01-01T00:05:00Z 118.4 25.4
2022-01-01T00:06:00Z 106.3 26.3
2022-01-01T00:07:00Z 119.6 26.5
2022-01-01T00:08:00Z 134.7 25.4
2022-01-01T00:10:00Z 137.5 25.5
2022-01-01T00:11:00Z 129.9 27.4
2022-01-01T00:12:00Z 118.1 26.3
2022-01-01T00:13:00Z 105.4 25.7
2022-01-01T00:14:00Z 101.7 24.4
2022-01-01T00:15:00Z 100.8 25.6

Fiddle

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文