如何在Sql Server中查找相邻记录中具有相同值的记录？（我相信正确的术语是一个区域？？）

发布于 2024-11-19 06:51:16 字数 582 浏览 2 评论 0原文

查找具有相同值的相邻记录的开始时间和结束时间？

我有一个包含心率读数（以每分钟心跳次数为单位）和日期时间字段的表。（实际上这些字段是 heartrate_id、heartrate 和 datetime。）数据由每 6 次记录一次心率和时间的设备生成。秒。有时，心率监测器会给出错误的读数，并且记录的每分钟心跳数会“停留”一段时间。通过棍棒，我的意思是相邻时间的每分钟节拍值将相同。

基本上我需要找到心率相同的所有记录（例如每分钟 5 次心跳、每分钟 100 次心跳等），但仅限于相邻记录。如果设备在 3 个连续读数（或 100 个连续读数）中记录每分钟 25 次心跳，我需要找到这些事件。结果需要有心率、心率开始时间和心率结束时间，理想情况下结果看起来更像是这样：

heartrate starttime endtime
--------- --------- --------
1.00      21:12:00  21:12:24
35.00     07:00:12  07:00:36

我已经尝试了几种不同的方法，但到目前为止我还没有成功。任何帮助将不胜感激！

原文

Finding the start and end time for adjacent records that have the same value?

I have a table that contains heart rate readings (in beats per minute) and datetime field. (Actually the fields are heartrate_id, heartrate, and datetime.) The data are generated by a device that records the heart rate and time every 6 seconds. Sometimes the heart rate monitor will give false readings and the recorded beats per minute will "stick" for an period of time. By sticks, I mean the beats per minute value will be identical in adjacent times.

Basically I need to find all the records where the heart rate is the same (e.g. 5 beats per minute, 100 beats per minute, etc.) in but only on adjacent records. If the device records 25 beats per minute for 3 consecutive reading (or 100 consecutive readings) I need to locate these events. The results need to have the heartrate, time the heartrate started, and the time the heart rate ended and ideally the results would look more of less like this:

heartrate starttime endtime
--------- --------- --------
1.00      21:12:00  21:12:24
35.00     07:00:12  07:00:36

I've tried several different approaches but so far I'm striking out. Any help would be greatly appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃扇骨 2024-11-26 06:51:16

编辑：

经审查，我对这个答案的原始工作都不是很好。这实际上属于差距和岛屿，这个修改后的答案将使用我从第一次回答这个问题以来从类似问题中收集/学到的信息。

事实证明，这个查询可以比我最初想象的更简单地完成：

WITH Grouped_Run AS (SELECT heartRate, dateTime,
                            ROW_NUMBER() OVER(ORDER BY dateTime) -  
                            ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime) AS groupingId
                     FROM HeartRate)

SELECT heartRate, MIN(dateTime), MAX(dateTime)
FROM Grouped_Run
GROUP BY heartRate, groupingId
HAVING COUNT(*) > 2

SQL Fiddle 演示

那么这里发生了什么？间隙和岛屿问题的定义之一是需要连续值的“组”（或缺乏连续值）。通常会生成序列来解决这个问题，利用一个经常被忽视/过于直观的事实：减去序列会产生一个常数值。

例如，想象以下序列和减法（行中的值不重要）：

position   positionInGroup  subtraction
=========================================
1          1                0
2          2                0
3          3                0
4          1                3
5          2                3
6          1                5
7          4                3
8          5                3

position 是在所有记录上生成的简单序列。
positionInGroup 是为每组不同记录生成的简单序列。在本例中，实际上有 3 组不同的记录（从 position = 1, 4, 6 开始）。
减法是其他两列之间的差异的结果。请注意，不同组的值可能会重复！
序列必须共享的关键属性之一是它们必须以相同的顺序在数据行上生成，否则就会中断。

那么 SQL 是如何做到这一点的呢？通过使用ROW_NUMBER()，该函数将在记录“窗口”上生成数字序列：

ROW_NUMBER() OVER(ORDER BY dateTime)

将生成位置序列。

ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime)

将生成 positionInGroup 序列，每个 heartRate 是一个不同的组。
在大多数此类查询的情况下，两个序列的值并不重要，重要的是减法（以获得序列组），因此我们只需要减法的结果。
我们还需要 heartRate 以及它们发生的时间来提供答案。

最初的答案询问每次心跳“运行”的开始和结束时间。这是一个标准的 MIN(...)/MAX(...)，这意味着 GROUP BY。我们需要使用两者原始heartRate列（因为这是一个非聚合列）和我们生成的groupingId （它标识每个卡住值的当前“运行”）。

部分问题仅针对重复三次或以上的运行。 HAVING COUNT(*) > 2 是忽略长度为 2 或更小的游程的指令；它计算每组的行数。

EDIT:

Upon review, none of my original work on this answer was very good. This actually belongs to the class of problems known as gaps-and-islands, and this revised answer will use information I've gleaned from similar questions/learned since first answering this question.

It turns out this query can be done a lot more simply than I originally thought:

WITH Grouped_Run AS (SELECT heartRate, dateTime,
                            ROW_NUMBER() OVER(ORDER BY dateTime) -  
                            ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime) AS groupingId
                     FROM HeartRate)

SELECT heartRate, MIN(dateTime), MAX(dateTime)
FROM Grouped_Run
GROUP BY heartRate, groupingId
HAVING COUNT(*) > 2

SQL Fiddle Demo

So what's happening here? One of the definitions of gaps-and-islands problems is the need for "groups" of consecutive values (or lack thereof). Often sequences are generated to solve this, exploiting an often overlooked/too-intuitive fact: subtracting sequences yields a constant value.

For example, imagine the following sequences, and the subtraction (the values in the rows are unimportant):

position   positionInGroup  subtraction
=========================================
1          1                0
2          2                0
3          3                0
4          1                3
5          2                3
6          1                5
7          4                3
8          5                3

position is a simple sequence generated over all records.
positionInGroup is a simple sequence generated for each set of different records. In this case, there's actually 3 different sets of records (starting at position = 1, 4, 6).
subtraction is the result of the difference between the other two columns. Note that values may repeat for different groups!
One of the key properties the sequences must share is they must be generated over the rows of data in the same order, or this breaks.

So how is SQL doing this? Through the use of ROW_NUMBER() this function will generate a sequence of numbers over a "window" of records:

ROW_NUMBER() OVER(ORDER BY dateTime)

will generate the position sequence.

ROW_NUMBER() OVER(PARTITION BY heartRate ORDER BY dateTime)

will generate the positionInGroup sequence, with each heartRate being a different group.
In the case of most queries of this type, the values of the two sequences is unimportant, it's the subtraction (to get the sequence group) that matters, so we just need the result of the subtraction.
We'll also need the heartRate and the times in which they occurred to provide the answer.

The original answer asked for the start and end times of each of the "runs" of stuck heartbeats. That's a standard MIN(...)/MAX(...), which means a GROUP BY. We need to use both the original heartRate column (because that's a non-aggregate column) and our generated groupingId (which identifies the current "run" per stuck value).

Part of the question asked for only runs that repeated three or more times. The HAVING COUNT(*) > 2 is an instruction to ignore runs of length 2 or less; it counts rows per-group.

回复收藏 0 原文