SQL中基于地点和时间的记录聚类

发布于 2024-10-21 23:38:25 字数 1030 浏览 4 评论 0原文

我想知道你们中是否有人编写过基于重叠时间间隔和位置的记录聚类查询。 我的应用程序中的数据表示为一个人从开始时间到结束时间在任何给定位置的单个事件。位置定义为纬度和经度。在一天中,一个人将有多个不同的地点以及开始和结束时间。我需要找到同时在同一地点的一群人。一个人很可能在一天内参加多个小组。

Example:
Person A can be with Person B at the office from 10 AM to 11 AM.
Then Person A leaves the office for gym.
There he is with Person C from 12 noon to 1PM.
At 12:30 Person C leaves gym for the office.
At 1:30PM I have Person B and C at the office.
Persons B and C leave the office at 5PM.

在此示例中,

  • 集群 1(人员 A 和 B 在办公室)从上午 10 点到上午 11 点,
  • 集群 2(人员 A 和 C 在健身房)从中午 12 点到下午 1 点,
  • 集群 3(人员 B 和 C 在办公室)下午 1:30 至下午 5 点。

每个人的位置不会与另一个人的位置完全匹配。我使用 SQL 地理点类型和邻近阈值的 STBuffer 并检查 STIntersects。我还将表本身加入进来以检查时间重叠。但是,当 A 聚集在自己身上而没有其他人加入他时,我遇到了一些奇怪的行为。

我想知道是否有一种设计模式可以处理这样的情况。理想情况下,我会将记录集按“重叠时间段”和“任意几何形状的质心”分组,但无法弄清楚如何获取重叠时间段和任意几何形状。

欢迎任何想法并高度赞赏。

PS 编写 Windows 应用程序不是一种选择,除非它是唯一的方法。

编辑:没有提到聚类的位置永远不会提前知道。我的两个或多个客户可能聚集在无数个位置。我不知道办公室、健身房、公园或公交车站是否会出现聚集现象。聚类位置(我认为)将是由所有聚集的人的纬度和经度表示的多边形的质心。

I would like to know if any of you guys have written a query for record clustering based on overlapping time intervals AND locations.
Data in my application is represented as individual events of a person being at any given location from start time to end time. Location is defined as latitude and longitude. During a day one person will have multiple different locations and start and end times. I need to get groups of persons who were at the same location and at the same time. One person will most likely be in several groups during a day.

Example:
Person A can be with Person B at the office from 10 AM to 11 AM.
Then Person A leaves the office for gym.
There he is with Person C from 12 noon to 1PM.
At 12:30 Person C leaves gym for the office.
At 1:30PM I have Person B and C at the office.
Persons B and C leave the office at 5PM.

In this example I have

  • Cluster 1 (Person A and B at the office) from 10AM to 11AM,
  • Cluster 2 (Person A and C at the gym) from 12 noon to 1PM, and
  • Cluster 3 (Person B and C at the office) from 1:30PM to 5PM.

The Location of each individual person will not match exactly to another person's location. I'm using SQL geography point type with the STBuffer of proximity threshold and check for STIntersects. I'm also joining the table on itself to check time overlaps. But i'm experiencing some weird behaviors when Person A gets clustered on itself without other person ever joining him.

I'm wondering if there's a design pattern for handling situations like this. Ideally i would have the recordset grouped on "Overlapping Time Period" and "Centroid of an arbitrary geometry" but can't figure out how to get the overlapping time period and the arbitrary geometry.

Any ideas are welcome and highly appreciated.

P.S. writing a windows application is not an option unless it's the only way.

EDIT: Failed to mention that locations of clustering is never known in advance. There can be indefinite number of locations where two or more of my customers may cluster. I don't know if clustering will happen in the office, gym, some park or at a bus station. Clustering location (i think ) will be the Centroid of a polygon represented by all congregated people's Latitudes and Longitudes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小耗子 2024-10-28 23:38:25

代码会是这样的吗

select a.person,a.eventtime,a.eventplace,
b.person,b.eventtime,b.eventplace
from people a
join people b on a.eventtime between dateadd(hh,-2,b.eventime) and dateadd(hh,2,b.eventime)
and yourdistancefunction(a.eventplace ,b.eventplace) < 5 -- don't know what you are measuring
and a.person<>b.person

Would the code be something like

select a.person,a.eventtime,a.eventplace,
b.person,b.eventtime,b.eventplace
from people a
join people b on a.eventtime between dateadd(hh,-2,b.eventime) and dateadd(hh,2,b.eventime)
and yourdistancefunction(a.eventplace ,b.eventplace) < 5 -- don't know what you are measuring
and a.person<>b.person
小鸟爱天空丶 2024-10-28 23:38:25

我通过首先获取给定时间段的整个数据集来解决这个难题。循环遍历记录集并为所有重叠位置生成 STUnion 形状。然后将生成的临时表连接到初始数据上,并仅获取与 STUnion 形状 AND 及时相交的记录。
使用了三个临时表,但是嘿,谁在乎它是否能完成工作:)

I solved the puzzle by first getting the entire dataset for the given time period. Looping through the recordset and generating STUnion shapes for all overlapping locations. Then joining the generated temporary table on the initial datased and getting only the records that intersected with STUnion shapes AND with each other in time.
Used three temp tables but hey, who cares if it does the job :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文