单一维度内不重叠范围的数据结构

发布于 2024-07-06 15:42:05 字数 715 浏览 8 评论 0原文

我需要一个可以在单个维度内存储不重叠范围的数据结构。 不需要完全覆盖维度的整个范围。

一个例子是会议室调度程序。 维度是时间。 两个时间表不得重叠。 会议室并不总是预定的。 换句话说,对于给定时间最多可以有一个时间表。

一个快速的解决方案是使用一个范围来存储开始和结束时间。

Range {
    Date start
    Date end
}

这是非标准化的,需要容器强制不重叠。 对于两个相邻的范围,前一个的结束将与下一个的开始冗余。

另一种方案可能涉及为每个范围存储一个边界值。 但对于一系列连续的范围,总是会比范围多一个边界值。 为了解决这个问题,序列可以表示为交替的边界值和范围:

B = 边界值,r = 范围

BrBrB

数据结构可能如下所示:

Boundary {
    Date value
    Range prev
    Range next
}

Range {
    Boundary start
    Boundary end
}

本质上,它是一个具有交替类型的双向链表。

最终,我使用的任何数据结构都将在内存(应用程序代码)和关系数据库中表示。

我很好奇学术界或业界尝试过哪些解决方案。

I need a data structure that can store non-overlapping ranges within a single dimension. The entire range of the dimension need not be completely covered.

An example would be a conference room scheduler. The dimension is time. No two schedules may overlap. The conference room isn't always scheduled. In other words, for a given time there can be at most one schedule.

A quick solution is for a range to store the start and end times.

Range {
    Date start
    Date end
}

This is non-normalized and requires the container to enforce no overlapping. For two adjacent ranges, the previous' end will be redundant with the next's start.

Another scheme might involve storing one boundary value with each range. But for a contiguous sequence of ranges, there will always be one more boundary values than ranges. To get around this the sequence could be represented as alternating boundary values and ranges:

B = boundary value, r = range

B-r-B-r-B

The data structure might look like:

Boundary {
    Date value
    Range prev
    Range next
}

Range {
    Boundary start
    Boundary end
}

In essence it's a doubly linked list with alternating types.

Ultimately, whatever data structure I use will be represented in both memory (application code) and a relational database.

I'm curious what academic or industry tried solutions exists.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

简单气质女生网名 2024-07-13 15:42:05

表示数据的标准化方式是存储每个时间单位的记录。 这可以在会议安排应用程序的示例中完成。 您的约束将是一个唯一的约束。

(RoomId, StartTime)

在连续范围的情况下,您必须存储 2 个东西,一个边界和第二个边界或长度。 通常是通过存储第二个边界,然后在该类型的两个边界上创建一个约束来完成的,

(boundary not between colBoudaryA and colBoundaryB)

附加约束为

(startBoundary < endBoundary)

The normalized way to represent your data would be to store a record for each unit of time. This can be done in the example of the conference scheduling application. Your constraint would be a unique constraint for

(RoomId, StartTime)

In the case of continuous ranges, you necessarily need to store 2 things, one boundary and either the second boundary or the length. It is usually done by storing the second boundary and then creating a constraint on both boundary of the kind

(boundary not between colBoudaryA and colBoundaryB)

with the additional constraint that

(startBoundary < endBoundary)
时间海 2024-07-13 15:42:05

双向链表效果很好,因为您只使用与填充范围一样多的内存,并且您只需要检查插入的重叠 - 此时这样做几乎是微不足道的。 如果存在重叠,则新项目将被拒绝。

RoomID
ReservationID
PreviousReservationID
NextReservationID
StartTimeDate
EndTimeDate
Priority
UserID

优先级和用户 ID 允许日程表具有优先级(教授可能比学生团体拥有更大的影响力),以便新项目可以在插入过程中“击倒”优先级较低的项目,并且用户 ID 允许电子邮件发送给被撞的会议组织者。

您需要考虑添加一个指向每天第一次会议的表,以便优化搜索。

-亚当

A doubly linked list works well because you only use as much memory as you have filled ranges, and you need only check overlapping on insertions - it's almost trivial to do so at that point. If there's overlap the new item is rejected.

RoomID
ReservationID
PreviousReservationID
NextReservationID
StartTimeDate
EndTimeDate
Priority
UserID

The priority and UserID allow for schedules to have priorities (professor might have more clout than a student group) so that a new item can 'knock' the lower priority items out of the way during an insertion, and the UserID allows an email to be sent to the bumped meeting organizers.

You'll want to consider adding a table that points to the first meeting of each day so that searches can be optimized.

-Adam

思慕 2024-07-13 15:42:05
  1. 对于不重叠的间隔,您可以只用起始点对间隔进行排序。 当您向此结构添加新间隔时,您只需检查起点和终点是否不属于此间隔集。 要检查某个点 X 是否属于区间集,您可以使用二分搜索来查找最近的起点并检查 X 是否属于其区间。
    这种方法对于修改操作来说并不是最佳选择。

  2. 您可以查看间隔树结构 - 对于非重叠间隔,它具有最佳查询并修改操作。

  1. For non-overlapping intervals you could just sort you intervals with starting point. When you add a new interval to this structure, you could just check that start and end points do not belong to this interval set. To check whether some point X belong interval set you could use binary search to find the nearest start point and check that X belongs it's interval.
    This approach is not so optimal for modify operations.

  2. You could look at Interval tree structure - for non-overlapping intervals it has optimal query and modify operations.

待天淡蓝洁白时 2024-07-13 15:42:05

如果您足够幸运(!)使用 Postgres,则可以使用 tstzrange 列,并应用约束来防止重叠。 使用范围类型的好处是它本质上可以防止开始大于结束。

ALTER TABLE "booking" 
ADD CONSTRAINT "overlapping_bookings" 
EXCLUDE USING gist ("period" WITH &&, "room" WITH =);

您可能需要CREATE EXTENSION IF NOT EXISTS btree_gist,就像使用 && 创建要点一样。 如果没有该扩展名,则不支持。

If you are lucky (!) enough to be using Postgres, you can use a tstzrange column, and apply a constraint to prevent overlaps. The bonus of using a range type is that it will inherently prevent start being greater than finish.

ALTER TABLE "booking" 
ADD CONSTRAINT "overlapping_bookings" 
EXCLUDE USING gist ("period" WITH &&, "room" WITH =);

You may need to CREATE EXTENSION IF NOT EXISTS btree_gist, as creating a gist using && is not supported without that extension.

宁愿没拥抱 2024-07-13 15:42:05

很大程度上取决于您将如何处理数据,以及哪些操作需要高效。 但是,我会考虑在 Start 和 End 的 setter 中使用逻辑的双向链表来检查它现在是否与其邻居重叠,如果是的话则缩小它们(或者抛出异常,或者您想要处理尝试的情况)重叠)。

这提供了一个很好的简单的预订周期链接列表以供阅读,但没有容器负责维护无重叠规则。

A lot depends on what you'll be doing with the data, and therefore which operations need to be efficient. However, I'd consider a doubly linked list of Ranges with logic in the setters of Start and End to check whether it now overlaps its neighbours, and to shrink them if so (or throw an exception, or however you want to handle an attempted overlap).

That gives a nice simple linked list of booked periods to read, but no container responsible for maintaining the no-overlap rule.

っ〆星空下的拥抱 2024-07-13 15:42:05

这在约束编程世界中称为“一元资源”约束。 这方面有很多研究,特别是针对事件时间不固定的情况,并且您需要为每个事件找到时间段。
有一个商业 C++ 软件包可以解决您的问题以及更多Ilog CP,但很可能矫枉过正。 还有一个名为 eclipse 的开源版本(与 IDE 无关)。

This is called the "Unary Resource" constraint in the Constraint Programming world. There is a lot of research in this area, specifically for the case when the event times aren't fixed, and you need to find time-slots for each of them.
There is a commercial C++ package that does your problem and more Ilog CP, but it is likely overkill. There is also a somewhat open-source version called eclipse (no relation to the IDE).

梦里梦着梦中梦 2024-07-13 15:42:05

这很重要,因为(在数据库世界中)您必须比较多行以确定不重叠的范围。 显然,当信息在内存中时,其他表示形式(例如按时间顺序排列的列表)也是可能的。 不过,我认为即使在列表中,您最好使用“开始+结束”符号。

有关于该主题的整本书 - “时态数据库”处理的一部分。 你可以看看 Darwen、Date 和 Lorentzos “时态数据和关系模型”和(完全不同的极端)“在 SQL 中开发面向时间的数据库应用程序”,Richard T. Snodgrass,Morgan Kaufmann Publishers, Inc.,旧金山,1999 年 7 月,504+xxiii 页,ISBN 1-55860-436-7。 该文件已绝版,但可在其网站 cs.arizona.edu 上获取 PDF 版本 (因此 Google 搜索很容易找到)。

我相信,相关的数据结构之一是 R-Tree。 这通常用于二维结构,但对于一维结构也有效。

您还可以查找“艾伦关系”了解间隔 - 它们可能对您有帮助。

This is non-trivial because (in the database world) you have to compare multiple rows to determine non-overlapping ranges. Clearly, when the information is in memory, then other representations such as lists in time order are possible. I think, though, that you'd be best off with your 'start + end' notation, even in a list.

There are whole books on the subject - part of 'Temporal Database' handling. Two you could look at are Darwen, Date and Lorentzos "Temporal Data and the Relational Model" and (at a radically different extreme) "Developing Time-Oriented Database Applications in SQL", Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, 504+xxiii pages, ISBN 1-55860-436-7. That is out of print but available as PDF on his web site at cs.arizona.edu (so a Google search makes it pretty easy to find).

One of the relevant data structures is, I believe, an R-Tree. That is often used for 2-dimensional structures, but can also be effective for 1-dimensional structures.

You can also look for "Allen's Relations" for intervals - they may be helpful to you.

始终不够 2024-07-13 15:42:05

我已成功存储开始时间和持续时间。 重叠的测试就像

WHERE NOT EXISTS (
   SELECT 1 FROM table
   WHERE BeginTime < NewBeginTime AND BeginTime + Duration > NewBeginTime
)
AND NOT EXISTS (
   SELECT 1 FROM table
   WHERE NewBeginTime < BeginTime AND NewBeginTime + NewDuration > BeginTime
)

我想的那样,无需测试,但希望你能明白

I've had success storing a beginning time and duration. The test for overlap would be something like

WHERE NOT EXISTS (
   SELECT 1 FROM table
   WHERE BeginTime < NewBeginTime AND BeginTime + Duration > NewBeginTime
)
AND NOT EXISTS (
   SELECT 1 FROM table
   WHERE NewBeginTime < BeginTime AND NewBeginTime + NewDuration > BeginTime
)

I think without testing, but hopefully you get the drift

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文