MySQL：自连接生成日期对

发布于 2024-12-03 13:20:16 字数 611 浏览 0 评论 0原文

我有一个表，其中包含“丢失”和“找到”的项目条目。每行都有一个事件的日期。我希望通过将表连接到自身来构建一个具有匹配的“itemid”、“丢失日期”、“找到日期”对的查询。

这在一定程度上是有效的：不幸的是，如果给定项目有多个失物招领对，则每个“丢失日期”将与其后面的所有“找到日期”连接起来。

还在我身边吗？

查询类似于：

select c0.ItemId, c0.ChangeDate, c1.ChangeDate from Changes c0
join Changes c1 on 
   c0.ItemId = c1.ItemId and c1.ChangeDate >= c0.ChangeDate
where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found);

我希望实现的是某种形式的给定“丢失日期”仅与序列中的下一个“找到日期”配对（如果不存在“找到日期”，则为 NULL）。我（非常）确定这是可能的，但我没有看到路径。

我想知道如何在第一个连接中放置一个子选择并使用 LIMIT 1 来仅获取一条记录，但我不知道如何将其连接到选择主要部分中的适当行。 MySQL 告诉我它不存在。很公平。

原文

I have a table with entries for Items as being 'lost' and 'found'. Each row has a date for the event. Im hoping to build a query with matching pairs of 'itemid', 'lost date', 'found date' by joining the table to itself.

This works to a point: unfortunately if there are multiple lost and found pairs for a given item each 'lost date' will be joined with all the 'found dates' that follow it.

Still with me?

The query goes something like:

select c0.ItemId, c0.ChangeDate, c1.ChangeDate from Changes c0
join Changes c1 on 
   c0.ItemId = c1.ItemId and c1.ChangeDate >= c0.ChangeDate
where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found);

What Im hoping to achieve is some form of a given 'lost date' paired with only the next 'found date' in sequence (or NULL if no 'found date' exists). Im (pretty) sure this is possible but Im not seeing the path.

I was wondering about putting a sub-select in the first join and using a LIMIT 1 to get only one record but I don't see how to join this to the appropriate row in the main part of the select. MySQL tells me it doesn't exist. Fair enough.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半葬歌 2024-12-10 13:20:16

这里的技巧是规定“并且在丢失日期和找到日期之间没有其他丢失或找到日期”，或者，在 SQL 中：

SELECT c0.ItemId, c0.ChangeDate, c1.ChangeDate
  FROM Changes AS c0
  JOIN Changes AS c1 ON c0.ItemId = c1.ItemId AND c1.ChangeDate >= c0.ChangeDate
 WHERE c0.ChangeType = 9   -- Lost
   AND c1.ChangeType = 10  -- Found
   AND NOT EXISTS(SELECT *
                    FROM Changes AS c2
                   WHERE c2.ItemId = c1.ItemID
                     AND c2.ChangeType IN (9, 10) -- Lost or Found
                     AND c2.ChangeDate BETWEEN c0.ChangeDate AND c1.ChangeDate
                     AND (c2.ChangeDate != c0.ChangeDate AND c2.ChangeDate != c1.ChangeDate)
                 );

因为这是一个相关子查询，它往往会减慢查询速度，但它应该产生正确的行。

关于我通过规定 c2 中的行的 ChangeDate 应不同于丢失日期或找到日期来消除 c0 和 c1 行的方式，有一个重要的警告。但是，主查询似乎允许在丢失的同一天找到物品。可能还有一些其他列 - 例如 ChangeId 列 - 查询中尚未提及，但可以使用它来代替：

AND c2.ChangeID NOT IN (c0.ChangeID, c1.ChangeID)

您需要考虑如果某个项目在 2011-06-07 丢失，会发生什么情况，并于2011-06-14再次丢失，直到2011-06-21才找到。如果2011-06-28也发现了怎么办？此类问题应该通过数据输入处理来防止，因此上面的查询假设不会出现此类问题。

The trick here is to stipulate 'and there is no other lost or found date between the lost and found dates', or, in SQL:

SELECT c0.ItemId, c0.ChangeDate, c1.ChangeDate
  FROM Changes AS c0
  JOIN Changes AS c1 ON c0.ItemId = c1.ItemId AND c1.ChangeDate >= c0.ChangeDate
 WHERE c0.ChangeType = 9   -- Lost
   AND c1.ChangeType = 10  -- Found
   AND NOT EXISTS(SELECT *
                    FROM Changes AS c2
                   WHERE c2.ItemId = c1.ItemID
                     AND c2.ChangeType IN (9, 10) -- Lost or Found
                     AND c2.ChangeDate BETWEEN c0.ChangeDate AND c1.ChangeDate
                     AND (c2.ChangeDate != c0.ChangeDate AND c2.ChangeDate != c1.ChangeDate)
                 );

Because that is a correlated sub-query, it tends to slow down the query, but it should produce the correct rows.

There is an important caveat about the way I've eliminated the c0 and c1 rows by stipulating that the ChangeDate for the row in c2 should be different from either the lost date or the found date. However, the main query seems to allow for an item to be found on the same day that it is lost. There might be some other column - such as a ChangeId column - that is not mentioned in the query yet that could be used instead:

AND c2.ChangeID NOT IN (c0.ChangeID, c1.ChangeID)

You'll need to think about what happens if an item is lost on, say, 2011-06-07, and lost again on 2011-06-14, and only found on 2011-06-21. And what about if it is also found on 2011-06-28? Such problems should be prevented by the data entry processing, so the query above assumes there won't be such issues.

回复收藏 0 原文

俯瞰星空 2024-12-10 13:20:16

通常，在处理日期对（例如计划的开始/结束）时，建议不要将它们放在单独的行上。将它们放在同一行的两列中。请参阅 Joe Celko 的 SQL 编程风格。

但话虽如此，您可以通过搜索执行另一个自连接来搜索两者之间的 ChangeDate，从而使用当前模式解决该问题。如果没有找到（即，如果 c2.* 由于外连接而为空），则 c0 和 c1 是“相邻的”。

select c0.ItemId, c0.ChangeDate, c1.ChangeDate 
from Changes c0
inner join Changes c1 on 
   c0.ItemId = c1.ItemId and c1.ChangeDate > c0.ChangeDate
left outer join Changes c2 on
   c0.ItemId = c2.ItemId and c2.ChangeDate > c0.ChangeDate 
                         and c2.ChangeDate < c1.ChangeDate 
                         and c2.ChangeType IN (9,10) -- edit
where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found)
   and c2.ItemId IS NULL;

在上面的示例中，我假设 ChangeDate 是唯一的，并将 >= 更改为 >。如果 ChangeDate 不是唯一的，则您必须想出一些其他表达式来测试 c0 和 c1“之间”的 c2。

Generally when dealing with pairs of dates (e.g. start/end for scheduling) the advice is don't put them on separate rows. Put them in two columns of the same row. See Joe Celko's SQL Programming Style.

But that said, you can solve it with your current schema by searching doing another self-join to search for a ChangeDate between the two. If none is found (that is, if c2.* is null because of the outer join), then c0 and c1 are "adjacent."

select c0.ItemId, c0.ChangeDate, c1.ChangeDate 
from Changes c0
inner join Changes c1 on 
   c0.ItemId = c1.ItemId and c1.ChangeDate > c0.ChangeDate
left outer join Changes c2 on
   c0.ItemId = c2.ItemId and c2.ChangeDate > c0.ChangeDate 
                         and c2.ChangeDate < c1.ChangeDate 
                         and c2.ChangeType IN (9,10) -- edit
where c0.ChangeType = 9 (lost) and c1.ChangeType = 10 (found)
   and c2.ItemId IS NULL;

In the above example, I've assumed that ChangeDate is unique, and I changed the >= to >. If ChangeDate is not unique, you'll have to come up with some other expression to test for c2 "between" c0 and c1.

回复收藏 0 原文

~没有更多了~