在PostgreSQL中过滤子集时间段

发布于 2025-02-07 05:30:25 字数 2987 浏览 3 评论 0 原文

一个人如何查询表_1之类的表(如下所示示例),以便只有最大时间段没有其他行作为时间段的子集。例如,表_1应该给出表_2中的内容。我试图使用存在各种条件使用的地方,但似乎无法提取所需的行。

我尝试了以下查询:

    SELECT * FROM table_1 x WHERE NOT EXISTS (SELECT * FROM table_2 y WHERE (y.start < x.start and y.end > x.end) or (y.start < x.start and y.end <= x.end) or (y.start >= x.start and y.end < x.end) and y.symbol = x.symbol);

任何帮助将不胜感激。使用PostgreSQL。

编辑: 我减少了表_1的大小以减少长度。到最长时间,我的意思是所有不相交的时期,因此它们之间没有任何时期。即,时间段是最大的 - &nbsp;那就是第r行最大,r.start&lt; = r'.start&lt; = r'.end&lt&lt; = r.end r.end r',其中r'是表中的其他任何行,相同的符号值。

因此,AJR返回两行的原因是因为没有在2021-01-23和2021-05-08之间开始或结束的行。但是例如,表_1的前4行不会在第5行的时间段内返回。

table_2:

符号 启动
AJR 2021-01-02 2021-01-01-23
AJR 2021-05 -08 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-11-20

符号 开始 末端
table_1 -09
AJR 2021-01-02 2021-01-16
AJR 2021-01-02 2021-01-01-16
AJR 2021-01-09 2021-01-23
AJR 2021-01-01-09 2021-09
AJR 2021-09 2021-01-23 2021-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01-01 -16 2021-01-23
AJR 2021-05-08 2021-05-15
AJR 2021-05-08 2021-05-22
AJR 2021-05-08 2021-05-08
AJR 2021-05-29 05
AJR 2021-05-15 2021-05-22
AJR 2021-05-15 2021-05-29
AJR 2021-05-15 2021-06-06-05
AJR 2021-05-22 2021-22
2021-2021-29 ​22 2021-06-05
AJR 2021-05-29 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-10-0-30
CCC 2021-10-23 2021-11-11-11-11-13
CCC 2021-10-23 2021-11-20
CCC 2021-10-30 2021-11-13
CCC 2021-10-30 2021-11-11-20
CCC 2021-11-11-13 2021-11-20

How would one query a table like table_1 (example shown below) such that only the maximum time period with no other rows being a subset of the time period. For example Table_1 should give whats in table_2. I have tried to use various where conditions using Exists but I can not seem to extract the desired rows.

I have tried queries like the following:

    SELECT * FROM table_1 x WHERE NOT EXISTS (SELECT * FROM table_2 y WHERE (y.start < x.start and y.end > x.end) or (y.start < x.start and y.end <= x.end) or (y.start >= x.start and y.end < x.end) and y.symbol = x.symbol);

Any help would be appreciated. Using postgresql.

Edit:
I have reduced the size of table_1 to reduce the length. By maximum time period I mean all disjoint periods such that there is no period in between them in other rows. i.e. that the time period is maximal – that is for row R to be maximal, R.start<= R'.start<=R'.end<=R.end where R' is any other row in the table with the same symbol value.

So the reason why two rows are return for AJR is because there is no rows that start or end between 2021-01-23 and 2021-05-08. But for example the first 4 rows of table_1 are not returned as they fall within the time period of row 5.

Table_2:

symbol start end
AJR 2021-01-02 2021-01-23
AJR 2021-05-08 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-11-20

Table_1:

symbol start end
AJR 2021-01-02 2021-01-09
AJR 2021-01-02 2021-01-16
AJR 2021-01-02 2021-01-23
AJR 2021-01-09 2021-01-16
AJR 2021-01-09 2021-01-23
AJR 2021-01-16 2021-01-23
AJR 2021-05-08 2021-05-15
AJR 2021-05-08 2021-05-22
AJR 2021-05-08 2021-05-29
AJR 2021-05-08 2021-06-05
AJR 2021-05-15 2021-05-22
AJR 2021-05-15 2021-05-29
AJR 2021-05-15 2021-06-05
AJR 2021-05-22 2021-05-29
AJR 2021-05-22 2021-06-05
AJR 2021-05-29 2021-06-05
BBB 2021-07-17 2021-07-24
CCC 2021-10-23 2021-10-30
CCC 2021-10-23 2021-11-13
CCC 2021-10-23 2021-11-20
CCC 2021-10-30 2021-11-13
CCC 2021-10-30 2021-11-20
CCC 2021-11-13 2021-11-20

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

划一舟意中人 2025-02-14 05:30:25

您可以应用一个组合所有启动的自连接到符号字段上匹配的所有末端,然后从“ start ”值中删除。最终,您可以通过在“ symbol ”和“ start ”字段上分组在“ end ”字段上汇总。

SELECT t1.symbol,
       t1.start,
       MAX(COALESCE(t2.end, t1.end))    AS "end"
FROM       tab t1
LEFT JOIN tab t2
       ON t1.end = t2.start
WHERE CONCAT(t1.symbol, t1.start) NOT IN (
    SELECT CONCAT(symbol, "end") 
    FROM tab
) 
GROUP BY t1.symbol, 
         t1.start

检查演示在这里

注释:如果只有一行(因此,自连接将在最后一个“ end end ”字段中返回零值),您可以使用一个 cocece

You could apply a self join that combines all starts to all ends that match on the symbol field, then remove from the "start" values any corresponding "end" values. Eventually you can aggregate on the "end" field by grouping on the "symbol" and "start" field.

SELECT t1.symbol,
       t1.start,
       MAX(COALESCE(t2.end, t1.end))    AS "end"
FROM       tab t1
LEFT JOIN tab t2
       ON t1.end = t2.start
WHERE CONCAT(t1.symbol, t1.start) NOT IN (
    SELECT CONCAT(symbol, "end") 
    FROM tab
) 
GROUP BY t1.symbol, 
         t1.start

Check the demo here.

Note: In the case that there is only one row (hence the self join will return a NULL value in the last "end" field), you can fix this using a COALESCE.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文