如果序列没有破坏,则获取多行的总时间间隔

发布于 2024-12-02 06:13:07 字数 993 浏览 2 评论 0原文

我有 WorkPerson 表(这些只是理解问题的示例)。

结构

Work

id INTEGER
person_id INTEGER
dt_from DATETIME
dt_to DATETIME

Person

person_id INTEGER
name VARCHAR(50)

数据

Work

id | person_id | dt_from    | dt_to
-------------------------------------------------
1  | 1         | 2011-01-01 | 2011-02-02
2  | 1         | 2011-02-02 | 2011-04-04
3  | 1         | 2011-06-06 | 2011-09-09
4  | 2         | 2011-01-01 | 2011-02-02
5  | 2         | 2011-02-02 | 2011-03-03
....etc.

Person

仅包含人员 ID 的人员姓名

预期输出

Person 1 : 2011-01-01 - 2011-04-04
Person 1 : 2011-06-06 - 2011-09-09
Person 2 : 2011-01-01 - 2011-03-03

间隔必须在顺序。它不能在中间的某个地方被破坏。这就是为什么第一个人有两个间隔。

如果它改变了某些东西,我正在使用 postgres。你有什么想法吗? 我想在一个查询中完成它,但如果没有这样的解决方案,我将在 php 中进行一些间隔合并。

I have Work and Person tables (these are just examples to understand problem).

Structure

Work table

id INTEGER
person_id INTEGER
dt_from DATETIME
dt_to DATETIME

Person table

person_id INTEGER
name VARCHAR(50)

Data

Work table

id | person_id | dt_from    | dt_to
-------------------------------------------------
1  | 1         | 2011-01-01 | 2011-02-02
2  | 1         | 2011-02-02 | 2011-04-04
3  | 1         | 2011-06-06 | 2011-09-09
4  | 2         | 2011-01-01 | 2011-02-02
5  | 2         | 2011-02-02 | 2011-03-03
....etc.

Person table

Just person names with person id

Expected output

Person 1 : 2011-01-01 - 2011-04-04
Person 1 : 2011-06-06 - 2011-09-09
Person 2 : 2011-01-01 - 2011-03-03

Interval must be in sequence. It cannot be broken somewhere in middle. Thats why Person 1 have two intervals.

I'm using postgres if it changes something. Have you any thougths?
I wanted do it in one query, but if there is no such solution i will do some interval merge in php.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

负佳期 2024-12-09 06:13:07

也许有一种方法可以在一个 SQL 选择中做到这一点,但我没有想到。不过,我设法用一个存储函数来做到这一点。这是我为测试所做的:

create table work
(id integer, start_date date, end_date date);

insert into work values (1, '2011-01-01','2011-02-02');
insert into work values (1, '2011-02-02','2011-04-04');
insert into work values (1, '2011-06-06','2011-09-09');
insert into work values (2, '2011-01-01','2011-02-02');
insert into work values (2, '2011-02-02','2011-03-03');

create or replace function get_data() returns setof work as
$body$
declare
    res work%rowtype;
    sd  date := null;
begin
    for res in
        select
            w1.id,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
        from
            work w1
        order by
            id, start_date, end_date
    loop
        if res.start_date is not null and res.end_date is not null then
            return next res;
        elsif res.start_date is not null then
            sd := res.start_date;
        elsif res.end_date is not null then
            res.start_date := sd;
            return next res;
        end if;
    end loop;

    return;
end;$body$
  language 'plpgsql';

然后

select * from get_data() order by id, start_date;

返回了这个结果:

 id | start_date |  end_date
----+------------+------------
  1 | 2011-01-01 | 2011-04-04
  1 | 2011-06-06 | 2011-09-09
  2 | 2011-01-01 | 2011-03-03
(3 rows)

我认为这就是您所追求的。

There may be a way to do this in one SQL select, but it escapes me. I managed to do it with one stored function though. Here's what I did for my testing:

create table work
(id integer, start_date date, end_date date);

insert into work values (1, '2011-01-01','2011-02-02');
insert into work values (1, '2011-02-02','2011-04-04');
insert into work values (1, '2011-06-06','2011-09-09');
insert into work values (2, '2011-01-01','2011-02-02');
insert into work values (2, '2011-02-02','2011-03-03');

create or replace function get_data() returns setof work as
$body$
declare
    res work%rowtype;
    sd  date := null;
begin
    for res in
        select
            w1.id,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.end_date=w1.start_date) then null else w1.start_date end,
            case when exists (select 1 from work w2 where w1.id=w2.id and w2.start_date=w1.end_date) then null else w1.end_date end
        from
            work w1
        order by
            id, start_date, end_date
    loop
        if res.start_date is not null and res.end_date is not null then
            return next res;
        elsif res.start_date is not null then
            sd := res.start_date;
        elsif res.end_date is not null then
            res.start_date := sd;
            return next res;
        end if;
    end loop;

    return;
end;$body$
  language 'plpgsql';

Then

select * from get_data() order by id, start_date;

returned this result:

 id | start_date |  end_date
----+------------+------------
  1 | 2011-01-01 | 2011-04-04
  1 | 2011-06-06 | 2011-09-09
  2 | 2011-01-01 | 2011-03-03
(3 rows)

which is, I think, what you're after.

抠脚大汉 2024-12-09 06:13:07

您可以尝试 postgres 的 WITH RECURSIVE 构造。 (毕竟,链表是一种树)获得正确的边界条件将是一个问题,但至少它可以解决问题而不需要循环。

更新:添加了代码。 RECURSIVE 的问题是您只能指定“尾部”边界条件。要指定“head”条件,您需要将其包装到视图中。

CREATE VIEW collected_time AS (
    WITH RECURSIVE ztree(person_id, dt_from, dt_to)  AS ( 
    -- Terminal part
    SELECT  pr.person_id, pr.dt_from, pr.dt_to
    FROM prikklok pr
    WHERE NOT EXISTS (
            SELECT * FROM prikklok px
            WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
            )
    UNION
    -- Recursive part
    SELECT  p1.person_id AS person_id
    , p1.dt_from AS dt_from
    , p2.dt_to AS dt_to
    FROM prikklok AS p1
    , ztree AS p2
    WHERE p1.person_id = p2.person_id
    AND p1.dt_to = p2.dt_from
    )
SELECT *
FROM ztree zt
WHERE NOT EXISTS (select *
    FROM prikklok p3
    WHERE p3.person_id = zt.person_id
        AND p3.dt_to = zt.dt_from
            )
    );

SELECT * FROM collected_time;

-- 现在生成一些有间隙的数据

INSERT INTO prikklok
    SELECT serie_n
    , serie_t
    , serie_t + '1 month'::interval
FROM generate_series (1,10) serie_n
    , generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
    ;

DELETE FROM prikklok
WHERE random() <  0.001
    ;

-- a few indexes won't hurt
ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
    ;
CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);

生成的查询计划看起来很完美:

                                                                              QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
   CTE ztree
     ->  Recursive Union  (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
           ->  Merge Anti Join  (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
                 Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok pr  (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
                 ->  Index Scan using prikklok_pkey on prikklok px  (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
           ->  Nested Loop  (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
                 ->  WorkTable Scan on ztree p2  (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p1  (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
                       Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
   ->  CTE Scan on ztree zt  (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
   ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p3  (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
         Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
 Total runtime: 41.354 ms
(15 rows)

You could try postgres's WITH RECURSIVE construct. (after all, a linked list is a kind of tree) Getting the boundary conditions right will be a problem, but at least it would solve the problem without the need for loops.

UPDATE: Added code. The problem with RECURSIVE is that you can only specify only the "tail" boundary condition. To specify the "head" condition, you need to wrap it into a view.

CREATE VIEW collected_time AS (
    WITH RECURSIVE ztree(person_id, dt_from, dt_to)  AS ( 
    -- Terminal part
    SELECT  pr.person_id, pr.dt_from, pr.dt_to
    FROM prikklok pr
    WHERE NOT EXISTS (
            SELECT * FROM prikklok px
            WHERE px.person_id = pr.person_id AND px.dt_from = pr.dt_to
            )
    UNION
    -- Recursive part
    SELECT  p1.person_id AS person_id
    , p1.dt_from AS dt_from
    , p2.dt_to AS dt_to
    FROM prikklok AS p1
    , ztree AS p2
    WHERE p1.person_id = p2.person_id
    AND p1.dt_to = p2.dt_from
    )
SELECT *
FROM ztree zt
WHERE NOT EXISTS (select *
    FROM prikklok p3
    WHERE p3.person_id = zt.person_id
        AND p3.dt_to = zt.dt_from
            )
    );

SELECT * FROM collected_time;

-- now generate some data with gaps

INSERT INTO prikklok
    SELECT serie_n
    , serie_t
    , serie_t + '1 month'::interval
FROM generate_series (1,10) serie_n
    , generate_series ( '1970-01-01 00:00:00' , '2011-09-01 00:00:00' , '1 month' ::interval) serie_t
    ;

DELETE FROM prikklok
WHERE random() <  0.001
    ;

-- a few indexes won't hurt
ALTER TABLE prikklok ADD PRIMARY KEY (person_id,dt_from)
    ;
CREATE UNIQUE INDEX ON prikklok (person_id,dt_to);

The resulting query plan looks perfect:

                                                                              QUERY PLAN                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=1389.73..1469.09 rows=1 width=20) (actual time=13.580..40.920 rows=16 loops=1)
   CTE ztree
     ->  Recursive Union  (cost=0.00..1389.73 rows=11 width=20) (actual time=0.136..27.405 rows=5004 loops=1)
           ->  Merge Anti Join  (cost=0.00..638.92 rows=1 width=20) (actual time=0.130..10.011 rows=16 loops=1)
                 Merge Cond: ((pr.person_id = px.person_id) AND (pr.dt_to = px.dt_from))
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok pr  (cost=0.00..291.31 rows=5004 width=20) (actual time=0.063..2.273 rows=5004 loops=1)
                 ->  Index Scan using prikklok_pkey on prikklok px  (cost=0.00..291.31 rows=5004 width=12) (actual time=0.012..2.204 rows=5004 loops=1)
           ->  Nested Loop  (cost=0.00..75.06 rows=1 width=20) (actual time=0.002..0.027 rows=10 loops=501)
                 ->  WorkTable Scan on ztree p2  (cost=0.00..0.20 rows=10 width=20) (actual time=0.000..0.001 rows=10 loops=501)
                 ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p1  (cost=0.00..7.47 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=5004)
                       Index Cond: ((p1.person_id = p2.person_id) AND (p1.dt_to = p2.dt_from))
   ->  CTE Scan on ztree zt  (cost=0.00..0.22 rows=11 width=20) (actual time=0.138..29.887 rows=5004 loops=1)
   ->  Index Scan using prikklok_person_id_dt_to_idx on prikklok p3  (cost=0.00..7.18 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=5004)
         Index Cond: ((p3.person_id = zt.person_id) AND (p3.dt_to = zt.dt_from))
 Total runtime: 41.354 ms
(15 rows)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文