另一个日期空白填充 SQL 难题

发布于 2024-12-23 05:17:48 字数 1535 浏览 8 评论 0原文

不幸的是，我正在使用 Vertica，这使我无法使用 CROSS APPLY。显然 Vertica 中不存在 CTE 这样的东西。

这是我得到的结果：

t:
    day    | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 |  1 | 10     | 10
2011-12-03 |  1 | 12     | 2
2011-12-04 |  1 | 15     | 3

请注意，在第一天，增量等于指标值。我想填补空白，就像这样：

t_fill:
    day    | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 |  1 | 10     | 10
2011-12-02 |  1 | 10     | 0 -- a delta of 0
2011-12-03 |  1 | 12     | 2
2011-12-04 |  1 | 15     | 3

我每天都在想一种方法来做到这一点，但我真正想要的是一种一次性解决方案。

我想我可以得到一些与 LAST_VALUE 一起工作的东西，但我无法想出正确的 JOIN 语句来让我对每个 id 的日常历史记录进行正确的分区和排序。

编辑：假设我有一个像这样的表：

calendar:
    day 
------------
2011-01-01
2011-01-02
   ...

可以涉及连接。我的目的是维护日历中的日期范围以匹配t中的日期范围。

编辑：关于我正在寻找的内容的更多注释，只是具体而言：

在生成 t_fill 时，我还想准确覆盖 t 中的日期范围以及中间缺少的任何日期。因此，正确的 t_fill 将与 t 在同一日期开始并在同一日期结束。 t_fill 有两个属性：

1）一旦 id 出现在某个日期，后面的每个日期都会有一行。这是原始问题中隐含的空白填充。

2) 如果某个 id 的行在某个日期后不再出现，t_fill 解决方案应该愉快地生成从最后一个数据点的日期到下一个数据点的日期具有相同指标值（和 0 增量）的行。 t 的结束日期。

解决方案可能会回填较早的日期，直到 t 中日期范围的开始日期。也就是说，对于 t 中的第一个日期之后出现的任何 id，t 中的第一个日期与 id 的第一个日期之间的行将填充为 metric=0并且d_metric=0。我不喜欢这种解决方案，因为它对进入系统的每个 id 都有更高的增长因子。但我可以通过仅选择新表中 metric!=0 和 d_metric!=0 的行来轻松处理它。

原文

I'm using Vertica, which precludes me from using CROSS APPLY, unfortunately. And apparently there's no such thing as CTEs in Vertica.

Here's what I've got:

t:
    day    | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 |  1 | 10     | 10
2011-12-03 |  1 | 12     | 2
2011-12-04 |  1 | 15     | 3

Note that on the first day, the delta is equal to the metric value.
I'd like to fill in the gaps, like this:

t_fill:
    day    | id | metric | d_metric
-----------+----+--------+----------
2011-12-01 |  1 | 10     | 10
2011-12-02 |  1 | 10     | 0 -- a delta of 0
2011-12-03 |  1 | 12     | 2
2011-12-04 |  1 | 15     | 3

I've thought of a way to do this day by day, but what I'd really like is a solution that works in one go.

I think I could get something working with LAST_VALUE, but I can't come up with the right JOIN statements that will let me properly partition and order on each id's day-by-day history.

edit:
assume I have a table like this:

calendar:
    day 
------------
2011-01-01
2011-01-02
   ...

that can be involved with joins. My intent would be to maintain the date range in calendar to match the date range in t.

edit:
A few more notes on what I'm looking for, just to be specific:

In generating t_fill, I'd like to exactly cover the date range in t, as well as any dates that are missing in between. So a correct t_fill will start on the same date and end on the same date as t.
t_fill has two properties:

1) once an id appears on some date, it will always have a row for each later date. This is the gap-filling implied in the original question.

2) Should no row for an id ever appear again after some date, the t_fill solution should merrily generate rows with the same metric value (and 0 delta) from the date of that last data point up to the end date of t.

A solution might backfill earlier dates up to the start of the date range in t. That is, for any id that appears after the first date in t, rows between the first date in t and the first date for the id will be filled with metric=0 and d_metric=0. I don't prefer this kind of solution, since it has a higher growth factor for each id that enters the system. But I could easily deal with it by selecting into a new table only rows where metric!=0 and d_metric!=0.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

平安喜乐 2024-12-30 05:17:48

这与 Jonathan Leffler 的提议有关，但采用了老式的低级 SQL（没有花哨的 CTE 或窗口函数或聚合子查询）：

SET search_path='tmp'
DROP TABLE ttable CASCADE;
CREATE TABLE ttable
        ( zday date NOT NULL
        , id INTEGER NOT NULL
        , metric INTEGER NOT NULL
        , d_metric INTEGER NOT NULL
        , PRIMARY KEY (id,zday)
        );
INSERT INTO ttable(zday,id,metric,d_metric) VALUES
 ('2011-12-01',1,10,10)
,('2011-12-03',1,12,2)
,('2011-12-04',1,15,3)
        ;

DROP TABLE ctable CASCADE;
CREATE TABLE ctable
        ( zday date NOT NULL
        , PRIMARY KEY (zday)
        );
INSERT INTO ctable(zday) VALUES
 ('2011-12-01')
,('2011-12-02')
,('2011-12-03')
,('2011-12-04')
        ;

CREATE VIEW v_cte AS (
        SELECT t.zday,t.id,t.metric,t.d_metric
        FROM ttable t
        JOIN ctable c ON c.zday = t.zday
        UNION
        SELECT c.zday,t.id,t.metric, 0
        FROM ctable c, ttable t
        WHERE t.zday < c.zday
        AND NOT EXISTS ( SELECT *
                FROM ttable nx
                WHERE nx.id = t.id
                AND nx.zday = c.zday
                )
        AND NOT EXISTS ( SELECT *
                FROM ttable nx
                WHERE nx.id = t.id
                AND nx.zday < c.zday
                AND nx.zday > t.zday
                )
        )
        ;
SELECT * FROM v_cte;

结果：

    zday    | id | metric | d_metric 
------------+----+--------+----------
 2011-12-01 |  1 |     10 |       10
 2011-12-02 |  1 |     10 |        0
 2011-12-03 |  1 |     12 |        2
 2011-12-04 |  1 |     15 |        3
(4 rows)

This about what Jonathan Leffler proposed, but into old-fashioned low-level SQL (without fancy CTE's or window functions or aggregating subqueries):

SET search_path='tmp'
DROP TABLE ttable CASCADE;
CREATE TABLE ttable
        ( zday date NOT NULL
        , id INTEGER NOT NULL
        , metric INTEGER NOT NULL
        , d_metric INTEGER NOT NULL
        , PRIMARY KEY (id,zday)
        );
INSERT INTO ttable(zday,id,metric,d_metric) VALUES
 ('2011-12-01',1,10,10)
,('2011-12-03',1,12,2)
,('2011-12-04',1,15,3)
        ;

DROP TABLE ctable CASCADE;
CREATE TABLE ctable
        ( zday date NOT NULL
        , PRIMARY KEY (zday)
        );
INSERT INTO ctable(zday) VALUES
 ('2011-12-01')
,('2011-12-02')
,('2011-12-03')
,('2011-12-04')
        ;

CREATE VIEW v_cte AS (
        SELECT t.zday,t.id,t.metric,t.d_metric
        FROM ttable t
        JOIN ctable c ON c.zday = t.zday
        UNION
        SELECT c.zday,t.id,t.metric, 0
        FROM ctable c, ttable t
        WHERE t.zday < c.zday
        AND NOT EXISTS ( SELECT *
                FROM ttable nx
                WHERE nx.id = t.id
                AND nx.zday = c.zday
                )
        AND NOT EXISTS ( SELECT *
                FROM ttable nx
                WHERE nx.id = t.id
                AND nx.zday < c.zday
                AND nx.zday > t.zday
                )
        )
        ;
SELECT * FROM v_cte;

The results:

    zday    | id | metric | d_metric 
------------+----+--------+----------
 2011-12-01 |  1 |     10 |       10
 2011-12-02 |  1 |     10 |        0
 2011-12-03 |  1 |     12 |        2
 2011-12-04 |  1 |     15 |        3
(4 rows)

回复收藏 0 原文