优化 Vertica SQL 查询以执行运行总计

发布于 2024-11-28 06:42:44 字数 731 浏览 1 评论 0原文

我有一个表 S,其中包含如下时间序列数据:

key   day   delta

对于给定的键,有可能但不太可能会丢失几天。

我想从增量值(正整数)构造一个累积列,以便将此累积数据插入到另一个表中。这就是我到目前为止所得到的:

SELECT key, day,
   SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
   delta
FROM S

在我的 SQL 风格中,默认窗口子句是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,但我将其留在那里以明确。

这个查询真的很慢,比旧的损坏查询慢了一个数量级,旧的查询在累积计数中填充了 0。对于其他生成累积数字的方法有什么建议吗?

我确实在这里查看了解决方案: 按表中分组记录运行总计

我使用的是Vertica。 Vertica SQL 排除了第一个子选择解决方案,并且其查询规划器预测第二个左外连接解决方​​案的成本比我上面显示的分析形式高出大约 100 倍。

I have a table S with time series data like this:

key   day   delta

For a given key, it's possible but unlikely that days will be missing.

I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:

SELECT key, day,
   SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
   delta
FROM S

In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.

This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?

I did look at the solutions here:
Running total by grouped records in table

The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

何以心动 2024-12-05 06:42:44

我认为你基本上就在那里。您可能只需要稍微更新一下语法:

SELECT s_qty, 
   Sum(s_price) 
     OVER( 
       partition BY NULL 
       ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum" 
FROM   sample_sales;

输出:

S_QTY | Cumulative Sum 
------+----------------
1     | 1000
100   | 11000
150   | 26000
200   | 28000
250   | 53000
300   | 83000
2000  | 103000
(7 rows)

参考链接:

https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/

I think you're essentially there. You may just need to update the syntax a bit:

SELECT s_qty, 
   Sum(s_price) 
     OVER( 
       partition BY NULL 
       ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum" 
FROM   sample_sales;

Output:

S_QTY | Cumulative Sum 
------+----------------
1     | 1000
100   | 11000
150   | 26000
200   | 28000
250   | 53000
300   | 83000
2000  | 103000
(7 rows)

reference link:

https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/
回眸一遍 2024-12-05 06:42:44

有时,仅使用相关子查询会更快:

SELECT 
    [key]
    , [day]
    , delta
    , (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1

Sometimes it's faster to just use a correlated subquery:

SELECT 
    [key]
    , [day]
    , delta
    , (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文