优化 Vertica SQL 查询以执行运行总计
我有一个表 S,其中包含如下时间序列数据:
key day delta
对于给定的键,有可能但不太可能会丢失几天。
我想从增量值(正整数)构造一个累积列,以便将此累积数据插入到另一个表中。这就是我到目前为止所得到的:
SELECT key, day,
SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
delta
FROM S
在我的 SQL 风格中,默认窗口子句是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,但我将其留在那里以明确。
这个查询真的很慢,比旧的损坏查询慢了一个数量级,旧的查询在累积计数中填充了 0。对于其他生成累积数字的方法有什么建议吗?
我确实在这里查看了解决方案: 按表中分组记录运行总计
我使用的是Vertica。 Vertica SQL 排除了第一个子选择解决方案,并且其查询规划器预测第二个左外连接解决方案的成本比我上面显示的分析形式高出大约 100 倍。
I have a table S with time series data like this:
key day delta
For a given key, it's possible but unlikely that days will be missing.
I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:
SELECT key, day,
SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
delta
FROM S
In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.
This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?
I did look at the solutions here:
Running total by grouped records in table
The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为你基本上就在那里。您可能只需要稍微更新一下语法:
输出:
参考链接:
I think you're essentially there. You may just need to update the syntax a bit:
Output:
reference link:
有时,仅使用相关子查询会更快:
Sometimes it's faster to just use a correlated subquery: