使用单个 SQL 相关子查询获取两列
我的问题由以下查询表示:
SELECT
b.row_id, b.x, b.y, b.something,
(SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
(SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM
my_table b
我使用相同的子查询语句两次,以获取 source_x
和 source_y
。 这就是为什么我想知道是否可以仅使用一个子查询来完成它?
因为一旦我对真实数据(数百万行)运行此查询,它似乎永远不会完成,并且需要数小时甚至数天(我的连接在结束前挂断)。
我正在使用 PostgreSQL 8.4
My problem is represented by the following query:
SELECT
b.row_id, b.x, b.y, b.something,
(SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
(SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM
my_table b
I'm using the same subquery statement twice, for getting both source_x
and source_y
.
That's why I'm wondering if it's possible to do it using one subquery only?
Because once I run this query on my real data (millions of rows) it seems to never finish and take hours, if not days (my connection hang up before the end).
I am using PostgreSQL 8.4
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为你可以使用这种方法:
I think you can use this approach:
@DavidEG 发布了查询的最佳语法。
但是,您的问题绝对不仅仅是查询技术。使用
JOIN
代替两个子查询最多可以将速度提高两倍。很可能更少。这并不能解释“小时”。即使有数百万行,设置得当的 Postgres 也应该在几秒钟而不是几小时内完成简单的查询。首先值得注意的是查询中的语法错误:
这里需要
AND
或OR
,而不是逗号。接下来要检查的是索引。如果
row_id
不是主键,则可能没有索引。为了获得此特定查询的最佳性能,请创建一个多列索引(row_id, some) 上的strong>,如下所示:如果过滤器每次都排除 相同的值 <代码>某事!= 42您还可以使用部分索引 而是为了额外加速:
只有当
42
是一个通用值或某物
是一个比整数更大的列时,这才会产生实质性的差异。 (由于数据对齐,具有两个整数列的索引通常在磁盘上占用的大小与仅具有一列的索引相同。请参阅:当性能成为问题时,它始终是一个问题检查您的设置是个好主意。资源在许多发行版都无法处理“数百万行”。
根据您的 Postgres 实际版本,升级到当前版本(撰写本文时为 9.1)可能会有很大帮助。
最终,硬件始终也是一个因素。调整和优化只能让您到目前为止。
@DavidEG posted the best syntax for the query.
However, your problem is definitely not just with the query technique. A
JOIN
instead of two subqueries can speed up things by a factor of two at best. Most likely less. That doesn't explain "hours". Even with millions of rows, a decently set up Postgres should finish the simple query in seconds, not hours.First thing that stands out is the syntax error in your query:
AND
orOR
is needed here, not a comma.Next thing to check are indexes. If
row_id
is not the primary key, you may not have an index on it. For optimum performance of this particular query create a multi-column index on(row_id, something)
like this:If the filter excludes the same value every time in
something != 42
you can also use a partial index instead for additional speed up:This will only make a substantial difference if
42
is a common value orsomething
is a bigger column than just an integer. (An index with twointeger
columns normally occupies the the same size on disk as an index with just one, due to data alignment. See:When performance is an issue, it is always a good idea to check your settings. Standard settings in Postgres use minimal resources in many distributions and are not up to handling "millions of rows".
Depending on your actual version of Postgres, an upgrade to a current version (9.1 at the time of writing) may help a lot.
Ultimately, hardware is always a factor, too. Tuning and optimizing can only get you so far.
老式语法:
Join 语法:
old-fashioned syntax:
Join-syntax:
Postgres:
MsSQL
Postgres:
MsSQL