使用单个 SQL 相关子查询获取两列

发布于 2024-12-14 00:22:37 字数 493 浏览 0 评论 0原文

我的问题由以下查询表示:

SELECT 
  b.row_id, b.x, b.y, b.something,
  (SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
  (SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM 
  my_table b

我使用相同的子查询语句两次,以获取 source_xsource_y。 这就是为什么我想知道是否可以仅使用一个子查询来完成它?

因为一旦我对真实数据(数百万行)运行此查询,它似乎永远不会完成,并且需要数小时甚至数天(我的连接在结束前挂断)。

我正在使用 PostgreSQL 8.4

My problem is represented by the following query:

SELECT 
  b.row_id, b.x, b.y, b.something,
  (SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
  (SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM 
  my_table b

I'm using the same subquery statement twice, for getting both source_x and source_y.
That's why I'm wondering if it's possible to do it using one subquery only?

Because once I run this query on my real data (millions of rows) it seems to never finish and take hours, if not days (my connection hang up before the end).

I am using PostgreSQL 8.4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

转身泪倾城 2024-12-21 00:22:37

我认为你可以使用这种方法:

SELECT b.row_id
     , b.x
     , b.y
     , b.something
     , a.x
     , a.y
  FROM my_table b
  left join my_table a on a.row_id = (b.row_id - 1)
                      and a.something != 42

I think you can use this approach:

SELECT b.row_id
     , b.x
     , b.y
     , b.something
     , a.x
     , a.y
  FROM my_table b
  left join my_table a on a.row_id = (b.row_id - 1)
                      and a.something != 42
风月客 2024-12-21 00:22:37

@DavidEG 发布了查询的最佳语法。

但是,您的问题绝对不仅仅是查询技术。使用 JOIN 代替两个子查询最多可以将速度提高两倍。很可能更少。这并不能解释“小时”。即使有数百万行,设置得当的 Postgres 也应该在几秒钟而不是几小时内完成简单的查询。

  • 首先值得注意的是查询中的语法错误

    ... WHERE a.row_id = (b.row_id - 1), a.something != 42
    

    这里需要

    ANDOR,而不是逗号。

  • 接下来要检查的是索引。如果 row_id 不是主键,则可能没有索引。为了获得此特定查询的最佳性能,请创建一个多列索引(row_id, some) 上的strong>,如下所示:

    在 my_table (row_id,某事)上创建索引 my_table_row_id_something_idx
    

  • 如果过滤器每次都排除 相同的值 <代码>某事!= 42您还可以使用部分索引 而是为了额外加速:

    在 my_table (row_id) 上创建索引 my_table_row_id_something_idx
    哪里有东西!= 42
    

    只有当42是一个通用值某物是一个比整数更大的列时,这才会产生实质性的差异。 (由于数据对齐,具有两个整数列的索引通常在磁盘上占用的大小与仅具有一列的索引相同。请参阅:

  • 当性能成为问题时,它始终是一个问题检查您的设置是个好主意。资源在许多发行版都无法处理“数百万行”。

  • 根据您的 Postgres 实际版本,升级到当前版本(撰写本文时为 9.1)可能会有很大帮助。

  • 最终,硬件始终也是一个因素。调整和优化只能让您到目前为止。

@DavidEG posted the best syntax for the query.

However, your problem is definitely not just with the query technique. A JOIN instead of two subqueries can speed up things by a factor of two at best. Most likely less. That doesn't explain "hours". Even with millions of rows, a decently set up Postgres should finish the simple query in seconds, not hours.

  • First thing that stands out is the syntax error in your query:

    ... WHERE a.row_id = (b.row_id - 1), a.something != 42
    

    AND or OR is needed here, not a comma.

  • Next thing to check are indexes. If row_id is not the primary key, you may not have an index on it. For optimum performance of this particular query create a multi-column index on (row_id, something) like this:

    CREATE INDEX my_table_row_id_something_idx ON my_table (row_id, something)
    
  • If the filter excludes the same value every time in something != 42you can also use a partial index instead for additional speed up:

    CREATE INDEX my_table_row_id_something_idx ON my_table (row_id)
    WHERE something != 42
    

    This will only make a substantial difference if 42 is a common value or something is a bigger column than just an integer. (An index with two integer columns normally occupies the the same size on disk as an index with just one, due to data alignment. See:

  • When performance is an issue, it is always a good idea to check your settings. Standard settings in Postgres use minimal resources in many distributions and are not up to handling "millions of rows".

  • Depending on your actual version of Postgres, an upgrade to a current version (9.1 at the time of writing) may help a lot.

  • Ultimately, hardware is always a factor, too. Tuning and optimizing can only get you so far.

悍妇囚夫 2024-12-21 00:22:37

老式语法:

SELECT 
  b.row_id, b.x, b.y, b.something
  , a.x AS source_x
  , a.y AS source
FROM my_table b
    ,my_table a 
WHERE a.row_id = b.row_id - 1
  AND a.something != 42
  ;

Join 语法:

SELECT 
  b.row_id, b.x, b.y, b.something
  , a.x AS source_x
  , a.y AS source
FROM my_table b
JOIN my_table a 
  ON (a.row_id = b.row_id - 1)
WHERE a.something != 42
  ;

old-fashioned syntax:

SELECT 
  b.row_id, b.x, b.y, b.something
  , a.x AS source_x
  , a.y AS source
FROM my_table b
    ,my_table a 
WHERE a.row_id = b.row_id - 1
  AND a.something != 42
  ;

Join-syntax:

SELECT 
  b.row_id, b.x, b.y, b.something
  , a.x AS source_x
  , a.y AS source
FROM my_table b
JOIN my_table a 
  ON (a.row_id = b.row_id - 1)
WHERE a.something != 42
  ;
还不是爱你 2024-12-21 00:22:37
SELECT b.row_id, b.x, b.y, b.something, a.x, a.y
  FROM my_table b
  LEFT JOIN (
    SELECT row_id + 1, x, y
      FROM my_table
      WHERE something != 42
  ) AS a ON a.row_id = b.row_id;
SELECT b.row_id, b.x, b.y, b.something, a.x, a.y
  FROM my_table b
  LEFT JOIN (
    SELECT row_id + 1, x, y
      FROM my_table
      WHERE something != 42
  ) AS a ON a.row_id = b.row_id;
云雾 2024-12-21 00:22:37

Postgres:

    SELECT 
      b.row_id, b.x, b.y, b.something,
      source_x,
      source_y
    FROM 
      my_table b,
LATERAL(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

MsSQL

SELECT 
      b.row_id, b.x, b.y, b.something,
      source_x,
      source_y
    FROM 
      my_table b
OUTER APPLY(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

Postgres:

    SELECT 
      b.row_id, b.x, b.y, b.something,
      source_x,
      source_y
    FROM 
      my_table b,
LATERAL(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

MsSQL

SELECT 
      b.row_id, b.x, b.y, b.something,
      source_x,
      source_y
    FROM 
      my_table b
OUTER APPLY(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文