优化 SQL“Where”带子查询的查询子句

发布于 2024-12-12 11:28:40 字数 1816 浏览 7 评论 0原文

假设我有以下假设的数据结构：

create table "country"
(
  country_id integer,  
  country_name varchar(50),
  continent varchar(50),
  constraint country_pkey primary key (country_id)
);

create table "person"
(
  person_id integer,
  person_name varchar(100),
  country_id integer,
  constraint person_pkey primary key (person_id)
);

create table "event"
(
  event_id integer,
  event_desc varchar(100),
  country_id integer,
  constraint event_pkey primary key (event_id)
);

我想查询每个国家的人物和事件的行数。我决定使用子查询。

select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
group by c.country_name

我知道你可以通过在字段列表中使用 select 语句来做到这一点，但使用子查询的优点是我可以更灵活地更改 SQL 以使其汇总并使用另一个字段。假设我更改查询以按大陆显示它，那么就像将字段“c.country_name”替换为“c.Continental”一样简单。

我的问题是关于过滤。如果我们添加一个像这样的 where 子句：

select c.country_name, 
  sum(sub1.person_count) as person_count, 
  sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name

子查询似乎仍然对所有国家/地区执行计数。假设人员表和事件表很大，并且我已经在所有表上都有 Country_id 索引。真的很慢。数据库不应该只执行被过滤的国家/地区的子查询吗？我是否必须为每个子查询重新创建国家/地区过滤器（这非常繁琐并且代码不易修改）？顺便说一下，我同时使用 PostgreSQL 8.3 和 9.0，但我猜其他数据库也会发生同样的情况。

原文

Let's say I have the following hypothetical data structure:

create table "country"
(
  country_id integer,  
  country_name varchar(50),
  continent varchar(50),
  constraint country_pkey primary key (country_id)
);

create table "person"
(
  person_id integer,
  person_name varchar(100),
  country_id integer,
  constraint person_pkey primary key (person_id)
);

create table "event"
(
  event_id integer,
  event_desc varchar(100),
  country_id integer,
  constraint event_pkey primary key (event_id)
);

I want to query the number of rows of people and events per country. I decided to use a subquery.

select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
group by c.country_name

I know you can do this by using select statements in the fields list, but the advantage of using subqueries is that I am more flexible in changing the SQL to make it summarized and use another field. Let's say if I change the query to display it by continent, it will be as simple as replacing the field "c.country_name" into "c.continent".

My problem is regarding filtering. If we add a where clause like so:

select c.country_name, 
  sum(sub1.person_count) as person_count, 
  sum(sub2.event_count) as event_count
from
  "country" c
  left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
    on (c.country_id=sub1.country_id)
  left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
    on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name

The subqueries seem to still execute the counting for all countries. Assume that the person and event tables are huge and I already have indexes on country_id on all tables. It's really slow. Shouldn't the database only execute the subqueries for the country that was filtered? Do i have to re-create the country filter to each subquery (this is very tedious and code is not easily modifiable)? I am using both PostgreSQL 8.3 and 9.0 by the way but I guess the same happens in other databases.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知你几分 2024-12-19 11:28:40

数据库不应该只执行国家/地区的子查询吗
被过滤了？

不会。像您这样的查询的第一步似乎是从 FROM 子句中的所有表构造函数构建一个工作表。之后评估 WHERE 子句。

想象一下，如果 sub1 和 sub2 都是基表而不是子选择，您将如何执行此操作。它们都有两列，并且每个country_id 都有一行。如果你想加入所有行，你可以这样写。

from
  "country" c
  left join sub1 on (c.country_id=sub1.country_id)
  left join sub2 on (c.country_id=sub2.country_id)

但如果您想在单行上加入，您可以编写与此等效的内容。

from
  "country" c
  left join (select * from sub1 where country_id = ?)
    on (c.country_id=sub1.country_id)
  left join (select * from sub2 where country_id = ?)
    on (c.country_id=sub2.country_id)

Joe Celko 帮助开发了早期 SQL 标准，经常撰写有关 SQL 的计算顺序如何出现在 Usenet 上。

Shouldn't the database only execute the subqueries for the country
that was filtered?

No. The first step in a query like yours is to appear to build a working table from all of the table constructors in the FROM clause. The WHERE clause is evaluated after that.

Imagine how you'd do this if sub1 and sub2 were both base tables instead of subselects. They'd both have two columns, and they'd both have one row for each country_id. And if you wanted to JOIN all the rows, you'd write it like this.

from
  "country" c
  left join sub1 on (c.country_id=sub1.country_id)
  left join sub2 on (c.country_id=sub2.country_id)

But if you wanted to JOIN on a single row, you'd write something equivalent to this.

from
  "country" c
  left join (select * from sub1 where country_id = ?)
    on (c.country_id=sub1.country_id)
  left join (select * from sub2 where country_id = ?)
    on (c.country_id=sub2.country_id)

Joe Celko, who helped develop early SQL standards, has often written about how SQL's order of evaluation appears on Usenet.

回复收藏 0 原文