优化 SQL“Where”带子查询的查询子句
假设我有以下假设的数据结构:
create table "country"
(
country_id integer,
country_name varchar(50),
continent varchar(50),
constraint country_pkey primary key (country_id)
);
create table "person"
(
person_id integer,
person_name varchar(100),
country_id integer,
constraint person_pkey primary key (person_id)
);
create table "event"
(
event_id integer,
event_desc varchar(100),
country_id integer,
constraint event_pkey primary key (event_id)
);
我想查询每个国家的人物和事件的行数。我决定使用子查询。
select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
"country" c
left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
on (c.country_id=sub1.country_id)
left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
on (c.country_id=sub2.country_id)
group by c.country_name
我知道你可以通过在字段列表中使用 select 语句来做到这一点,但使用子查询的优点是我可以更灵活地更改 SQL 以使其汇总并使用另一个字段。假设我更改查询以按大陆显示它,那么就像将字段“c.country_name”替换为“c.Continental”一样简单。
我的问题是关于过滤。如果我们添加一个像这样的 where 子句:
select c.country_name,
sum(sub1.person_count) as person_count,
sum(sub2.event_count) as event_count
from
"country" c
left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
on (c.country_id=sub1.country_id)
left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name
子查询似乎仍然对所有国家/地区执行计数。假设人员表和事件表很大,并且我已经在所有表上都有 Country_id 索引。真的很慢。数据库不应该只执行被过滤的国家/地区的子查询吗?我是否必须为每个子查询重新创建国家/地区过滤器(这非常繁琐并且代码不易修改)?顺便说一下,我同时使用 PostgreSQL 8.3 和 9.0,但我猜其他数据库也会发生同样的情况。
Let's say I have the following hypothetical data structure:
create table "country"
(
country_id integer,
country_name varchar(50),
continent varchar(50),
constraint country_pkey primary key (country_id)
);
create table "person"
(
person_id integer,
person_name varchar(100),
country_id integer,
constraint person_pkey primary key (person_id)
);
create table "event"
(
event_id integer,
event_desc varchar(100),
country_id integer,
constraint event_pkey primary key (event_id)
);
I want to query the number of rows of people and events per country. I decided to use a subquery.
select c.country_name, sum(sub1.person_count) as person_count, sum(sub2.event_count) as event_count
from
"country" c
left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
on (c.country_id=sub1.country_id)
left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
on (c.country_id=sub2.country_id)
group by c.country_name
I know you can do this by using select statements in the fields list, but the advantage of using subqueries is that I am more flexible in changing the SQL to make it summarized and use another field. Let's say if I change the query to display it by continent, it will be as simple as replacing the field "c.country_name" into "c.continent".
My problem is regarding filtering. If we add a where clause like so:
select c.country_name,
sum(sub1.person_count) as person_count,
sum(sub2.event_count) as event_count
from
"country" c
left join (select country_id, count(*) as person_count from "person" group by country_id) sub1
on (c.country_id=sub1.country_id)
left join (select country_id, count(*) as event_count from "event" group by country_id) sub2
on (c.country_id=sub2.country_id)
where c.country_name='UNITED STATES'
group by c.country_name
The subqueries seem to still execute the counting for all countries. Assume that the person and event tables are huge and I already have indexes on country_id on all tables. It's really slow. Shouldn't the database only execute the subqueries for the country that was filtered? Do i have to re-create the country filter to each subquery (this is very tedious and code is not easily modifiable)? I am using both PostgreSQL 8.3 and 9.0 by the way but I guess the same happens in other databases.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不会。像您这样的查询的第一步似乎是从 FROM 子句中的所有表构造函数构建一个工作表。之后评估 WHERE 子句。
想象一下,如果 sub1 和 sub2 都是基表而不是子选择,您将如何执行此操作。它们都有两列,并且每个country_id 都有一行。如果你想加入所有行,你可以这样写。
但如果您想在单行上加入,您可以编写与此等效的内容。
Joe Celko 帮助开发了早期 SQL 标准,经常撰写有关 SQL 的计算顺序如何出现在 Usenet 上。
No. The first step in a query like yours is to appear to build a working table from all of the table constructors in the FROM clause. The WHERE clause is evaluated after that.
Imagine how you'd do this if sub1 and sub2 were both base tables instead of subselects. They'd both have two columns, and they'd both have one row for each country_id. And if you wanted to JOIN all the rows, you'd write it like this.
But if you wanted to JOIN on a single row, you'd write something equivalent to this.
Joe Celko, who helped develop early SQL standards, has often written about how SQL's order of evaluation appears on Usenet.
country_id
而不是country_name
来过滤/分组行吗?我想你没有名字索引。country_id
notcountry_name
? I suppose you do not have index on name.