SQL 中的主键 Select Distinct 如何工作?

发布于 2025-01-06 04:22:17 字数 169 浏览 0 评论 0原文

SQL 优化器如何处理主键上的 Select Distinct?例如,

员工表 - (EmpId int 主键,EmpName)

从 Employee 中选择不同的 EmpId。

考虑到 EmpId 已经不同,优化器会拒绝 unique 关键字吗?

How does the SQL Optimizer handle Select Distinct on a primary key? For eg

Employee Table - (EmpId int primary key, EmpName)

select distinct EmpId from Employee.

Would the optimizer reject the distinct keyword considering EmpId is already distinct?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梓梦 2025-01-13 04:22:17

尝试使用和不使用 DISTINCT 关键字并比较执行计划。

SQL Server 将为两者生成相同的计划。对于其他 RDBMS 来说是 YMMV。

Try it both with and without the DISTINCT keyword and compare execution plans.

SQL Server will produce identical plans for both. YMMV for other RDBMSs.

情定在深秋 2025-01-13 04:22:17

在 Oracle 11g 下尝试过,结果不同。

select USERID from TPM_USER

生成:

SELECT STATEMENT    7.0 7   242479  667 2668    7                   ALL_ROWS                                            
TABLE ACCESS (FULL) 7.0 7   242479  667 2668    1   TPMDBO  TPM_USER    FULL    TABLE   ANALYZED    1                                       

和:

select distinct USERID from TPM_USER

生成:

SELECT STATEMENT    8.0 7   24725738    667 2668    8                   ALL_ROWS                                            
HASH (UNIQUE)   8.0 7   24725738    667 2668    1           UNIQUE                                                  
TABLE ACCESS (FULL) 7.0 7   242479  667 2668    1   TPMDBO  TPM_USER    FULL    TABLE   ANALYZED    1                                       

USERIDTPM_USER 上的主键,以防不清楚。

更新:

我有点恼火 Oracle 可能这么愚蠢,所以我在我们的生产服务器上尝试了相同的查询,该服务器的数据量大约是原来的一千倍。这次,两个查询计划是相同的(两次都使用了索引,并且没有进行全表扫描)。这让我相信规划者在决定走哪条路线时会考虑表统计数据。

生产(有或没有DISTINCT):

SELECT STATEMENT    3.0 3   461492  3521    14084   3                   ALL_ROWS                                            
INDEX (FAST FULL SCAN)  3.0 3   461492  3521    14084   1   TPMDBO  TPM_USER_PK FAST FULL SCAN  INDEX (UNIQUE)  ANALYZED                                            

Tried under Oracle 11g and they produce different results.

select USERID from TPM_USER

Produces:

SELECT STATEMENT    7.0 7   242479  667 2668    7                   ALL_ROWS                                            
TABLE ACCESS (FULL) 7.0 7   242479  667 2668    1   TPMDBO  TPM_USER    FULL    TABLE   ANALYZED    1                                       

And:

select distinct USERID from TPM_USER

Produces:

SELECT STATEMENT    8.0 7   24725738    667 2668    8                   ALL_ROWS                                            
HASH (UNIQUE)   8.0 7   24725738    667 2668    1           UNIQUE                                                  
TABLE ACCESS (FULL) 7.0 7   242479  667 2668    1   TPMDBO  TPM_USER    FULL    TABLE   ANALYZED    1                                       

USERID is the primary key on TPM_USER in case that wasn't clear.

UPDATE:

I was somewhat irked that Oracle could be this stupid, so I tried the same queries on our production server which has about a thousand times more data. This time, the two query plans were identical (the index was used both times, and no full table scan was done). This leads me to believe the planner will take the table statistics into account when deciding which route to go.

Production (with or without DISTINCT):

SELECT STATEMENT    3.0 3   461492  3521    14084   3                   ALL_ROWS                                            
INDEX (FAST FULL SCAN)  3.0 3   461492  3521    14084   1   TPMDBO  TPM_USER_PK FAST FULL SCAN  INDEX (UNIQUE)  ANALYZED                                            
2025-01-13 04:22:17

你的问题不可能有一个固定的规则,每个数据库都可以用自己的方式处理这个问题。您必须查看数据库的查询执行计划。

但我相信现代生产数据库,如 Oracle、MySQL、Postgres...只是忽略不同的单词,只是迭代索引(而不是表)。

There can't be a fixed rule for your question, and each database can handle this by its own way. You must look at your database's execution plan for the query.

But I believe modern productional databases such as Oracle, MySQL, Postgres, ... just ignore the distinct word and just iterate over the index (and not the table).

栖迟 2025-01-13 04:22:17

SQL Server 不会为我生成相同的计划。

在非索引列上使用不同的查询使用哈希匹配,而在主键上查询则不需要这样做。

SQL Server doesn't produce identical plans for me.

Querying with distinct on a non-indexed column uses a hash match, whereas querying on the primary key doesn't bother to do that.

逆蝶 2025-01-13 04:22:17

我不确定你所说的拒绝是什么意思,但考虑到,根据定义,主键是非空且唯一的,我确信排序/分组操作被简单地忽略,因为对于大多数情况下生成执行计划的目的来说是不必要的RDBMS 系统。

I'm not sure what you mean by rejected, but given that, by definition, primary keys are non-NULL and unique, I'm sure the sorting/grouping operation is simply ignored as unnecessary for purposes of generating an execution plan in most RDBMS systems.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文