这个连接提示危险吗?

发布于 2024-07-10 15:17:03 字数 799 浏览 14 评论 0原文

一位同事要求我查看一些表的索引,因为他的查询运行时间很长。 一个多小时。

select count(1)
from databaseA.dbo.table1
inner join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

注意不同的数据库。 这是从 DatabaseB 运行的,

表 1 和表 2 的记录长度超过 200 万条。 表3有十几条记录。

我查看了查询计划,优化器决定对表 1 和表 2 进行嵌套循环索引查找,并以表 3 作为驱动表!

我的第一个假设是表 1 和表 2 上的统计数据严重混乱。 2 但在更新统计数据之前,我尝试添加连接提示:

select count(1)
from databaseA.dbo.table1
inner HASH join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

结果在 15 秒内返回。

由于时间有限,我将结果传回给他,但我担心这可能会导致以后出现问题。

我应该重新审视统计问题并以这种方式解决问题吗? 错误的查询计划是否是由于连接来自单独的数据库而导致的?

有人可以根据您的经验向我提供一些想法吗?

A coworker asked me to look at indexing on some tables because his query was running very long. Over an hour.

select count(1)
from databaseA.dbo.table1
inner join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Note the different databases. This was being run from DatabaseB

Tables 1 and 2 were over 2 million records long. Table3 had a dozen records or so.

I looked at the query plan and the optimizer decided to do nested-loop index seeks into tables 1 and 2 with Table3 as the driving table!

My first assumption was that statistics were seriously messed up on Tables1 & 2 but before updating statistics I tried adding a join hint thusly:

select count(1)
from databaseA.dbo.table1
inner HASH join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Results returned in 15 seconds.

Since I was short on time, I passed the results back to him but I'm worried that this might result in problems down the road.

Should I revisit the statistics issue and resolve the problem that way? Could the bad query plan have resulted from the join being from a separate databases?

Can anyone offer me some ideas based on your experience?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

时光礼记 2024-07-17 15:17:03

我首先怀疑统计数据。

您无疑知道,在 99% 的情况下应避免使用连接提示,并且仅在您有证据证明绝对需要时才使用连接提示。

I would suspect the statistics first.

As you are no doubt aware, Join hints should be avoided in 99% of cases and used only when you have proof that they are absolutely required.

和我恋爱吧 2024-07-17 15:17:03

首先检查表上的统计信息和索引。 索引提示可能会导致问题。 如果表中的数据发生更改,优化器将无法选择更有效的计划,因为您强制它始终使用哈希。

Check statistics, and indexing on the table first. Index hints can cause problems. If the data in the tables changes the optimizer will be unable to choose a more efficent plan since you have forced it to always use a hash.

小清晰的声音 2024-07-17 15:17:03

嵌套循环不是最合适的吗? 从表 3 中获取 12 条记录,与表 1 中的 12 条记录匹配,与表 2 中的 12 条记录匹配。

否则,您的散列连接也会强制排序 - 这意味着您将从表 1 和表中散列 100 万条记录2,然后连接到表 3 中的 12 条记录。

我会查看这两个计划的统计信息 - 我怀疑循环连接实际上更有效,但被阻止或者您的散列连接正在利用缓存的数据。

但是 - 是的 - 一般来说,连接提示是最后的手段。

Wouldn't a nested loop be the most appropiate? Take the 12 records from Table 3, ,match to the 12 records in Table 1, match to 12 records in Table 2.

Otherwise, your hash join would enforce ordering as well - meaning you'd hash 1 million records from Table 1 and Table 2, then join to the 12 records in Table 3.

I'd look at statistics for both the plans - and I'd suspect the loop join is actually more efficient, but was blocked or your hash join was taking advantage of cached data.

But - yeah - in general, join hints are a last resort.

不忘初心 2024-07-17 15:17:03

涉及链接服务器的运行缓慢的查询可能与排序规则有关。
请参阅此处了解一些背景信息: http://blogs.msdn.com/psssql/archive/2008/02/14/how-it-works-linked-servers-and-collat​​ion-compatibility.aspx
哈希连接提示强制排序顺序,因此可以解释性能的提升。

设置选项的方法如下:

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'collation compatible', 
    @optvalue=N'true'

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'use remote collation', 
    @optvalue=N'false'

-Edoode

A slow-running query involving linked servers might have to do with collation.
See here for some background: http://blogs.msdn.com/psssql/archive/2008/02/14/how-it-works-linked-servers-and-collation-compatibility.aspx
The hash join hint forces the sortorder, so that explains the performance gain.

Here's how to set the options:

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'collation compatible', 
    @optvalue=N'true'

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'use remote collation', 
    @optvalue=N'false'

-Edoode

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文