查询超慢...我做错了什么？

发布于 2024-10-08 04:39:31 字数 6044 浏览 8 评论 0原文

你们太棒了。在过去的几天里，我已经在这里发帖两次（作为一个新用户），我对这些帮助感到震惊。因此，我想我应该采用软件中最慢的查询，看看是否有人可以帮助我加快速度。我使用此查询作为视图，因此速度快很重要（但事实并非如此！）。

首先，我有一个联系人表，用于存储我公司的客户。该表中有一个 JobTitle 列，其中包含在 Contacts_Def_JobFunctions 表中定义的 ID。还有一个名为 contacts_link_job_functions 的表，其中包含客户的 contactID 号和其他工作职能 - 也在 Contacts_Def_JobFunctions 表中定义。

其次，Contacts_Def_JobFunctions 表记录与其自身具有父/子关系。通过这种方式，我们将相似的工作职能聚集在一起（例如：女佣、洗衣服务、家政、清洁等都是相同的基本工作 - 尽管职位名称可能有所不同）。我们当前不使用的工作职能将作为 ParentJobID 1841 的子项进行维护。

第三，具有额外邮政编码的机构只是为最终结果提供地理数据。

最后，像所有负责任的公司一样，我们为任何希望选择退出我们的新闻通讯（选择加入后）的客户保留了一份删除列表。

我使用以下查询来构建一个表，其中包含已选择接收我们的新闻通讯以及具有与我们提供的服务/产品相关的工作职能或职位的人员。

这是我的丑陋的查询：

SELECT DISTINCT 
    dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID, 
    dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM         
    dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3 
INNER JOIN
    dbo.contacts 
INNER JOIN
    dbo.contacts_link_emails 
        ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID 
        ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle 
INNER JOIN
    dbo.institutionswithzipcodesadditional 
        ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID 
LEFT OUTER JOIN
    dbo.contacts_def_jobfunctions 
INNER JOIN
    dbo.contacts_link_jobfunctions 
        ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID 
        ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE     
        (dbo.contacts.JobTitle IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
        WHERE      (ParentJobID <> '1841'))) 
    AND
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist)) 
OR
        (dbo.contacts_link_jobfunctions.JobID IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
        WHERE      (ParentJobID <> '1841')))
    AND 
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist AS newsletterremovelist))

我希望你们中的一些超级明星可以帮助我调整这个。

非常感谢，

Russell Schutte

更新 - 更新 - 更新 - 更新 - 更新

在收到几条反馈消息（尤其是来自 Khanzor 的反馈消息）后，我努力调整此查询，并提出了以下建议：

SELECT  DISTINCT
                  contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                  institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts 
INNER JOIN
    contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID 
INNER JOIN
    institutionswithzipcodesadditional ON contacts.InstitutionID =  institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
    newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE    
    newsletterremovelist.emailaddress IS NULL

这并不十分完美（我怀疑我应该做一些外连接或右连接或其他东西，我不太确定）。我的结果集大约是我原始查询提供的记录的 40%（我不再 100% 肯定这是一个完美的查询）。

为了清理一切，我取出了所有“dbo”。 SQL Studio 添加的前缀。他们有做任何事情吗？

我现在做错了什么？

谢谢，

拉塞尔·舒特

== == == == == == 另一个更新 == 另一个更新 == 另一个更新 == 另一个更新 == 另一个更新 == == == == ==

我已经在这个查询上工作了几个小时了。我的结论是：

SELECT DISTINCT 
                      contacts_link_emails.Email, contacts.contactID,  contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                      institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM         
    contacts INNER JOIN institutionswithzipcodesadditional
        ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
    INNER JOIN contacts_link_emails 
        ON contacts.ContactID = contacts_link_emails.ContactID
    LEFT OUTER JOIN contacts_def_jobfunctions 
        ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
    LEFT OUTER JOIN contacts_link_jobfunctions
        ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841' 
    LEFT OUTER JOIN
        newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE     (newsletterremovelist.EmailAddress IS NULL)

令人失望的是，我无法填补我知识上的空白。我是加入新手，除非我有可视化工具为我构建它们，所以我想我想要联系人、带有邮政编码的机构和附加的联系人链接电子邮件中的所有内容，所以我已经内部加入了它们（上图）。

我对下一点感到困惑。如果我内部加入他们，那么我会找到拥有适当工作的人（<> 1841） - 但我想我会失去那些没有 JobTitle 和 JobFunctions 条目的人。在很多情况下，这是不对的。我可以有一个 JobTitle“Custodian”，我想将其保留在我们的时事通讯列表中，但如果他没有 JobFunction 条目，我想如果我使用 INNER JOIN，他就会从列表中消失。

但是，如果我使用 LEFT OUTER JOIN 进行查询，如上所述，我想我会得到很多人的 JobTitles 错误，仅仅是因为任何缺少 JobTitle 或 JobFunction 的人都会在我的列表中 - 他们可能是“高级管理人员”，没有工作职能，他们就会出现在名单上 - 这是不对的。我们不再提供适合“高级管理人员”的服务。

然后我看到 LEFT OUTER JOIN 如何用于 newsletterremovelist。它非常光滑，我认为我做得对......

但我仍然卡住了。希望有人能看到我在这里想做的事情并引导我走向正确的方向。

谢谢，

Russell Schutte

再次更新

可悲的是，这个线程似乎已经死了，没有一个完美的解决方案 - 但我已经接近了。请查看启动的新线程，该线程重新启动讨论：单击这里

（为所提供的大量工作提供了正确答案 - 即使尚未完全达到正确答案）。

谢谢！

拉塞尔·舒特

原文

You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).

First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.

Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.

Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.

Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).

I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.

Here's my UGLY query:

SELECT DISTINCT 
    dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID, 
    dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM         
    dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3 
INNER JOIN
    dbo.contacts 
INNER JOIN
    dbo.contacts_link_emails 
        ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID 
        ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle 
INNER JOIN
    dbo.institutionswithzipcodesadditional 
        ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID 
LEFT OUTER JOIN
    dbo.contacts_def_jobfunctions 
INNER JOIN
    dbo.contacts_link_jobfunctions 
        ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID 
        ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE     
        (dbo.contacts.JobTitle IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
        WHERE      (ParentJobID <> '1841'))) 
    AND
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist)) 
OR
        (dbo.contacts_link_jobfunctions.JobID IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
        WHERE      (ParentJobID <> '1841')))
    AND 
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist AS newsletterremovelist))

I'm hoping some of you superstars can help me tune this up.

Thanks so much,

Russell Schutte

UPDATE - UPDATE - UPDATE - UPDATE - UPDATE

After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:

SELECT  DISTINCT
                  contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                  institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts 
INNER JOIN
    contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID 
INNER JOIN
    institutionswithzipcodesadditional ON contacts.InstitutionID =  institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
    newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE    
    newsletterremovelist.emailaddress IS NULL

This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).

To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?

What am I doing wrong now?

Thanks,

Russell Schutte

== == == == ==
== ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE
== == == == ==

I've been working on this one query for several hours now. I've got it down to this:

SELECT DISTINCT 
                      contacts_link_emails.Email, contacts.contactID,  contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                      institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM         
    contacts INNER JOIN institutionswithzipcodesadditional
        ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
    INNER JOIN contacts_link_emails 
        ON contacts.ContactID = contacts_link_emails.ContactID
    LEFT OUTER JOIN contacts_def_jobfunctions 
        ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
    LEFT OUTER JOIN contacts_link_jobfunctions
        ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841' 
    LEFT OUTER JOIN
        newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE     (newsletterremovelist.EmailAddress IS NULL)

Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).

I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.

BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".

Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...

But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.

Thanks,

Russell Schutte

UPDATE AGAIN

Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here

(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).

Thanks!

Russell Schutte

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

长不大的小祸害 2024-10-15 04:39:32

将 WHERE 中的查询移出到实际的联接中。这些被称为相关子查询，是 Voldemort 的作品。如果它们是联接，则它们仅执行一次，并且会加快查询速度。

对于 NOT IN 部分，使用左外连接，并检查您连接的列是否为 NULL。

另外，尽可能避免在 WHERE 查询中使用 OR - 请记住 OR 不一定是短路操作。

一个例子如下：

SELECT 
    *
FROM
    dbo.contacts AS c
INNER JOIN
    dbo.contacts_def_jobfunctions AS jf
    ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
    dbo.contacts_link_emails AS e
    ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle 
LEFT JOIN
    dbo.newsletterremovelist AS rl
    ON e.Email = rl.EmailAddress
WHERE    
    rl.EmailAddress IS NULL

请不要使用这个，因为它几乎肯定是不正确的（更不用说SELECT *），我忽略了contacts_ref_jobfunctions_3的逻辑来提供一个简单的例子。

有关连接的（非常）好的解释，请尝试这个连接的视觉解释

Move the queries in your WHERE out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.

For the NOT IN sections, use a left outer join, and check that the column you joined on is NULL.

Also, avoid using OR in WHERE queries where possible - remember that OR is not neccesarily a short circuit operation.

An example is as follows:

SELECT 
    *
FROM
    dbo.contacts AS c
INNER JOIN
    dbo.contacts_def_jobfunctions AS jf
    ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
    dbo.contacts_link_emails AS e
    ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle 
LEFT JOIN
    dbo.newsletterremovelist AS rl
    ON e.Email = rl.EmailAddress
WHERE    
    rl.EmailAddress IS NULL

Please don't use this, as it's almost certainly incorrect (not to mention SELECT *), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.

For a (really) nice explanation of joins, try this visual explanation of joins

回复收藏 0 原文