查询超慢...我做错了什么?
你们太棒了。在过去的几天里,我已经在这里发帖两次(作为一个新用户),我对这些帮助感到震惊。因此,我想我应该采用软件中最慢的查询,看看是否有人可以帮助我加快速度。我使用此查询作为视图,因此速度快很重要(但事实并非如此!)。
首先,我有一个联系人表,用于存储我公司的客户。该表中有一个 JobTitle 列,其中包含在 Contacts_Def_JobFunctions 表中定义的 ID。还有一个名为 contacts_link_job_functions 的表,其中包含客户的 contactID 号和其他工作职能 - 也在 Contacts_Def_JobFunctions 表中定义。
其次,Contacts_Def_JobFunctions 表记录与其自身具有父/子关系。通过这种方式,我们将相似的工作职能聚集在一起(例如:女佣、洗衣服务、家政、清洁等都是相同的基本工作 - 尽管职位名称可能有所不同)。我们当前不使用的工作职能将作为 ParentJobID 1841 的子项进行维护。
第三,具有额外邮政编码的机构只是为最终结果提供地理数据。
最后,像所有负责任的公司一样,我们为任何希望选择退出我们的新闻通讯(选择加入后)的客户保留了一份删除列表。
我使用以下查询来构建一个表,其中包含已选择接收我们的新闻通讯以及具有与我们提供的服务/产品相关的工作职能或职位的人员。
这是我的丑陋的查询:
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
我希望你们中的一些超级明星可以帮助我调整这个。
非常感谢,
Russell Schutte
更新 - 更新 - 更新 - 更新 - 更新
在收到几条反馈消息(尤其是来自 Khanzor 的反馈消息)后,我努力调整此查询,并提出了以下建议:
SELECT DISTINCT
contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN
contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN
institutionswithzipcodesadditional ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE
newsletterremovelist.emailaddress IS NULL
这并不十分完美(我怀疑我应该做一些外连接或右连接或其他东西,我不太确定)。我的结果集大约是我原始查询提供的记录的 40%(我不再 100% 肯定这是一个完美的查询)。
为了清理一切,我取出了所有“dbo”。 SQL Studio 添加的前缀。他们有做任何事情吗?
我现在做错了什么?
谢谢,
拉塞尔·舒特
== == == == == == 另一个更新 == 另一个更新 == 另一个更新 == 另一个更新 == 另一个更新 == == == == ==
我已经在这个查询上工作了几个小时了。我的结论是:
SELECT DISTINCT
contacts_link_emails.Email, contacts.contactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM
contacts INNER JOIN institutionswithzipcodesadditional
ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions
ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN contacts_link_jobfunctions
ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN
newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE (newsletterremovelist.EmailAddress IS NULL)
令人失望的是,我无法填补我知识上的空白。我是加入新手,除非我有可视化工具为我构建它们,所以我想我想要联系人、带有邮政编码的机构和附加的联系人链接电子邮件中的所有内容,所以我已经内部加入了它们(上图)。
我对下一点感到困惑。如果我内部加入他们,那么我会找到拥有适当工作的人(<> 1841) - 但我想我会失去那些没有 JobTitle 和 JobFunctions 条目的人。在很多情况下,这是不对的。我可以有一个 JobTitle“Custodian”,我想将其保留在我们的时事通讯列表中,但如果他没有 JobFunction 条目,我想如果我使用 INNER JOIN,他就会从列表中消失。
但是,如果我使用 LEFT OUTER JOIN 进行查询,如上所述,我想我会得到很多人的 JobTitles 错误,仅仅是因为任何缺少 JobTitle 或 JobFunction 的人都会在我的列表中 - 他们可能是“高级管理人员”,没有工作职能,他们就会出现在名单上 - 这是不对的。我们不再提供适合“高级管理人员”的服务。
然后我看到 LEFT OUTER JOIN 如何用于 newsletterremovelist。它非常光滑,我认为我做得对......
但我仍然卡住了。希望有人能看到我在这里想做的事情并引导我走向正确的方向。
谢谢,
Russell Schutte
再次更新
可悲的是,这个线程似乎已经死了,没有一个完美的解决方案 - 但我已经接近了。请查看启动的新线程,该线程重新启动讨论:单击这里
(为所提供的大量工作提供了正确答案 - 即使尚未完全达到正确答案)。
谢谢!
拉塞尔·舒特
You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).
First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.
Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.
Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.
Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).
I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.
Here's my UGLY query:
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
I'm hoping some of you superstars can help me tune this up.
Thanks so much,
Russell Schutte
UPDATE - UPDATE - UPDATE - UPDATE - UPDATE
After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:
SELECT DISTINCT
contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN
contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN
institutionswithzipcodesadditional ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE
newsletterremovelist.emailaddress IS NULL
This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).
To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?
What am I doing wrong now?
Thanks,
Russell Schutte
== == == == ==
== ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE
== == == == ==
I've been working on this one query for several hours now. I've got it down to this:
SELECT DISTINCT
contacts_link_emails.Email, contacts.contactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM
contacts INNER JOIN institutionswithzipcodesadditional
ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions
ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN contacts_link_jobfunctions
ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN
newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE (newsletterremovelist.EmailAddress IS NULL)
Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).
I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.
BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".
Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...
But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.
Thanks,
Russell Schutte
UPDATE AGAIN
Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here
(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).
Thanks!
Russell Schutte
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将
WHERE
中的查询移出到实际的联接中。这些被称为相关子查询,是 Voldemort 的作品。如果它们是联接,则它们仅执行一次,并且会加快查询速度。对于
NOT IN
部分,使用左外连接,并检查您连接的列是否为NULL
。另外,尽可能避免在
WHERE
查询中使用OR
- 请记住OR
不一定是短路操作。一个例子如下:
请不要使用这个,因为它几乎肯定是不正确的(更不用说
SELECT *
),我忽略了contacts_ref_jobfunctions_3的逻辑来提供一个简单的例子。有关连接的(非常)好的解释,请尝试 这个连接的视觉解释
Move the queries in your
WHERE
out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.For the
NOT IN
sections, use a left outer join, and check that the column you joined on isNULL
.Also, avoid using
OR
inWHERE
queries where possible - remember thatOR
is not neccesarily a short circuit operation.An example is as follows:
Please don't use this, as it's almost certainly incorrect (not to mention
SELECT *
), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.For a (really) nice explanation of joins, try this visual explanation of joins
创建一些表示您所做的一些常见关联的视图,以便您的子查询更简单。此外,视图执行得更快一些,因为它们不需要每次运行时都进行解释。
Create some views representing some common associations that you make so that your sub-query is simpler. Also views execute a bit quicker as they do not need to be interpreted each time they are run.
它可以是任意数量的事情。我的第一个问题是您要加入的列是否已编入索引?
更好的是,做一个
SHOWPLAN
并将其粘贴到您的问题中。It could be any number of things. My first question is are the columns you're joining on indexed?
Better yet, do a
SHOWPLAN
and paste it into your question.