加入时SQL Server性能问题
我有一个查询或一批查询,但我无法很好地发挥作用:)
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM TCustomers C WITH (NOLOCK)
LEFT JOIN TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND C.InsDate > '2011-04-03 00:00:00'
这个查询本身非常敏捷,可以在几秒钟内执行超过一百万行。 问题是当我想使用订单表中的信息来过滤结果时。
我尝试加入 Orders 表,
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM TCustomers C WITH (NOLOCK)
LEFT JOIN TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
LEFT JOIN TOrders O WITH (NOLOCK) ON C.ID = O.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND
(C.InsDate > '2011-04-03 00:00:00' OR O.CustomerID NOT NULL OR O.[Date] > '2011-04-03 00:00:00' OR O.Exported IS NULL);
这会导致执行时间为几分钟。 如果我只是从 TOrders
运行有趣的部分,则只需几秒钟即可执行。
SELECT CustomerID
FROM TOrders
WHERE O.CustomerID NOT NULL
OR O.[Date] > '2011-04-03 00:00:00'
OR O.Exported IS NULL;
所以问题在于将两者结合起来。我尝试运行 TOrders 查询并将生成的 CustomerID 直接粘贴到主查询中,这很快,大约需要 5-10 秒。我尝试从 TOrders 中预取有趣的数据并将其放入临时表中,但这并没有使其速度更快。
CREATE TABLE #ORDERCUSTOMERS (
CustomerID int
);
CREATE UNIQUE CLUSTERED INDEX IX_1 on #ORDERCUSTOMERS (CustomerID);
INSERT #ORDERCUSTOMERS SELECT DISTINCT O.CustomerID FROM LPD1_8.dbo.TOrders O WHERE O.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND (O.[Date] > '2011-04-03 00:00:00' OR O.Exported IS NULL);
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM LPD1_8.dbo.TCustomers C WITH (NOLOCK)
LEFT JOIN LPD1_8.dbo.TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN LPD1_8.dbo.TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
LEFT JOIN #ORDERCUSTOMERS O ON C.ID = O.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND
(C.InsDate > '2011-04-03 00:00:00' OR O.CustomerID NOT NULL);
DROP TABLE #ORDERCUSTOMERS;
那么你们有什么想法吗?这将驻留在存储过程中,因此我希望尽可能避免动态 SQL。否则,这可能是一种方法,从 TOrders 获取数据并将返回的 CustomerID 注入主查询的 IN 子句中。 但是是否有类似的方法可以将 id 数组“注入”到 IN 子句或其他内容中?
I have a query, or batch of queries, that I just can't get to play nice :)
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM TCustomers C WITH (NOLOCK)
LEFT JOIN TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND C.InsDate > '2011-04-03 00:00:00'
This query in itself is very snappy and executes on a million+ rows in a few seconds.
The problem is when I want to use information from the order table to filter my result.
I have tried to join on the Orders table
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM TCustomers C WITH (NOLOCK)
LEFT JOIN TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
LEFT JOIN TOrders O WITH (NOLOCK) ON C.ID = O.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND
(C.InsDate > '2011-04-03 00:00:00' OR O.CustomerID NOT NULL OR O.[Date] > '2011-04-03 00:00:00' OR O.Exported IS NULL);
This results in an execution time of several minutes.
If I just run the interesting part from TOrders
it just takes a few seconds to execute.
SELECT CustomerID
FROM TOrders
WHERE O.CustomerID NOT NULL
OR O.[Date] > '2011-04-03 00:00:00'
OR O.Exported IS NULL;
So the problem is when combining the two. I tried running the TOrders query and pasting the resulting CustomerID's into the main query directly and that was snappy and took around 5-10 seconds. I tried prefetching the interesting data from TOrders and putting it into a temporary table but that didn't make it any faster.
CREATE TABLE #ORDERCUSTOMERS (
CustomerID int
);
CREATE UNIQUE CLUSTERED INDEX IX_1 on #ORDERCUSTOMERS (CustomerID);
INSERT #ORDERCUSTOMERS SELECT DISTINCT O.CustomerID FROM LPD1_8.dbo.TOrders O WHERE O.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND (O.[Date] > '2011-04-03 00:00:00' OR O.Exported IS NULL);
SELECT
C.ID AS cust_id,
C.State AS cust_state,
C.PrevState AS cust_previous_state,
C.ProjectID AS cust_imported_on_project_id,
C.CampaignID AS cust_imported_on_campaign_id,
C.Priority AS cust_priority,
C.Name AS cust_firstname,
C.Name2 AS cust_lastname,
C.AllocatedUser AS cust_allocated_user,
C.ED1 AS cust_social_security_number,
C.ED2 AS cust_customer_number,
C.ED3 AS cust_type,
C.ED4 AS cust_initial_fact_1,
C.ED5 AS cust_initial_fact_2,
C.ED6 AS cust_initial_fact_3,
C.ED7 AS cust_initial_fact_4,
C.ED8 AS cust_initial_fact_5,
C.ED9 AS cust_extra_1,
C.ED10 AS cust_extra_2,
CED2.ED11 AS cust_extra_3,
CED2.ED12 AS cust_extra_4,
CED2.ED13 AS cust_extra_5,
A.Serial AS address_serial,
A.PostAddress AS address_postal_address,
A.PostCode AS address_postal_code,
A.PostOffice AS address_city,
A.PhoneNr AS address_phonenumber,
A.FaxNr AS address_faxnumber,
A.EMail AS address_email,
A.Notes AS address_notes,
A.ED1 AS address_secondary_phonenumber,
A.ED2 AS address_origin_file,
A.State AS address_state
FROM LPD1_8.dbo.TCustomers C WITH (NOLOCK)
LEFT JOIN LPD1_8.dbo.TCustED2 CED2 WITH (NOLOCK) ON C.ID = CED2.CustomerID
LEFT JOIN LPD1_8.dbo.TAddresses A WITH (NOLOCK) ON C.AddressNr = A.Serial AND C.ID = A.CustomerID
LEFT JOIN #ORDERCUSTOMERS O ON C.ID = O.CustomerID
WHERE C.CampaignID IN(SELECT items FROM dbo.Split('196,195,210,206,205,207,204,200,209,213,197,198,214', ',')) AND
(C.InsDate > '2011-04-03 00:00:00' OR O.CustomerID NOT NULL);
DROP TABLE #ORDERCUSTOMERS;
So do you guys have any idea? This will reside in a stored procedure so I want to avoid dynamic SQL if possible. Otherwise that would probably be one way, to fetch the data from TOrders and inject the returned CustomerID's into the IN clause of the main query.
But prehaps there is a similar way to "inject" an array of id's into an IN clause or something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您在 where 子句中使用订单表。这意味着查询计划程序无法尽早开始从客户表中删除行。您是否尝试过早些时候加入订单表?
you are using the order table in your where clause. this means that the query planner can't start removing lines from the customer table early enough. have you tried joining in the order table earlier?
您是否尝试过使用内置的 SQL Server 查询优化器工具来分析查询?
Have you tried analyzing your query using the built-in SQL Server Query Optimizer tool?
坏消息。您希望动态地进行类似的事情,因为您至少希望在 SP 上“重新编译”,或者您的第一次运行决定您的查询计划。
也就是说,这看起来是一项昂贵的操作。更多详细信息取决于查询计划。也许你有msisan索引或两个?请公布执行计划(假设的和执行的)。
BAD news. You want dynamic on soemthing like that in that you at least want "with recompile" on the SP, or your first run decides your query plan.
That said, it looks like an expensive oepration. More details depend on the query plan. Maybe you msisan index or two? Please publish the execution plans (assumed & from execution).
查询优化器的方式并不总是可预测的。假设您在要连接/过滤的所有列上都有索引,我会尝试以下操作:
看起来多余 - 当您在连接条件中使用该列时,它应该始终评估为 TRUE。这应该意味着查询返回每条记录,无论日期如何。
另外,您可能会发现替换
为
可以加快速度。我发现函数求值可以以不明显的方式改变查询优化器的工作方式。
The ways of the query optimizer are not always predictable. Assuming you have indices on all columns you're joining/filtering on, I'd try the following:
appears redundant - as you're using that column in the join condition, it should always evaluate to TRUE. This should mean that the query returns every record, regardless of the dates.
Also, you may find that replacing
with
could speed things up. I've found that function evaluation can change the way the query optimizer works in ways that aren't obvious.