Tsql - 在分隔列上执行联接 - 性能和优化问题
我有以下查询(在返回的列中略有简化)。
select Products.Product, Products.ID, Products.Customers
from Products
where Products.orderCompleteDate is null
作为示例,这将返回
productA 1 Bob
productA 1 Jane
productB 2 John,Dave
请注意,客户可以是逗号分隔的列表。我想添加的是“客户位置”列,因此上面变成了
productA 1 Bob Ireland
productA 1 Jane Wales
productB 2 John,Dave Scotland,England
我在下面创建的函数,其中 fn_split 返回每个分隔项的单行。
create FUNCTION [dbo].[GetLocations] (@CustomerNames Varchar(256) )
RETURNS @TempLocations table (CustomerLocations varchar(256)) AS begin
declare @NameStr varchar(256)
declare @temp table(singleLoc varchar(256))
insert into @temp
select CustomerLocation.Location from CustomerLocation
INNER JOIN Customers ON Customers.ID = CustomerLocation.ID
INNER JOIN dbo.fn_Split(@CustomerNames,',') split ON split.Item = Customers.Name
SELECT @NameStr = COALESCE(@NameStr + ',', '') + singleLoc
FROM @temp
insert into @TempLocations values (@NameStr)
return
end
并将其应用到原始查询中,如下所示。
select Products.product, Products.ID, Products.Customers, Locations.CustomerLocations
from Products
OUTER APPLY dbo.GetLocations(Products.Customers,',') AS Locations
where Products.orderCompleteDate is null
但是,这非常慢,在只有 2000 行的表上查询大约需要 10 秒(初始查询几乎立即运行)。这表明查询无法优化,并且是逐行生成的。由于这个原因,我远离了标量值函数,并尝试坚持使用表值函数。我的逻辑/代码有什么明显的错误吗?
I have the following (slightly simplified in the columns returned) query.
select Products.Product, Products.ID, Products.Customers
from Products
where Products.orderCompleteDate is null
This would return, as an example
productA 1 Bob
productA 1 Jane
productB 2 John,Dave
Note that Customers can be a comma delimited list. What I want to add, is a column 'Customer Locations', so the above becomes
productA 1 Bob Ireland
productA 1 Jane Wales
productB 2 John,Dave Scotland,England
I created a function below, where fn_split returns a single row per delimited item.
create FUNCTION [dbo].[GetLocations] (@CustomerNames Varchar(256) )
RETURNS @TempLocations table (CustomerLocations varchar(256)) AS begin
declare @NameStr varchar(256)
declare @temp table(singleLoc varchar(256))
insert into @temp
select CustomerLocation.Location from CustomerLocation
INNER JOIN Customers ON Customers.ID = CustomerLocation.ID
INNER JOIN dbo.fn_Split(@CustomerNames,',') split ON split.Item = Customers.Name
SELECT @NameStr = COALESCE(@NameStr + ',', '') + singleLoc
FROM @temp
insert into @TempLocations values (@NameStr)
return
end
And applied it to the original query as follows
select Products.product, Products.ID, Products.Customers, Locations.CustomerLocations
from Products
OUTER APPLY dbo.GetLocations(Products.Customers,',') AS Locations
where Products.orderCompleteDate is null
However, this is extremely slow, with the query taking ~10seconds on a table with a mere 2000 rows (initial query runs almost instantly). This suggests that the query was unable to be optimised, and is being generated row by row. I stayed away from scalar value functions for this reason, and tried to stick to table value functions. Is there any glaring fault in my logic/code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我通常建议基于非标准化表创建一个视图来执行标准化,然后将其用作将来任何查询的基础。不幸的是,我无法识别您当前的 Products 表的 PK,但您希望使用
schemabinding
创建此视图,并希望能够将其转换为 索引视图(根据 PK + 客户名称建立索引)。查询此视图(使用企业版或 NOEXPAND 选项)应该会提供与规范化表存在相当的性能。
I'd normally suggest creating a view, based on the unnormalized table, that does the normalization, and then use that as the basis for any future queries. Unfortunately, I can't identify a PK for your current Products table, but you'd hopefully create this view using
schemabinding
, and hopefully be able to turn it into an indexed view (indexing on PK + customer name).Querying this view (using Enterprise Edition, or the NOEXPAND option) should then give you comparable performance as if the normalized table existed.
一种选择是创建第二个表,用于标准化产品表并使其与插入行时调用 split 函数的触发器保持同步。
优点是你可以获得标准的性能和简单的 SQL 查询
缺点是如果出现任何问题,表可能会不同步(总是可以安排一个作业定期从头开始重建新表)
显然最好的答案是重新设计产品表,但假设这不是您可能会弄乱拆分功能等。
One option would be to create a second table that normalises the product table and keeps it in sync with triggers that call the split function when inserting rows.
Pros are you get standard performance and easy SQL queries
Cons are potential for tables going out of sync should anything go wrong (can always schedule a job to rebuild new table from scratch periodically)
Obviously best answer would be to redesign product table but assume that's not possible for you to be messing with split functions etc.