复杂的SQL查询——查找与多个不同外键匹配的项目

发布于 2024-09-28 20:21:29 字数 1528 浏览 11 评论 0原文

假设您有一个 Products 表 (ID int, Name nvarchar(200)) 和另外两个表：ProductsCategories (ProductID int, CategoryID int) 和 发票产品（InvoiceID int，ProductID int）。

我需要编写一个查询来生成一组与给定的发票 ID 和类别 ID 集相匹配的产品，以便产品列表与所有指定的类别和所有指定的发票相匹配，而无需回退到动态 SQL。想象一下，我需要找到同时属于类别 1 和 2 以及发票 3 和 4 的产品列表。

首先，我编写了一个存储过程，该过程接受类别 id 和发票 id 作为字符串，并解析它们放入表格中：

 CREATE PROCEDURE dbo.SearchProducts (@categories varchar(max), @invoices varchar(max))
 AS BEGIN
      with catids as (select cast([value] as int) from dbo.split(@categories, ' ')),
           invoiceids as (select cast([value] as int) from dbo.split(@invoices, ' '))
           select * from products --- insert awesomeness here
 END

我提出的不同解决方案看起来很糟糕，而且性能更差。我发现的最好的事情是生成一个由所有条件的左连接组成的视图，但这似乎非常昂贵，并且不能解决匹配指定的所有不同键的问题。

更新：这是我编写的示例查询，它产生了预期的结果。我是否错过了任何优化机会？就像忍者的神奇独角兽矩阵运算一样？

with catids as (select distinct cast([value] as int) [value] from dbo.split(@categories, ' ')),
  invoiceids as (select distinct cast([value] as int) [value] from dbo.split(@invoices, ' '))

  select pc.ProductID from ProductsCategories pc (nolock)
    inner join catids c on c.value = pc.CategoryID 
    group by pc.ProductID 
    having COUNT(*) = (select COUNT(*) from catids)  
  intersect
  select ip.ProductID from InvoiceProducts ip (nolock)
    inner join invoiceids i on i.value = ip.InvoiceID 
    group by ip.ProductID 
    having COUNT(*) = (select COUNT(*) from invoiceids)

原文

So imagine that you have a table of Products (ID int, Name nvarchar(200)), and two other tables, ProductsCategories (ProductID int, CategoryID int) and InvoiceProducts (InvoiceID int, ProductID int).

I need to write a query to produce a set of products that match a given set of invoice ids and category ids such that the list of products match all the specified categories and all the specified invoices, without falling back to dynamic SQL. Imagine I need to find a list of products that are in both categories 1 and 2 and in invoices 3 and 4.

As a start, I've written a stored-procedure that accept the category ids and invoice ids as strings, and parse them into tables:

 CREATE PROCEDURE dbo.SearchProducts (@categories varchar(max), @invoices varchar(max))
 AS BEGIN
      with catids as (select cast([value] as int) from dbo.split(@categories, ' ')),
           invoiceids as (select cast([value] as int) from dbo.split(@invoices, ' '))
           select * from products --- insert awesomeness here
 END

The different solutions I've come up with look awful, and perform worse. The best thing I've found is to generate a view comprised of left joins of all the criteria, but that seems very expensive and doesn't solve the issue of matching all of the different keys specified.

Update: This is an example query I wrote that yields the expected results. Am I missing any optimization opportunities? Like magical unicorn matrix operations by ninjas?

with catids as (select distinct cast([value] as int) [value] from dbo.split(@categories, ' ')),
  invoiceids as (select distinct cast([value] as int) [value] from dbo.split(@invoices, ' '))

  select pc.ProductID from ProductsCategories pc (nolock)
    inner join catids c on c.value = pc.CategoryID 
    group by pc.ProductID 
    having COUNT(*) = (select COUNT(*) from catids)  
  intersect
  select ip.ProductID from InvoiceProducts ip (nolock)
    inner join invoiceids i on i.value = ip.InvoiceID 
    group by ip.ProductID 
    having COUNT(*) = (select COUNT(*) from invoiceids)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

愛放△進行李 2024-10-05 20:21:29

前提是您在 (ProductID, CategoryID) 和 (ProductID, InvoiceID) 上都有唯一索引：

SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN (1, 2)
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN (3, 4)
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 4

或者，如果您的值以 CSV 形式传递字符串：

WITH    catids(value) AS
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@categories, ' '))
        ), 
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@invoices, ' '))
        )
SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN
                (
                SELECT  value
                FROM    invoiceids
                )
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN
                (
                SELECT  value
                FROM    catids
                )
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 
        (
        SELECT  COUNT(*)
        FROM    catids
        ) + 
        (
        SELECT  COUNT(*)
        FROM    invoiceids
        )

请注意，在 SQL Server 2008 中，您可以将表值参数传递到存储过程中。

Provided that you have unique indices on both (ProductID, CategoryID) and (ProductID, InvoiceID):

SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN (1, 2)
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN (3, 4)
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 4

or, if your values are passed in CSV strings:

WITH    catids(value) AS
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@categories, ' '))
        ), 
        (
        SELECT  DISTINCT CAST([value] AS INT)
        FROM    dbo.split(@invoices, ' '))
        )
SELECT  ProductID
FROM    (
        SELECT  ProductID
        FROM    ProductInvoice
        WHERE   InvoiceID IN
                (
                SELECT  value
                FROM    invoiceids
                )
        UNION ALL
        SELECT  ProductID
        FROM    ProductCategory pc
        WHERE   CategoryID IN
                (
                SELECT  value
                FROM    catids
                )
        ) q
GROUP BY
        ProductID
HAVING  COUNT(*) = 
        (
        SELECT  COUNT(*)
        FROM    catids
        ) + 
        (
        SELECT  COUNT(*)
        FROM    invoiceids
        )

Note that in SQL Server 2008 you can pass table-valued parameters into the stored procedures.

回复收藏 0 原文

情绪少女 2024-10-05 20:21:29

我会从类似的事情开始，利用参数中的表格 ID 值。临时表可以帮助提高子查询速度。

select p.*
from
(
    select pc.*
    from catids c
    inner join ProductsCategories pc
        on pc.CategoryID = c.value
) catMatch
inner join
(
    select pin.*
    from invoiceids i
    inner join ProductsInvoices pin
        on pin.InvoiceID = i.value
) invMatch
    on invMatch.ProductID = catMatch.ProductID
inner join Products p
    on p.ID = invMatch.ProductID

I'd start with something like this, utilizing your tabled ID values from the parameters. Temp tables can help with subquery speed.

select p.*
from
(
    select pc.*
    from catids c
    inner join ProductsCategories pc
        on pc.CategoryID = c.value
) catMatch
inner join
(
    select pin.*
    from invoiceids i
    inner join ProductsInvoices pin
        on pin.InvoiceID = i.value
) invMatch
    on invMatch.ProductID = catMatch.ProductID
inner join Products p
    on p.ID = invMatch.ProductID

回复收藏 0 原文

以可爱出名 2024-10-05 20:21:29

ProductCategories 应该在 (CategoryId, ProductId) 上有一个聚集索引，InvoiceProducts 应该在 (InvoiceId, ProductId) 上有一个最佳索引。这将允许仅使用聚集索引中的数据来查找给定 CategoryId 和 InvoiceId 的产品 ID。

您可以使用函数返回给定字符串的整数表。谷歌“CsvToInt”并单击来自 SqlTeam 的第一个链接以查看代码。

然后你可以：

SELECT *
FROM Products
WHERE ID IN (SELECT DISTINCT ProductId 
        FROM ProductCategories
        WHERE CategoryId in dbo.CsvToInt(@categories)
    ) AND ID IN (SELECT DISTINCT ProductId 
        FROM InvoiceProducts
        WHERE InvoiceId in dbo.CsvToInt(@invoices)
    )

ProductCategories should have a clustered index on (CategoryId, ProductId) and InvoiceProducts should have one on (InvoiceId, ProductId) optimally. This will allow finding product ids given the CategoryId and InvoiceId by using the data in the clustered indexes only.

You could use a function to return a table of ints given a string. Google "CsvToInt" and click on the first link from SqlTeam to see the code.

Then you could:

SELECT *
FROM Products
WHERE ID IN (SELECT DISTINCT ProductId 
        FROM ProductCategories
        WHERE CategoryId in dbo.CsvToInt(@categories)
    ) AND ID IN (SELECT DISTINCT ProductId 
        FROM InvoiceProducts
        WHERE InvoiceId in dbo.CsvToInt(@invoices)
    )

回复收藏 0 原文

不可一世的女人 2024-10-05 20:21:29

递归 CTE 怎么样？

首先将行号添加到条件表中，然后添加一些伪 SQL（如果愿意）：

;WITH cte AS(
Base case: Select productid, criteria from products left join criteria where row_number = 1 if it matches criteria from both row 1s or one is null.
UNION ALL
Recursive case: Select n+1 criteria row from products left join criteria where row_number = cte.row_number + 1 AND matches criteria from both row_number + 1 or one or the other (but not both) is null
)
SELECT *
WHERE criteria = maximum id from criteria table.

这将为您提供一种对多个条件执行 AND 的方法，并且应该执行良好。

这有任何意义吗？我最近用 CTE 做了一些非常酷的快速工作，如果需要的话可以详细说明。

删除了 cte 代码，因为它是错误的，并且不值得修复以获得更好的解决方案。

How about a recursive CTE?

First add row numbers to the criteria tables, then some pseudo SQL if you will:

;WITH cte AS(
Base case: Select productid, criteria from products left join criteria where row_number = 1 if it matches criteria from both row 1s or one is null.
UNION ALL
Recursive case: Select n+1 criteria row from products left join criteria where row_number = cte.row_number + 1 AND matches criteria from both row_number + 1 or one or the other (but not both) is null
)
SELECT *
WHERE criteria = maximum id from criteria table.

This will give you a way of performing AND on multiple criteria, and should perform well.

Does this make any sense at all? I've done some pretty cool fast stuff with CTEs lately, and can elaborate if necessary.

Removed cte code because it was wrong, and not really worth fixing having a much better solution out there.

回复收藏 0 原文