你什么时候会放弃 SQL 中的集合操作并转向过程化操作?

发布于 2024-07-09 03:46:14 字数 408 浏览 8 评论 0原文

我曾经被要求在 RDBMS 中执行此任务:

给定表客户、订单、订单行和产品。 一切都使用常用字段和关系完成,并在订单行表上添加注释备注字段。

对于一个客户,检索客户曾经订购过的所有产品的列表,其中包含产品名称、首次购买年份、最近 3 次购买的日期、最新订单的评论、该产品-客户组合过去 12 个月的总收入总和。

几天后,我放弃了将其作为查询,而是选择只获取客户的每个订单行和每个产品,并按程序运行数据以构建所需的客户端表。

我认为这是以下一个或多个的症状:

  • 我是一个懒惰的白痴,应该知道如何在 SQL 中执行此
  • 操作 集合操作不如过程操作那样具有表现力
  • SQL 没有应有的表现力

我这样做了吗正确的事情? 我还有其他选择吗?

I was once given this task to do in an RDBMS:

Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.

For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.

After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.

I regard this a symptom of one or more of the following:

  • I'm a lazy idiot and should have seen how to do it in SQL
  • Set operations are not as expressive as procedural operations
  • SQL is not as expressive as it should be

Did I do the right thing? Did I have other options?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

澜川若宁 2024-07-16 03:46:14

您绝对应该能够完成此练习,而无需执行与应用程序代码中的 JOIN 等效的工作,即从订单行和产品中获取所有行并迭代它们。 您不必是一名 SQL 向导也能做到这一点。 JOIN 对于 SQL 来说就像循环对于过程语言 —— 因为两者都是您应该知道如何使用的基本语言功能。

人们容易陷入的一个陷阱是认为整个报告必须在单个 SQL 查询中生成。 不对! 正如托尼·安德鲁斯指出的那样,大多数报告都不适合放在一个矩形中。 有很多汇总、摘要、特殊情况等,因此在单独的查询中获取报表的各个部分既简单又高效。 同样,在过程语言中,您不会尝试在一行代码中完成所有计算,甚至在单个函数中(希望如此)。

一些报告工具坚持认为报告是从单个查询生成的,并且您没有机会合并多个查询。 如果是这样,那么您需要生成多份报告(如果老板希望将其放在一页上,那么您需要手动进行一些粘贴)。

要获取所有订购的产品(带有产品名称)、最近三个购买的日期以及对最新订单的评论的列表,很简单:

SELECT o.*, l.*, p.*
FROM Orders o
 JOIN OrderLines l USING (order_id)
 JOIN Products p USING (product_id)
WHERE o.customer_id = ?
ORDER BY o.order_date;

可以逐行迭代结果以提取最新订单的日期和评论,因为无论如何您都会获取这些行。 但是,通过要求数据库返回按日期排序的结果,可以让自己轻松一些。

首次购买年份可从之前的查询中获取,如果您按 order_date 排序并逐行获取结果,您将可以访问第一个订单。 否则,您可以这样做:

SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;

过去 12 个月的产品购买总和最好通过单独的查询来计算:

SELECT SUM(l.quantity * p.price)
FROM Orders o
 JOIN OrderLines l USING (order_id)
 JOIN Products p USING (product_id)
WHERE o.customer_id = ?
 AND o.order_date > CURDATE() - INTERVAL 1 YEAR;

编辑:您在另一条评论中说您'我想了解如何使用标准 SQL 获取最近 3 次购买的日期:

SELECT o1.order_date
FROM Orders o1
  LEFT OUTER JOIN Orders o2 
  ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date 
      OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
WHERE o1.customer_id = ?
GROUP BY o1.order_id
HAVING COUNT(*) <= 3;

如果您可以使用一点点特定于供应商的 SQL 功能,则可以使用 Microsoft/Sybase TOP n ,或 MySQL/PostgreSQL LIMIT

SELECT TOP 3 order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC;

SELECT order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC
LIMIT 3;

You definitely should be able to do this exercise without doing the work equivalent to a JOIN in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one. JOIN is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.

One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).

Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).

To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:

SELECT o.*, l.*, p.*
FROM Orders o
 JOIN OrderLines l USING (order_id)
 JOIN Products p USING (product_id)
WHERE o.customer_id = ?
ORDER BY o.order_date;

It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.

Year of first purchase is available from the previous query, if you sort by the order_date and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:

SELECT YEAR(MIN(o.order_date)) FROM Orders o WHERE o.customer_id = ?;

Sum of product purchases for the last 12 months is best calculated by a separate query:

SELECT SUM(l.quantity * p.price)
FROM Orders o
 JOIN OrderLines l USING (order_id)
 JOIN Products p USING (product_id)
WHERE o.customer_id = ?
 AND o.order_date > CURDATE() - INTERVAL 1 YEAR;

edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:

SELECT o1.order_date
FROM Orders o1
  LEFT OUTER JOIN Orders o2 
  ON (o1.customer_id = o2.customer_id AND (o1.order_date < o2.order_date 
      OR (o1.order_date = o2.order_date AND o1.order_id < o2.order_id)))
WHERE o1.customer_id = ?
GROUP BY o1.order_id
HAVING COUNT(*) <= 3;

If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase TOP n, or MySQL/PostgreSQL LIMIT:

SELECT TOP 3 order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC;

SELECT order_date
FROM Orders
WHERE customer_id = ?
ORDER BY order_date DESC
LIMIT 3;
做个少女永远怀春 2024-07-16 03:46:14

集合操作不如过程操作那么具有表现力

也许更像是:“对于习惯过程语言的开发人员来说,集合操作不像过程操作那么熟悉”;-)

像您现在所做的那样迭代地执行它对于小数据集来说是很好的,但根本不以同样的方式扩展。 您是否做了正确的事情的答案取决于您是否对当前的性能感到满意和/或预计数据量不会增加太多。

如果您可以提供一些示例代码,我们也许能够帮助您找到基于集合的解决方案,该解决方案的启动速度会更快,并且可扩展性会更好。 正如 GalicateCowboy 提到的,临时表等技术可以帮助提高语句的可读性,同时在很大程度上保留性能优势。

Set operations are not as expressive as procedural operations

Perhaps more like: "Set operations are not as familiar as procedural operations to a developer used to procedural languages" ;-)

Doing it iteratively as you have done now is fine for small sets of data, but simply doesn't scale the same way. The answer to whether you did the right thing depends on whether you are satisfied with the performance right now and/or don't expect the amount of data to increase much.

If you could provide some sample code, we might be able to help you find a set-based solution, which will be faster to begin with and scale far, far better. As GalacticCowboy mentioned, techniques such as temporary tables can help make the statements far more readable while largely retaining the performance benefits.

盛装女皇 2024-07-16 03:46:14

在大多数 RDBMS 中,您可以选择使用临时表或本地表变量,将此类任务分解为可管理的块。

我没有看到任何方法可以轻松地将其作为单个查询(没有一些令人讨厌的子查询)来完成,但它仍然应该是可行的,而无需退出过程代码,如果你使用临时表。

In most RDBMS you have the option of temporary tables or local table variables that you can use to break up a task like this into manageable chunks.

I don't see any way to easily do this as a single query (without some nasty subqueries), but it still should be doable without dropping out to procedural code, if you use temp tables.

零度° 2024-07-16 03:46:14

此问题可能无法通过一个查询来解决。 我看到几个不同的部分...

对于一位客户

  1. 获取所有订购产品的列表(带有产品名称)
  2. 获取首次购买的年份
  3. 获取最近 3 次购买的日期
  4. 获取对最新订单的评论
  5. 获取过去 12 个月的产品购买总和

您的过程是步骤 1 - 5,SQL 会获取数据。

This problem may not have been solvable by one query. I see several distinct parts...

For one customer

  1. Get a list of all products ordered (with product name)
  2. Get year of first purchase
  3. Get dates of last three purchases
  4. Get comment on latest order
  5. Get sum of product purchases for the last 12 months

Your procedure is steps 1 - 5 and SQL gets you the data.

盗心人 2024-07-16 03:46:14

对我来说听起来像是一个数据仓库项目。 如果您需要诸如“最近的三件事”和“过去 12 个月的某件事的总和”之类的内容,请存储它们,即非规范化。

Sounds like a data warehouse project to me. If you need things like "three most recent things" and "sum of something over the last 12 months" then store them i.e. denormalize.

爱你不解释 2024-07-16 03:46:14

编辑:这是一个全新的解决方案,不使用临时表或奇怪的子子子查询。 但是,它仅适用于 SQL 2005 或更高版本,因为它使用该版本中新增的“pivot”命令。

根本问题是所需的从一组行(数据中)到输出中的列的数据透视。 在思考这个问题时,我想起 SQL Server 现在有一个“pivot”运算符来处理这个问题。

这仅适用于 SQL 2005,使用 Northwind 示例数据。

-- This could be a parameter to a stored procedure
-- I picked this one because he has products that he ordered 4 or more times
declare @customerId nchar(5)
set @customerId = 'ERNSH'

select c.CustomerID, p.ProductName, products_ordered_by_cust.FirstOrderYear,
    latest_order_dates_pivot.LatestOrder1 as LatestOrderDate,
    latest_order_dates_pivot.LatestOrder2 as SecondLatestOrderDate,
    latest_order_dates_pivot.LatestOrder3 as ThirdLatestOrderDate,
    'If I had a comment field it would go here' as LatestOrderComment,
    isnull(last_year_revenue_sum.ItemGrandTotal, 0) as LastYearIncome
from
    -- Find all products ordered by customer, along with first year product was ordered
    (
        select c.CustomerID, od.ProductID,
            datepart(year, min(o.OrderDate)) as FirstOrderYear
        from Customers c
            join Orders o on o.CustomerID = c.CustomerID
            join [Order Details] od on od.OrderID = o.OrderID
        group by c.CustomerID, od.ProductID
    ) products_ordered_by_cust
    -- Find the grand total for product purchased within last year - note fudged date below (Northwind)
    join (
        select o.CustomerID, od.ProductID, 
            sum(cast(round((od.UnitPrice * od.Quantity) - ((od.UnitPrice * od.Quantity) * od.Discount), 2) as money)) as ItemGrandTotal
        from
            Orders o
            join [Order Details] od on od.OrderID = o.OrderID
        -- The Northwind database only contains orders from 1998 and earlier, otherwise I would just use getdate()
        where datediff(yy, o.OrderDate, dateadd(year, -10, getdate())) = 0
        group by o.CustomerID, od.ProductID
    ) last_year_revenue_sum on last_year_revenue_sum.CustomerID = products_ordered_by_cust.CustomerID
        and last_year_revenue_sum.ProductID = products_ordered_by_cust.ProductID
    -- THIS is where the magic happens.  I will walk through the individual pieces for you
    join (
        select CustomerID, ProductID,
            max([1]) as LatestOrder1,
            max([2]) as LatestOrder2,
            max([3]) as LatestOrder3
        from
        (
            -- For all orders matching the customer and product, assign them a row number based on the order date, descending
            -- So, the most recent is row # 1, next is row # 2, etc.
            select o.CustomerID, od.ProductID, o.OrderID, o.OrderDate,
                row_number() over (partition by o.CustomerID, od.ProductID order by o.OrderDate desc) as RowNumber
            from Orders o join [Order Details] od on o.OrderID = od.OrderID
        ) src
        -- Now, produce a pivot table that contains the first three row #s from our result table,
        -- pivoted into columns by customer and product
        pivot
        (
            max(OrderDate)
            for RowNumber in ([1], [2], [3])
        ) as pvt
        group by CustomerID, ProductID
    ) latest_order_dates_pivot on products_ordered_by_cust.CustomerID = latest_order_dates_pivot.CustomerID
        and products_ordered_by_cust.ProductID = latest_order_dates_pivot.ProductID
    -- Finally, join back to our other tables to get more details
    join Customers c on c.CustomerID = products_ordered_by_cust.CustomerID
    join Orders o on o.CustomerID = products_ordered_by_cust.CustomerID and o.OrderDate = latest_order_dates_pivot.LatestOrder1
    join [Order Details] od on od.OrderID = o.OrderID and od.ProductID = products_ordered_by_cust.ProductID
    join Products p on p.ProductID = products_ordered_by_cust.ProductID
where c.CustomerID = @customerId
order by CustomerID, p.ProductID

EDIT: This is a completely new take on the solution, using no temp tables or strange sub-sub-sub queries. However, it will ONLY work on SQL 2005 or newer, as it uses the "pivot" command that is new in that version.

The fundamental problem is the desired pivot from a set of rows (in the data) into columns in the output. While noodling on the issue, I recalled that SQL Server now has a "pivot" operator to deal with this.

This works on SQL 2005 only, using the Northwind sample data.

-- This could be a parameter to a stored procedure
-- I picked this one because he has products that he ordered 4 or more times
declare @customerId nchar(5)
set @customerId = 'ERNSH'

select c.CustomerID, p.ProductName, products_ordered_by_cust.FirstOrderYear,
    latest_order_dates_pivot.LatestOrder1 as LatestOrderDate,
    latest_order_dates_pivot.LatestOrder2 as SecondLatestOrderDate,
    latest_order_dates_pivot.LatestOrder3 as ThirdLatestOrderDate,
    'If I had a comment field it would go here' as LatestOrderComment,
    isnull(last_year_revenue_sum.ItemGrandTotal, 0) as LastYearIncome
from
    -- Find all products ordered by customer, along with first year product was ordered
    (
        select c.CustomerID, od.ProductID,
            datepart(year, min(o.OrderDate)) as FirstOrderYear
        from Customers c
            join Orders o on o.CustomerID = c.CustomerID
            join [Order Details] od on od.OrderID = o.OrderID
        group by c.CustomerID, od.ProductID
    ) products_ordered_by_cust
    -- Find the grand total for product purchased within last year - note fudged date below (Northwind)
    join (
        select o.CustomerID, od.ProductID, 
            sum(cast(round((od.UnitPrice * od.Quantity) - ((od.UnitPrice * od.Quantity) * od.Discount), 2) as money)) as ItemGrandTotal
        from
            Orders o
            join [Order Details] od on od.OrderID = o.OrderID
        -- The Northwind database only contains orders from 1998 and earlier, otherwise I would just use getdate()
        where datediff(yy, o.OrderDate, dateadd(year, -10, getdate())) = 0
        group by o.CustomerID, od.ProductID
    ) last_year_revenue_sum on last_year_revenue_sum.CustomerID = products_ordered_by_cust.CustomerID
        and last_year_revenue_sum.ProductID = products_ordered_by_cust.ProductID
    -- THIS is where the magic happens.  I will walk through the individual pieces for you
    join (
        select CustomerID, ProductID,
            max([1]) as LatestOrder1,
            max([2]) as LatestOrder2,
            max([3]) as LatestOrder3
        from
        (
            -- For all orders matching the customer and product, assign them a row number based on the order date, descending
            -- So, the most recent is row # 1, next is row # 2, etc.
            select o.CustomerID, od.ProductID, o.OrderID, o.OrderDate,
                row_number() over (partition by o.CustomerID, od.ProductID order by o.OrderDate desc) as RowNumber
            from Orders o join [Order Details] od on o.OrderID = od.OrderID
        ) src
        -- Now, produce a pivot table that contains the first three row #s from our result table,
        -- pivoted into columns by customer and product
        pivot
        (
            max(OrderDate)
            for RowNumber in ([1], [2], [3])
        ) as pvt
        group by CustomerID, ProductID
    ) latest_order_dates_pivot on products_ordered_by_cust.CustomerID = latest_order_dates_pivot.CustomerID
        and products_ordered_by_cust.ProductID = latest_order_dates_pivot.ProductID
    -- Finally, join back to our other tables to get more details
    join Customers c on c.CustomerID = products_ordered_by_cust.CustomerID
    join Orders o on o.CustomerID = products_ordered_by_cust.CustomerID and o.OrderDate = latest_order_dates_pivot.LatestOrder1
    join [Order Details] od on od.OrderID = o.OrderID and od.ProductID = products_ordered_by_cust.ProductID
    join Products p on p.ProductID = products_ordered_by_cust.ProductID
where c.CustomerID = @customerId
order by CustomerID, p.ProductID
杀お生予夺 2024-07-16 03:46:14

SQL 查询以包含行和列的单个“平面”表的形式返回结果。 报告要求通常比这更复杂,需要像您的示例一样的“锯齿状”结果集。 通过“程序化”来解决此类需求,或者使用位于数据库之上的报告工具,并没有什么问题。 但是,您应该尽可能使用 SQL 以获得数据库的最佳性能。

SQL queries return results in the form of a single "flat" table of rows and columns. Reporting requirements are often more complex than this, demanding a "jagged" set of results like your example. There is nothing wrong with "going procedural" to solve such requirements, or using a reporting tool that sits on top of the database. However, you should use SQL as far as possible to get the best performance from the database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文