你什么时候会放弃 SQL 中的集合操作并转向过程化操作?
我曾经被要求在 RDBMS 中执行此任务:
给定表客户、订单、订单行和产品。 一切都使用常用字段和关系完成,并在订单行表上添加注释备注字段。
对于一个客户,检索客户曾经订购过的所有产品的列表,其中包含产品名称、首次购买年份、最近 3 次购买的日期、最新订单的评论、该产品-客户组合过去 12 个月的总收入总和。
几天后,我放弃了将其作为查询,而是选择只获取客户的每个订单行和每个产品,并按程序运行数据以构建所需的客户端表。
我认为这是以下一个或多个的症状:
- 我是一个懒惰的白痴,应该知道如何在 SQL 中执行此
- 操作 集合操作不如过程操作那样具有表现力
- SQL 没有应有的表现力
我这样做了吗正确的事情? 我还有其他选择吗?
I was once given this task to do in an RDBMS:
Given tables customer, order, orderlines and product. Everything done with the usual fields and relationships, with a comment memo field on the orderline table.
For one customer retrieve a list of all products that customer has ever ordered with product name, year of first purchase, dates of three last purchases, comment of the latest order, sum of total income for that product-customer combination last 12 months.
After a couple of days I gave up doing it as a Query and opted to just fetch every orderline for a customer, and every product and run through the data procedurally to build the required table clientside.
I regard this a symptom of one or more of the following:
- I'm a lazy idiot and should have seen how to do it in SQL
- Set operations are not as expressive as procedural operations
- SQL is not as expressive as it should be
Did I do the right thing? Did I have other options?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您绝对应该能够完成此练习,而无需执行与应用程序代码中的
JOIN
等效的工作,即从订单行和产品中获取所有行并迭代它们。 您不必是一名 SQL 向导也能做到这一点。JOIN
对于 SQL 来说就像循环对于过程语言 —— 因为两者都是您应该知道如何使用的基本语言功能。人们容易陷入的一个陷阱是认为整个报告必须在单个 SQL 查询中生成。 不对! 正如托尼·安德鲁斯指出的那样,大多数报告都不适合放在一个矩形中。 有很多汇总、摘要、特殊情况等,因此在单独的查询中获取报表的各个部分既简单又高效。 同样,在过程语言中,您不会尝试在一行代码中完成所有计算,甚至在单个函数中(希望如此)。
一些报告工具坚持认为报告是从单个查询生成的,并且您没有机会合并多个查询。 如果是这样,那么您需要生成多份报告(如果老板希望将其放在一页上,那么您需要手动进行一些粘贴)。
要获取所有订购的产品(带有产品名称)、最近三个购买的日期以及对最新订单的评论的列表,很简单:
可以逐行迭代结果以提取最新订单的日期和评论,因为无论如何您都会获取这些行。 但是,通过要求数据库返回按日期排序的结果,可以让自己轻松一些。
首次购买年份可从之前的查询中获取,如果您按
order_date
排序并逐行获取结果,您将可以访问第一个订单。 否则,您可以这样做:过去 12 个月的产品购买总和最好通过单独的查询来计算:
编辑:您在另一条评论中说您'我想了解如何使用标准 SQL 获取最近 3 次购买的日期:
如果您可以使用一点点特定于供应商的 SQL 功能,则可以使用 Microsoft/Sybase
TOP
n ,或 MySQL/PostgreSQLLIMIT
:You definitely should be able to do this exercise without doing the work equivalent to a
JOIN
in application code, i.e. by fetching all rows from both orderlines and products and iterating through them. You don't have to be an SQL wizard to do that one.JOIN
is to SQL what a loop is to a procedural language -- in that both are fundamental language features that you should know how to use.One trap people fall into is thinking that the whole report has to be produced in a single SQL query. Not true! Most reports don't fit into a rectangle, as Tony Andrews points out. There are lots of rollups, summaries, special cases, etc. so it's both simpler and more efficient to fetch parts of the report in separate queries. Likewise, in a procedural language you wouldn't try do all your computation in a single line of code, or even in a single function (hopefully).
Some reporting tools insist that a report is generated from a single query, and you have no opportunity to merge in multiple queries. If so, then you need to produce multiple reports (and if the boss wants it on one page, then you need to do some paste-up manually).
To get a list of all products ordered (with product name), dates of last three purchases, and comment on latest order is straightforward:
It's fine to iterate over the result row-by-row to extract the dates and comments on the latest orders, since you're fetching those rows anyway. But make it easy on yourself by asking the database to return the results sorted by date.
Year of first purchase is available from the previous query, if you sort by the
order_date
and fetch the result row-by-row, you'll have access to the first order. Otherwise, you can do it this way:Sum of product purchases for the last 12 months is best calculated by a separate query:
edit: You said in another comment that you'd like to see how to get the dates of the last three purchases in standard SQL:
If you can use a wee bit of vendor-specific SQL features, you can use Microsoft/Sybase
TOP
n, or MySQL/PostgreSQLLIMIT
:也许更像是:“对于习惯过程语言的开发人员来说,集合操作不像过程操作那么熟悉”;-)
像您现在所做的那样迭代地执行它对于小数据集来说是很好的,但根本不以同样的方式扩展。 您是否做了正确的事情的答案取决于您是否对当前的性能感到满意和/或预计数据量不会增加太多。
如果您可以提供一些示例代码,我们也许能够帮助您找到基于集合的解决方案,该解决方案的启动速度会更快,并且可扩展性会更好。 正如 GalicateCowboy 提到的,临时表等技术可以帮助提高语句的可读性,同时在很大程度上保留性能优势。
Perhaps more like: "Set operations are not as familiar as procedural operations to a developer used to procedural languages" ;-)
Doing it iteratively as you have done now is fine for small sets of data, but simply doesn't scale the same way. The answer to whether you did the right thing depends on whether you are satisfied with the performance right now and/or don't expect the amount of data to increase much.
If you could provide some sample code, we might be able to help you find a set-based solution, which will be faster to begin with and scale far, far better. As GalacticCowboy mentioned, techniques such as temporary tables can help make the statements far more readable while largely retaining the performance benefits.
在大多数 RDBMS 中,您可以选择使用临时表或本地表变量,将此类任务分解为可管理的块。
我没有看到任何方法可以轻松地将其作为单个查询(没有一些令人讨厌的子查询)来完成,但它仍然应该是可行的,而无需退出过程代码,如果你使用临时表。
In most RDBMS you have the option of temporary tables or local table variables that you can use to break up a task like this into manageable chunks.
I don't see any way to easily do this as a single query (without some nasty subqueries), but it still should be doable without dropping out to procedural code, if you use temp tables.
此问题可能无法通过一个查询来解决。 我看到几个不同的部分...
对于一位客户
您的过程是步骤 1 - 5,SQL 会获取数据。
This problem may not have been solvable by one query. I see several distinct parts...
For one customer
Your procedure is steps 1 - 5 and SQL gets you the data.
对我来说听起来像是一个数据仓库项目。 如果您需要诸如“最近的三件事”和“过去 12 个月的某件事的总和”之类的内容,请存储它们,即非规范化。
Sounds like a data warehouse project to me. If you need things like "three most recent things" and "sum of something over the last 12 months" then store them i.e. denormalize.
编辑:这是一个全新的解决方案,不使用临时表或奇怪的子子子查询。 但是,它仅适用于 SQL 2005 或更高版本,因为它使用该版本中新增的“pivot”命令。
根本问题是所需的从一组行(数据中)到输出中的列的数据透视。 在思考这个问题时,我想起 SQL Server 现在有一个“pivot”运算符来处理这个问题。
这仅适用于 SQL 2005,使用 Northwind 示例数据。
EDIT: This is a completely new take on the solution, using no temp tables or strange sub-sub-sub queries. However, it will ONLY work on SQL 2005 or newer, as it uses the "pivot" command that is new in that version.
The fundamental problem is the desired pivot from a set of rows (in the data) into columns in the output. While noodling on the issue, I recalled that SQL Server now has a "pivot" operator to deal with this.
This works on SQL 2005 only, using the Northwind sample data.
SQL 查询以包含行和列的单个“平面”表的形式返回结果。 报告要求通常比这更复杂,需要像您的示例一样的“锯齿状”结果集。 通过“程序化”来解决此类需求,或者使用位于数据库之上的报告工具,并没有什么问题。 但是,您应该尽可能使用 SQL 以获得数据库的最佳性能。
SQL queries return results in the form of a single "flat" table of rows and columns. Reporting requirements are often more complex than this, demanding a "jagged" set of results like your example. There is nothing wrong with "going procedural" to solve such requirements, or using a reporting tool that sits on top of the database. However, you should use SQL as far as possible to get the best performance from the database.