如何连接表中的最新行?
我经常遇到这种形式的问题,但还没有找到好的解决方案:
假设我们有两个代表电子商务系统的数据库表。
userData (userId, name, ...)
orderData (orderId, userId, orderType, createDate, ...)
对于系统中的所有用户,选择其用户信息、类型 = '1' 的最新订单信息以及类型 = '2' 的最新订单信息。 我想在一个查询中完成此操作。 这是一个示例结果:
(userId, name, ..., orderId1, orderType1, createDate1, ..., orderId2, orderType2, createDate2, ...)
(101, 'Bob', ..., 472, '1', '4/25/2008', ..., 382, '2', '3/2/2008', ...)
I frequently run into problems of this form and haven't found a good solution yet:
Assume we have two database tables representing an e-commerce system.
userData (userId, name, ...)
orderData (orderId, userId, orderType, createDate, ...)
For all users in the system, select their user information, their most recent order information with type = '1', and their most recent order information with type = '2'. I want to do this in one query. Here is an example result:
(userId, name, ..., orderId1, orderType1, createDate1, ..., orderId2, orderType2, createDate2, ...)
(101, 'Bob', ..., 472, '1', '4/25/2008', ..., 382, '2', '3/2/2008', ...)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这应该可行,您必须调整表/列名称:
对数据进行非规范化也可能是一个好主意。 做这种事情的成本相当高。 因此,您可以将
last_order_date
添加到您的 userData 中。This should work, you'll have to adjust the table / column names:
Denormalizing your data might also be a good idea. This type of thing will be fairly expensive to do. So you might add a
last_order_date
to your userData.我提供了三种不同的方法来解决此问题:
所有解决方案都假设我们根据
orderId
列确定“最新”订单。 使用createDate
列会因时间戳冲突而增加复杂性,并严重影响性能,因为createDate
可能不是索引键的一部分。 我只使用 MS SQL Server 2005 测试了这些查询,因此我不知道它们是否适用于您的服务器。解决方案 (1) 和 (2) 的性能几乎相同。 事实上,它们都会导致从数据库中读取相同数量的数据。
处理大型数据集时,解决方案 (3) 不是首选方法。 它始终比 (1) 和 (2) 进行数百次逻辑读取。 当针对一个特定用户进行过滤时,方法(3)与其他方法相当。 在单用户情况下,CPU 时间的下降有助于抵消显着增加的读取次数; 然而,随着磁盘驱动器变得更加繁忙并且发生缓存未命中,这种微小的优势将消失。
结论
对于所呈现的场景,如果您的 DBMS 支持,请使用数据透视方法。 它比 case 语句需要更少的代码,并简化了将来添加订单类型的过程。
请注意,在某些情况下,PIVOT 不够灵活,使用 case 语句的特征值函数是可行的方法。
使用 PIVOT 的代码
方法 (1):
使用 Case 语句的方法 (2):
在 where 子句中使用内联查询的方法 (3)(基于 Steve K. 的响应):
用于生成表和 1000 个用户(每个用户有 100 个订单)的脚本:
除了 SQL Profiler 之外,用于测试 MS SQL Server 上的查询性能的小片段:
I have provided three different approaches for solving this problem:
All of the solutions assume we are determining the "most recent" order based on the
orderId
column. Using thecreateDate
column would add complexity due to timestamp collisions and seriously hinder performance sincecreateDate
is probably not part of the indexed key. I have only tested these queries using MS SQL Server 2005, so I have no idea if they will work on your server.Solutions (1) and (2) perform almost identically. In fact, they both result in the same number of reads from the database.
Solution (3) is not the preferred approach when working with large data sets. It consistently makes hundreds of logical reads more than (1) and (2). When filtering for one specific user, approach (3) is comparable to the other methods. In the single user case, a drop in the cpu time helps to counter the significantly higher number of reads; however, as the disk drive becomes busier and cache misses occur, this slight advantage will disappear.
Conclusion
For the presented scenario, use the pivot approach if it is supported by your DBMS. It requires less code than the case statement and simplifies adding order types in the future.
Please note, in some cases, PIVOT is not flexible enough and characteristic value functions using case statements are the way to go.
Code
Approach (1) using PIVOT:
Approach (2) using Case Statements:
Approach (3) using inline queries in the where clause (based on Steve K.'s response):
Script to generate tables and 1000 users with 100 orders each:
Small snippet for testing query performance on MS SQL Server in addition to SQL Profiler:
抱歉,我面前没有 oracle,但这是我在 oracle 中执行操作的基本结构:
Sorry I don't have oracle in front of me, but this is the basic structure of what I would do in oracle:
T-SQL 示例解决方案 (MS SQL):
在 SQL 2005 中,您还可以使用 RANK ( ) OVER 函数。 (但据我所知,它完全是 MSSQL 特有的功能)
T-SQL sample solution (MS SQL):
In SQL 2005 you could also use RANK ( ) OVER function. (But AFAIK its completely MSSQL-specific feature)
您也许可以为此执行联合查询。 确切的语法需要一些工作,特别是按部分分组,但联合应该能够做到。
例如:
You might be able to do a union query for this. The exact syntax needs some work, especially the group by section, but the union should be able to do it.
For example:
他们最新的你的意思是当今的所有新东西? 如果 createDate >= 当天,您可以随时检查 createDate 并获取所有用户和订单数据。
已更新
这是您在此处发表评论后想要的内容
:)
Their newest you mean all new in the current day? You can always check with your createDate and get all user and order data if the createDate >= current day.
UPDATED
Here is what you want after your comment here:
)
我在 MySQL 中使用类似的东西:
简而言之,通过将条件字段(createDate)添加到感兴趣的字段(otherfield)之前,使用 MAX() 来获取最新的数据。 然后 SUBSTRING_INDEX() 删除日期。
OTOH,如果您需要任意数量的订单(如果 userType 可以是任何数字,而不是有限的 ENUM); 最好处理单独的查询,如下所示:
针对每个用户。
i use things like this in MySQL:
In short, use MAX() to get the newest, by prepending the criteria field (createDate) to the interesting field(s) (otherfield). SUBSTRING_INDEX() then strips off the date.
OTOH, if you need an arbitrary number of orders (if userType can be any number, and not a limited ENUM); it's better to handle with a separate query, something like this:
for each user.
假设 orderId 随时间单调增加:
然后在客户端进行透视,或者如果使用 SQL Server,则有 PIVOT 功能
Assuming orderId is monotonic increasing with time:
Then pivot at the client or if using SQL Server, there is a PIVOT functionality
以下是将类型 1 和 2 数据移动到同一行的一种方法:
(通过将类型 1 和类型 2 信息放入它们自己的选择中,然后在 from 子句中使用。)
Here is one way to move the type 1 and 2 data on to the same row:
(by placing the type 1 and type 2 information into their own selects that then get used in the from clause.)
我是这样做的。 这是标准 SQL,适用于任何品牌的数据库。
请注意,如果您有多个任一类型的订单,其日期等于最新日期,您将在结果集中获得多行。 如果您有多个这两种类型的订单,您将在结果集中获得 N x M 行。 因此,我建议您在单独的查询中获取每种类型的行。
Here's how I do it. This is standard SQL and works in any brand of database.
Note that if you have multiple orders of either type whose dates are equal to the latest date, you'll get multiple rows in the result set. If you have multiple orders of both types, you'll get N x M rows in the result set. So I would recommend that you fetch the rows of each type in separate queries.
史蒂夫·K 完全正确,谢谢! 我确实稍微重写了他的答案,以考虑到特定类型可能没有顺序的事实(我没有提及,所以我不能责怪 Steve K。)
这就是我最终使用的内容:
Steve K is absolutely right, thanks! I did rewrite his answer a little to account for the fact that there might be no order for a particular type (which I failed to mention, so I can't fault Steve K.)
Here's what I wound up using: