从多列中选择最小值的最佳方法是什么?
给出 SQL Server 2005 中的下表:
ID Col1 Col2 Col3
-- ---- ---- ----
1 3 34 76
2 32 976 24
3 7 235 3
4 245 1 792
编写产生以下结果的查询的最佳方式是什么(即产生最后一列 - 包含 Col1、Col2 和 Col 3 中的最小值的列每行)?
ID Col1 Col2 Col3 TheMin
-- ---- ---- ---- ------
1 3 34 76 3
2 32 976 24 24
3 7 235 3 3
4 245 1 792 1
更新:
为了澄清(正如我在评论中所说),在真实场景中,数据库被正确规范化。 这些“数组”列不在实际的表中,而是在报告所需的结果集中。 新的要求是报表也需要这个 MinValue 列。 我无法更改底层结果集,因此我希望使用 T-SQL 来获得方便的“出狱卡”。
我尝试了下面提到的CASE方法,虽然有点麻烦,但它确实有效。 它也比答案中所述的更复杂,因为您需要考虑同一行中有两个最小值这一事实。
不管怎样,我想我应该发布我当前的解决方案,考虑到我的限制,该解决方案效果很好。 它使用 UNPIVOT 运算符:
with cte (ID, Col1, Col2, Col3)
as
(
select ID, Col1, Col2, Col3
from TestTable
)
select cte.ID, Col1, Col2, Col3, TheMin from cte
join
(
select
ID, min(Amount) as TheMin
from
cte
UNPIVOT (Amount for AmountCol in (Col1, Col2, Col3)) as unpvt
group by ID
) as minValues
on cte.ID = minValues.ID
我首先要说的是,我不希望这提供最佳性能,但考虑到这种情况(我无法仅针对新的 MinValue 列要求重新设计所有查询),它是一个相当不错的选择。优雅的“出狱卡”。
Given the following table in SQL Server 2005:
ID Col1 Col2 Col3
-- ---- ---- ----
1 3 34 76
2 32 976 24
3 7 235 3
4 245 1 792
What is the best way to write the query that yields the following result (i.e. one that yields the final column - a column containing the minium values out of Col1, Col2, and Col 3 for each row)?
ID Col1 Col2 Col3 TheMin
-- ---- ---- ---- ------
1 3 34 76 3
2 32 976 24 24
3 7 235 3 3
4 245 1 792 1
UPDATE:
For clarification (as I have said in the coments) in the real scenario the database is properly normalized. These "array" columns are not in an actual table but are in a result set that is required in a report. And the new requirement is that the report also needs this MinValue column. I can't change the underlying result set and therefore I was looking to T-SQL for a handy "get out of jail card".
I tried the CASE approach mentioned below and it works, although it is a bit cumbersome. It is also more complicated than stated in the answers because you need to cater for the fact that there are two min values in the same row.
Anyway, I thought I'd post my current solution which, given my constraints, works pretty well. It uses the UNPIVOT operator:
with cte (ID, Col1, Col2, Col3)
as
(
select ID, Col1, Col2, Col3
from TestTable
)
select cte.ID, Col1, Col2, Col3, TheMin from cte
join
(
select
ID, min(Amount) as TheMin
from
cte
UNPIVOT (Amount for AmountCol in (Col1, Col2, Col3)) as unpvt
group by ID
) as minValues
on cte.ID = minValues.ID
I'll say upfront that I don't expect this to offer the best performance, but given the circumstances (I can't redesign all the queries just for the new MinValue column requirement), it is a pretty elegant "get out of jail card".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
可能有很多方法可以实现这一目标。 我的建议是使用 Case/When 来执行此操作。 有 3 列,还不错。
There are likely to be many ways to accomplish this. My suggestion is to use Case/When to do it. With 3 columns, it's not too bad.
使用
CROSS APPLY
:SQL Fiddle
Using
CROSS APPLY
:SQL Fiddle
OP 被标记为
sql-server
,这是 SQL Server 特定的答案。 可能不适用于其他 DBMS。 (搜索least
查找 Postgres 派生的数据库和其他一些数据库)。The OP was tagged as
sql-server
, and this is an SQL Server specific answer. May not work on other DBMSs. (Search forleast
for Postgres derived DBs and some others).在 MySQL 上,使用以下命令:
On MySQL, use this:
从 SQL Server 2022 开始,您可以使用
LEAST
函数。 还有最棒的
功能。对于早期版本,您可以使用“蛮力”方法,但需要注意的是:
当第一个条件失败时,它保证 Col1 不是最小值,因此您可以从其余条件中消除它。 对于后续条件也是如此。 对于五列,您的查询将变为:
请注意,如果两列或多列之间存在联系,则
<=
确保我们尽早退出CASE
语句。As of SQL Server 2022, you can use the
LEAST
function for this. There is also theGREATEST
function.For earlier versions, you can use the "brute force" approach with a twist:
When the first when condition fails it guarantees that Col1 is not the smallest value therefore you can eliminate it from rest of the conditions. Likewise for subsequent conditions. For five columns your query becomes:
Note that if there is a tie between two or more columns then
<=
ensures that we exit theCASE
statement as early as possible.如果列是整数(如您的示例中所示),我将创建一个函数:
那么当我需要使用它时,我会这样做:
如果您有 5 列,则上面的内容将变为
If the columns were integers as in your example I would create a function:
then when I need to use it I would do :
if you have 5 colums then the above becomes
最好的方法可能是不这样做 - 奇怪的是,人们坚持以需要 SQL“体操”来提取有意义的信息的方式存储数据,而实际上有更简单的方法来提取有意义的信息。如果您只是更好地构建您的架构,即可获得所需的结果:-)
在我看来,正确的方法是使用下表:
使用
ID/Col code> 作为主键(也可能将
Col
作为额外键,具体取决于您的需要)。 然后,您的查询就变成了一个简单的select min(val) from tbl
,并且您仍然可以通过在其他查询中使用where col = 2
来单独处理各个“旧列”。 如果“旧列”数量增加,这也可以轻松扩展。这使您的查询变得更加容易。 我倾向于使用的一般准则是,如果您曾经在数据库行中有一些看起来像数组的东西,那么您可能做错了什么,应该考虑重组数据。
但是,如果由于某种原因您无法更改这些列,我建议使用插入和更新触发器并添加另一个列,这些触发器将其设置为最小值 <代码>Col1/2/3。 这会将操作的“成本”从选择转移到它所属的更新/插入 - 根据我的经验,大多数数据库表的读取次数远远多于写入次数,因此随着时间的推移,写入成本往往会更加有效。
换句话说,一行的最小值仅在其他列之一发生变化时才发生变化,因此您应该在此时计算它,而不是每次选择时(如果数据不正确,这就会浪费)改变)。 然后,您最终会得到一个如下表:
任何其他必须在
select
时间做出决定的选项通常在性能方面都是一个坏主意,因为数据仅在插入/更新时更改 - 添加另一列在数据库中占用更多空间,插入和更新速度会稍慢,但选择速度可能快得多 - 首选方法应取决于您的优先级,但如上所述,大多数表阅读的次数远多于编写的次数。The best way to do that is probably not to do it - it's strange that people insist on storing their data in a way that requires SQL "gymnastics" to extract meaningful information, when there are far easier ways to achieve the desired result if you just structure your schema a little better :-)
The right way to do this, in my opinion, is to have the following table:
with
ID/Col
as the primary key (and possiblyCol
as an extra key, depending on your needs). Then your query becomes a simpleselect min(val) from tbl
and you can still treat the individual 'old columns' separately by usingwhere col = 2
in your other queries. This also allows for easy expansion should the number of 'old columns' grow.This makes your queries so much easier. The general guideline I tend to use is, if you ever have something that looks like an array in a database row, you're probably doing something wrong and should think about restructuring the data.
However, if for some reason you can't change those columns, I'd suggest using insert and update triggers and add another column which these triggers set to the minimum on
Col1/2/3
. This will move the 'cost' of the operation away from the select to the update/insert where it belongs - most database tables in my experience are read far more often than written so incurring the cost on write tends to be more efficient over time.In other words, the minimum for a row only changes when one of the other columns change, so that's when you should be calculating it, not every time you select (which is wasted if the data isn't changing). You would then end up with a table like:
Any other option that has to make decisions at
select
time is usually a bad idea performance-wise, since the data only changes on insert/update - the addition of another column takes up more space in the DB and will be slightly slower for the inserts and updates but can be much faster for selects - the preferred approach should depend on your priorities there but, as stated, most tables are read far more often than they're written.这是蛮力,但有效
...因为 min() 仅适用于一列,而不适用于跨列。
This is brute force but works
... because min() works only on one column and not across columns.
您也可以使用联合查询来完成此操作。 随着列数的增加,您需要修改查询,但至少这是一个直接的修改。
You could also do this with a union query. As the number of columns increase, you would need to modify the query, but at least it would be a straight forward modification.
两个这个问题
并且 这个问题尝试回答这个问题。
回顾一下,Oracle 为此有一个内置函数,而使用 Sql Server,您要么定义用户定义函数,要么使用 case 语句。
Both this question
And this question try to answer this.
The recap is that Oracle has a built in function for this, with Sql Server you are stuck either defining a user-defined-function or using case statements.
对于多列,最好使用 CASE 语句,但是对于两个数字列 i 和 j,您可以使用简单的数学:
min(i,j) = (i+j)/2 - abs(ij)/2
此公式可以是用于获取多列的最小值,但过去 2 确实很混乱,min(i,j,k) 将是 min(i,min(j,k))
For multiple columns its best to use a CASE statement, however for two numeric columns i and j you can use simple math:
min(i,j) = (i+j)/2 - abs(i-j)/2
This formula can be used to get the minimum value of multiple columns but its really messy past 2, min(i,j,k) would be min(i,min(j,k))
如果您能够创建一个存储过程,它可能需要一个值数组,您可以直接调用它。
If you're able to make a stored procedure, it could take an array of values, and you could just call that.
联合查询的一点变化:
A little twist on the union query:
如果您使用 SQL 2005,您可以做一些像这样的巧妙的事情:
这样您就不会迷失在这么多运算符中:)
但是,这可能比其他选择慢。
这是你的选择...
If you use SQL 2005 you can do something neat like this:
This way you don't get lost in so many operators :)
However, this could be slower than the other choice.
It's your choice...
如果您知道要查找的值(通常是状态代码),则以下内容可能会有所帮助:
If you know what values you are looking for, usually a status code, the following can be helpful:
下面我使用临时表来获取几个日期中的最小值。 第一个临时表查询多个连接表以获取各种日期(以及查询的其他值),然后第二个临时表使用与日期列一样多的遍数获取各种列和最小日期。
这本质上类似于联合查询,需要相同数量的传递,但可能更有效(基于经验,但需要测试)。 在本例中(8,000 条记录),效率不是问题。 可以索引等。
Below I use a temp table to get the minimum of several dates. The first temp table queries several joined tables to get various dates (as well as other values for the query), the second temp table then gets the various columns and the minimum date using as many passes as there are date columns.
This is essentially like the union query, the same number of passes are required, but may be more efficient (based on experience, but would need testing). Efficiency wasn't an issue in this case (8,000 records). One could index etc.
我知道这个问题已经很老了,但我仍然需要答案,并且对其他答案不满意,所以我必须设计自己的答案,这是 @paxdiablo 的一个扭曲 答案。
我来自 SAP ASE 16.0,我只需要查看某些数据的统计信息,恕我直言,这些数据有效地存储在单行的不同列中(它们代表不同的时间 - 计划到达的时间、预计到达的时间)行动开始,最后是实际时间)。 因此,我将列转置到临时表的行中,并像往常一样对其执行查询。
注意不是万能的解决方案!
这在 630000 行的源集上花费了大约 30 秒并且仅使用索引数据,因此不适合在时间关键的过程中运行,但是对于一次性数据检查或日终报告之类的事情,您可能没问题(但请与您的同事或上级核实!)。
这种风格对我来说的主要好处是,我可以轻松地使用更多/更少的列并更改分组、过滤等,尤其是在复制数据后。
附加数据(
columnName
、max
es、...)是为了帮助我进行搜索,因此您可能不需要它们; 我把它们留在这里也许是为了激发一些想法:-)。I know that question is old, but I was still in the need of the answer and was not happy with other answers so I had to devise my own which is a twist on @paxdiablo´s answer.
I came from land of SAP ASE 16.0, and I only needed a peek at statistics of certain data which are IMHO validly stored in different columns of a single row (they represent different times - when arrival of something was planned, what it was expected when the action started and finally what was the actual time). Thus I had transposed columns into the rows of temporary table and preformed my query over this as usually.
N.B. Not the one-size-fits-all solution ahead!
This took some 30 seconds on source set of 630000 rows and used only index-data, so not the thing to run in time-critical process but for things like one-time data inspection or end-of-the-day report you might be fine (but verify this with your peers or superiors, please!).
Main bonus of this style for me was that I could readily use more/less columns and change grouping, filtering, etc., especially once data was copyied over.
The additional data (
columnName
,max
es, ...) were to aid me in my search, so you might not need them; I left them here to maybe spark some ideas :-).案件
当 Col1 < Col2 和 Col1 < 第3栏
然后是第 1 列
当 Col2 为空且 Col3 为空时
然后是第 1 列
当 Col1 < Col2 和 Col3 为空
然后是第 1 列
当 Col1 < Col3 和 Col2 为空
然后是第 1 列
当Col2 < Col1 和 Col2 < 第3栏
然后是第 2 列
当 Col1 为空且 Col3 为空时
然后是第 2 列
当Col2 < Col1 和 Col3 为空
然后是第 2 列
当Col2 < Col3 和 Col1 为空
然后是第 2 列
当Col3 < Col1 和 Col3 < 第2栏
然后是第 3 列
当 Col1 为 null 并且 Col2 为 null 时
然后是第 3 列
当Col3 < Col1 和 Col2 为空
然后是第 3 列
当Col3 < Col2 和 Col1 为空
然后是第 3 列
当 Col2 = Col3 时
然后是第 2 列
当列 1 = 列 3 时
然后是第 1 列
当列 1 = 列 2 时
然后是第 1 列
当 Col2 = Col3 且 Col1 = Col3 时
然后 Col1
否则 null 结尾为 'MIN'
case
when Col1 < Col2 and Col1 < Col3
then Col1
when Col2 is null and Col3 is null
then Col1
when Col1 < Col2 and Col3 is null
then Col1
when Col1 < Col3 and Col2 is null
then Col1
when Col2 < Col1 and Col2 < Col3
then Col2
when Col1 is null and Col3 is null
then Col2
when Col2 < Col1 and Col3 is null
then Col2
when Col2 < Col3 and Col1 is null
then Col2
when Col3 < Col1 and Col3 < Col2
then Col3
when Col1 is null and Col2 is null
then Col3
when Col3 < Col1 and Col2 is null
then Col3
when Col3 < Col2 and Col1 is null
then Col3
when Col2 = Col3
then Col2
when Col1 = Col3
then Col1
when Col1 = Col2
then Col1
when Col2 = Col3 and Col1 = Col3
then Col1
else null end as 'MIN'
从 SQL Server 2022(和 Azure SQL 数据库)开始,您可以使用
LEAST()
函数。https://learn.microsoft.com/en-us/sql/t-sql/functions/tical-functions-least-transact-sql?view=azure-sqldw-latest
As of SQL Server 2022 (and Azure SQL Database), you can use
LEAST()
function.https://learn.microsoft.com/en-us/sql/t-sql/functions/logical-functions-least-transact-sql?view=azure-sqldw-latest