为多列选择不同值时,group by、distinct、Union 之间有什么区别?
这个问题解释了获取多列的不同组合的方法。 但我想知道DISTINCT、UNION、GROUP BY关键字方法为此目的的方法之间的区别。 使用它们时我得到了不同的结果。 我的查询是这样的 查询1。
select
column1,
column2,
column3
from table
group by 1,2,3
查询2。
select distinct
column1,
column2,
column3
from table
查询3。
SELECT DISTINCT(ans) FROM (
SELECT column1 AS ans FROM sametable
UNION
SELECT column2 AS ans FROM sametable
UNION
SELECT column3 AS ans FROM sametable
) AS Temp
我为上述查询获得了不同的行数(编辑:前两个给出了相同的行数,但最后一个给出了不同的行数)。 任何人都可以解释上述查询在做什么吗? 尤其是第三个?
编辑:请注意,我正在同一张表上进行 UNION 操作。 那样的话会发生什么呢?
This question explained about a way of getting distinct combination of multiple columns. But I want to know the difference between the methods of DISTINCT, UNION, GROUP BY keyword method for this purpose. I am getting different results when using them.
My queries are like this
Query 1.
select
column1,
column2,
column3
from table
group by 1,2,3
Query 2.
select distinct
column1,
column2,
column3
from table
Query 3.
SELECT DISTINCT(ans) FROM (
SELECT column1 AS ans FROM sametable
UNION
SELECT column2 AS ans FROM sametable
UNION
SELECT column3 AS ans FROM sametable
) AS Temp
I am getting different number of rows for above queries(Edit: The first two are giving equal number of rows but last one is giving differnetly). Can any body explain what the above queries are doing? Especially the third one?
EDIT: Note that I am doing UNION on same table. In that case what will happen?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您包含“实际执行计划”(MS SQL Management Studio 中的 control + M),它将为您提供 SQL 引擎如何优化每个语句的图表。 理解这一点将帮助您编写更好的查询。
If you include "Actual Execution Plan" (control + M in MS SQL Management Studio), it will give you a diagram of how the SQL engine optimises each of your statements. Understanding this will help you write better queries.
让我们假设这是您的数据库数据:
第一个查询
在第一个示例中,您将从数据库获取所有列组合(因为
GROUP BY 1,2,3
不执行任何操作),包括重复项,因此它将返回:第二个查询
第二个示例采用列元组的唯一值,因此您将以
第三个查询
结束最后一个查询采用三列中的所有值,然后从该集中删除重复项。 因此,您将从任何表中获取所有值。 最后这个会返回吗,
这说明清楚了吗?
Lets assume this is your db data:
First query
In the first example you will get all column combinations from the db (as
GROUP BY 1,2,3
does nothing) including duplicates, so it will return:2nd query
Second example takes unique values for column tuples so you will end with
3rd query
Last query takes all values from three columns and then it removes duplicates from that set. So you will get all values from any of the tables. In the end this will return
Does this makes it clear?
从我认为最简单的、独特的开始,确实就是这样。 它返回不同的行组合。 想想这个数据集:
这将返回 3 行,因为数据集中的第 4 行与第一行完全匹配。
结果:
GROUP BY 经常用于汇总和其他计算
选择 COL1、SUM(COL2)
从表
按列 1 分组;
对于此数据集:
将返回
UNION 仅获取不同查询的结果并将它们呈现为 1 个结果集:
结果集:
执行并集时,列数据类型必须匹配。 您不能将数字列与字符列联合起来(除非您显式执行数据转换)
Starting with what I think is the simplest, DISTINCT, really is just that. It returns the distinct combinations of rows. Think of this dataset:
This will return 3 rows because the 4th row in the dataset exactly matches the first row.
Result:
The GROUP BY is frequently used for summaries and other calculations
select COL1, SUM(COL2)
from table
group by column1;
For this dataset:
would return
a UNION just takes results from different queries and presents them as 1 result set:
Result Set:
When performing a union, the column datatypes must match up. You can't UNION a number column with a char column (unless you explicitly perform a data conversion)
让我们使用一组示例数据
Distinct 基本上返回给定记录的单个实例,结果集中的整个列集没有重复项。 例如:“从订单中选择不同的客户”将返回“A”、“B”、“C”,默认按所选列的字母顺序。
Group by 是在查询中的给定字段集内进行聚合。 例如:
select customer, count(*) as NumberOfOrders from Orders group by 1
您还可以在查询中应用重复(仅一次),但在给定组内..
select customer, count(*) as NumberOfOrders, count( unique {month of orderdate} ) as CustomerMonths from orders group by customer
Unions 是必须具有完全相同的结果格式、列名和字段序列的查询。 假设您有一个订单表,其结构也与数据的存档版本完全相同。 您只保留最近一年的当前数据,所有历史数据都会被推送到存档。 如果您想在一个查询中获取给定客户的所有订单活动,您需要
从 CurrentOrders 中进行联合选择 customerid、orderdate、amount,其中 customerid = ?? 按 2 降序排列
联盟
从 ArchivedOrders 中选择客户 ID、订单日期、金额,其中客户 ID = ??
第一个选择的 ORDER by 子句将驱动结果,所有后续记录都被拉入结果中。 这就像 SQL 说转到表一,获取所有符合条件的内容,然后排序。 然后,转到表二,获取所有符合条件的内容,并将其拉入表一中现有的排序列表。 最终结果是所有记录。
华泰
Lets go with a sample set of data
Distinct basically returns a single instance of a given record with no duplicates of the entire set of columns in the result set. Ex: "select distinct customer from orders" would return "A", "B", "C" defaulted in alpha order of column(s) chosen.
Group by is to do aggregations within a given set of fields in a query. Ex:
select customer, count(*) as NumberOfOrders from Orders group by 1
You can also apply distinct (only once), within a query, but within a given group..
select customer, count(*) as NumberOfOrders, count( distinct {month of orderdate} ) as CustomerMonths from orders group by customer
Unions are queries that must be the exact same result format, column names and sequence of fields. Lets say you have an orders table that is the exact same structure as an archived version of data too. You only keep current data over the most current year, all historical is pushed to archive. If you wanted to get ALL order activity for a given customer in one query, you would want to do a union
select customerid, orderdate, amount from CurrentOrders where customerid = ?? order by 2 descending
UNION
select customerid, orderdate, amount from ArchivedOrders where customerid = ??
The ORDER by clause of the first select will drive the results all all subsequent records being pulled into the results. Its like SQL saying go to table one, get all that qualify, then sort. Then, go to table two, get all that qualify there and pull into the existing sorted list from table one. Final result is ALL records.
HTH