为多列选择不同值时,group by、distinct、Union 之间有什么区别?

发布于 2024-08-01 23:28:19 字数 796 浏览 5 评论 0原文

这个问题解释了获取多列的不同组合的方法。 但我想知道DISTINCT、UNION、GROUP BY关键字方法为此目的的方法之间的区别。 使用它们时我得到了不同的结果。 我的查询是这样的 查询1。

select 
column1,
column2,
column3
from table
group by 1,2,3

查询2。

select distinct 
column1,
column2,
column3
from table

查询3。

SELECT DISTINCT(ans) FROM (
    SELECT column1 AS ans FROM sametable
    UNION
    SELECT column2 AS ans FROM sametable
    UNION
    SELECT column3 AS ans FROM sametable
) AS Temp

我为上述查询获得了不同的行数(编辑:前两个给出了相同的行数,但最后一个给出了不同的行数)。 任何人都可以解释上述查询在做什么吗? 尤其是第三个?

编辑:请注意,我正在同一张表上进行 UNION 操作。 那样的话会发生什么呢?

This question explained about a way of getting distinct combination of multiple columns. But I want to know the difference between the methods of DISTINCT, UNION, GROUP BY keyword method for this purpose. I am getting different results when using them.
My queries are like this
Query 1.

select 
column1,
column2,
column3
from table
group by 1,2,3

Query 2.

select distinct 
column1,
column2,
column3
from table

Query 3.

SELECT DISTINCT(ans) FROM (
    SELECT column1 AS ans FROM sametable
    UNION
    SELECT column2 AS ans FROM sametable
    UNION
    SELECT column3 AS ans FROM sametable
) AS Temp

I am getting different number of rows for above queries(Edit: The first two are giving equal number of rows but last one is giving differnetly). Can any body explain what the above queries are doing? Especially the third one?

EDIT: Note that I am doing UNION on same table. In that case what will happen?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

情感失落者 2024-08-08 23:28:19

如果您包含“实际执行计划”(MS SQL Management Studio 中的 control + M),它将为您提供 SQL 引擎如何优化每个语句的图表。 理解这一点将帮助您编写更好的查询。

If you include "Actual Execution Plan" (control + M in MS SQL Management Studio), it will give you a diagram of how the SQL engine optimises each of your statements. Understanding this will help you write better queries.

无敌元气妹 2024-08-08 23:28:19

让我们假设这是您的数据库数据:

column1 | column2 | column3
1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

第一个查询

在第一个示例中,您将从数据库获取所有列组合(因为 GROUP BY 1,2,3 不执行任何操作),包括重复项,因此它将返回:

1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

第二个查询

第二个示例采用列元组的唯一值,因此您将以

1       | 2       | 1
1       | 2       | 2
3       | 1       | 2

第三个查询

结束最后一个查询采用三列中的所有值,然后从该集中删除重复项。 因此,您将从任何表中获取所有值。 最后这个会返回吗,

1
2
3

这说明清楚了吗?

Lets assume this is your db data:

column1 | column2 | column3
1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

First query

In the first example you will get all column combinations from the db (as GROUP BY 1,2,3 does nothing) including duplicates, so it will return:

1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

2nd query

Second example takes unique values for column tuples so you will end with

1       | 2       | 1
1       | 2       | 2
3       | 1       | 2

3rd query

Last query takes all values from three columns and then it removes duplicates from that set. So you will get all values from any of the tables. In the end this will return

1
2
3

Does this makes it clear?

注定孤独终老 2024-08-08 23:28:19

从我认为最简单的、独特的开始,确实就是这样。 它返回不同的行组合。 想想这个数据集:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I
A         B         C   <- duplicate of row 1

这将返回 3 行,因为数据集中的第 4 行与第一行完全匹配。
结果:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I

GROUP BY 经常用于汇总和其他计算
选择 COL1、SUM(COL2)
从表
按列 1 分组;

对于此数据集:

COL1      COL2
A         5
A         6
B         2
C         3
C         4
C         5

将返回

COL1     SUM(COL2)
A        11
B        2
C        12

UNION 仅获取不同查询的结果并将它们呈现为 1 个结果集:

Table1
COL1
A

Table2
COLX
B

Table3
WHATEVER_COLUMN_NAME
Giddyup

select COL1 from Table1
UNION
select COLX from Table2
UNION 
select WHATEVER_COLUMN_NAME from Table3;

结果集:

A
B
Giddyup

执行并集时,列数据类型必须匹配。 您不能将数字列与字符列联合起来(除非您显式执行数据转换)

Starting with what I think is the simplest, DISTINCT, really is just that. It returns the distinct combinations of rows. Think of this dataset:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I
A         B         C   <- duplicate of row 1

This will return 3 rows because the 4th row in the dataset exactly matches the first row.
Result:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I

The GROUP BY is frequently used for summaries and other calculations
select COL1, SUM(COL2)
from table
group by column1;

For this dataset:

COL1      COL2
A         5
A         6
B         2
C         3
C         4
C         5

would return

COL1     SUM(COL2)
A        11
B        2
C        12

a UNION just takes results from different queries and presents them as 1 result set:

Table1
COL1
A

Table2
COLX
B

Table3
WHATEVER_COLUMN_NAME
Giddyup

select COL1 from Table1
UNION
select COLX from Table2
UNION 
select WHATEVER_COLUMN_NAME from Table3;

Result Set:

A
B
Giddyup

When performing a union, the column datatypes must match up. You can't UNION a number column with a char column (unless you explicitly perform a data conversion)

忱杏 2024-08-08 23:28:19

让我们使用一组示例数据

orderid    customer orderdate
1          B        July 29
2          A        Aug 1
3          A        Aug 4
4          C        Aug 5
5          B        Aug 6
6          A        Aug 11

Distinct 基本上返回给定记录的单个实例,结果集中的整个列集没有重复项。 例如:“从订单中选择不同的客户”将返回“A”、“B”、“C”,默认按所选列的字母顺序。

Group by 是在查询中的给定字段集内进行聚合。 例如:

select customer, count(*) as NumberOfOrders from Orders group by 1

Would result with...
A    3
B    2
C    1

您还可以在查询中应用重复(仅一次),但在给定组内..

select customer, count(*) as NumberOfOrders, count( unique {month of orderdate} ) as CustomerMonths from orders group by customer

Would result with
A    3    1  (all orders were in August)
B    2    2  (had orders in July and August)
C    1    1  (only one order in August)

Unions 是必须具有完全相同的结果格式、列名和字段序列的查询。 假设您有一个订单表,其结构也与数据的存档版本完全相同。 您只保留最近一年的当前数据,所有历史数据都会被推送到存档。 如果您想在一个查询中获取给定客户的所有订单活动,您需要

从 CurrentOrders 中进行联合选择 customerid、orderdate、amount,其中 customerid = ?? 按 2 降序排列
联盟
从 ArchivedOrders 中选择客户 ID、订单日期、金额,其中客户 ID = ??

第一个选择的 ORDER by 子句将驱动结果,所有后续记录都被拉入结果中。 这就像 SQL 说转到表一,获取所有符合条件的内容,然后排序。 然后,转到表二,获取所有符合条件的内容,并将其拉入表一中现有的排序列表。 最终结果是所有记录。

华泰

Lets go with a sample set of data

orderid    customer orderdate
1          B        July 29
2          A        Aug 1
3          A        Aug 4
4          C        Aug 5
5          B        Aug 6
6          A        Aug 11

Distinct basically returns a single instance of a given record with no duplicates of the entire set of columns in the result set. Ex: "select distinct customer from orders" would return "A", "B", "C" defaulted in alpha order of column(s) chosen.

Group by is to do aggregations within a given set of fields in a query. Ex:

select customer, count(*) as NumberOfOrders from Orders group by 1

Would result with...
A    3
B    2
C    1

You can also apply distinct (only once), within a query, but within a given group..

select customer, count(*) as NumberOfOrders, count( distinct {month of orderdate} ) as CustomerMonths from orders group by customer

Would result with
A    3    1  (all orders were in August)
B    2    2  (had orders in July and August)
C    1    1  (only one order in August)

Unions are queries that must be the exact same result format, column names and sequence of fields. Lets say you have an orders table that is the exact same structure as an archived version of data too. You only keep current data over the most current year, all historical is pushed to archive. If you wanted to get ALL order activity for a given customer in one query, you would want to do a union

select customerid, orderdate, amount from CurrentOrders where customerid = ?? order by 2 descending
UNION
select customerid, orderdate, amount from ArchivedOrders where customerid = ??

The ORDER by clause of the first select will drive the results all all subsequent records being pulled into the results. Its like SQL saying go to table one, get all that qualify, then sort. Then, go to table two, get all that qualify there and pull into the existing sorted list from table one. Final result is ALL records.

HTH

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文