SQL。如何从表中获得唯一值,但仅适用于1列?

发布于 2025-02-01 18:28:52 字数 1496 浏览 2 评论 0原文

我有一个桌子。我按名称对此表进行排序,按时间列:

SELECT name, value 
FROM table1
WHERE time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00'  
ORDER BY name, time 

结果我得到了下一个表:

    name,   value,  time
    A,      5,      2022-05-23 01:01:12
    A,      9,      2022-05-23 01:02:11
    A,      7,      2022-05-23 01:03:21
    B,      5,      2022-05-23 01:04:23
    B,      6,      2022-05-23 01:05:33
    C,      7,      2022-05-23 01:06:30
    C,      8,      2022-05-23 01:07:41
    C,      3,      2022-05-23 01:08:44
    C,      7,      2022-05-23 01:09:50

然后我需要选择所有具有最低时间的唯一名称,并获得此名称的值:

结果应该像这样:

name,   value,  time
A,      5,      2022-05-23 01:01:12
B,      5,      2022-05-23 01:04:23
C,      7,      2022-05-23 01:06:30

================== =============================================== ============= = ==

我们刚从表中获取了每个第一个唯一名称:

name,   value,  time
A,      5,      2022-05-23 01:01:12 <- take this one
A,      9,      2022-05-23 01:02:11
A,      7,      2022-05-23 01:03:21 
B,      5,      2022-05-23 01:04:23 <- take this one
B,      6,      2022-05-23 01:05:33
C,      7,      2022-05-23 01:06:30 <- take this one
C,      8,      2022-05-23 01:07:41
C,      3,      2022-05-23 01:08:44
C,      7,      2022-05-23 01:09:50

我的目标是以最低的时间戳(或表格中的每个第一个唯一名称,因为它已经由时间戳)

我不清楚如何获得想要的结果。 我尝试使用“选择不同的名称,值”,但它是所有唯一名称和唯一值,但是我只需要有唯一的名称+值(时间最低)

I have a table1 . I sort this table by name, and by time column :

SELECT name, value 
FROM table1
WHERE time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00'  
ORDER BY name, time 

AS result i got next table :

    name,   value,  time
    A,      5,      2022-05-23 01:01:12
    A,      9,      2022-05-23 01:02:11
    A,      7,      2022-05-23 01:03:21
    B,      5,      2022-05-23 01:04:23
    B,      6,      2022-05-23 01:05:33
    C,      7,      2022-05-23 01:06:30
    C,      8,      2022-05-23 01:07:41
    C,      3,      2022-05-23 01:08:44
    C,      7,      2022-05-23 01:09:50

Then i need select all unique names with lowest time, and got values for this names:

Result should be like this:

name,   value,  time
A,      5,      2022-05-23 01:01:12
B,      5,      2022-05-23 01:04:23
C,      7,      2022-05-23 01:06:30

=====================================================================

It turns out that we just took each first unique name from the table:

name,   value,  time
A,      5,      2022-05-23 01:01:12 <- take this one
A,      9,      2022-05-23 01:02:11
A,      7,      2022-05-23 01:03:21 
B,      5,      2022-05-23 01:04:23 <- take this one
B,      6,      2022-05-23 01:05:33
C,      7,      2022-05-23 01:06:30 <- take this one
C,      8,      2022-05-23 01:07:41
C,      3,      2022-05-23 01:08:44
C,      7,      2022-05-23 01:09:50

My goal is take unique names with lowest timestamp (or each first unique name from table, because it's already sorted by timestamp)

I'm not clear how to get wanted result.
I tried use "SELECT DISTINCT name,values" but it's back all unique names AND unique values, but i'm need got ONLY unique names+values (where time is lowest)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

娇纵 2025-02-08 18:28:52

您可以使用函数argmin聚合函数( https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/Reference/Argmin/ )以实现所需的结果。如果需要获取最新的行,则有gragmax聚合函数( https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/reference/argmax/ )。

argmin/argmax的想法是返回给定值的最小值或最大参数(在您的用例中,time)。

示例数据:

select * from table1; 

SELECT *
FROM table1

┌─name─┬─value─┬────────────────time─┐
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     9 │ 2022-05-23 01:02:11 │
│ A    │     7 │ 2022-05-23 01:03:21 │
│ B    │     5 │ 2022-05-23 01:04:23 │
│ B    │     6 │ 2022-05-23 01:05:33 │
│ C    │     7 │ 2022-05-23 01:06:30 │
│ C    │     8 │ 2022-05-23 01:07:41 │
│ C    │     3 │ 2022-05-23 01:08:44 │
│ C    │     7 │ 2022-05-23 01:09:50 │
└──────┴───────┴─────────────────────┘

12 rows in set. Elapsed: 0.002 sec.

由于您需要整个行(和argmin/argmax需要一个参数),因此需要将其转换为另一个结构(此示例中的元组),然后将其投射回他们的名称。以下查询返回第一个(或val表达式的最小值的参数):

SELECT 
    tpl.1 AS name, 
    tpl.2 AS value, 
    tpl.3 AS time
FROM (
    SELECT 
        argMin(tuple(*), time) as tpl
    FROM 
        table1 
    GROUP BY name
)
ORDER BY name


┌─name─┬─value─┬────────────────time─┐
│ A    │     5 │ 2022-05-23 01:01:12 │
│ B    │     5 │ 2022-05-23 01:04:23 │
│ C    │     7 │ 2022-05-23 01:06:30 │
└──────┴───────┴─────────────────────┘


3 rows in set. Elapsed: 0.004 sec. 

AS gragmax以类似的方式工作,但要获取最新的(或参数的参数) val表达式的最大值),这将为您提供最新的(最新)名称/值/时间行:

SELECT
    tpl.1 AS name,
    tpl.2 AS value,
    tpl.3 AS time
FROM
(
    SELECT argMax(tuple(*), time) AS tpl
    FROM table1
    GROUP BY name
)
ORDER BY name ASC


┌─name─┬─value─┬────────────────time─┐
│ A    │     7 │ 2022-05-23 01:03:21 │
│ B    │     6 │ 2022-05-23 01:05:33 │
│ C    │     7 │ 2022-05-23 01:09:50 │
└──────┴───────┴─────────────────────┘

3 rows in set. Elapsed: 0.002 sec.

You can use the function argMin aggregation function (https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/argmin/) to achieve the desired result. If you need to get the latest row, there is the argMax aggregation function (https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/argmax/).

The idea of argMin/argMax is to return the argument for the min or max of a given value (time, in your use case).

Sample data:

select * from table1; 

SELECT *
FROM table1

┌─name─┬─value─┬────────────────time─┐
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     5 │ 2022-05-23 01:01:12 │
│ A    │     9 │ 2022-05-23 01:02:11 │
│ A    │     7 │ 2022-05-23 01:03:21 │
│ B    │     5 │ 2022-05-23 01:04:23 │
│ B    │     6 │ 2022-05-23 01:05:33 │
│ C    │     7 │ 2022-05-23 01:06:30 │
│ C    │     8 │ 2022-05-23 01:07:41 │
│ C    │     3 │ 2022-05-23 01:08:44 │
│ C    │     7 │ 2022-05-23 01:09:50 │
└──────┴───────┴─────────────────────┘

12 rows in set. Elapsed: 0.002 sec.

As you need the whole row (and argMin/argMax require a single argument), it needs to be converted to another structure (a tuple in this example) and then projected back to their names. The following query returns the first (or the argument for the minimum value of val expression):

SELECT 
    tpl.1 AS name, 
    tpl.2 AS value, 
    tpl.3 AS time
FROM (
    SELECT 
        argMin(tuple(*), time) as tpl
    FROM 
        table1 
    GROUP BY name
)
ORDER BY name


┌─name─┬─value─┬────────────────time─┐
│ A    │     5 │ 2022-05-23 01:01:12 │
│ B    │     5 │ 2022-05-23 01:04:23 │
│ C    │     7 │ 2022-05-23 01:06:30 │
└──────┴───────┴─────────────────────┘


3 rows in set. Elapsed: 0.004 sec. 

As argMax works in a similar way but to get the latest (or argument for the maximum value of the val expression), this will give you the latest (most recent) name/value/time rows:

SELECT
    tpl.1 AS name,
    tpl.2 AS value,
    tpl.3 AS time
FROM
(
    SELECT argMax(tuple(*), time) AS tpl
    FROM table1
    GROUP BY name
)
ORDER BY name ASC


┌─name─┬─value─┬────────────────time─┐
│ A    │     7 │ 2022-05-23 01:03:21 │
│ B    │     6 │ 2022-05-23 01:05:33 │
│ C    │     7 │ 2022-05-23 01:09:50 │
└──────┴───────┴─────────────────────┘

3 rows in set. Elapsed: 0.002 sec.
橪书 2025-02-08 18:28:52

通常,您会使用row_number,但是ClickHouse不支持分析功能。我们可以使用加入方法:

SELECT t1.*
FROM table1 t1
INNER JOIN
(
    SELECT name, MIN(time) AS min_time
    FROM table1
    WHERE time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00'
    GROUP BY name
) t2
    ON t2.name = t1.name AND
       t2.min_time = t1.time
WHERE
    time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00';

Generally you would use ROW_NUMBER, but Clickhouse does not support analytic functions. We can use a join approach instead:

SELECT t1.*
FROM table1 t1
INNER JOIN
(
    SELECT name, MIN(time) AS min_time
    FROM table1
    WHERE time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00'
    GROUP BY name
) t2
    ON t2.name = t1.name AND
       t2.min_time = t1.time
WHERE
    time >= '2022-05-23 00:00:00' AND time <= '2022-05-23 01:00:00';
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文