SQLite group_concat 排序

发布于 2024-08-14 18:35:07 字数 247 浏览 8 评论 0原文

在 Sqlite 中,我可以使用 group_concat 来执行以下操作:

1...A
1...B
1...C
2...A
2...B
2...C

1...C,B,A
2...C,B,A

但根据文档,连接的顺序是随机的。

我需要将 group_concat 的输出排序为

1...A,B,C
2...A,B,C

我该怎么做?

In Sqlite I can use group_concat to do:

1...A
1...B
1...C
2...A
2...B
2...C

1...C,B,A
2...C,B,A

but the order of the concatenation is random - according to docs.

I need to sort the output of group_concat to be

1...A,B,C
2...A,B,C

How can I do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

木緿 2024-08-21 18:35:07

您不能使用带有 order by 子句的子选择,然后对值进行分组连接吗?

像这样的东西

SELECT ID, GROUP_CONCAT(Val)
FROM (
   SELECT ID, Val
   FROM YourTable
   ORDER BY ID, Val
   )
GROUP BY ID;

Can you not use a subselect with the order by clause in, and then group concat the values?

Something like

SELECT ID, GROUP_CONCAT(Val)
FROM (
   SELECT ID, Val
   FROM YourTable
   ORDER BY ID, Val
   )
GROUP BY ID;
陪我终i 2024-08-21 18:35:07

更准确地说,根据 docs

连接元素的顺序是任意的。

它并不真正意味着随机,它只是意味着开发人员保留使用他们想要的任何顺序的权利,甚至可以针对不同的查询或不同的 SQLite 版本使用不同的顺序。

在当前版本中,这一顺序可能是 Adrian Stander 的答案所暗示的顺序,因为他的代码似乎确实有效。因此,您可能只是通过一些单元测试来保护自己,然后就到此为止了。但是,如果不仔细检查 SQLite 的源代码,您永远无法 100% 确定它始终有效。

如果您愿意从源代码构建 SQLite,您也可以尝试编写自己的 用户定义的聚合函数,但还有一种更简单的方法。

幸运的是,从版本 3.25.0 开始,您有了窗口函数,提供了有保证的工作,虽然对你的问题的解决方案有点丑陋。

正如您在文档中所看到的,窗口函数有自己的 ORDER BY 子句:

在上面的示例中,窗口框架由前一行 (“1 PRECEDING”) 和后一行 (“1 FOLLOWING”) 之间的所有行组成,其中包含行,其中行根据 ORDER BY 子句进行排序window-defn(在本例中为“ORDER BY a”)。

请注意,仅此一点并不一定意味着所有聚合函数都遵循窗口框架内的顺序,但如果您查看单元测试,您可以看到实际情况是这样的:

do_execsql_test 4.10.1 {
  SELECT a, 
    count() OVER (ORDER BY a DESC),
    group_concat(a, '.') OVER (ORDER BY a DESC) 
  FROM t2 ORDER BY a DESC
} {
  6 1 6
  5 2 6.5
  4 3 6.5.4
  3 4 6.5.4.3
  2 5 6.5.4.3.2
  1 6 6.5.4.3.2.1
  0 7 6.5.4.3.2.1.0
}

因此,总而言之,您可以编写

SELECT ID, GROUP_CONCAT(Val) OVER (PARTITION BY ID ORDER BY Val) FROM YourTable;

结果:

1|A
1|A,B
1|A,B,C
2|A
2|A,B
2|A,B,C

不幸的是,它还包含您所需聚合的每个前缀。相反,您希望指定窗口框架始终包含完整范围,然后丢弃冗余值,如下所示:

SELECT DISTINCT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable;

或如下所示:

SELECT * FROM (
    SELECT ID, GROUP_CONCAT(Val)
    OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM YourTable
)
GROUP BY ID;

To be more precise, according to the docs:

The order of the concatenated elements is arbitrary.

It does not really mean random, it just means that the developers reserve the right to use whatever ordering they whish, even different ones for different queries or in different SQLite versions.

With the current version, this ordering might be the one implied by Adrian Stander's answer, as his code does seem to work. So you might just guard yourself with some unit tests and call it a day. But without examining the source code of SQLite really closely you can never be 100% sure this will always work.

If you are willing to build SQLite from source, you can also try to write your own user-defined aggregate function, but there is an easier way.

Fortunately, since version 3.25.0, you have window functions, providing a guaranteed-to-work, although somewhat ugly solution to your problem.

As you can see in the documentation, window functions have their own ORDER BY clauses:

In the example above, the window frame consists of all rows between the previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive, where rows are sorted according to the ORDER BY clause in the window-defn (in this case "ORDER BY a").

Note, that this alone would not necessarily mean that all aggregate functions respect the ordering inside a window frame, but if you take a look at the unit tests, you can see this is actually the case:

do_execsql_test 4.10.1 {
  SELECT a, 
    count() OVER (ORDER BY a DESC),
    group_concat(a, '.') OVER (ORDER BY a DESC) 
  FROM t2 ORDER BY a DESC
} {
  6 1 6
  5 2 6.5
  4 3 6.5.4
  3 4 6.5.4.3
  2 5 6.5.4.3.2
  1 6 6.5.4.3.2.1
  0 7 6.5.4.3.2.1.0
}

So, to sum it up, you can write

SELECT ID, GROUP_CONCAT(Val) OVER (PARTITION BY ID ORDER BY Val) FROM YourTable;

resulting in:

1|A
1|A,B
1|A,B,C
2|A
2|A,B
2|A,B,C

Which unfortunately also contains every prefix of your desired aggregations. Instead you want to specify the window frames to always contain the full range, then discard the redundant values, like this:

SELECT DISTINCT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable;

or like this:

SELECT * FROM (
    SELECT ID, GROUP_CONCAT(Val)
    OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM YourTable
)
GROUP BY ID;
九局 2024-08-21 18:35:07

偶然发现潜在的排序问题,我尝试了以下方法:
(...在 10.4.18-MariaDB 上)

select GROUP_CONCAT(ex.ID) as ID_list
FROM (
SELECT usr.ID
FROM (
SELECT u1.ID as ID
FROM table_users u1
) usr
GROUP BY ID
) ex

...并发现序列化的 ID_list 已排序!
但我对这个现在“正确”(?)的结果没有解释。

Stumbling upon the underlying sorting-problem I tried this:
(... on 10.4.18-MariaDB)

select GROUP_CONCAT(ex.ID) as ID_list
FROM (
SELECT usr.ID
FROM (
SELECT u1.ID as ID
FROM table_users u1
) usr
GROUP BY ID
) ex

... and found the serialized ID_list ordered!
But I don't have an explanation for this now "correct" (?) result.

祁梦 2024-08-21 18:35:07

替代解决方案:使用递归而不是 GROUP_CONCAT。
对于演示,这里有一个 WORKGROUPS 表:

create table Workgroups as 
select 1 as workgroup,'Daniel' as name union all
select 2,'Marc' union all
select 3,'Chris' union all
select 3,'Evelyn' union all
select 2,'Valentine' union all
select 1,'John' union all
select 3,'Luca' union all
select 2,'Thomas' union all
select 3,'Harry' union all
select 4,'Tom' union all
select 4,'Marilyn' union all
select 1,'Ben' union all
select 2,'Ann';

现在,我在WITH子句中准备一个排名视图,使用两个排名:
rk 为输出的一般顺序,以允许递归链接。 rk2 稍后将每组的最后一次出现识别为排名 1。
诀窍在于 Iif 子句,每当发生组中断时,它就会清除聚合链。

with Ranked as (
select workgroup,
  rank() over (order by workgroup,name) as rk,
  rank() over (partition by workgroup order by name desc) as rk2,
  name 
 from Workgroups
),Recursed as ( --follows the initial part
 select Ranked.workgroup,Ranked.rk,Ranked.rk2, Ranked.name as names from Ranked 
  where Ranked.rk=1 
   union all --follows the recursion part
 select Ranked.workgroup,Ranked.rk,Ranked.rk2, 
   iif(Recursed.workgroup=Ranked.workgroup,names || ', ','') || Ranked.name as names 
 from Recursed 
  join Ranked on Recursed.rk+1=Ranked.rk 
)
select workgroup,names from Recursed where rk2=1;

看起来有点笨拙,但很优雅,不是吗?自己尝试一下。

相对于 GROUP_CONCAT 的优点可能是,在 Iif 子句中,您可以做任何您想做的事情:复杂的分隔符、决定如何处理空(Null)条目、处理多个字段,甚至构建 HTML 代码等。

欢迎评论!

ALTERNATIVE solution: use recursion instead of GROUP_CONCAT.
For the demo, here's a WORKGROUPS table:

create table Workgroups as 
select 1 as workgroup,'Daniel' as name union all
select 2,'Marc' union all
select 3,'Chris' union all
select 3,'Evelyn' union all
select 2,'Valentine' union all
select 1,'John' union all
select 3,'Luca' union all
select 2,'Thomas' union all
select 3,'Harry' union all
select 4,'Tom' union all
select 4,'Marilyn' union all
select 1,'Ben' union all
select 2,'Ann';

Now, I prepare a Ranked view in a WITH clause, using two rankings:
rk for the general order of the output, in order to allow recursive chaining. rk2 to later identify the LAST occurrence of each group as rank 1.
The trick resides in the Iif clause, which clears the aggregated chain whenever a group break occurs.

with Ranked as (
select workgroup,
  rank() over (order by workgroup,name) as rk,
  rank() over (partition by workgroup order by name desc) as rk2,
  name 
 from Workgroups
),Recursed as ( --follows the initial part
 select Ranked.workgroup,Ranked.rk,Ranked.rk2, Ranked.name as names from Ranked 
  where Ranked.rk=1 
   union all --follows the recursion part
 select Ranked.workgroup,Ranked.rk,Ranked.rk2, 
   iif(Recursed.workgroup=Ranked.workgroup,names || ', ','') || Ranked.name as names 
 from Recursed 
  join Ranked on Recursed.rk+1=Ranked.rk 
)
select workgroup,names from Recursed where rk2=1;

Looks bit clumsy, but pretty elegant, is it not ? Try it yourself.

The advantage over GROUP_CONCAT may be that within the Iif clause, you can do whatever you wish: complex separators, decide what to do with empty (Null) entries, process multiple fields, even build HTML code etc.

Comments welcome !

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文