PostgreSQL ORDER BY 问题 - 自然排序

发布于 2025-01-03 03:49:22 字数 584 浏览 1 评论 0原文

我在下表中遇到了 Postgres ORDER BY 问题:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

要向表中插入新记录,

  1. 我使用 SELECT * FROMEmployees ORDER BY em_code DESC
  2. 使用 reg exp 从 em_code 中剥离字母并存储在 ec_alpha
  3. 将重新匹配部分转换为整数 ec_num
  4. 递增 1 ec_num++
  5. 再次填充足够的 zeors 和前缀 ec_alpha

em_code 达到 EM1000 时,上述算法失败。

第一步将返回 EM999 而不是 EM1000,并且它将再次生成 EM1000 作为新的 em_code,从而打破唯一键约束。

您知道如何选择 EM1000 吗?

I've got a Postgres ORDER BY issue with the following table:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,

  1. I select the last record with SELECT * FROM employees ORDER BY em_code DESC
  2. Strip alphabets from em_code usiging reg exp and store in ec_alpha
  3. Cast the remating part to integer ec_num
  4. Increment by one ec_num++
  5. Pad with sufficient zeors and prefix ec_alpha again

When em_code reaches EM1000, the above algorithm fails.

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.

Any idea how to select EM1000?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

以可爱出名 2025-01-10 03:49:22

从 Postgres 9.6 开始,可以指定一个排序规则来自然地对带有数字的列进行排序。

https://www.postgresql.org/docs/10/collat​​ion.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

现在只需像其他方式一样查询即可。

SELECT * FROMEmployees ORDER BY em_code

在我的数据上,我按以下顺序得到结果(请注意,它还会对外国数字进行排序):

0
0001
001
1
06
6
13
13
14

Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.

https://www.postgresql.org/docs/10/collation.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

Now just query as you would otherwise.

SELECT * FROM employees ORDER BY em_code

On my data, I get results in this order (note that it also sorts foreign numerals):

Value
0
0001
001
1
06
6
13
۱۳
14
往日 2025-01-10 03:49:22

您可以采取的一种方法是为此创建一个 naturalsort 函数。这是一个由 Postgres legend RhodiumToad 编写的示例。

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

来源:http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

要使用它,只需按以下方式调用您的订单中的函数:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
蓝海 2025-01-10 03:49:22

原因是字符串按字母顺序排序(而不是像您希望的那样按数字排序),并且 1 排序在 9 之前。
您可以这样解决:

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

如果可以的话,从您的 em_code 中删除多余的“EM”并首先保存一个整数会更有效。

评论中问题的答案

要从字符串中删除所有非数字:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\D 是正则表达式 class-shorthand 表示“非数字”。
'g' 作为第四个参数是“全局”开关,用于将替换应用于字符串中的每个匹配项,而不仅仅是第一个匹配项。

将所有非数字替换为空字符串后,仅保留数字。

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.

Answer to question in comment

To strip any and all non-digits from a string:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

After replacing every non-digit with the empty string, only digits remain.

此生挚爱伱 2025-01-10 03:49:22

在我自己的发展中,这总是会出现问题,我最终厌倦了这样做的棘手方法。我最终崩溃并将其实现为 PostgreSQL 扩展:

https://github.com/Bjond/pg_natural_sort_order

它是免费使用的,麻省理工学院许可证。

基本上,它只是规范化字符串中的数字(零前置数字),以便您可以创建一个索引列以进行自然的全速排序。自述文件解释了。

优点是您可以让触发器来完成工作,而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度进行计算,并且添加列的迁移变得简单而快速。

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:

https://github.com/Bjond/pg_natural_sort_order

It's free to use, MIT license.

Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.

The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.

守不住的情 2025-01-10 03:49:22

你可以只使用这一行
“按长度排序(子字符串(em_code FROM '[0-9]+')),em_code”

you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"

‖放下 2025-01-10 03:49:22

我在这个相关问题中详细介绍了这一点:

混合单词和数字字符串的人性化或自然数排序

(我发布此答案仅作为有用的交叉引用,因此它是社区维基)。

I wrote about this in detail in this related question:

Humanized or natural number sorting of mixed word-and-number strings

(I'm posting this answer as a useful cross-reference only, so it's community wiki).

凉墨 2025-01-10 03:49:22

我想出了一些稍微不同的东西。

基本思想是创建一个元组数组(integer, string),然后按它们排序。幻数 2147483647 是 int32_max,用于将字符串排序在数字之后。

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )

I came up with something slightly different.

The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )
猫卆 2025-01-10 03:49:22

以下解决方案结合了另一个问题中提出的各种想法,以及经典解决方案

create function natsort(s text) returns text immutable language sql as $
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$;

该函数的设计目标是简单和纯字符串操作(没有自定义类型,也没有数组),因此它可以很容易地用作嵌入式解决方案,并且索引起来很简单。

注意:如果您期望数字超过 20 位,则必须将函数中的硬编码最大长度 20 替换为合适的较大长度。请注意,这将直接影响结果字符串的长度,因此不要使该值大于所需的值。

The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:

create function natsort(s text) returns text immutable language sql as $
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$;

The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.

Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文