PostgreSQL ORDER BY 问题 - 自然排序

发布于 2025-01-03 03:49:22 字数 584 浏览 6 评论 0原文

我在下表中遇到了 Postgres ORDER BY 问题：

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

要向表中插入新记录，

我使用 SELECT * FROMEmployees ORDER BY em_code DESC
使用 reg exp 从 em_code 中剥离字母并存储在 ec_alpha 中
将重新匹配部分转换为整数 ec_num
递增 1 ec_num++
再次填充足够的 zeors 和前缀 ec_alpha

当 em_code 达到 EM1000 时，上述算法失败。

第一步将返回 EM999 而不是 EM1000，并且它将再次生成 EM1000 作为新的 em_code，从而打破唯一键约束。

您知道如何选择 EM1000 吗？

原文

I've got a Postgres ORDER BY issue with the following table:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,

I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again

When em_code reaches EM1000, the above algorithm fails.

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.

Any idea how to select EM1000?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

以可爱出名 2025-01-10 03:49:22

从 Postgres 9.6 开始，可以指定一个排序规则来自然地对带有数字的列进行排序。

https://www.postgresql.org/docs/10/collation.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

现在只需像其他方式一样查询即可。

SELECT * FROMEmployees ORDER BY em_code

在我的数据上，我按以下顺序得到结果（请注意，它还会对外国数字进行排序）：

值
0
0001
001
1
06
6
13
13
14

Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.

https://www.postgresql.org/docs/10/collation.html

-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');

-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;

Now just query as you would otherwise.

SELECT * FROM employees ORDER BY em_code

On my data, I get results in this order (note that it also sorts foreign numerals):

Value
0
0001
001
1
06
6
13
۱۳
14

回复收藏 0 原文

往日 2025-01-10 03:49:22

您可以采取的一种方法是为此创建一个 naturalsort 函数。这是一个由 Postgres legend RhodiumToad 编写的示例。

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

来源：http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

要使用它，只需按以下方式调用您的订单中的函数：

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

回复收藏 0 原文

蓝海 2025-01-10 03:49:22

原因是字符串按字母顺序排序（而不是像您希望的那样按数字排序），并且 1 排序在 9 之前。
您可以这样解决：

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

如果可以的话，从您的 em_code 中删除多余的“EM”并首先保存一个整数会更有效。

评论中问题的答案

要从字符串中删除所有非数字：

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\D 是正则表达式 class-shorthand 表示“非数字”。
'g' 作为第四个参数是“全局”开关，用于将替换应用于字符串中的每个匹配项，而不仅仅是第一个匹配项。

将所有非数字替换为空字符串后，仅保留数字。

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:

SELECT * FROM employees
ORDER  BY substring(em_code, 3)::int DESC;

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.

Answer to question in comment

To strip any and all non-digits from a string:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM   employees;

\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

After replacing every non-digit with the empty string, only digits remain.

回复收藏 0 原文

此生挚爱伱 2025-01-10 03:49:22

在我自己的发展中，这总是会出现问题，我最终厌倦了这样做的棘手方法。我最终崩溃并将其实现为 PostgreSQL 扩展：

https://github.com/Bjond/pg_natural_sort_order

它是免费使用的，麻省理工学院许可证。

基本上，它只是规范化字符串中的数字（零前置数字），以便您可以创建一个索引列以进行自然的全速排序。自述文件解释了。

优点是您可以让触发器来完成工作，而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度进行计算，并且添加列的迁移变得简单而快速。

回复收藏 0 原文

守不住的情 2025-01-10 03:49:22

你可以只使用这一行
“按长度排序（子字符串（em_code FROM '[0-9]+'）），em_code”

回复收藏 0 原文

‖放下 2025-01-10 03:49:22

我在这个相关问题中详细介绍了这一点：

混合单词和数字字符串的人性化或自然数排序

（我发布此答案仅作为有用的交叉引用，因此它是社区维基）。

回复收藏 0 原文

凉墨 2025-01-10 03:49:22

我想出了一些稍微不同的东西。

基本思想是创建一个元组数组(integer, string)，然后按它们排序。幻数 2147483647 是 int32_max，用于将字符串排序在数字之后。

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )

I came up with something slightly different.

The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )

回复收藏 0 原文

猫卆 2025-01-10 03:49:22

以下解决方案结合了另一个问题中提出的各种想法，以及经典解决方案：

create function natsort(s text) returns text immutable language sql as $
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$;

该函数的设计目标是简单和纯字符串操作（没有自定义类型，也没有数组），因此它可以很容易地用作嵌入式解决方案，并且索引起来很简单。

注意：如果您期望数字超过 20 位，则必须将函数中的硬编码最大长度 20 替换为合适的较大长度。请注意，这将直接影响结果字符串的长度，因此不要使该值大于所需的值。

The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:

create function natsort(s text) returns text immutable language sql as $
  select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
  from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$;

The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.

Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

回复收藏 0 原文

~没有更多了~