PostgreSQL ORDER BY 问题 - 自然排序
我在下表中遇到了 Postgres ORDER BY
问题:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
要向表中插入新记录,
- 我使用
SELECT * FROMEmployees ORDER BY em_code DESC
- 使用 reg exp 从 em_code 中剥离字母并存储在
ec_alpha
中 - 将重新匹配部分转换为整数
ec_num
- 递增 1
ec_num++
- 再次填充足够的 zeors 和前缀
ec_alpha
当 em_code
达到 EM1000 时,上述算法失败。
第一步将返回 EM999 而不是 EM1000,并且它将再次生成 EM1000 作为新的 em_code
,从而打破唯一键约束。
您知道如何选择 EM1000 吗?
I've got a Postgres ORDER BY
issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
- I select the last record with
SELECT * FROM employees ORDER BY em_code DESC
- Strip alphabets from em_code usiging reg exp and store in
ec_alpha
- Cast the remating part to integer
ec_num
- Increment by one
ec_num++
- Pad with sufficient zeors and prefix
ec_alpha
again
When em_code
reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code
, breaking the unique key constraint.
Any idea how to select EM1000?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
从 Postgres 9.6 开始,可以指定一个排序规则来自然地对带有数字的列进行排序。
https://www.postgresql.org/docs/10/collation.html
现在只需像其他方式一样查询即可。
SELECT * FROMEmployees ORDER BY em_code
在我的数据上,我按以下顺序得到结果(请注意,它还会对外国数字进行排序):
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
您可以采取的一种方法是为此创建一个
naturalsort
函数。这是一个由 Postgres legend RhodiumToad 编写的示例。来源:http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
要使用它,只需按以下方式调用您的订单中的函数:
One approach you can take is to create a
naturalsort
function for this. Here's an example, written by Postgres legend RhodiumToad.Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
原因是字符串按字母顺序排序(而不是像您希望的那样按数字排序),并且
1
排序在9
之前。您可以这样解决:
如果可以的话,从您的
em_code
中删除多余的“EM”并首先保存一个整数会更有效。评论中问题的答案
要从字符串中删除所有非数字:
\D
是正则表达式 class-shorthand 表示“非数字”。'g'
作为第四个参数是“全局”开关,用于将替换应用于字符串中的每个匹配项,而不仅仅是第一个匹配项。将所有非数字替换为空字符串后,仅保留数字。
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and
1
sorts before9
.You could solve it like this:
It would be more efficient to drop the redundant 'EM' from your
em_code
- if you can - and save an integer number to begin with.Answer to question in comment
To strip any and all non-digits from a string:
\D
is the regular expression class-shorthand for "non-digits".'g'
as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.After replacing every non-digit with the empty string, only digits remain.
在我自己的发展中,这总是会出现问题,我最终厌倦了这样做的棘手方法。我最终崩溃并将其实现为 PostgreSQL 扩展:
https://github.com/Bjond/pg_natural_sort_order
它是免费使用的,麻省理工学院许可证。
基本上,它只是规范化字符串中的数字(零前置数字),以便您可以创建一个索引列以进行自然的全速排序。自述文件解释了。
优点是您可以让触发器来完成工作,而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度进行计算,并且添加列的迁移变得简单而快速。
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
你可以只使用这一行
“按长度排序(子字符串(em_code FROM '[0-9]+')),em_code”
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
我在这个相关问题中详细介绍了这一点:
混合单词和数字字符串的人性化或自然数排序
(我发布此答案仅作为有用的交叉引用,因此它是社区维基)。
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
我想出了一些稍微不同的东西。
基本思想是创建一个元组数组
(integer, string)
,然后按它们排序。幻数 2147483647 是 int32_max,用于将字符串排序在数字之后。I came up with something slightly different.
The basic idea is to create an array of tuples
(integer, string)
and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.以下解决方案结合了另一个问题中提出的各种想法,以及经典解决方案:
该函数的设计目标是简单和纯字符串操作(没有自定义类型,也没有数组),因此它可以很容易地用作嵌入式解决方案,并且索引起来很简单。
注意:如果您期望数字超过
20
位,则必须将函数中的硬编码最大长度20
替换为合适的较大长度。请注意,这将直接影响结果字符串的长度,因此不要使该值大于所需的值。The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than
20
digits, you'll have to replace the hard-coded maximum length20
in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.