当前位置：文江博客话题详情

MySQL 中的自然排序

发布于 2024-07-07 00:15:05 字数 603 浏览 17 评论 0原文

有没有一种优雅的方法可以在 MySQL 数据库中实现高性能、自然的排序？

例如，如果我有以下数据集：

Final Fantasy
最终幻想 4
Final Fantasy 10
Final Fantasy 12
Final Fantasy 12: Chains of Promathia
Final Fantasy Adventure
Final Fantasy Origins
Final Fantasy Tactics

除了拆分之外的任何其他优雅解决方案将游戏名称添加到其组件中

标题：“最终幻想”
编号：“12”
副标题：“Promathia 之链”

以确保它们以正确的顺序出现？（10 在 4 之后，而不是在 2 之前）。

这样做是一件很痛苦的事情，因为时不时就会有另一款游戏打破游戏标题的解析机制（例如“战锤 40,000”、“詹姆斯·邦德 007”）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伤感在游骋 2024-07-14 00:15:06

刚刚发现：

SELECT names FROM your_table ORDER BY games + 0 ASC

当数字位于前面时进行自然排序，也可能适用于中间。

Just found this:

SELECT names FROM your_table ORDER BY games + 0 ASC

Does a natural sort when the numbers are at the front, might work for middle as well.

回复收藏 0 原文

听不够的曲调 2024-07-14 00:15:06

我不久前为 MSSQL 2000 编写了这个函数：

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO

I've written this function for MSSQL 2000 a while ago:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO

回复收藏 0 原文

九命猫 2024-07-14 00:15:06

我认为这就是为什么很多事情都是按发布日期排序的。

解决方案可能是在表中为“SortKey”创建另一列。这可能是标题的净化版本，符合您为轻松排序或计数器而创建的模式。

回复收藏 0 原文

兮颜 2024-07-14 00:15:06

与 @plalx 发布的功能相同，但重写为 MySQL：

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

用法：

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")

Same function as posted by @plalx, but rewritten to MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

Usage:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")

回复收藏 0 原文

瀟灑尐姊 2024-07-14 00:15:06

关于 Richard Toth 的最佳回应 https://stackoverflow.com/a/12257917/4052357

注意 UTF8 编码的字符串包含2字节（或更多）字符和数字，例如

12 南新宿

在udf_NaturalSortFormat函数中使用MySQL的LENGTH()将返回字符串的字节长度并且不正确，而是使用CHAR_LENGTH() 将返回正确的字符长度。

在我的例子中，使用 LENGTH() 导致查询永远无法完成，并导致 MySQL 的 CPU 使用率达到 100%

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

。我会将此添加为原始评论，但我还没有足够的声誉

Regarding the best response from Richard Toth https://stackoverflow.com/a/12257917/4052357

Watch out for UTF8 encoded strings that contain 2byte (or more) characters and numbers e.g.

12 南新宿

Using MySQL's LENGTH() in udf_NaturalSortFormat function will return the byte length of the string and be incorrect, instead use CHAR_LENGTH() which will return the correct character length.

In my case using LENGTH() caused queries to never complete and result in 100% CPU usage for MySQL

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

p.s. I would have added this as a comment to the original but I don't have enough reputation (yet)

回复收藏 0 原文

无远思近则忧 2024-07-14 00:15:06

在表格中添加排序键（排名）。 ORDER BY排名
利用“发布日期”列。 ORDER BY release_date
当从 SQL 中提取数据时，让你的对象进行排序，例如，如果提取到 Set 中，则将其设为 TreeSet，并使你的数据模型实现 Comparable 并制定集合中时，您将从 SQL 中逐一读取行）

回复收藏 0 原文

獨角戲 2024-07-14 00:15:06

订购：
0
1
2
10
23
101
205
1000
一个
aac
b
卡萨萨萨
css

使用此查询：

SELECT 
    column_name 
FROM 
    table_name 
ORDER BY
    column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#\$\%\^\*\(\)\;\:\\,\?\/\~\`\|\_\-]' DESC, 
    column_name + 0, 
    column_name;

To order:
0
1
2
10
23
101
205
1000
a
aac
b
casdsadsa
css

Use this query:

SELECT 
    column_name 
FROM 
    table_name 
ORDER BY
    column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\@\#\$\%\^\*\(\)\;\:\\,\?\/\~\`\|\_\-]' DESC, 
    column_name + 0, 
    column_name;

回复收藏 0 原文

初吻给了烟 2024-07-14 00:15:06

添加一个“排序键”字段，将所有数字字符串用零填充到固定长度，然后在该字段上进行排序。

如果您可能有很长的数字字符串，另一种方法是在每个数字字符串前面添加数字位数（固定宽度、零填充）。例如，如果连续数字不超过 99 位，则对于“Super Blast 10 Ultra”，排序键将为“Super Blast 0210 Ultra”。

回复收藏 0 原文

我一向站在原地 2024-07-14 00:15:06

如果您不想重新发明轮子或对大量不起作用的代码感到头疼，只需使用 Drupal Natural Sort ...只需运行压缩的 SQL（MySQL 或 Postgre），就这样。进行查询时，只需使用以下命令即可：

... ORDER BY natsort_canon(column_name, 'natural')

If you do not want to reinvent the wheel or have a headache with lot of code that does not work, just use Drupal Natural Sort ... Just run the SQL that comes zipped (MySQL or Postgre), and that's it. When making a query, simply order using:

... ORDER BY natsort_canon(column_name, 'natural')

回复收藏 0 原文

朮生 2024-07-14 00:15:06

另一种选择是从 mysql 拉取数据后在内存中进行排序。虽然从性能的角度来看这不是最好的选择，但如果您不对巨大的列表进行排序，那么应该没问题。

如果您查看 Jeff 的帖子，您可以找到适合您可能使用的任何语言的大量算法。
人类排序：自然排序

回复收藏 0 原文

赢得她心 2024-07-14 00:15:06

您还可以以动态方式创建“排序列”：

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name

这样，您可以创建要排序的组。

在我的查询中，我希望所有内容前面都有“-”，然后是数字，然后是文本。这可能会导致类似的结果：

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows

这样，您在添加数据时不必以正确的顺序维护排序列。您还可以根据需要更改排序顺序。

You can also create in a dynamic way the "sort column" :

SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum 
FROM table 
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name

That way, you can create groups to sort.

In my query, I wanted the '-' in front of everything, then the numbers, then the text. Which could result in something like :

-
0    
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows

That way you don't have to maintain the sort column in the correct order as you add data. You can also change your sort order depending on what you need.

回复收藏 0 原文

岁吢 2024-07-14 00:15:06

我在这里看到的许多其他答案（以及在重复的问题中）基本上只适用于非常特定格式的数据，例如完全是数字的字符串，或者有固定长度的字母前缀。这在一般情况下是行不通的。

确实，没有任何方法可以在 MySQL 中实现 100% 通用 nat-sort，因为要做到这一点，您真正需要的是修改后的比较函数，该函数可以切换如果/当遇到数字时，介于字符串的字典排序和数字排序之间。此类代码可以实现您想要识别和比较两个字符串中的数字部分的任何算法。但不幸的是，MySQL 中的比较函数是其代码内部的，用户无法更改。

这会留下某种黑客行为，您尝试为字符串创建一个排序键，其中数字部分被重新格式化，以便标准词典排序实际上按照您的方式对它们进行排序想要。

对于最大位数的普通整数，明显的解决方案是简单地用零填充它们，以便它们都是固定宽度的。这是 Drupal 插件采用的方法，也是 @plalx / @RichardToth 的解决方案。（@Christian 有一个不同的、更复杂的解决方案，但它没有提供我所看到的优势）。

正如 @tye 指出的，您可以通过在每个数字前面添加固定数字长度来改进这一点，而不是简单地向左填充它。不过，即使考虑到本质上是一个尴尬的黑客攻击的局限性，您还有很多很多可以改进的地方。然而，似乎没有任何预先构建的解决方案！

例如：

加号和减号怎么样？ +10 vs 10 vs -10
小数？ 8.2、8.5、1.006、.75
前导零？ 020, 030, 00000922
千位分隔符？ “1,001 斑点狗”与“1001 斑点狗”
版本号？ MariaDB v10.3.18 与 MariaDB v10.3.3
非常长的数字？ 103,768,276,592,092,364,859,236,487,687,870,234,598.55

扩展@tye的方法，我创建了一个相当紧凑的NatSortKey()存储函数，它将把任意字符串转换为nat排序键，并且处理所有上述情况，相当高效，并保留总排序-order（没有两个不同的字符串具有比较相等的排序键）。第二个参数可用于限制每个字符串中处理的数字数量（例如，前 10 个数字），这可用于确保输出适合给定长度。

注意：使用第二个参数的给定值生成的排序键字符串只能与使用该参数的相同值生成的其他字符串进行排序，否则它们可能无法正确排序！< /strong>

您可以直接在排序中使用它，例如，

SELECT myString FROM myTable ORDER BY NatSortKey(myString,0);  ### 0 means process all numbers - resulting sort key might be quite long for certain inputs

但是为了对大型表进行有效排序，最好将排序键预先存储在另一列中（可能带有索引）：

INSERT INTO myTable (myString,myStringNSK) VALUES (@theStringValue,NatSortKey(@theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;

[理想情况下，您可以通过将键列创建为计算存储列来自动实现这一点，使用如下内容：

CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);

但目前 MySQL 和 MariaDB 都不允许在计算列中存储函数，所以不幸的是 < strong>您还不能执行此操作。]

我的函数仅影响数字排序。如果您想要执行其他排序规范化操作，例如删除所有标点符号，或修剪两端的空格，或用单个空格替换多空格序列，您可以扩展该函数，也可以在 < 之前或之后完成code>NatSortKey() 应用于您的数据。（我建议使用 REGEXP_REPLACE() 为此目的）。

这也有点以英语为中心，因为我认为“。” 对于小数点和“，”对于千位分隔符，但是如果您想要相反，或者如果您希望它可以作为参数进行切换，那么修改它应该很容易。

可能可以通过其他方式进一步改进；例如，它当前按绝对值对负数进行排序，因此 -1 位于 -2 之前，而不是相反。也无法指定数字的 DESC 排序顺序，同时保留文本的 ASC 字典排序。这两个问题都可以通过更多的工作来解决；如果/当我有时间时，我会更新代码。

还有许多其他细节需要注意 - 包括对您正在使用的 chaset 和排序规则的一些关键依赖项 - 但我已将它们全部放入 SQL 代码中的注释块中。 请在亲自使用该函数之前仔细阅读本文！

所以，这是代码。如果您发现错误，或者有我没有提到的改进，请在评论中告诉我！

delimiter $
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
  Converts numbers in the input string s into a format such that sorting results in a nat-sort.
  Numbers of up to 359 digits (before the decimal point, if one is present) are supported.  Sort results are undefined if the input string contains numbers longer than this.
  For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
  Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
  Numbers may contain ',' as a thousands separator, and '.' as a decimal point.  To reverse these (as appropriate for some European locales), the code would require modification.
  Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
  Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
  Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
  Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
  Numbers with thousand separators sort after the same number without them.
  Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
  (When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
  Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
  The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
  Note that The first number component in an entity like this is also permitted to contain thousand separators.

  To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
  - The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros.  If there is a decimal point, this length only includes the integer part of the number.
  - A 3-character suffix is appended after the number (after the decimals if present).
    - The first character is a space, or a '+' sign if the number was preceded by '+'.  Any preceding '+' sign is also removed from the front of the number.
    - This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
    - The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
  - For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
    Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.

  e.g. here is how some simple sample strings get converted:
  'Foo055' --> 'Foo0255 02'
  'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
  'The $1,000,000 prize' --> 'The $071000000 01 prize'
  '+99.74 degrees' --> '0299.74+00 degrees'
  'I have 0 apples' --> 'I have 00 02 apples'
  '.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
  'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'

  The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
  The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
  A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
  However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.

  The function definition does not specify the charset or collation to be used for string-type parameters or variables:  The default database charset & collation at the time the function is defined will be used.
  This is to make the function code more portable.  However, there are some important restrictions:

  - Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
    This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
    To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.

  - For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
    This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
    If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()

  Length of the output:

  Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
  This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
  If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
  Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
  So for the current input length of 100, the maximum output length is 350.
  If changing the input length, the output length must be modified according to the above formula.  The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
  DECLARE x,y varchar(100);            # need to be same length as input s
  DECLARE r varchar(350) DEFAULT '';   # return value:  needs to be same length as return type
  DECLARE suf varchar(101);   # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
  DECLARE i,j,k int UNSIGNED;
  IF n<=0 THEN SET n := -1; END IF;   # n<=0 means "process all numbers"
  LOOP
    SET i := REGEXP_INSTR(s,'\\d');   # find position of next digit
    IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF;   # no more numbers to process -> we're done
    SET n := n-1, suf := ' ';
    IF i>1 THEN
      IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\\p{L}\\p{N}\\p{M}\\x{608}\\x{200C}\\x{200D}\\x{2100}-\\x{214F}\\x{24B6}-\\x{24E9}\\x{1F130}-\\x{1F149}\\x{1F150}-\\x{1F169}\\x{1F170}-\\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\\d++\\.\\d') THEN SET i:=i-1; END IF;   # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
      IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF;   # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
      SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i);   # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
    END IF;
    SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\\d*+(?:\\.\\d++)*','^(?:[1-9]\\d{0,2}(?:,\\d{3}(?!\\d))++|\\d++)(?:\\.\\d++)*+'));   # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
    SET s := SUBSTRING(s,LENGTH(x)+1);   # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
    SET i := INSTR(x,'.');
    IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF;   # move any following decimals into y
    SET i := LENGTH(x);
    SET x := REPLACE(x,',','');
    SET j := LENGTH(x);
    SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
    SET k := LENGTH(x);
    SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0'));   # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise)  Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
    SET i := LOCATE('.',y,2);
    IF i=0 THEN
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf);   # k = count of digits in number, bounded to be <= '9Z' base-36
    ELSE   # encode a version number (like 3.12.707, etc)
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
      WHILE LENGTH(y)>0 AND n!=0 DO
        IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
        SET j := LENGTH(x);
        SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
        SET k := LENGTH(x);
        SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
        SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0'));   # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
        SET n := n-1;
      END WHILE;
      SET r := CONCAT(r,y,suf);
    END IF;
  END LOOP;
END
$
delimiter ;

A lot of other answers I see here (and in the duplicate questions) basically only work for very specifically formatted data, e.g. a string that's entirely a number, or for which there's a fixed-length alphabetic prefix. This isn't going to work in the general case.

It's true that there's not really any way to implement a 100% general nat-sort in MySQL, because to do it what you really need is a modified comparison function, that switches between lexicographic sorting of the strings and numeric sort if/when it encounters a number. Such code could implement any algorithm you could desire for recognising and comparing the numeric portions within two strings. Unfortunately, though, the comparison function in MySQL is internal to its code, and cannot be changed by the user.

This leaves a hack of some kind, where you try to create a sort key for your string in which the numeric parts are re-formatted so that the standard lexicographic sort actually sorts them the way you want.

For plain integers up to some maximum number of digits, the obvious solution is to simply left-pad them with zeros so that they're all fixed width. This is the approach taken by the Drupal plugin, and the solutions of @plalx / @RichardToth. (@Christian has a different and much more complex solution, but it offers no advantages that I can see).

As @tye points out, you can improve on this by prepending a fixed-digit length to each number, rather than simply left-padding it. There's much, much more you can improve on, though, even given the limitations of what is essentially an awkward hack. Yet, there doesn't seem to be any pre-built solutions out there!

For example, what about:

Plus and minus signs? +10 vs 10 vs -10
Decimals? 8.2, 8.5, 1.006, .75
Leading zeros? 020, 030, 00000922
Thousand separators? "1,001 Dalmations" vs "1001 Dalmations"
Version numbers? MariaDB v10.3.18 vs MariaDB v10.3.3
Very long numbers? 103,768,276,592,092,364,859,236,487,687,870,234,598.55

Extending on @tye's method, I've created a fairly compact NatSortKey() stored function that will convert an arbitrary string into a nat-sort key, and that handles all of the above cases, is reasonably efficient, and preserves a total sort-order (no two different strings have sort keys that compare equal). A second parameter can be used to limit the number of numbers processed in each string (e.g. to the first 10 numbers, say), which can be used to ensure the output fits within a given length.

NOTE: Sort-key string generated with a given value of this 2nd parameter should only be sorted against other strings generated with the same value for the parameter, or else they might not sort correctly!

You can use it directly in ordering, e.g.

SELECT myString FROM myTable ORDER BY NatSortKey(myString,0);  ### 0 means process all numbers - resulting sort key might be quite long for certain inputs

But for efficient sorting of large tables, it's better to pre-store the sort key in another column (possibly with an index on it):

INSERT INTO myTable (myString,myStringNSK) VALUES (@theStringValue,NatSortKey(@theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;

[Ideally, you'd make this happen automatically by creating the key column as a computed stored column, using something like:

CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);

But for now neither MySQL nor MariaDB allow stored functions in computed columns, so unfortunately you can't yet do this.]

My function affects sorting of numbers only. If you want to do other sort-normalization things, such as removing all punctuation, or trimming whitespace off each end, or replacing multi-whitespace sequences with single spaces, you could either extend the function, or it could be done before or after NatSortKey() is applied to your data. (I'd recommend using REGEXP_REPLACE() for this purpose).

It's also somewhat Anglo-centric in that I assume '.' for a decimal point and ',' for the thousands-separator, but it should be easy enough to modify if you want the reverse, or if you want that to be switchable as a parameter.

It might be amenable to further improvement in other ways; for example it currently sorts negative numbers by absolute value, so -1 comes before -2, rather than the other way around. There's also no way to specify a DESC sort order for numbers while retaining ASC lexicographical sort for text. Both of these issues can be fixed with a little more work; I will updated the code if/when I get the time.

There are lots of other details to be aware of - including some critical dependencies on the chaset and collation that you're using - but I've put them all into a comment block within the SQL code. Please read this carefully before using the function for yourself!

So, here's the code. If you find a bug, or have an improvement I haven't mentioned, please let me know in the comments!

delimiter $
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
  Converts numbers in the input string s into a format such that sorting results in a nat-sort.
  Numbers of up to 359 digits (before the decimal point, if one is present) are supported.  Sort results are undefined if the input string contains numbers longer than this.
  For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
  Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
  Numbers may contain ',' as a thousands separator, and '.' as a decimal point.  To reverse these (as appropriate for some European locales), the code would require modification.
  Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
  Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
  Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
  Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
  Numbers with thousand separators sort after the same number without them.
  Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
  (When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
  Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
  The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
  Note that The first number component in an entity like this is also permitted to contain thousand separators.

  To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
  - The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros.  If there is a decimal point, this length only includes the integer part of the number.
  - A 3-character suffix is appended after the number (after the decimals if present).
    - The first character is a space, or a '+' sign if the number was preceded by '+'.  Any preceding '+' sign is also removed from the front of the number.
    - This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
    - The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
  - For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
    Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.

  e.g. here is how some simple sample strings get converted:
  'Foo055' --> 'Foo0255 02'
  'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
  'The $1,000,000 prize' --> 'The $071000000 01 prize'
  '+99.74 degrees' --> '0299.74+00 degrees'
  'I have 0 apples' --> 'I have 00 02 apples'
  '.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
  'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'

  The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
  The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
  A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
  However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.

  The function definition does not specify the charset or collation to be used for string-type parameters or variables:  The default database charset & collation at the time the function is defined will be used.
  This is to make the function code more portable.  However, there are some important restrictions:

  - Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
    This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
    To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.

  - For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
    This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
    If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()

  Length of the output:

  Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
  This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
  If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
  Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
  So for the current input length of 100, the maximum output length is 350.
  If changing the input length, the output length must be modified according to the above formula.  The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
  DECLARE x,y varchar(100);            # need to be same length as input s
  DECLARE r varchar(350) DEFAULT '';   # return value:  needs to be same length as return type
  DECLARE suf varchar(101);   # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
  DECLARE i,j,k int UNSIGNED;
  IF n<=0 THEN SET n := -1; END IF;   # n<=0 means "process all numbers"
  LOOP
    SET i := REGEXP_INSTR(s,'\\d');   # find position of next digit
    IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF;   # no more numbers to process -> we're done
    SET n := n-1, suf := ' ';
    IF i>1 THEN
      IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\\p{L}\\p{N}\\p{M}\\x{608}\\x{200C}\\x{200D}\\x{2100}-\\x{214F}\\x{24B6}-\\x{24E9}\\x{1F130}-\\x{1F149}\\x{1F150}-\\x{1F169}\\x{1F170}-\\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\\d++\\.\\d') THEN SET i:=i-1; END IF;   # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
      IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF;   # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
      SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i);   # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
    END IF;
    SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\\d*+(?:\\.\\d++)*','^(?:[1-9]\\d{0,2}(?:,\\d{3}(?!\\d))++|\\d++)(?:\\.\\d++)*+'));   # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
    SET s := SUBSTRING(s,LENGTH(x)+1);   # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
    SET i := INSTR(x,'.');
    IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF;   # move any following decimals into y
    SET i := LENGTH(x);
    SET x := REPLACE(x,',','');
    SET j := LENGTH(x);
    SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
    SET k := LENGTH(x);
    SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0'));   # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise)  Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
    SET i := LOCATE('.',y,2);
    IF i=0 THEN
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf);   # k = count of digits in number, bounded to be <= '9Z' base-36
    ELSE   # encode a version number (like 3.12.707, etc)
      SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
      WHILE LENGTH(y)>0 AND n!=0 DO
        IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
        SET j := LENGTH(x);
        SET x := TRIM(LEADING '0' FROM x);   # strip leading zeros
        SET k := LENGTH(x);
        SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x);   # k = count of digits in number, bounded to be <= '9Z' base-36
        SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0'));   # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
        SET n := n-1;
      END WHILE;
      SET r := CONCAT(r,y,suf);
    END IF;
  END LOOP;
END
$
delimiter ;

回复收藏 0 原文

樱花细雨 2024-07-14 00:15:06

我知道这个话题很古老，但我想我已经找到了一种方法来做到这一点：

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC

废弃那个，它错误地对以下集合进行了排序（这是无用的哈哈）：

最终幻想1
最终幻想2
最终幻想5
最终幻想7
最终幻想 7：降临之子
最终幻想12
最终幻想112
FF1
FF2

I know this topic is ancient but I think I've found a way to do this:

SELECT * FROM `table` ORDER BY 
CONCAT(
  GREATEST(
    LOCATE('1', name),
    LOCATE('2', name),
    LOCATE('3', name),
    LOCATE('4', name),
    LOCATE('5', name),
    LOCATE('6', name),
    LOCATE('7', name),
    LOCATE('8', name),
    LOCATE('9', name)
   ),
   name
) ASC

Scrap that, it sorted the following set incorrectly (It's useless lol):

Final Fantasy 1
Final Fantasy 2
Final Fantasy 5
Final Fantasy 7
Final Fantasy 7: Advent Children
Final Fantasy 12
Final Fantasy 112
FF1
FF2

回复收藏 0 原文

╰沐子 2024-07-14 00:15:06

该函数将为给定的输入字符串生成自然排序键。
该函数接受输入字符串，修剪前导和尾随空格，将其转换为小写以保持一致性，然后通过连接以下组件创建排序键：

SOUNDEX 值：SOUNDEX 值是以下内容的语音表示形式：输入字符串，这有助于捕获听起来相似的字符串。

字符串长度：原始字符串的长度附加到排序键。这可确保当 SOUNDEX 值相同时，字符串按长度排序。

原始字符串：最后，将原始字符串附加到排序键。这可确保在相同长度和 SOUNDEX 值的字符串内，保持字典顺序。

此函数对于需要考虑语音相似性和字符串长度的自然排序顺序的场景特别有用。

DROP FUNCTION IF EXISTS `Natural_Sort`;
DELIMITER $
CREATE FUNCTION `Natural_Sort` (`in_string` varchar(500)) 
RETURNS varchar(500)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    SET in_string = TRIM(LOWER(in_string));
    RETURN CONCAT(RPAD(SOUNDEX(in_string),25, '0') ,'_' ,LENGTH(in_string) ,'_', in_string);
END
$

用法：

-- Example Usage:
SELECT Generate_Natural_Sort_Key('Example String') AS SortKey;

上面的查询将根据输入字符串“示例字符串”所描述的算法返回排序键。

This function will generate a natural sort key for a given input string.
The function takes an input string, trims leading and trailing whitespaces, converts it to lowercase for consistency, and then creates a sort key by concatenating the following components:

SOUNDEX Value: The SOUNDEX value is a phonetic representation of the input string, which helps in capturing similar-sounding strings.

Length of the String: The length of the original string is appended to the sort key. This ensures that strings are sorted by length when their SOUNDEX values are identical.

Original String: Finally, the original string is appended to the sort key. This ensures that within strings of the same length and SOUNDEX value, the lexicographical order is maintained.

This function is particularly useful for scenarios where a natural sorting order is desired, considering both phonetic similarity and string length.

DROP FUNCTION IF EXISTS `Natural_Sort`;
DELIMITER $
CREATE FUNCTION `Natural_Sort` (`in_string` varchar(500)) 
RETURNS varchar(500)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    SET in_string = TRIM(LOWER(in_string));
    RETURN CONCAT(RPAD(SOUNDEX(in_string),25, '0') ,'_' ,LENGTH(in_string) ,'_', in_string);
END
$

Usage:

-- Example Usage:
SELECT Generate_Natural_Sort_Key('Example String') AS SortKey;

The above query will return a sort key based on the described algorithm for the input string 'Example String'.

回复收藏 0 原文

伴我心暖 2024-07-14 00:15:06

我尝试了几种解决方案，但实际上很简单：

SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC

/* 
Result 
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/

I have tried several solutions but the actually it is very simple:

SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC

/* 
Result 
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/

回复收藏 0 原文

只是偏爱你 2024-07-14 00:15:06

还有natsort。它旨在成为 drupal 插件的一部分，但它可以独立运行。

回复收藏 0 原文

青萝楚歌 2024-07-14 00:15:06

如果标题仅将版本作为数字，则这是一个简单的方法：

ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';

否则，如果您使用模式，则可以使用简单的 SQL（此模式在版本之前使用 #）：

create table titles(title);

insert into titles (title) values 
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');

select REGEXP_REPLACE(title, "#([0-9]+)", "\\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\\1") AS INT);     
+-------------------+
| title             |
+-------------------+
| Bond 007          |
| Bond 007 #2       |
| Final Fantasy     |
| Final Fantasy 01  |
| Final Fantasy 2   |
| Final Fantasy 03  |
| Final Fantasy 10  |
| Final Fantasy 11  |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)

如果需要，您可以使用其他模式。
例如，如果您有一部电影“我#1”和“我#1 第 2 部分”，则可以包装该版本，例如“最终幻想 {11}”

Here is a simple one if titles only have the version as a number:

ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';

Otherwise you can use simple SQL if you use a pattern (this pattern uses a # before the version):

create table titles(title);

insert into titles (title) values 
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');

select REGEXP_REPLACE(title, "#([0-9]+)", "\\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\\1") AS INT);     
+-------------------+
| title             |
+-------------------+
| Bond 007          |
| Bond 007 #2       |
| Final Fantasy     |
| Final Fantasy 01  |
| Final Fantasy 2   |
| Final Fantasy 03  |
| Final Fantasy 10  |
| Final Fantasy 11  |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)

You can use other patterns if needed.
For example if you have a movie "I'm #1" and "I'm #1 part 2" then maybe wrap the version e.g. "Final Fantasy {11}"

回复收藏 0 原文

薄情伤 2024-07-14 00:15:06

@plaix/Richard Toth/Luke Hoggett 最佳响应的简化非 udf 版本仅适用于字段中的第一个整数，如下

SELECT name,
LEAST(
    IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
    SUBSTR(name, 1, first_int - 1),
    LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
    SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC

A simplified non-udf version of the best response of @plaix/Richard Toth/Luke Hoggett, which works only for the first integer in the field, is

SELECT name,
LEAST(
    IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
    IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
    SUBSTR(name, 1, first_int - 1),
    LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
    SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC

回复收藏 0 原文

水波映月 2024-07-14 00:15:06

其他答案是正确的，但您可能想知道 MariaDB 10.11 LTS 有一个 natural_sort_key() 函数。该函数记录在此处。

回复收藏 0 原文

优雅的叶子 2024-07-14 00:15:06

如果您使用 PHP，您可以在 php 中进行自然排序。

$keys = array();
$values = array();
foreach ($results as $index => $row) {
   $key = $row['name'].'__'.$index; // Add the index to create an unique key.
   $keys[] = $key;
   $values[$key] = $row; 
}
natsort($keys);
$sortedValues = array(); 
foreach($keys as $index) {
  $sortedValues[] = $values[$index]; 
}

我希望MySQL在未来的版本中实现自然排序，但是功能请求(#1588)< /a> 自 2003 年以来一直开放，所以我不会屏住呼吸。

If you're using PHP you can do the the natural sort in php.

$keys = array();
$values = array();
foreach ($results as $index => $row) {
   $key = $row['name'].'__'.$index; // Add the index to create an unique key.
   $keys[] = $key;
   $values[$key] = $row; 
}
natsort($keys);
$sortedValues = array(); 
foreach($keys as $index) {
  $sortedValues[] = $values[$index]; 
}

I hope MySQL will implement natural sorting in a future version, but the feature request (#1588) is open since 2003, So I wouldn't hold my breath.

回复收藏 0 原文

甜味超标? 2024-07-14 00:15:06

因此，虽然我知道您已经找到了满意的答案，但我在这个问题上挣扎了一段时间，而且我们之前已经确定它无法在 SQL 中很好地完成，因此我们将不得不在 JSON 上使用 javascript大批。

以下是我仅使用 SQL 解决该问题的方法。希望这对其他人有帮助：

我有这样的数据：

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

我实际上没有“投射”东西，尽管我认为这也可能有效。

我首先替换了数据中不变的部分，在本例中为“场景”，然后进行了 LPAD 来排列内容。这似乎可以很好地对字母字符串和编号字符串进行正确排序。

我的 ORDER BY 子句看起来像：

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

显然这对解决不太统一的原始问题没有帮助 - 但我想这可能适用于许多其他相关问题，所以把它放在那里。

So, while I know that you have found a satisfactory answer, I was struggling with this problem for awhile, and we'd previously determined that it could not be done reasonably well in SQL and we were going to have to use javascript on a JSON array.

Here's how I solved it just using SQL. Hopefully this is helpful for others:

I had data such as:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

I actually didn't "cast" things though I suppose that may also have worked.

I first replaced the parts that were unchanging in the data, in this case "Scene ", and then did a LPAD to line things up. This seems to allow pretty well for the alpha strings to sort properly as well as the numbered ones.

My ORDER BY clause looks like:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

Obviously this doesn't help with the original problem which was not so uniform - but I imagine this would probably work for many other related problems, so putting it out there.

回复收藏 0 原文

前事休说 2024-07-14 00:15:06

MySQL 不允许这种“自然排序”，因此看起来获得您想要的内容的最佳方法是按照上面所述拆分数据设置（单独的 id 字段等），否则会失败即，根据非标题元素、数据库中的索引元素（日期、数据库中插入的 ID 等）执行排序。

让数据库为您进行排序几乎总是比将大型数据集读入您选择的编程语言并在那里进行排序要快，因此，如果您对此处的数据库模式有任何控制，那么请考虑添加如上所述，可以轻松排序字段，从长远来看，它将为您节省很多麻烦和维护工作。

MySQL bug 和讨论论坛，许多解决方案都围绕剥离取出数据的特定部分并将它们转换为查询的 ORDER BY 部分，例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)

这种解决方案可以在上面的最终幻想示例中使用，但不是特别灵活恐怕不太可能完全扩展到包括“战锤 40,000”和“詹姆斯·邦德 007”在内的数据集。

MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).

Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.

Requests to add a "natural sort" come up from time to time on the MySQL bugs and discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY part of the query, e.g.

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)

This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.

回复收藏 0 原文

不知所踪 2024-07-14 00:15:05

这是一个快速解决方案：

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric

Here is a quick solution:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric

回复收藏 0 原文

~没有更多了~

关于作者

寄意

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

MySQL 中的自然排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（23）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

MySQL 中的自然排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（23）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。