Microsoft SQL 2005 中的自然(人类字母数字)排序

发布于 2024-07-04 03:00:02 字数 1383 浏览 11 评论 0原文

我们有一个大型数据库,在该数据库上有数据库端分页。 速度很快,只需不到一秒的时间即可从数百万条记录中返回 50 行的页面。

用户可以定义自己的排序,基本上是选择排序依据的列。 列是动态的 - 有些具有数值,有些具有日期,有些具有文本。

虽然大多数按预期排序,但文本以一种愚蠢的方式排序。 好吧,我说愚蠢,这对计算机来说有意义,但让用户感到沮丧。

例如,按字符串记录 id 排序会得到类似:

rec1
rec10
rec14
rec2
rec20
rec3
rec4

...等等。

我希望它考虑到数字,所以:

rec1
rec2
rec3
rec4
rec10
rec14
rec20

我无法控制输入(否则我只会以前导 000 进行格式化)并且我不能依赖单一格式 - 有些是“{alpha code}” -{部门代码}-{记录 ID}”。

我知道在 C# 中执行此操作的几种方法,但无法拉下所有记录来对它们进行排序,因为那样会很慢。

有谁知道在 Sql server 中快速应用自然排序的方法吗?


我们正在使用:

ROW_NUMBER() over (order by {field name} asc)

然后我们通过它进行分页。

我们可以添加触发器,尽管我们不会。 他们的所有输入都是参数化的等等,但我无法更改格式 - 如果他们输入“rec2”和“rec10”,他们希望它们像那样以自然顺序返回。


我们拥有针对不同客户采用不同格式的有效用户输入。

一个可能会去rec1,rec2,rec3,...rec100,rec101

而另一个可能会去:grp1rec1,grp1rec2,...grp20rec300,grp20rec301

当我说我们无法控制输入时,我的意思是我们不能强迫用户更改这些标准 - 它们具有类似于 grp1rec1 的值,我无法将其重新格式化为 grp01rec001,因为这将更改用于查找和链接到外部系统的内容。

这些格式差异很大,但通常是字母和数字的混合。

在 C# 中对它们进行排序很容易 - 只需将其分解为 { "grp", 20, "rec", 301 },然后依次比较序列值即可。

然而可能有数百万条记录并且数据是分页的,我需要在SQL服务器上完成排序。

SQL Server 按值排序,而不是比较 - 在​​ C# 中,我可以将值拆分出来进行比较,但在 SQL 中,我需要一些逻辑来(非常快地)获得一致排序的单个值。

@moebius - 你的答案可能有效,但为所有这些文本值添加排序键确实感觉像是一个丑陋的妥协。

We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.

Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.

While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.

For instance, sorting by a string record id gives something like:

rec1
rec10
rec14
rec2
rec20
rec3
rec4

...and so on.

I want this to take account of the number, so:

rec1
rec2
rec3
rec4
rec10
rec14
rec20

I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".

I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.

Does anyone know a way to quickly apply a natural sort in Sql server?


We're using:

ROW_NUMBER() over (order by {field name} asc)

And then we're paging by that.

We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.


We have valid user input that follows different formats for different clients.

One might go rec1, rec2, rec3, ... rec100, rec101

While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301

When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.

These formats vary a lot, but are often mixtures of letters and numbers.

Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.

However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.

SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.

@moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

鹿童谣 2024-07-11 03:00:02

我知道这是一个老问题,但我刚刚遇到它,因为它没有得到公认的答案。

我一直使用与此类似的方法:

SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)

唯一常见的问题是您的列不会转换为 VARCHAR(MAX),或者 LEN([Column]) > > 1000(但如果您愿意,您可以将 1000 更改为其他值),但您可以根据您的需要使用这个粗略的想法。

此外,这比正常的 ORDER BY [Column] 性能要差得多,但它确实为您提供了 OP 中要求的结果。

编辑:为了进一步澄清,如果您有十进制值,例如 11.151.5(它们将排序为 {1, 1.5, 1.15}),因为这不是 OP 中所要求的,但可以通过以下方式轻松完成:

SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')

结果:{1, 1.15, 1.5}

而且仍然完全在 SQL 内。 这不会对 IP 地址进行排序,因为您现在进入的是非常具体的数字组合,而不是简单的文本 + 数字。

I know this is an old question but I just came across it and since it's not got an accepted answer.

I have always used ways similar to this:

SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)

The only common times that this has issues is if your column won't cast to a VARCHAR(MAX), or if LEN([Column]) > 1000 (but you can change that 1000 to something else if you want), but you can use this rough idea for what you need.

Also this is much worse performance than normal ORDER BY [Column], but it does give you the result asked for in the OP.

Edit: Just to further clarify, this the above will not work if you have decimal values such as having 1, 1.15 and 1.5, (they will sort as {1, 1.5, 1.15}) as that is not what is asked for in the OP, but that can easily be done by:

SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')

Result: {1, 1.15, 1.5}

And still all entirely within SQL. This will not sort IP addresses because you're now getting into very specific number combinations as opposed to simple text + number.

梨涡少年 2024-07-11 03:00:02

RedFilter 的答案对于索引并不重要的合理大小的数据集非常有用,但是如果您想要索引,则需要进行一些调整。

首先,将函数标记为不进行任何数据访问,并且具有确定性和精确性:

[SqlFunction(DataAccess = DataAccessKind.None,
                          SystemDataAccess = SystemDataAccessKind.None,
                          IsDeterministic = true, IsPrecise = true)]

接下来,MSSQL 对索引键大小有 900 字节的限制,因此如果归化值是索引中的唯一值,则它最多不得超过 450长字符。 如果索引包含多列,则返回值必须更小。 两个变化:

CREATE FUNCTION Naturalize(@str AS nvarchar(max)) RETURNS nvarchar(450)
    EXTERNAL NAME ClrExtensions.Util.Naturalize

在 C# 代码中:

const int maxLength = 450;

最后,您需要向表中添加计算列,并且必须将其持久化(因为 MSSQL 无法证明 Naturalize 是确定性和精确的),这意味着归化值实际上存储在表中,但仍然自动维护:

ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED

您现在可以创建索引!

CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)

我还对 RedFilter 的代码进行了一些更改:为了清晰起见,使用字符、将重复空格删除合并到主循环中、一旦结果长于限制就退出、设置不带子字符串的最大长度等。结果如下:

using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public static class Util
{
    [SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
    public static SqlString Naturalize(string str)
    {
        if (string.IsNullOrEmpty(str))
            return str;

        const int maxLength = 450;
        const int padLength = 15;

        bool isDecimal = false;
        bool wasSpace = false;
        int numStart = 0;
        int numLength = 0;

        var sb = new StringBuilder();
        for (var i = 0; i < str.Length; i++)
        {
            char c = str[i];
            if (c >= '0' && c <= '9')
            {
                if (numLength == 0)
                    numStart = i;
                numLength++;
            }
            else
            {
                if (numLength > 0)
                {
                    sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
                    numLength = 0;
                }
                if (c != ' ' || !wasSpace)
                    sb.Append(c);
                isDecimal = c == '.';
                if (sb.Length > maxLength)
                    break;
            }
            wasSpace = c == ' ';
        }
        if (numLength > 0)
            sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));

        if (sb.Length > maxLength)
            sb.Length = maxLength;
        return sb.ToString();
    }

    private static string pad(string num, bool isDecimal, int padLength)
    {
        return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
    }
}

RedFilter's answer is great for reasonably sized datasets where indexing is not critical, however if you want an index, several tweaks are required.

First, mark the function as not doing any data access and being deterministic and precise:

[SqlFunction(DataAccess = DataAccessKind.None,
                          SystemDataAccess = SystemDataAccessKind.None,
                          IsDeterministic = true, IsPrecise = true)]

Next, MSSQL has a 900 byte limit on the index key size, so if the naturalized value is the only value in the index, it must be at most 450 characters long. If the index includes multiple columns, the return value must be even smaller. Two changes:

CREATE FUNCTION Naturalize(@str AS nvarchar(max)) RETURNS nvarchar(450)
    EXTERNAL NAME ClrExtensions.Util.Naturalize

and in the C# code:

const int maxLength = 450;

Finally, you will need to add a computed column to your table, and it must be persisted (because MSSQL cannot prove that Naturalize is deterministic and precise), which means the naturalized value is actually stored in the table but is still maintained automatically:

ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED

You can now create the index!

CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)

I've also made a couple of changes to RedFilter's code: using chars for clarity, incorporating duplicate space removal into the main loop, exiting once the result is longer than the limit, setting maximum length without substring etc. Here's the result:

using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public static class Util
{
    [SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
    public static SqlString Naturalize(string str)
    {
        if (string.IsNullOrEmpty(str))
            return str;

        const int maxLength = 450;
        const int padLength = 15;

        bool isDecimal = false;
        bool wasSpace = false;
        int numStart = 0;
        int numLength = 0;

        var sb = new StringBuilder();
        for (var i = 0; i < str.Length; i++)
        {
            char c = str[i];
            if (c >= '0' && c <= '9')
            {
                if (numLength == 0)
                    numStart = i;
                numLength++;
            }
            else
            {
                if (numLength > 0)
                {
                    sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
                    numLength = 0;
                }
                if (c != ' ' || !wasSpace)
                    sb.Append(c);
                isDecimal = c == '.';
                if (sb.Length > maxLength)
                    break;
            }
            wasSpace = c == ' ';
        }
        if (numLength > 0)
            sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));

        if (sb.Length > maxLength)
            sb.Length = maxLength;
        return sb.ToString();
    }

    private static string pad(string num, bool isDecimal, int padLength)
    {
        return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
    }
}
过潦 2024-07-11 03:00:02

这是为 SQL 2000 编写的解决方案。它可能可以针对较新的 SQL 版本进行改进。

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                        ID  Name
 *  1.  A1.                         1.  A1-1.       
 *  2.  A1-1.                       2.  A1.
 *  3.  R1             -->          3.  R1
 *  4.  R11                         4.  R11
 *  5.  R2                          5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                        ID  Name
 *  1.  A1.                         1.  A1.     
 *  2.  A1-1.                       2.  A1-1.
 *  3.  R1              -->         3.  R1
 *  4.  R11                         4.  R2
 *  5.  R2                          5.  R11
 */
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

Here's a solution written for SQL 2000. It can probably be improved for newer SQL versions.

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                        ID  Name
 *  1.  A1.                         1.  A1-1.       
 *  2.  A1-1.                       2.  A1.
 *  3.  R1             -->          3.  R1
 *  4.  R11                         4.  R11
 *  5.  R2                          5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                        ID  Name
 *  1.  A1.                         1.  A1.     
 *  2.  A1-1.                       2.  A1-1.
 *  3.  R1              -->         3.  R1
 *  4.  R11                         4.  R2
 *  5.  R2                          5.  R11
 */
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END
指尖上的星空 2024-07-11 03:00:02

我知道这在这一点上有点老了,但在我寻找更好的解决方案时,我遇到了这个问题。 我目前正在使用一个函数来排序。 它可以很好地满足我对以混合字母数字命名的记录(“项目 1”、“项目 10”、“项目 2”等)进行排序的目的,

CREATE FUNCTION [dbo].[fnMixSort]
(
    @ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS

BEGIN
    DECLARE @p1 NVARCHAR(255),
        @p2 NVARCHAR(255),
        @p3 NVARCHAR(255),
        @p4 NVARCHAR(255),
        @Index TINYINT

    IF @ColValue LIKE '[a-z]%'
        SELECT  @Index = PATINDEX('%[0-9]%', @ColValue),
            @p1 = LEFT(CASE WHEN @Index = 0 THEN @ColValue ELSE LEFT(@ColValue, @Index - 1) END + REPLICATE(' ', 255), 255),
            @ColValue = CASE WHEN @Index = 0 THEN '' ELSE SUBSTRING(@ColValue, @Index, 255) END
    ELSE
        SELECT  @p1 = REPLICATE(' ', 255)

    SELECT  @Index = PATINDEX('%[^0-9]%', @ColValue)

    IF @Index = 0
        SELECT  @p2 = RIGHT(REPLICATE(' ', 255) + @ColValue, 255),
            @ColValue = ''
    ELSE
        SELECT  @p2 = RIGHT(REPLICATE(' ', 255) + LEFT(@ColValue, @Index - 1), 255),
            @ColValue = SUBSTRING(@ColValue, @Index, 255)

    SELECT  @Index = PATINDEX('%[0-9,a-z]%', @ColValue)

    IF @Index = 0
        SELECT  @p3 = REPLICATE(' ', 255)
    ELSE
        SELECT  @p3 = LEFT(REPLICATE(' ', 255) + LEFT(@ColValue, @Index - 1), 255),
            @ColValue = SUBSTRING(@ColValue, @Index, 255)

    IF PATINDEX('%[^0-9]%', @ColValue) = 0
        SELECT  @p4 = RIGHT(REPLICATE(' ', 255) + @ColValue, 255)
    ELSE
        SELECT  @p4 = LEFT(@ColValue + REPLICATE(' ', 255), 255)

    RETURN  @p1 + @p2 + @p3 + @p4

END

然后调用

select item_name from my_table order by fnMixSort(item_name)

它可以轻松地将简单数据读取的处理时间增加两倍,因此它可能不是完美的解决方案。

I know this is a bit old at this point, but in my search for a better solution, I came across this question. I'm currently using a function to order by. It works fine for my purpose of sorting records which are named with mixed alpha numeric ('item 1', 'item 10', 'item 2', etc)

CREATE FUNCTION [dbo].[fnMixSort]
(
    @ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS

BEGIN
    DECLARE @p1 NVARCHAR(255),
        @p2 NVARCHAR(255),
        @p3 NVARCHAR(255),
        @p4 NVARCHAR(255),
        @Index TINYINT

    IF @ColValue LIKE '[a-z]%'
        SELECT  @Index = PATINDEX('%[0-9]%', @ColValue),
            @p1 = LEFT(CASE WHEN @Index = 0 THEN @ColValue ELSE LEFT(@ColValue, @Index - 1) END + REPLICATE(' ', 255), 255),
            @ColValue = CASE WHEN @Index = 0 THEN '' ELSE SUBSTRING(@ColValue, @Index, 255) END
    ELSE
        SELECT  @p1 = REPLICATE(' ', 255)

    SELECT  @Index = PATINDEX('%[^0-9]%', @ColValue)

    IF @Index = 0
        SELECT  @p2 = RIGHT(REPLICATE(' ', 255) + @ColValue, 255),
            @ColValue = ''
    ELSE
        SELECT  @p2 = RIGHT(REPLICATE(' ', 255) + LEFT(@ColValue, @Index - 1), 255),
            @ColValue = SUBSTRING(@ColValue, @Index, 255)

    SELECT  @Index = PATINDEX('%[0-9,a-z]%', @ColValue)

    IF @Index = 0
        SELECT  @p3 = REPLICATE(' ', 255)
    ELSE
        SELECT  @p3 = LEFT(REPLICATE(' ', 255) + LEFT(@ColValue, @Index - 1), 255),
            @ColValue = SUBSTRING(@ColValue, @Index, 255)

    IF PATINDEX('%[^0-9]%', @ColValue) = 0
        SELECT  @p4 = RIGHT(REPLICATE(' ', 255) + @ColValue, 255)
    ELSE
        SELECT  @p4 = LEFT(@ColValue + REPLICATE(' ', 255), 255)

    RETURN  @p1 + @p2 + @p3 + @p4

END

Then call

select item_name from my_table order by fnMixSort(item_name)

It easily triples the processing time for a simple data read, so it may not be the perfect solution.

清旖 2024-07-11 03:00:02

这是我喜欢的另一个解决方案:
http://www.dreamchain.com/sql-and-alpha -numeric-sort-order/

它不是 Microsoft SQL,但由于我在寻找 Postgres 解决方案时最终来到这里,我认为在这里添加它会对其他人有所帮助。

编辑:这是代码,以防链接消失。

CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $
  SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
    E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
      E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
        E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
          E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$ LANGUAGE SQL;

“C”是 postgresql 中的默认排序规则; 您可以指定所需的任何排序规则,或者如果您可以确定表列永远不会分配不确定的排序规则,则可以删除排序规则语句。

用法:

SELECT * FROM wtf w 
  WHERE TRUE
  ORDER BY pad_numbers(w.my_alphanumeric_field)

Here is an other solution that I like:
http://www.dreamchain.com/sql-and-alpha-numeric-sort-order/

It's not Microsoft SQL, but since I ended up here when I was searching for a solution for Postgres, I thought adding this here would help others.

EDIT: Here is the code, in case the link goes away.

CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $
  SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
    E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
      E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
        E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
          E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$ LANGUAGE SQL;

"C" is the default collation in postgresql; you may specify any collation you desire, or remove the collation statement if you can be certain your table columns will never have a nondeterministic collation assigned.

usage:

SELECT * FROM wtf w 
  WHERE TRUE
  ORDER BY pad_numbers(w.my_alphanumeric_field)
爱给你人给你 2024-07-11 03:00:02

只需您排序

ORDER BY 
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)

 ##

Simply you sort by

ORDER BY 
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)

 ##
痞味浪人 2024-07-11 03:00:02

如果您在从数据库加载数据以在 C# 中进行排序时遇到问题,那么我确信您会对在数据库中以编程方式执行此操作的任何方法感到失望。 当服务器要排序时,它必须像您每次那样计算“感知”顺序。

我建议您在首次插入数据时使用某种 C# 方法添加一个附加列来存储预处理的可排序字符串。 例如,您可能尝试将数字转换为固定宽度范围,因此“xyz1”将变成“xyz00000001”。 然后你就可以使用普通的 SQL Server 排序了。

冒着自吹自擂的风险,我写了一篇 CodeProject 文章,实现了 CodingHorror 文章中提出的问题。 请随意窃取我的代码

If you're having trouble loading the data from the DB to sort in C#, then I'm sure you'll be disappointed with any approach at doing it programmatically in the DB. When the server is going to sort, it's got to calculate the "perceived" order just as you would have -- every time.

I'd suggest that you add an additional column to store the preprocessed sortable string, using some C# method, when the data is first inserted. You might try to convert the numerics into fixed-width ranges, for example, so "xyz1" would turn into "xyz00000001". Then you could use normal SQL Server sorting.

At the risk of tooting my own horn, I wrote a CodeProject article implementing the problem as posed in the CodingHorror article. Feel free to steal from my code.

爱已欠费 2024-07-11 03:00:02

对于以下 varchar 数据:

BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR

这最适合我:

ORDER BY substring(fieldName, 1, 1), LEN(fieldName)

For the following varchar data:

BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR

This worked best for me:

ORDER BY substring(fieldName, 1, 1), LEN(fieldName)
懷念過去 2024-07-11 03:00:02

我像往常一样时髦地迟到了。 尽管如此,这是我对一个似乎效果很好的答案的尝试(我会这么说)。 它假定文本末尾带有数字,就像原始示例数据中一样。

首先,这个函数不会很快赢得“漂亮的 SQL”竞赛。

CREATE FUNCTION udfAlphaNumericSortHelper (
@string varchar(max)
)
RETURNS @results TABLE (
    txt varchar(max),
    num float
)
AS
BEGIN

  DECLARE @txt varchar(max) = @string
  DECLARE @numStr varchar(max) = ''
  DECLARE @num float = 0
  DECLARE @lastChar varchar(1) = ''

  set @lastChar = RIGHT(@txt, 1)
  WHILE @lastChar <> '' and @lastChar is not null
  BEGIN 
    IF ISNUMERIC(@lastChar) = 1
    BEGIN 
        set @numStr = @lastChar + @numStr
        set @txt = Substring(@txt, 0, len(@txt))
        set @lastChar = RIGHT(@txt, 1)
    END
    ELSE
    BEGIN 
        set @lastChar = null
    END
  END
  SET @num = CAST(@numStr as float)

  INSERT INTO @results select @txt, @num
  RETURN;
END

然后像下面这样调用它:

declare @str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'


SELECT tbl.value  --, sorter.txt, sorter.num
FROM STRING_SPLIT(@str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)

结果:
狐狸
狐狸00
珍0
珍1
珍02
珍3
珍0004
珍15
记录1
记录2
记录3
记录4
记录10
记录14
记录20
袜队
TT01
TT0001
TT002
邮编1
zip1.3
邮编1.32
邮编1.33

I'm fashionably late to the party as usual. Nevertheless, here is my attempt at an answer that seems to work well (I would say that). It assumes text with digits at the end, like in the original example data.

First a function that won't end up winning a "pretty SQL" competition anytime soon.

CREATE FUNCTION udfAlphaNumericSortHelper (
@string varchar(max)
)
RETURNS @results TABLE (
    txt varchar(max),
    num float
)
AS
BEGIN

  DECLARE @txt varchar(max) = @string
  DECLARE @numStr varchar(max) = ''
  DECLARE @num float = 0
  DECLARE @lastChar varchar(1) = ''

  set @lastChar = RIGHT(@txt, 1)
  WHILE @lastChar <> '' and @lastChar is not null
  BEGIN 
    IF ISNUMERIC(@lastChar) = 1
    BEGIN 
        set @numStr = @lastChar + @numStr
        set @txt = Substring(@txt, 0, len(@txt))
        set @lastChar = RIGHT(@txt, 1)
    END
    ELSE
    BEGIN 
        set @lastChar = null
    END
  END
  SET @num = CAST(@numStr as float)

  INSERT INTO @results select @txt, @num
  RETURN;
END

Then call it like below:

declare @str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'


SELECT tbl.value  --, sorter.txt, sorter.num
FROM STRING_SPLIT(@str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)

With results:
fox
fox00
Jen0
jen1
jen02
jen3
jen0004
jen15
rec1
rec2
rec3
rec4
rec10
rec14
rec20
sox
TT01
TT0001
TT002
zip1
zip1.3
zip1.32
zip1.33

优雅的叶子 2024-07-11 03:00:02

您可以使用以下代码来解决该问题:

Select *, 
    substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
    CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv 
FROM Documents 
   left outer join Sites ON Sites.IDSite = Documents.IDSite 
Order BY alpha, intv

问候,
[电子邮件受保护]

You can use the following code to resolve the problem:

Select *, 
    substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
    CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv 
FROM Documents 
   left outer join Sites ON Sites.IDSite = Documents.IDSite 
Order BY alpha, intv

regards,
[email protected]

归属感 2024-07-11 03:00:02

我刚刚在某处读到一篇关于此类主题的文章。 关键点是:您只需要整数值即可对数据进行排序,而“rec”字符串属于 UI。 您可以将信息拆分为两个字段,例如 alpha 和 num,按 alpha 和 num(分别)排序,然后显示由 alpha + num 组成的字符串。 您可以使用计算列来组成字符串或视图。
希望能帮助到你

I've just read a article somewhere about such a topic. The key point is: you only need the integer value to sort data, while the 'rec' string belongs to the UI. You could split the information in two fields, say alpha and num, sort by alpha and num (separately) and then showing a string composed by alpha + num. You could use a computed column to compose the string, or a view.
Hope it helps

楠木可依 2024-07-11 03:00:02

我还是不明白(可能是因为我的英语不好)。

您可以尝试:

ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)

但它不适用于数百万条记录。

这就是为什么我建议使用触发器,用填充单独列来人类价值

而且:

  • 内置的 T-SQL 函数确实
    速度慢,微软建议使用
    .NET 函数取而代之。
  • 人类价值是恒定的,因此没有必要每次都计算它
    当查询运行时。

I still don't understand (probably because of my poor English).

You could try:

ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)

But it won't work for millions of records.

That why I suggested to use trigger which fills separate column with human value.

Moreover:

  • built-in T-SQL functions are really
    slow and Microsoft suggest to use
    .NET functions instead.
  • human value is constant so there is no point calculating it each time
    when query runs.
忆依然 2024-07-11 03:00:02
order by LEN(value), value

并不完美,但在很多情况下效果很好。

order by LEN(value), value

Not perfect, but works well in a lot of cases.

踏雪无痕 2024-07-11 03:00:02

我见过的大多数基于 SQL 的解决方案在数据变得足够复杂时(例如其中超过一两个数字)就会崩溃。 最初,我尝试在 T-SQL 中实现 NaturalSort 函数来满足我的要求(除其他外,处理字符串中的任意数量的数字),但性能方式太慢。

最终,我用 C# 编写了一个标量 CLR 函数,以实现自然排序,即使使用未优化的代码,从 SQL Server 调用它的性能也快得令人眼花缭乱。 它具有以下特征:

  • 将正确排序前 1,000 个字符左右(很容易在代码中修改或制成参数)
  • 正确排序小数,因此由于上述原因,123.333 排在 123.45 之前
  • ,可能无法正确排序 IP 地址等内容; 如果您希望不同的行为,请修改代码,
  • 支持对其中包含任意数量数字的字符串进行排序,
  • 将正确对最多 25 位数字长的数字进行排序(可以轻松地在代码中修改或制成参数)

代码在这里:

using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public class UDF
{
    [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
    public static SqlString Naturalize(string val)
    {
        if (String.IsNullOrEmpty(val))
            return val;

        while(val.Contains("  "))
            val = val.Replace("  ", " ");

        const int maxLength = 1000;
        const int padLength = 25;

        bool inNumber = false;
        bool isDecimal = false;
        int numStart = 0;
        int numLength = 0;
        int length = val.Length < maxLength ? val.Length : maxLength;

        //TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
        var sb = new StringBuilder();
        for (var i = 0; i < length; i++)
        {
            int charCode = (int)val[i];
            if (charCode >= 48 && charCode <= 57)
            {
                if (!inNumber)
                {
                    numStart = i;
                    numLength = 1;
                    inNumber = true;
                    continue;
                }
                numLength++;
                continue;
            }
            if (inNumber)
            {
                sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
                inNumber = false;
            }
            isDecimal = (charCode == 46);
            sb.Append(val[i]);
        }
        if (inNumber)
            sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));

        var ret = sb.ToString();
        if (ret.Length > maxLength)
            return ret.Substring(0, maxLength);

        return ret;
    }

    static string PadNumber(string num, bool isDecimal, int padLength)
    {
        return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
    }
}

要注册此代码要从 SQL Server 调用它,请在查询分析器中运行以下命令:

CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(@val as nvarchar(max)) RETURNS nvarchar(1000) 
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go

然后,您可以像这样使用它:

select *
from MyTable
order by dbo.Naturalize(MyTextField)

注意:如果您在 SQL Server 中遇到类似于 的错误.NET Framework 中的用户代码执行被禁用。 启用“clr启用”配置选项。,按照说明此处启用它。 执行此操作之前,请确保考虑安全隐患。 如果您不是数据库管理员,请确保在对服务器配置进行任何更改之前与管理员讨论此问题。

注2:此代码无法正确支持国际化(例如,假设小数点标记为“.”,未针对速度进行优化等。欢迎提出改进建议!

编辑: 将函数重命名为 Naturalize,而不是 NaturalSort,因为它不执行任何实际排序。

Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.

Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:

  • will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
  • properly sorts decimals, so 123.333 comes before 123.45
  • because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
  • supports sorting a string with an arbitrary number of numbers within it
  • will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)

The code is here:

using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public class UDF
{
    [SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
    public static SqlString Naturalize(string val)
    {
        if (String.IsNullOrEmpty(val))
            return val;

        while(val.Contains("  "))
            val = val.Replace("  ", " ");

        const int maxLength = 1000;
        const int padLength = 25;

        bool inNumber = false;
        bool isDecimal = false;
        int numStart = 0;
        int numLength = 0;
        int length = val.Length < maxLength ? val.Length : maxLength;

        //TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
        var sb = new StringBuilder();
        for (var i = 0; i < length; i++)
        {
            int charCode = (int)val[i];
            if (charCode >= 48 && charCode <= 57)
            {
                if (!inNumber)
                {
                    numStart = i;
                    numLength = 1;
                    inNumber = true;
                    continue;
                }
                numLength++;
                continue;
            }
            if (inNumber)
            {
                sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
                inNumber = false;
            }
            isDecimal = (charCode == 46);
            sb.Append(val[i]);
        }
        if (inNumber)
            sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));

        var ret = sb.ToString();
        if (ret.Length > maxLength)
            return ret.Substring(0, maxLength);

        return ret;
    }

    static string PadNumber(string num, bool isDecimal, int padLength)
    {
        return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
    }
}

To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:

CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(@val as nvarchar(max)) RETURNS nvarchar(1000) 
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go

Then, you can use it like so:

select *
from MyTable
order by dbo.Naturalize(MyTextField)

Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.

Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!

Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文