用于生成 slugs 的 T-SQL 函数？

发布于 2024-09-06 05:52:09 字数 736 浏览 9 评论 0原文

快速检查是否有人拥有或知道能够从给定 nvarchar 输入生成段的 T-SQL 函数。 IE;

“你好世界”> “你好世界”
“这是一个测试”> “这是一个测试”

我有一个 C# 函数，通常用于这些目的，但在这种情况下，我有大量数据需要解析并转换为 slugs，因此在SQL Server 而不必通过线路传输数据。

顺便说一句，我没有远程桌面访问该盒子的权限，因此我无法对其运行代码（.net、Powershell 等），

提前致谢。

编辑：根据请求，这是我通常用来生成 slugs 的函数：

public static string GenerateSlug(string n, int maxLength)
{
    string s = n.ToLower();                
    s = Regex.Replace(s, @"[^a-z0-9s-]", "");              
    s = Regex.Replace(s, @"[s-]+", " ").Trim();             
    s = s.Substring(0, s.Length <= maxLength ? s.Length : maxLength).Trim();             
    s = Regex.Replace(s, @"s", "-"); 
    return s;
}

原文

Quick check to see if anyone has or knows of a T-SQL function capable of generating slugs from a given nvarchar input. i.e;

"Hello World" > "hello-world"
"This is a test" > "this-is-a-test"

I have a C# function that I normally use for these purposes, but in this case I have a large amount of data to parse and turn into slugs, so it makes more sense to do it on the SQL Server rather than have to transfer data over the wire.

As an aside, I don't have Remote Desktop access to the box so I can't run code (.net, Powershell etc) against it

Thanks in advance.

EDIT:
As per request, here's the function I generally use to generate slugs:

public static string GenerateSlug(string n, int maxLength)
{
    string s = n.ToLower();                
    s = Regex.Replace(s, @"[^a-z0-9s-]", "");              
    s = Regex.Replace(s, @"[s-]+", " ").Trim();             
    s = s.Substring(0, s.Length <= maxLength ? s.Length : maxLength).Trim();             
    s = Regex.Replace(s, @"s", "-"); 
    return s;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

挽梦忆笙歌 2024-09-13 05:52:09

您可以使用 LOWER 和 REPLACE 执行此操作：

SELECT REPLACE(LOWER(origString), ' ', '-')
FROM myTable

用于列的批量更新（代码根据 origString 列的值设置 slug 列：

UPDATE myTable
SET slug = REPLACE(LOWER(origString), ' ', '-')

You can use LOWER and REPLACE to do this:

SELECT REPLACE(LOWER(origString), ' ', '-')
FROM myTable

For wholesale update of the column (the code sets the slug column according to the value of the origString column:

UPDATE myTable
SET slug = REPLACE(LOWER(origString), ' ', '-')

回复收藏 0 原文

可遇━不可求 2024-09-13 05:52:09

这就是我想出的解决方案。请随意在需要的地方修复/修改。

我应该提到的是，我当前正在开发的数据库不区分大小写，因此 LOWER(@str).

CREATE FUNCTION [dbo].[UDF_GenerateSlug]
(   
    @str VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
SET @str = LOWER(@str)
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @str = STUFF(@str,@incorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
END
SET @str = REPLACE(@str,' ','-')
RETURN @str
END

提及： http://blog.sqlauthority.com/2007/05/13/sql-server-udf-function-to-parse-alphanumeric-characters-from-string/ 查看原始代码。

This is what I've come up with as a solution. Feel free to fix / modify where needed.

I should mention that the database I'm currently developing against is case insensitive hence the LOWER(@str).

CREATE FUNCTION [dbo].[UDF_GenerateSlug]
(   
    @str VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
SET @str = LOWER(@str)
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @str = STUFF(@str,@incorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
END
SET @str = REPLACE(@str,' ','-')
RETURN @str
END

Mention to: http://blog.sqlauthority.com/2007/05/13/sql-server-udf-function-to-parse-alphanumeric-characters-from-string/ for the original code.

回复收藏 0 原文

哥，最终变帅啦 2024-09-13 05:52:09

我知道这是一个旧线程，但对于下一代，我找到了一个甚至可以处理重音的函数这里：

CREATE function [dbo].[slugify](@string varchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out varchar(4000)

    --convert to ASCII
    set @out = lower(@string COLLATE SQL_Latin1_General_CP1251_CS_AS)

    declare @pi int 
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    set @pi = patindex('%[^a-z0-9 -]%',@out)
    while @pi>0 begin
        set @out = replace(@out, substring(@out,@pi,1), '')
        --set @out = left(@out,@pi-1) + substring(@out,@pi+1,8000)
        set @pi = patindex('%[^a-z0-9 -]%',@out)
    end

    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

I know this is an old thread, but for future generation, I found one function that deals even with accents here:

CREATE function [dbo].[slugify](@string varchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out varchar(4000)

    --convert to ASCII
    set @out = lower(@string COLLATE SQL_Latin1_General_CP1251_CS_AS)

    declare @pi int 
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    set @pi = patindex('%[^a-z0-9 -]%',@out)
    while @pi>0 begin
        set @out = replace(@out, substring(@out,@pi,1), '')
        --set @out = left(@out,@pi-1) + substring(@out,@pi+1,8000)
        set @pi = patindex('%[^a-z0-9 -]%',@out)
    end

    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

回复收藏 0 原文

情定在深秋 2024-09-13 05:52:09

以下是杰里米回应的变体。从技术上讲，这可能不会变得迟缓，因为我正在做一些自定义的事情，比如替换“。”带有“-dot-”，并去掉撇号。主要改进是它还删除了所有连续的空格，并且不删除预先存在的破折号。

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max)
as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.',' dot '),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end
    -- remove consecutive spaces:
    while charindex('  ',@str) > 0
    begin
    set @str = replace(@str, '  ', ' ')
    end
    set @str = replace(@str,' ','-')
return @str
end

Here's a variation of Jeremy's response. This might not technically be slugifying since I'm doing a couple of custom things like replacing "." with "-dot-", and stripping out apostrophes. Main improvement is this one also strips out all consecutive spaces, and doesn't strip out preexisting dashes.

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max)
as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.',' dot '),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end
    -- remove consecutive spaces:
    while charindex('  ',@str) > 0
    begin
    set @str = replace(@str, '  ', ' ')
    end
    set @str = replace(@str,' ','-')
return @str
end

回复收藏 0 原文

￠蛋碎的人ぎ生 2024-09-13 05:52:09

我将杰里米的回应更进一步，即使在替换空格后也删除了所有连续的破折号，并删除了前导和尾随的破折号。

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max) as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.','-'),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end

    -- replace all spaces with dashes
    set @str = replace(@str,' ','-')

    -- remove consecutive dashes:
    while charindex('--',@str) > 0
    begin
        set @str = replace(@str, '--', '-')
    end

    -- remove leading dashes
    while charindex('-', @str) = 1
    begin
        set @str = RIGHT(@str, len(@str) - 1)
    end

    -- remove trailing dashes
    while len(@str) > 0 AND substring(@str, len(@str), 1) = '-'
    begin
        set @str = LEFT(@str, len(@str) - 1)
    end
return @str
end

I took Jeremy's response a couple steps further by removing all consecutive dashes even after spaces are replaced, and removed leading and trailing dashes.

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max) as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.','-'),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end

    -- replace all spaces with dashes
    set @str = replace(@str,' ','-')

    -- remove consecutive dashes:
    while charindex('--',@str) > 0
    begin
        set @str = replace(@str, '--', '-')
    end

    -- remove leading dashes
    while charindex('-', @str) = 1
    begin
        set @str = RIGHT(@str, len(@str) - 1)
    end

    -- remove trailing dashes
    while len(@str) > 0 AND substring(@str, len(@str), 1) = '-'
    begin
        set @str = LEFT(@str, len(@str) - 1)
    end
return @str
end

回复收藏 0 原文

几度春秋 2024-09-13 05:52:09

-- Converts a title such as "This is a Test" to an all lower case string such
-- as "this-is-a-test" for use as the slug in a URL.  All runs of separators
-- (whitespace, underscore, or hyphen) are converted to a single hyphen.
-- This is implemented as a state machine having the following four states:
--
--     0 - initial state
--     1 - in a sequence consisting of valid characters (a-z, A-Z, or 0-9)
--     2 - in a sequence of separators (whitespace, underscore, or hyphen)
--     3 - encountered a character that is neither valid nor a separator
--
-- Once the next state has been determined, the return value string is
-- built based on the transitions from the current state to the next state.
--
-- State 0 skips any initial whitespace.  State 1 includes all valid slug
-- characters.  State 2 converts multiple separators into a single hyphen
-- and skips trailing whitespace.  State 3 skips any punctuation between
-- between characters and, if no additional whitespace is encountered,
-- then the punctuation is not treated as a word separator.
--
CREATE FUNCTION ToSlug(@title AS NVARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @retval AS VARCHAR(MAX) = ''; -- return value
    DECLARE @i AS INT = 1;                -- title index
    DECLARE @c AS CHAR(1);                -- current character
    DECLARE @state AS INT = 0;            -- current state
    DECLARE @nextState AS INT;            -- next state
    DECLARE @tab AS CHAR(1) = CHAR(9);    -- tab
    DECLARE @lf AS CHAR(1) = CHAR(10);    -- line feed
    DECLARE @cr AS CHAR(1) = CHAR(13);    -- carriage return
    DECLARE @separators AS CHAR(8) = '[' + @tab + @lf + @cr + ' _-]';
    DECLARE @validchars AS CHAR(11) = '[a-zA-Z0-9]';

    WHILE (@i <= LEN(@title))
    BEGIN
        SELECT @c = SUBSTRING(@title, @i, 1),

        @nextState = CASE
            WHEN @c LIKE @validchars THEN 1
            WHEN @state = 0 THEN 0
            WHEN @state = 1 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- unknown character
                END
            WHEN @state = 2 THEN 2
            WHEN @state = 3 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- stay in state 3
                END
            END,

        @retval = @retval + CASE
            WHEN @nextState != 1 THEN ''
            WHEN @state = 0 THEN LOWER(@c)
            WHEN @state = 1 THEN LOWER(@c)
            WHEN @state = 2 THEN '-' + LOWER(@c)
            WHEN @state = 3 THEN LOWER(@c)
            END,

        @state = @nextState,

        @i = @i + 1
    END
    RETURN @retval;
END

-- Converts a title such as "This is a Test" to an all lower case string such
-- as "this-is-a-test" for use as the slug in a URL.  All runs of separators
-- (whitespace, underscore, or hyphen) are converted to a single hyphen.
-- This is implemented as a state machine having the following four states:
--
--     0 - initial state
--     1 - in a sequence consisting of valid characters (a-z, A-Z, or 0-9)
--     2 - in a sequence of separators (whitespace, underscore, or hyphen)
--     3 - encountered a character that is neither valid nor a separator
--
-- Once the next state has been determined, the return value string is
-- built based on the transitions from the current state to the next state.
--
-- State 0 skips any initial whitespace.  State 1 includes all valid slug
-- characters.  State 2 converts multiple separators into a single hyphen
-- and skips trailing whitespace.  State 3 skips any punctuation between
-- between characters and, if no additional whitespace is encountered,
-- then the punctuation is not treated as a word separator.
--
CREATE FUNCTION ToSlug(@title AS NVARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @retval AS VARCHAR(MAX) = ''; -- return value
    DECLARE @i AS INT = 1;                -- title index
    DECLARE @c AS CHAR(1);                -- current character
    DECLARE @state AS INT = 0;            -- current state
    DECLARE @nextState AS INT;            -- next state
    DECLARE @tab AS CHAR(1) = CHAR(9);    -- tab
    DECLARE @lf AS CHAR(1) = CHAR(10);    -- line feed
    DECLARE @cr AS CHAR(1) = CHAR(13);    -- carriage return
    DECLARE @separators AS CHAR(8) = '[' + @tab + @lf + @cr + ' _-]';
    DECLARE @validchars AS CHAR(11) = '[a-zA-Z0-9]';

    WHILE (@i <= LEN(@title))
    BEGIN
        SELECT @c = SUBSTRING(@title, @i, 1),

        @nextState = CASE
            WHEN @c LIKE @validchars THEN 1
            WHEN @state = 0 THEN 0
            WHEN @state = 1 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- unknown character
                END
            WHEN @state = 2 THEN 2
            WHEN @state = 3 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- stay in state 3
                END
            END,

        @retval = @retval + CASE
            WHEN @nextState != 1 THEN ''
            WHEN @state = 0 THEN LOWER(@c)
            WHEN @state = 1 THEN LOWER(@c)
            WHEN @state = 2 THEN '-' + LOWER(@c)
            WHEN @state = 3 THEN LOWER(@c)
            END,

        @state = @nextState,

        @i = @i + 1
    END
    RETURN @retval;
END

回复收藏 0 原文

零時差 2024-09-13 05:52:09

To slug with Vietnamese unicode    

CREATE function [dbo].[toslug](@string nvarchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out nvarchar(4000)
    declare @from nvarchar(255)
    declare @to varchar(255)
    --convert to ASCII dbo.slugify
    set @string = lower(@string)
    set @out = @string
    set @from = N'ýỳỷỹỵáàảãạâấầẩẫậăắằẳẵặéèẻẽẹêếềểễệúùủũụưứừửữựíìỉĩịóòỏõọơớờởỡợôốồổỗộđ·/_,:;'
    set @to = 'yyyyyaaaaaaaaaaaaaaaaaeeeeeeeeeeeuuuuuuuuuuuiiiiioooooooooooooooood------'
    declare @pi int 
    set @pi = 1
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    while @pi<=len(@from) begin
        set @out = replace(@out, substring(@from,@pi,1), substring(@to,@pi,1))
        set @pi = @pi + 1
    end
    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

To slug with Vietnamese unicode    

CREATE function [dbo].[toslug](@string nvarchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out nvarchar(4000)
    declare @from nvarchar(255)
    declare @to varchar(255)
    --convert to ASCII dbo.slugify
    set @string = lower(@string)
    set @out = @string
    set @from = N'ýỳỷỹỵáàảãạâấầẩẫậăắằẳẵặéèẻẽẹêếềểễệúùủũụưứừửữựíìỉĩịóòỏõọơớờởỡợôốồổỗộđ·/_,:;'
    set @to = 'yyyyyaaaaaaaaaaaaaaaaaeeeeeeeeeeeuuuuuuuuuuuiiiiioooooooooooooooood------'
    declare @pi int 
    set @pi = 1
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    while @pi<=len(@from) begin
        set @out = replace(@out, substring(@from,@pi,1), substring(@to,@pi,1))
        set @pi = @pi + 1
    end
    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

回复收藏 0 原文

~没有更多了~