如何从 SQL Server 中的字符串中删除所有非字母字符?

发布于 2024-07-24 08:44:07 字数 80 浏览 7 评论 0原文

如何从字符串中删除所有非字母字符?

非字母数字怎么办?

这是否必须是自定义函数,还是还有更通用的解决方案?

How could you remove all characters that are not alphabetic from a string?

What about non-alphanumeric?

Does this have to be a custom function or are there also more generalizable solutions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(21

风蛊 2024-07-31 08:44:07

尝试这个函数:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

像这样调用它:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

一旦理解了代码,您应该会发现更改它以删除其他字符也相对简单。 您甚至可以使其足够动态以传递您的搜索模式。

Try this function:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

Call it like this:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

Once you understand the code, you should see that it is relatively simple to change it to remove other characters, too. You could even make this dynamic enough to pass in your search pattern.

演多会厌 2024-07-31 08:44:07

George Mastros' 很棒的答案

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'
    
    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')
    
    RETURN @String
    
END

仅字母:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

仅数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

仅字母数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

非字母数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4
, '^a-z')

仅数字:


仅字母数字:


非字母数字:


, '^0-9')

仅字母数字:


非字母数字:


, '^a-z')

仅数字:


仅字母数字:


非字母数字:


, '^a-z0-9')

非字母数字:

, '^a-z')

仅数字:

仅字母数字:

非字母数字:

, '^0-9')

仅字母数字:

非字母数字:

, '^a-z')

仅数字:

仅字母数字:

非字母数字:

, 'a-z0-9') , '^a-z')

仅数字:

仅字母数字:

非字母数字:

, '^0-9')

仅字母数字:

非字母数字:

, '^a-z')

仅数字:

仅字母数字:

非字母数字:

, '^a-z0-9')

非字母数字:

, '^a-z')

仅数字:

仅字母数字:

非字母数字:

, '^0-9')

仅字母数字:

非字母数字:

, '^a-z')

仅数字:

仅字母数字:

非字母数字:

Parameterized version of George Mastros' awesome answer:

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'
    
    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')
    
    RETURN @String
    
END

Alphabetic only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

Numeric only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

Alphanumeric only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4

Non-alphanumeric:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4
, '^a-z')

Numeric only:


Alphanumeric only:


Non-alphanumeric:


, '^0-9')

Alphanumeric only:


Non-alphanumeric:


, '^a-z')

Numeric only:


Alphanumeric only:


Non-alphanumeric:


, '^a-z0-9')

Non-alphanumeric:

, '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

, '^0-9')

Alphanumeric only:

Non-alphanumeric:

, '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

, 'a-z0-9') , '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

, '^0-9')

Alphanumeric only:

Non-alphanumeric:

, '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

, '^a-z0-9')

Non-alphanumeric:

, '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

, '^0-9')

Alphanumeric only:

Non-alphanumeric:

, '^a-z')

Numeric only:

Alphanumeric only:

Non-alphanumeric:

執念 2024-07-31 08:44:07

不管你信不信,在我的系统中,这个丑陋的函数比 G Mastros 优雅的函数表现得更好。

CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) 
RETURNS VARCHAR(256) 
WITH SCHEMABINDING
    BEGIN
        IF @s IS NULL
            RETURN NULL
        DECLARE @s2 VARCHAR(256) = '',
                @l INT = LEN(@s),
                @p INT = 1

        WHILE @p <= @l
            BEGIN
                DECLARE @c INT
                SET @c = ASCII(SUBSTRING(@s, @p, 1))
                IF @c BETWEEN 48 AND 57
                   OR  @c BETWEEN 65 AND 90
                   OR  @c BETWEEN 97 AND 122
                    SET @s2 = @s2 + CHAR(@c)
                SET @p = @p + 1
            END

        IF LEN(@s2) = 0
            RETURN NULL

        RETURN @s2

Believe it or not, in my system this ugly function performs better than G Mastros elegant one.

CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) 
RETURNS VARCHAR(256) 
WITH SCHEMABINDING
    BEGIN
        IF @s IS NULL
            RETURN NULL
        DECLARE @s2 VARCHAR(256) = '',
                @l INT = LEN(@s),
                @p INT = 1

        WHILE @p <= @l
            BEGIN
                DECLARE @c INT
                SET @c = ASCII(SUBSTRING(@s, @p, 1))
                IF @c BETWEEN 48 AND 57
                   OR  @c BETWEEN 65 AND 90
                   OR  @c BETWEEN 97 AND 122
                    SET @s2 = @s2 + CHAR(@c)
                SET @p = @p + 1
            END

        IF LEN(@s2) = 0
            RETURN NULL

        RETURN @s2
﹏雨一样淡蓝的深情 2024-07-31 08:44:07

我知道 SQL 不擅长字符串操作,但我没想到会这么困难。 这是一个简单的函数,用于从字符串中删除所有数字。 会有更好的方法来做到这一点,但这只是一个开始。

CREATE FUNCTION dbo.AlphaOnly (
    @String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
  RETURN (
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(
                    REPLACE(
                      REPLACE(
                        @String,
                      '9', ''),
                    '8', ''),
                  '7', ''),
                '6', ''),
              '5', ''),
            '4', ''),
          '3', ''),
        '2', ''),
      '1', ''),
    '0', '')
  )
END
GO

-- ==================
DECLARE @t TABLE (
    ColID       int,
    ColString   varchar(50)
)

INSERT INTO @t VALUES (1, 'abc1234567890')

SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t

输出

ColID ColString
----- ------------- ---
    1 abc1234567890 abc

第 2 轮 - 数据驱动的黑名单

-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
  DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
    CharID              int         IDENTITY,
    DisallowedCharacter nchar(1)    NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO

-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
  DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
    @String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
  DECLARE @blacklistCt  int
  DECLARE @ct           int
  DECLARE @c            nchar(1)

  SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist

  SET @ct = 0
  WHILE @ct < @blacklistCt BEGIN
    SET @ct = @ct + 1

    SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
    FROM dbo.CharacterBlacklist
    WHERE CharID = @ct
  END

  RETURN (@String)
END
GO

-- ====================================
DECLARE @s  nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'

SELECT
    @s                  AS OriginalString,
    dbo.StripBlacklistCharacters(@s)   AS ResultString

输出

OriginalString           ResultString
------------------------ ------------
abc1234def5678ghi90jkl   abcdefghijkl

我向读者提出的挑战:你能提高效率吗? 使用递归怎么样?

I knew that SQL was bad at string manipulation, but I didn't think it would be this difficult. Here's a simple function to strip out all the numbers from a string. There would be better ways to do this, but this is a start.

CREATE FUNCTION dbo.AlphaOnly (
    @String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
  RETURN (
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(
                    REPLACE(
                      REPLACE(
                        @String,
                      '9', ''),
                    '8', ''),
                  '7', ''),
                '6', ''),
              '5', ''),
            '4', ''),
          '3', ''),
        '2', ''),
      '1', ''),
    '0', '')
  )
END
GO

-- ==================
DECLARE @t TABLE (
    ColID       int,
    ColString   varchar(50)
)

INSERT INTO @t VALUES (1, 'abc1234567890')

SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t

Output

ColID ColString
----- ------------- ---
    1 abc1234567890 abc

Round 2 - Data-Driven Blacklist

-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
  DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
    CharID              int         IDENTITY,
    DisallowedCharacter nchar(1)    NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO

-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
  DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
    @String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
  DECLARE @blacklistCt  int
  DECLARE @ct           int
  DECLARE @c            nchar(1)

  SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist

  SET @ct = 0
  WHILE @ct < @blacklistCt BEGIN
    SET @ct = @ct + 1

    SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
    FROM dbo.CharacterBlacklist
    WHERE CharID = @ct
  END

  RETURN (@String)
END
GO

-- ====================================
DECLARE @s  nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'

SELECT
    @s                  AS OriginalString,
    dbo.StripBlacklistCharacters(@s)   AS ResultString

Output

OriginalString           ResultString
------------------------ ------------
abc1234def5678ghi90jkl   abcdefghijkl

My challenge to readers: Can you make this more efficient? What about using recursion?

遮云壑 2024-07-31 08:44:07

原始答案

SQL Server 2017+ 的另一个可能选项(没有循环和/或递归)是使用 TRANSLATE()REPLACE()< 的基于字符串的方法/代码>。

T-SQL 语句:

DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

SELECT 
   v.[Text], 
   REPLACE(
      TRANSLATE(
         v.[Text],
         REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   ) AS AlphabeticCharacters
FROM (VALUES
   ('abc1234def5678ghi90jkl#@
amp;'),
   ('1234567890'),
   ('JAHDBESBN%*#*@*($E*sd55bn')
) v ([Text])

或作为函数:

CREATE FUNCTION dbo.RemoveNonAlphabeticCharacters (@Text varchar(1000)) 
RETURNS varchar(1000)
AS BEGIN

   DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
   SET @text = REPLACE(
      TRANSLATE(
         @Text,
         REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   )
   
   RETURN @Text
END

更新

感谢@ttugates 的评论,我发现这种方法存在一个小问题。 实际的边缘情况是 'V 0 S' 作为输入。 LEN() 函数会截断尾随空格,从而导致“TRANSLATE 内置函数的第二个和第三个参数必须包含相同数量的字符”错误消息。 解决方案是对长度计算进行小的更改,但方法保持不变:

  • 输入文本中的所有字母字符(或给定模式中的字符)都将转换为 'a' ,然后替换与''
  • 第一步之后的剩余文本仅包含非字母字符(或不属于给定模式的字符)。
  • 输入文本中的所有非字母字符都会转换为 '0',然后替换为 ''
  • 剩下的文字就是实际结果。
  • 新的长度计算为 LEN(REPLACE(@text, ' ', '.'))

更新功能:

CREATE FUNCTION dbo.RemoveNonAlphabeticCharacters (@Text varchar(1000)) 
RETURNS varchar(1000)
AS BEGIN

   DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
   DECLARE @translations varchar(1000)

   SET @translations = TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))) 
   SET @translations = REPLACE(@translations, 'a', '')

   SET @Text = TRANSLATE(@Text, @translations, REPLICATE('0', LEN(REPLACE(@translations, ' ', '.'))))
   SET @Text = REPLACE(@Text, '0', '')
   
   RETURN @Text

END

Original answer

Another possibe option for SQL Server 2017+, without loops and/or recursion, is a string-based approach using TRANSLATE() and REPLACE().

T-SQL statement:

DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

SELECT 
   v.[Text], 
   REPLACE(
      TRANSLATE(
         v.[Text],
         REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(v.[Text], @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   ) AS AlphabeticCharacters
FROM (VALUES
   ('abc1234def5678ghi90jkl#@
amp;'),
   ('1234567890'),
   ('JAHDBESBN%*#*@*($E*sd55bn')
) v ([Text])

or as a function:

CREATE FUNCTION dbo.RemoveNonAlphabeticCharacters (@Text varchar(1000)) 
RETURNS varchar(1000)
AS BEGIN

   DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
   SET @text = REPLACE(
      TRANSLATE(
         @Text,
         REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', ''),
         REPLICATE('0', LEN(REPLACE(TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))), 'a', '')))
      ),
      '0',
      ''
   )
   
   RETURN @Text
END

Update

Thanks to the @ttugates's comment, I've found a small issue in this approach. The actual edge case is the 'V 0 S' as an input. The LEN() function truncates trailing spaces, which results in "The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters" error message. A solution is a small change in the length calculation, but the approach remains the same:

  • All alphabetic characters (or characters from the given pattern) in the input text are translated to 'a' and after that replaced with ''.
  • The remaining text after the first step includes only non-alphabetic characters (or characters not in the given pattern).
  • All non-alphabetic characters in the input text are translated to '0' and after that replaced with ''.
  • The remaining text is the actual result.
  • The new length calculation is LEN(REPLACE(@text, ' ', '.')).

Updated function:

CREATE FUNCTION dbo.RemoveNonAlphabeticCharacters (@Text varchar(1000)) 
RETURNS varchar(1000)
AS BEGIN

   DECLARE @pattern varchar(52) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
   DECLARE @translations varchar(1000)

   SET @translations = TRANSLATE(@Text, @pattern, REPLICATE('a', LEN(@pattern))) 
   SET @translations = REPLACE(@translations, 'a', '')

   SET @Text = TRANSLATE(@Text, @translations, REPLICATE('0', LEN(REPLACE(@translations, ' ', '.'))))
   SET @Text = REPLACE(@Text, '0', '')
   
   RETURN @Text

END
遗失的美好 2024-07-31 08:44:07

这是一个不需要创建函数或列出要替换的字符的所有实例的解决方案。 它结合使用递归WITH 语句和PATINDEX 来查找不需要的字符。 它将替换列中所有不需要的字符 - 任何给定字符串中最多包含 100 个唯一的错误字符。 (例如“ABC123DEF234”将包含 4 个坏字符 1、2、3 和 4)100 限制是WITH 语句中允许的最大递归数,但这不会对要处理的行数施加限制,仅受可用内存的限制。
如果您不想要不同的结果,可以从代码中删除这两个选项。

-- Create some test data:
SELECT * INTO #testData 
FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)

-- Actual query:
-- Remove non-alpha chars: '%[^A-Z]%'
-- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';

WITH recurMain as (
    SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM #testData
    UNION ALL
    SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM (
        SELECT 
            CASE WHEN BadCharIndex > 0 
                THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
                ELSE TXT 
            END AS TXT
        FROM recurMain
        WHERE BadCharIndex > 0
    ) badCharFinder
)
SELECT DISTINCT TXT
FROM recurMain
WHERE BadCharIndex = 0;

Here's a solution that doesn't require creating a function or listing all instances of characters to replace. It uses a recursive WITH statement in combination with a PATINDEX to find unwanted chars. It will replace all unwanted chars in a column - up to 100 unique bad characters contained in any given string. (E.G. "ABC123DEF234" would contain 4 bad characters 1, 2, 3 and 4) The 100 limit is the maximum number of recursions allowed in a WITH statement, but this doesn't impose a limit on the number of rows to process, which is only limited by the memory available.
If you don't want DISTINCT results, you can remove the two options from the code.

-- Create some test data:
SELECT * INTO #testData 
FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)

-- Actual query:
-- Remove non-alpha chars: '%[^A-Z]%'
-- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';

WITH recurMain as (
    SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM #testData
    UNION ALL
    SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM (
        SELECT 
            CASE WHEN BadCharIndex > 0 
                THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
                ELSE TXT 
            END AS TXT
        FROM recurMain
        WHERE BadCharIndex > 0
    ) badCharFinder
)
SELECT DISTINCT TXT
FROM recurMain
WHERE BadCharIndex = 0;
念﹏祤嫣 2024-07-31 08:44:07

在查看了所有给定的解决方案后,我认为必须有一个纯 SQL 方法,不需要函数或 CTE / XML 查询,并且不涉及难以维护的嵌套 REPLACE 语句。 这是我的解决方案:

SELECT 
  x
  ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size 
    AS stripped_column
FROM (SELECT 
        column_to_strip AS x
        ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
      FROM my_table) a

这样做的优点是有效字符包含在子查询的一个字符串中,从而可以轻松地重新配置一组不同的字符。

缺点是您必须为每个字符添加一行 SQL,最多可达列的大小。 为了使该任务更容易,我只使用了下面的 Powershell 脚本,此示例针对 VARCHAR(64):

1..64 | % {
  "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe

Having looked at all the given solutions I thought that there has to be a pure SQL method that does not require a function or a CTE / XML query, and doesn't involve difficult to maintain nested REPLACE statements. Here is my solution:

SELECT 
  x
  ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size 
    AS stripped_column
FROM (SELECT 
        column_to_strip AS x
        ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
      FROM my_table) a

The advantage of doing it this way is that the valid characters are contained in the one string in the sub query making easy to reconfigure for a different set of characters.

The downside is that you have to add a row of SQL for each character up to the size of your column. To make that task easier I just used the Powershell script below, this example if for a VARCHAR(64):

1..64 | % {
  "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe
傲娇萝莉攻 2024-07-31 08:44:07

如果您像我一样,无法向生产数据添加函数,但仍想执行这种过滤,那么这里有一个纯 SQL 解决方案,使用 PIVOT 表将过滤后的部分重新组合在一起。

注意 我对表格进行了硬编码,最多可包含 40 个字符,如果要过滤的字符串较长,则必须添加更多字符。

SET CONCAT_NULL_YIELDS_NULL OFF;

with 
    ToBeScrubbed
as (
    select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),

Scrubbed as (
    select 
        P.Number as ValueOrder,
        isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
        t.id
    from
        ToBeScrubbed t
        left join master..spt_values P
            on P.number between 1 and len(t.ColumnToScrub)
            and type ='P'
    where
        PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)

SELECT
    id, 
    [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
    +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
    +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
    +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
    select 
        *
    from 
        Scrubbed
    ) 
    src
    PIVOT (
        MAX(ScrubbedValue) FOR ValueOrder IN (
        [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
        [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
        [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
        [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
        )
    ) pvt

If you are like me and don't have access to just add functions to your production data but still want to perform this kind of filtering, here's a pure SQL solution using a PIVOT table to put the filtered pieces back together again.

N.B. I hardcoded the table up to 40 characters, you'll have to add more if you have longer strings to filter.

SET CONCAT_NULL_YIELDS_NULL OFF;

with 
    ToBeScrubbed
as (
    select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),

Scrubbed as (
    select 
        P.Number as ValueOrder,
        isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
        t.id
    from
        ToBeScrubbed t
        left join master..spt_values P
            on P.number between 1 and len(t.ColumnToScrub)
            and type ='P'
    where
        PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)

SELECT
    id, 
    [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
    +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
    +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
    +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
    select 
        *
    from 
        Scrubbed
    ) 
    src
    PIVOT (
        MAX(ScrubbedValue) FOR ValueOrder IN (
        [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
        [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
        [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
        [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
        )
    ) pvt
仙女 2024-07-31 08:44:07

这是使用 iTVF 删除非字母字符的另一种方法。 首先,您需要一个基于模式的字符串拆分器。 以下摘自 Dwain Camp 的文章

-- PatternSplitCM will split a string based on a pattern of the form 
-- supported by LIKE and PATINDEX 
-- 
-- Created by: Chris Morris 12-Oct-2012 
CREATE FUNCTION [dbo].[PatternSplitCM]
(
       @List                VARCHAR(8000) = NULL
       ,@Pattern            VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING 
AS 

RETURN
    WITH numbers AS (
        SELECT TOP(ISNULL(DATALENGTH(@List), 0))
            n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
    )

    SELECT
        ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
        Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),
        [Matched]
    FROM (
        SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
        FROM numbers
        CROSS APPLY (
            SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END
        ) y
    ) d
    GROUP BY [Matched], Grouper

现在您有了基于模式的拆分器,您需要拆分与模式匹配的字符串:

[a-z]

然后将它们连接回去以获得所需的结果:

SELECT *
FROM tbl t
CROSS APPLY(
    SELECT Item + ''
    FROM dbo.PatternSplitCM(t.str, '[a-z]')
    WHERE Matched = 1
    ORDER BY ItemNumber
    FOR XML PATH('')
) x (a)

示例

结果:

| Id |              str |              a |
|----|------------------|----------------|
|  1 |    test“te d'abc |     testtedabc |
|  2 |            anr¤a |           anra |
|  3 |  gs-re-C“te d'ab |     gsreCtedab |
|  4 |         M‚fe, DF |          MfeDF |
|  5 |           R™temd |          Rtemd |
|  6 |          ™jad”ji |          jadji |
|  7 |      Cje y ret¢n |       Cjeyretn |
|  8 |        J™kl™balu |        Jklbalu |
|  9 |       le“ne-iokd |       leneiokd |
| 10 |   liode-Pyr‚n‚ie |    liodePyrnie |
| 11 |         V„s G”ta |          VsGta |
| 12 |        Sƒo Paulo |        SoPaulo |
| 13 |  vAstra gAtaland | vAstragAtaland |
| 14 |  ¥uble / Bio-Bio |     ubleBioBio |
| 15 | U“pl™n/ds VAsb-y |    UplndsVAsby |

Here is another way to remove non-alphabetic characters using an iTVF. First, you need a pattern-based string splitter. Here is one taken from Dwain Camp's article:

-- PatternSplitCM will split a string based on a pattern of the form 
-- supported by LIKE and PATINDEX 
-- 
-- Created by: Chris Morris 12-Oct-2012 
CREATE FUNCTION [dbo].[PatternSplitCM]
(
       @List                VARCHAR(8000) = NULL
       ,@Pattern            VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING 
AS 

RETURN
    WITH numbers AS (
        SELECT TOP(ISNULL(DATALENGTH(@List), 0))
            n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
    )

    SELECT
        ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
        Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),
        [Matched]
    FROM (
        SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
        FROM numbers
        CROSS APPLY (
            SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END
        ) y
    ) d
    GROUP BY [Matched], Grouper

Now that you have a pattern-based splitter, you need to split the strings that match the pattern:

[a-z]

and then concatenate them back to get the desired result:

SELECT *
FROM tbl t
CROSS APPLY(
    SELECT Item + ''
    FROM dbo.PatternSplitCM(t.str, '[a-z]')
    WHERE Matched = 1
    ORDER BY ItemNumber
    FOR XML PATH('')
) x (a)

SAMPLE

Result:

| Id |              str |              a |
|----|------------------|----------------|
|  1 |    test“te d'abc |     testtedabc |
|  2 |            anr¤a |           anra |
|  3 |  gs-re-C“te d'ab |     gsreCtedab |
|  4 |         M‚fe, DF |          MfeDF |
|  5 |           R™temd |          Rtemd |
|  6 |          ™jad”ji |          jadji |
|  7 |      Cje y ret¢n |       Cjeyretn |
|  8 |        J™kl™balu |        Jklbalu |
|  9 |       le“ne-iokd |       leneiokd |
| 10 |   liode-Pyr‚n‚ie |    liodePyrnie |
| 11 |         V„s G”ta |          VsGta |
| 12 |        Sƒo Paulo |        SoPaulo |
| 13 |  vAstra gAtaland | vAstragAtaland |
| 14 |  ¥uble / Bio-Bio |     ubleBioBio |
| 15 | U“pl™n/ds VAsb-y |    UplndsVAsby |
执着的年纪 2024-07-31 08:44:07

该解决方案受到 Allen 先生的解决方案的启发,需要一个整数Numbers 表(如果您想要以良好的性能执行严格的查询操作,您应该拥有该表)。 它不需要 CTE。 您可以更改 NOT IN (...) 表达式以排除特定字符,或将其更改为 IN (...) OR LIKE 表达式仅保留某些字符。

SELECT (
    SELECT  SUBSTRING([YourString], N, 1)
    FROM    dbo.Numbers
    WHERE   N > 0 AND N <= CONVERT(INT, LEN([YourString]))
        AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
    FOR XML PATH('')
) AS [YourStringTransformed]
FROM ...

This solution, inspired by Mr. Allen's solution, requires a Numbers table of integers (which you should have on hand if you want to do serious query operations with good performance). It does not require a CTE. You can change the NOT IN (...) expression to exclude specific characters, or change it to an IN (...) OR LIKE expression to retain only certain characters.

SELECT (
    SELECT  SUBSTRING([YourString], N, 1)
    FROM    dbo.Numbers
    WHERE   N > 0 AND N <= CONVERT(INT, LEN([YourString]))
        AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
    FOR XML PATH('')
) AS [YourStringTransformed]
FROM ...
如此安好 2024-07-31 08:44:07

从性能角度来看,我会使用内联函数:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udf_RemoveNumericCharsFromString]
(
@List NVARCHAR(4000)
)
RETURNS TABLE 
AS RETURN

    WITH GetNums AS (
       SELECT TOP(ISNULL(DATALENGTH(@List), 0))
        n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
          (VALUES (0),(0),(0),(0)) d (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
            )

    SELECT StrOut = ''+
        (SELECT Chr
         FROM GetNums
            CROSS APPLY (SELECT SUBSTRING(@List , n,1)) X(Chr)
         WHERE Chr LIKE '%[^0-9]%' 
         ORDER BY N
         FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)')


   /*How to Use
   SELECT StrOut FROM dbo.udf_RemoveNumericCharsFromString ('vv45--9gut')
   Result: vv--gut
   */

From performance perspective I'd use Inline Function:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udf_RemoveNumericCharsFromString]
(
@List NVARCHAR(4000)
)
RETURNS TABLE 
AS RETURN

    WITH GetNums AS (
       SELECT TOP(ISNULL(DATALENGTH(@List), 0))
        n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
          (VALUES (0),(0),(0),(0)) d (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
            )

    SELECT StrOut = ''+
        (SELECT Chr
         FROM GetNums
            CROSS APPLY (SELECT SUBSTRING(@List , n,1)) X(Chr)
         WHERE Chr LIKE '%[^0-9]%' 
         ORDER BY N
         FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)')


   /*How to Use
   SELECT StrOut FROM dbo.udf_RemoveNumericCharsFromString ('vv45--9gut')
   Result: vv--gut
   */
可爱咩 2024-07-31 08:44:07

这是另一个递归 CTE 解决方案,基于@Gerhard Weiss 此处的答案。 您应该能够将整个代码块复制并粘贴到 SSMS 中并在那里使用。 结果包括一些额外的列,以帮助我们了解正在发生的事情。 我花了一段时间才理解 PATINDEX (RegEx) 和递归 CTE 的所有原理。

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0

Here's another recursive CTE solution, based on @Gerhard Weiss's answer here. You should be able to copy and paste the whole code block into SSMS and play with it there. The results include a few extra columns to help us understand what's going on. It took me a while until I understood all that's going on with both PATINDEX (RegEx) and the recursive CTE.

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0
长途伴 2024-07-31 08:44:07

对于 SQL Server >= 2017...

declare @text varchar(max)

-- create some sample text
select
@text=
'
Lorem @ipsum  *&dolor-= sit?! amet, {consectetur } adipiscing\ elit. Vivamus commodo justo metus, sed facilisis ante 
congue eget. Proin ac bibendum sem/.
'

-- the characters to be removed
declare @unwanted varchar(max)='''.,!?/<>"[]{}|`~@#$%^&*()-+=/\:;'+char(13)+char(10)

-- interim replaced with
declare @replace_with char(1)=' '

-- call the translate function that will change unwanted characters to spaces
-- in this sample
declare @translated varchar(max)
select @translated=TRANSLATE(@text,@unwanted,REPLICATE(@replace_with,len(@unwanted)))

-- In this case, I want to preserve one space
select  string_agg(trim(value),' ')
from    STRING_SPLIT(@translated,' ')
where   trim(value)<>''

-- Result
'Lorem ipsum dolor sit amet consectetur adipiscing elit Vivamus commodo justo metus sed facilisis ante congue eget Proin ac bibendum sem'

For SQL Server >= 2017...

declare @text varchar(max)

-- create some sample text
select
@text=
'
Lorem @ipsum  *&dolor-= sit?! amet, {consectetur } adipiscing\ elit. Vivamus commodo justo metus, sed facilisis ante 
congue eget. Proin ac bibendum sem/.
'

-- the characters to be removed
declare @unwanted varchar(max)='''.,!?/<>"[]{}|`~@#$%^&*()-+=/\:;'+char(13)+char(10)

-- interim replaced with
declare @replace_with char(1)=' '

-- call the translate function that will change unwanted characters to spaces
-- in this sample
declare @translated varchar(max)
select @translated=TRANSLATE(@text,@unwanted,REPLICATE(@replace_with,len(@unwanted)))

-- In this case, I want to preserve one space
select  string_agg(trim(value),' ')
from    STRING_SPLIT(@translated,' ')
where   trim(value)<>''

-- Result
'Lorem ipsum dolor sit amet consectetur adipiscing elit Vivamus commodo justo metus sed facilisis ante congue eget Proin ac bibendum sem'
む无字情书 2024-07-31 08:44:07

我把它放在调用 PatIndex 的两个地方。

PatIndex('%[^A-Za-z0-9]%', @Temp)

对于上面的自定义函数RemoveNonAlphaCharacters,并将其重命名为RemoveNonAlphaNumericCharacters

I put this in both places where PatIndex is called.

PatIndex('%[^A-Za-z0-9]%', @Temp)

for the custom function above RemoveNonAlphaCharacters and renamed it RemoveNonAlphaNumericCharacters

顾挽 2024-07-31 08:44:07

--首先创建一个函数

CREATE FUNCTION [dbo].[GetNumericonly]
(@strAlphaNumeric VARCHAR(256))
RETURNS VARCHAR(256)
AS
BEGIN
     DECLARE @intAlpha INT
     SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
BEGIN
     WHILE @intAlpha > 0
   BEGIN
          SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
          SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
   END
END
RETURN ISNULL(@strAlphaNumeric,0)
END

现在调用这个函数,就像

select [dbo].[GetNumericonly]('Abhi12shek23jaiswal')

它的结果一样

1223

--First create one function

CREATE FUNCTION [dbo].[GetNumericonly]
(@strAlphaNumeric VARCHAR(256))
RETURNS VARCHAR(256)
AS
BEGIN
     DECLARE @intAlpha INT
     SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
BEGIN
     WHILE @intAlpha > 0
   BEGIN
          SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
          SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
   END
END
RETURN ISNULL(@strAlphaNumeric,0)
END

Now call this function like

select [dbo].[GetNumericonly]('Abhi12shek23jaiswal')

Its result like

1223
仙气飘飘 2024-07-31 08:44:07

使用 CTE 生成的数字表来检查每个字符,然后使用 FOR XML 连接到保留值的字符串,您可以...

CREATE FUNCTION [dbo].[PatRemove](
    @pattern varchar(50),
    @expression varchar(8000) 
    )
RETURNS varchar(8000)
AS
BEGIN
    WITH 
        d(d) AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) digits(d)),
        nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM d d1, d d2, d d3, d d4),
        chars(c) AS (SELECT SUBSTRING(@expression, n, 1) FROM nums WHERE n <= LEN(@expression))
    SELECT 
        @expression = (SELECT c AS [text()] FROM chars WHERE c NOT LIKE @pattern FOR XML PATH(''));

    RETURN @expression;
END

Using a CTE generated numbers table to examine each character, then FOR XML to concat to a string of kept values you can...

CREATE FUNCTION [dbo].[PatRemove](
    @pattern varchar(50),
    @expression varchar(8000) 
    )
RETURNS varchar(8000)
AS
BEGIN
    WITH 
        d(d) AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) digits(d)),
        nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM d d1, d d2, d d3, d d4),
        chars(c) AS (SELECT SUBSTRING(@expression, n, 1) FROM nums WHERE n <= LEN(@expression))
    SELECT 
        @expression = (SELECT c AS [text()] FROM chars WHERE c NOT LIKE @pattern FOR XML PATH(''));

    RETURN @expression;
END
俏︾媚 2024-07-31 08:44:07
DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4:  , 5:  , 6:  , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder '


WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0
  BEGIN
    SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ')
  END 

SELECT @vchVAlue
DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4:  , 5:  , 6:  , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder '


WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0
  BEGIN
    SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ')
  END 

SELECT @vchVAlue
孤独陪着我 2024-07-31 08:44:07

这种方法对我来说不起作用,因为我试图保留阿拉伯字母,我试图替换正则表达式,但它也不起作用。 我编写了另一种在 ASCII 级别上工作的方法,因为这是我唯一的选择并且它有效。

 Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000)
   with schemabinding
begin
   if @s is null
      return null
   declare @s2 varchar(4000)
   set @s2 = ''
   declare @l int
   set @l = len(@s)
   declare @p int
   set @p = 1
   while @p <= @l begin
      declare @c int
      set @c = ascii(substring(@s, @p, 1))
      if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33
         set @s2 = @s2 + char(@c)
      set @p = @p + 1
      end
   if len(@s2) = 0
      return null
   return @s2
   end

this way didn't work for me as i was trying to keep the Arabic letters i tried to replace the regular expression but also it didn't work. i wrote another method to work on ASCII level as it was my only choice and it worked.

 Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000)
   with schemabinding
begin
   if @s is null
      return null
   declare @s2 varchar(4000)
   set @s2 = ''
   declare @l int
   set @l = len(@s)
   declare @p int
   set @p = 1
   while @p <= @l begin
      declare @c int
      set @c = ascii(substring(@s, @p, 1))
      if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33
         set @s2 = @s2 + char(@c)
      set @p = @p + 1
      end
   if len(@s2) = 0
      return null
   return @s2
   end

GO

苏大泽ㄣ 2024-07-31 08:44:07
CREATE FUNCTION remove_spc_char(@str VARCHAR(MAX))
  RETURNS VARCHAR(MAX) 
AS
BEGIN
  DECLARE @resp    VARCHAR(MAX) = '';
  DECLARE @str_val   VARCHAR(MAX) = UPPER(@str);
  DECLARE @i       INTEGER= 1;
  DECLARE @v_asc   INTEGER;
   WHILE @i <= (LEN(@str_val))
   BEGIN
     SET @v_asc = (ASCII(SUBSTRING(@str_val, @i, 1))) 
        BEGIN
        IF @v_asc in (192,193,194,195,196,65) 
            begin
                SET @v_asc = 65;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (200,201,202,203,233,69)
            begin
                SET @v_asc = 69;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (204,205,206,207,296,73)
            begin
                SET @v_asc = 73;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (210,211,212,213,214,79)
            begin
                SET @v_asc = 79;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (217,218,219,220,85)
            begin
                SET @v_asc = 85;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (199,231,67)
            begin
                SET @v_asc = 67;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (209,78)
            begin
                SET @v_asc = 78;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (924,181,358,216,222,330,272,208,198,42,37,38,34,36,35,
64,33,39,41,40,43,61,95,45,62,60,63,47,176,183,124,166,174,359,248,254,
180,170,186,126,312,331,273,172,178,179,163,162,123,91,93,125,92,167,240,
223,230,171,187,169,185,168)
            begin
                SET @resp = concat(@resp, '');
            end;
        ELSE 
            begin
                if @v_asc not in (65,67,69,73,78,79,85)
                begin
                    SET @resp = concat(@resp, CHAR(@v_asc));
                end;
            end;
        END;
      SET @i = @i + 1
    END;
    RETURN @resp;
END;
CREATE FUNCTION remove_spc_char(@str VARCHAR(MAX))
  RETURNS VARCHAR(MAX) 
AS
BEGIN
  DECLARE @resp    VARCHAR(MAX) = '';
  DECLARE @str_val   VARCHAR(MAX) = UPPER(@str);
  DECLARE @i       INTEGER= 1;
  DECLARE @v_asc   INTEGER;
   WHILE @i <= (LEN(@str_val))
   BEGIN
     SET @v_asc = (ASCII(SUBSTRING(@str_val, @i, 1))) 
        BEGIN
        IF @v_asc in (192,193,194,195,196,65) 
            begin
                SET @v_asc = 65;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (200,201,202,203,233,69)
            begin
                SET @v_asc = 69;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (204,205,206,207,296,73)
            begin
                SET @v_asc = 73;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (210,211,212,213,214,79)
            begin
                SET @v_asc = 79;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (217,218,219,220,85)
            begin
                SET @v_asc = 85;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (199,231,67)
            begin
                SET @v_asc = 67;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (209,78)
            begin
                SET @v_asc = 78;
                SET @resp = concat(@resp, CHAR(@v_asc));
            end;
        IF @v_asc in (924,181,358,216,222,330,272,208,198,42,37,38,34,36,35,
64,33,39,41,40,43,61,95,45,62,60,63,47,176,183,124,166,174,359,248,254,
180,170,186,126,312,331,273,172,178,179,163,162,123,91,93,125,92,167,240,
223,230,171,187,169,185,168)
            begin
                SET @resp = concat(@resp, '');
            end;
        ELSE 
            begin
                if @v_asc not in (65,67,69,73,78,79,85)
                begin
                    SET @resp = concat(@resp, CHAR(@v_asc));
                end;
            end;
        END;
      SET @i = @i + 1
    END;
    RETURN @resp;
END;
难理解 2024-07-31 08:44:07

虽然帖子有点老了,但我还是想说一下。
我在上述解决方案中遇到的问题是,它不会过滤掉 ç、ë、ï 等字符。我按如下方式修改了一个函数(我只使用了 80 varchar 字符串来节省内存):

create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
RETURNS varchar(80) 
AS 

BEGIN 
declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
SET @return = '' 
SET @length = 0 
SET @counter = 1 
SET @length = LEN(@InputString) 
IF @length > 0 
BEGIN WHILE @counter <= @length 

BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END 
SET @counter = @counter + 1 
END END 

RETURN @return END

Though post is a bit old, I would like to say the following.
Issue I had with above solution is that it does not filter out characters like ç, ë, ï, etc. I adapted a function as follows (I only used an 80 varchar string to save memory):

create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
RETURNS varchar(80) 
AS 

BEGIN 
declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
SET @return = '' 
SET @length = 0 
SET @counter = 1 
SET @length = LEN(@InputString) 
IF @length > 0 
BEGIN WHILE @counter <= @length 

BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END 
SET @counter = @counter + 1 
END END 

RETURN @return END
德意的啸 2024-07-31 08:44:07

我刚刚发现它内置于 Oracle 10g(如果您使用的是 Oracle 10g)。 我必须去掉所有特殊字符才能进行电话号码比较。

regexp_replace(c.phone, '[^0-9]', '')

I just found this built into Oracle 10g if that is what you're using. I had to strip all the special characters out for a phone number compare.

regexp_replace(c.phone, '[^0-9]', '')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文