使用 SQL Server 在 varchar 列中查找非 ASCII 字符

发布于 2024-09-26 08:16:14 字数 713 浏览 0 评论 0原文

如何使用 SQL Server 返回包含非 ASCII 字符的行？
如果您能展示如何在一列中执行此操作，那就太好了。

我现在正在做类似的事情，但它不起作用

select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'

为了额外的信用，如果它可以跨越表中的所有 varchar 列，那么会很出色！在此解决方案中，最好返回三列：

该记录的标识字段。（这将允许使用另一个查询检查整个记录。）
列名
包含无效字符的文本

 Id | FieldName | InvalidText       |
----+-----------+-------------------+
 25 | LastName  | Solís             |
 56 | FirstName | François          |
100 | Address1  | 123 Ümlaut street |

无效字符可以是空格 (32₁₀) 到 ~ 范围之外的任何字符 (127₁₀)

原文

How can rows with non-ASCII characters be returned using SQL Server?
If you can show how to do it for one column would be great.

I am doing something like this now, but it is not working

select *
from Staging.APARMRE1 as ar
where ar.Line like '%[^!-~ ]%'

For extra credit, if it can span all varchar columns in a table, that would be outstanding! In this solution, it would be nice to return three columns:

The identity field for that record. (This will allow the whole record to be reviewed with another query.)
The column name
The text with the invalid character

 Id | FieldName | InvalidText       |
----+-----------+-------------------+
 25 | LastName  | Solís             |
 56 | FirstName | François          |
100 | Address1  | 123 Ümlaut street |

Invalid characters would be any outside the range of SPACE (32₁₀) through ~ (127₁₀)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

踏雪无痕 2024-10-03 08:16:14

这是使用 PATINDEX 进行单列搜索的解决方案。
它还显示 StartPosition、InvalidCharacter 和 ASCII 代码。

select line,
  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
  substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
  ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from  staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0

Here is a solution for the single column search using PATINDEX.
It also displays the StartPosition, InvalidCharacter and ASCII code.

select line,
  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) as [Position],
  substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1) as [InvalidCharacter],
  ascii(substring(line,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line),1)) as [ASCIICode]
from  staging.APARMRE1
where patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,Line) >0

回复收藏 0 原文

清风挽心 2024-10-03 08:16:14

我已经成功运行了这段代码

declare @UnicodeData table (
     data nvarchar(500)
)
insert into 
    @UnicodeData
values 
    (N'Horse�')
    ,(N'Dog')
    ,(N'Cat')

select
    data
from
    @UnicodeData 
where
    data collate LATIN1_GENERAL_BIN != cast(data as varchar(max))

，这对于已知的列效果很好。

为了额外加分，我编写了这个快速脚本来搜索给定表中的所有 nvarchar 列中的 Unicode 字符。

declare 
    @sql    varchar(max)    = ''
    ,@table sysname         = 'mytable' -- enter your table here

;with ColumnData as (
    select
        RowId               = row_number() over (order by c.COLUMN_NAME)
        ,c.COLUMN_NAME
        ,ColumnName         = '[' + c.COLUMN_NAME + ']'
        ,TableName          = '[' + c.TABLE_SCHEMA + '].[' + c.TABLE_NAME + ']' 
    from
        INFORMATION_SCHEMA.COLUMNS c
    where
        c.DATA_TYPE         = 'nvarchar'
        and c.TABLE_NAME    = @table
)
select
    @sql = @sql + 'select FieldName = ''' + c.ColumnName + ''',         InvalidCharacter = [' + c.COLUMN_NAME + ']  from ' + c.TableName + ' where ' + c.ColumnName + ' collate LATIN1_GENERAL_BIN != cast(' + c.ColumnName + ' as varchar(max)) '  +  case when c.RowId <> (select max(RowId) from ColumnData) then  ' union all ' else '' end + char(13)
from
    ColumnData c

-- check
-- print @sql
exec (@sql)

我不喜欢动态 SQL，但它确实可以用于像这样的探索性查询。

I've been running this bit of code with success

declare @UnicodeData table (
     data nvarchar(500)
)
insert into 
    @UnicodeData
values 
    (N'Horse�')
    ,(N'Dog')
    ,(N'Cat')

select
    data
from
    @UnicodeData 
where
    data collate LATIN1_GENERAL_BIN != cast(data as varchar(max))

Which works well for known columns.

For extra credit, I wrote this quick script to search all nvarchar columns in a given table for Unicode characters.

declare 
    @sql    varchar(max)    = ''
    ,@table sysname         = 'mytable' -- enter your table here

;with ColumnData as (
    select
        RowId               = row_number() over (order by c.COLUMN_NAME)
        ,c.COLUMN_NAME
        ,ColumnName         = '[' + c.COLUMN_NAME + ']'
        ,TableName          = '[' + c.TABLE_SCHEMA + '].[' + c.TABLE_NAME + ']' 
    from
        INFORMATION_SCHEMA.COLUMNS c
    where
        c.DATA_TYPE         = 'nvarchar'
        and c.TABLE_NAME    = @table
)
select
    @sql = @sql + 'select FieldName = ''' + c.ColumnName + ''',         InvalidCharacter = [' + c.COLUMN_NAME + ']  from ' + c.TableName + ' where ' + c.ColumnName + ' collate LATIN1_GENERAL_BIN != cast(' + c.ColumnName + ' as varchar(max)) '  +  case when c.RowId <> (select max(RowId) from ColumnData) then  ' union all ' else '' end + char(13)
from
    ColumnData c

-- check
-- print @sql
exec (@sql)

I'm not a fan of dynamic SQL but it does have its uses for exploratory queries like this.

回复收藏 0 原文

染墨丶若流云 2024-10-03 08:16:14

尝试这样的事情：

DECLARE @YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
INSERT @YourTable VALUES (1, 'ok','ok','ok');
INSERT @YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
INSERT @YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
INSERT @YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
INSERT @YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
INSERT @YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));

--if you have a Numbers table use that, other wise make one using a CTE
WITH AllNumbers AS
(   SELECT 1 AS Number
    UNION ALL
    SELECT Number+1
        FROM AllNumbers
        WHERE Number<1000
)
SELECT 
    pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
    WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
UNION
SELECT 
    pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
    WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
UNION
SELECT 
    pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
    WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
order by 1
OPTION (MAXRECURSION 1000);

输出：

pk          BadValueColumn BadValue
----------- -------------- --------------------
2           Col1           BA¶D
3           Col2           ¶BAD
4           Col3           B¶AD
5           Col1           ¶BAD
5           Col3           ¶BAD
6           Col1           BAD¶
6           Col2           B¶AD
6           Col3           BAD¶¶¶

(8 row(s) affected)

try something like this:

DECLARE @YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
INSERT @YourTable VALUES (1, 'ok','ok','ok');
INSERT @YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
INSERT @YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
INSERT @YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
INSERT @YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
INSERT @YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));

--if you have a Numbers table use that, other wise make one using a CTE
WITH AllNumbers AS
(   SELECT 1 AS Number
    UNION ALL
    SELECT Number+1
        FROM AllNumbers
        WHERE Number<1000
)
SELECT 
    pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
    WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
UNION
SELECT 
    pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
    WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
UNION
SELECT 
    pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
    FROM @YourTable           y
        INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
    WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
order by 1
OPTION (MAXRECURSION 1000);

OUTPUT:

pk          BadValueColumn BadValue
----------- -------------- --------------------
2           Col1           BA¶D
3           Col2           ¶BAD
4           Col3           B¶AD
5           Col1           ¶BAD
5           Col3           ¶BAD
6           Col1           BAD¶
6           Col2           B¶AD
6           Col3           BAD¶¶¶

(8 row(s) affected)

回复收藏 0 原文

小伙你站住 2024-10-03 08:16:14

该脚本在一列中搜索非 ASCII 字符。它生成一个包含所有有效字符的字符串，此处为代码点 32 到 127。然后它搜索与列表不匹配的行：

declare @str varchar(128);
declare @i int;
set @str = '';
set @i = 32;
while @i <= 127
    begin
    set @str = @str + '|' + char(@i);
    set @i = @i + 1;
    end;

select  col1
from    YourTable
where   col1 like '%[^' + @str + ']%' escape '|';

This script searches for non-ascii characters in one column. It generates a string of all valid characters, here code point 32 to 127. Then it searches for rows that don't match the list:

declare @str varchar(128);
declare @i int;
set @str = '';
set @i = 32;
while @i <= 127
    begin
    set @str = @str + '|' + char(@i);
    set @i = @i + 1;
    end;

select  col1
from    YourTable
where   col1 like '%[^' + @str + ']%' escape '|';

回复收藏 0 原文

像你 2024-10-03 08:16:14

在一些现实世界的数据上运行各种解决方案 - 12M 行 varchar 长度~30，大约 9k 可疑行，没有全文索引，patIndex 解决方案是最快的，并且它还选择最多的行。

（预运行 km。将缓存设置为已知状态，运行 3 个进程，最后再次运行 km - 最后 2 次运行 km 给出的时间在 2 秒内）

Gerhard Weiss 的 patindex 解决方案 - 运行时 0:38， MT返回 9144 行子

select dodgyColumn from myTable fcc
WHERE  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,dodgyColumn ) >0

串数字解。 -- 运行时 1:16，返回 8996 行

select dodgyColumn from myTable fcc
INNER JOIN dbo.Numbers32k dn ON dn.number<(len(fcc.dodgyColumn ))
WHERE ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))<32 
    OR ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))>127

Deon Robertson 的 udf 解决方案 -- 运行时 3:47，返回 7316 行

select dodgyColumn 
from myTable 
where dbo.udf_test_ContainsNonASCIIChars(dodgyColumn , 1) = 1

running the various solutions on some real world data - 12M rows varchar length ~30, around 9k dodgy rows, no full text index in play, the patIndex solution is the fastest, and it also selects the most rows.

(pre-ran km. to set the cache to a known state, ran the 3 processes, and finally ran km again - the last 2 runs of km gave times within 2 seconds)

patindex solution by Gerhard Weiss -- Runtime 0:38, returns 9144 rows

select dodgyColumn from myTable fcc
WHERE  patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,dodgyColumn ) >0

the substring-numbers solution by MT. -- Runtime 1:16, returned 8996 rows

select dodgyColumn from myTable fcc
INNER JOIN dbo.Numbers32k dn ON dn.number<(len(fcc.dodgyColumn ))
WHERE ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))<32 
    OR ASCII(SUBSTRING(fcc.dodgyColumn , dn.Number, 1))>127

udf solution by Deon Robertson -- Runtime 3:47, returns 7316 rows

select dodgyColumn 
from myTable 
where dbo.udf_test_ContainsNonASCIIChars(dodgyColumn , 1) = 1

回复收藏 0 原文

白馒头 2024-10-03 08:16:14

网络上有一个用户定义的函数“解析字母数字”。 Google UDF 解析字母数字，您应该找到它的代码。此用户定义的函数删除所有不适合 0-9、az 和 AZ 之间的字符。

Select * from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name

这应该会带回任何包含您的姓氏和无效字符的记录...尽管您的奖励积分问题更具挑战性，但我认为案例陈述可以处理它。这是有点伪代码，我不完全确定它是否有效。

Select id, case when udf_parsealpha(ar.last_name) <> ar.last_name then 'last name'
when udf_parsealpha(ar.first_name) <> ar.first_name then 'first name'
when udf_parsealpha(ar.Address1) <> ar.last_name then 'Address1'
end, 
case when udf_parsealpha(ar.last_name) <> ar.last_name then ar.last_name
when udf_parsealpha(ar.first_name) <> ar.first_name then ar.first_name
when udf_parsealpha(ar.Address1) <> ar.last_name then ar.Address1
end
from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name or
udf_parsealpha(ar.first_name) <> ar.first_name or
udf_parsealpha(ar.Address1) <> ar.last_name

我在论坛邮箱中写了这个......所以我不太确定它是否会按原样运行，但它应该接近。我不太确定如果单个记录有两个带有无效字符的字段，它将如何表现。

作为替代方案，您应该能够将 from 子句从单个表更改为看起来像这样的子查询：

select id,fieldname,value from (
Select id,'last_name' as 'fieldname', last_name as 'value'
from Staging.APARMRE1 ar
Union
Select id,'first_name' as 'fieldname', first_name as 'value'
from Staging.APARMRE1 ar
---(and repeat unions for each field)
)
where udf_parsealpha(value) <> value

这里的好处是对于每一列，您只需要在此处扩展 union 语句，而您需要将对该脚本的 case 语句版本中的每一列进行 3 次比较

There is a user defined function available on the web 'Parse Alphanumeric'. Google UDF parse alphanumeric and you should find the code for it. This user defined function removes all characters that doesn't fit between 0-9, a-z, and A-Z.

Select * from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name

That should bring back any records that have a last_name with invalid chars for you...though your bonus points question is a bit more of a challenge, but I think a case statement could handle it. This is a bit psuedo code, I'm not entirely sure if it'd work.

Select id, case when udf_parsealpha(ar.last_name) <> ar.last_name then 'last name'
when udf_parsealpha(ar.first_name) <> ar.first_name then 'first name'
when udf_parsealpha(ar.Address1) <> ar.last_name then 'Address1'
end, 
case when udf_parsealpha(ar.last_name) <> ar.last_name then ar.last_name
when udf_parsealpha(ar.first_name) <> ar.first_name then ar.first_name
when udf_parsealpha(ar.Address1) <> ar.last_name then ar.Address1
end
from Staging.APARMRE1 ar
where udf_parsealpha(ar.last_name) <> ar.last_name or
udf_parsealpha(ar.first_name) <> ar.first_name or
udf_parsealpha(ar.Address1) <> ar.last_name

I wrote this in the forum post box...so I'm not quite sure if that'll function as is, but it should be close. I'm not quite sure how it will behave if a single record has two fields with invalid chars either.

As an alternative, you should be able to change the from clause away from a single table and into a subquery that looks something like:

select id,fieldname,value from (
Select id,'last_name' as 'fieldname', last_name as 'value'
from Staging.APARMRE1 ar
Union
Select id,'first_name' as 'fieldname', first_name as 'value'
from Staging.APARMRE1 ar
---(and repeat unions for each field)
)
where udf_parsealpha(value) <> value

Benefit here is for every column you'll only need to extend the union statement here, while you need to put that comparisson three times for every column in the case statement version of this script

回复收藏 0 原文

迷爱 2024-10-03 08:16:14

要查找哪个字段包含无效字符：

SELECT * FROM Staging.APARMRE1 FOR XML AUTO, TYPE

您可以使用以下查询对其进行测试：

SELECT top 1 'char 31: '+char(31)+' (hex 0x1F)' field
from sysobjects
FOR XML AUTO, TYPE

结果将是：

消息 6841，级别 16，状态 1，第 3 行 FOR XML 无法序列化
节点“field”的数据，因为它包含一个字符（0x001F）
XML 中不允许。要使用 FOR XML 检索此数据，请将其转换
为 binary、varbinary 或 image 数据类型并使用 BINARY BASE64
指令。

当您编写 xml 文件并在验证它时收到无效字符错误时，它非常有用。

To find which field has invalid characters:

SELECT * FROM Staging.APARMRE1 FOR XML AUTO, TYPE

You can test it with this query:

SELECT top 1 'char 31: '+char(31)+' (hex 0x1F)' field
from sysobjects
FOR XML AUTO, TYPE

The result will be:

Msg 6841, Level 16, State 1, Line 3 FOR XML could not serialize the
data for node 'field' because it contains a character (0x001F) which
is not allowed in XML. To retrieve this data using FOR XML, convert it
to binary, varbinary or image data type and use the BINARY BASE64
directive.

It is very useful when you write xml files and get error of invalid characters when validate it.

回复收藏 0 原文

尬尬 2024-10-03 08:16:14

这是我构建的 UDF，用于检测具有扩展 ASCII 字符的列。它速度很快，并且您可以扩展要检查的字符集。第二个参数允许您在检查标准字符集之外的任何内容或允许扩展集之间切换：

create function [dbo].[udf_ContainsNonASCIIChars]
(
@string nvarchar(4000),
@checkExtendedCharset bit
)
returns bit
as
begin

    declare @pos int = 0;
    declare @char varchar(1);
    declare @return bit = 0;

    while @pos < len(@string)
    begin
        select @char = substring(@string, @pos, 1)
        if ascii(@char) < 32 or ascii(@char) > 126 
            begin
                if @checkExtendedCharset = 1
                    begin
                        if ascii(@char) not in (9,124,130,138,142,146,150,154,158,160,170,176,180,181,183,184,185,186,192,193,194,195,196,197,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,216,217,218,219,220,221,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,248,249,250,251,252,253,254,255)
                            begin
                                select @return = 1;
                                select @pos = (len(@string) + 1)
                            end
                        else
                            begin
                                select @pos = @pos + 1
                            end
                    end
                else
                    begin
                        select @return = 1;
                        select @pos = (len(@string) + 1)    
                    end
            end
        else
            begin
                select @pos = @pos + 1
            end
    end

    return @return;

end

用法：

select Address1 
from PropertyFile_English
where udf_ContainsNonASCIIChars(Address1, 1) = 1

Here is a UDF I built to detectc columns with extended ascii charaters. It is quick and you can extended the character set you want to check. The second parameter allows you to switch between checking anything outside the standard character set or allowing an extended set:

create function [dbo].[udf_ContainsNonASCIIChars]
(
@string nvarchar(4000),
@checkExtendedCharset bit
)
returns bit
as
begin

    declare @pos int = 0;
    declare @char varchar(1);
    declare @return bit = 0;

    while @pos < len(@string)
    begin
        select @char = substring(@string, @pos, 1)
        if ascii(@char) < 32 or ascii(@char) > 126 
            begin
                if @checkExtendedCharset = 1
                    begin
                        if ascii(@char) not in (9,124,130,138,142,146,150,154,158,160,170,176,180,181,183,184,185,186,192,193,194,195,196,197,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,216,217,218,219,220,221,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,248,249,250,251,252,253,254,255)
                            begin
                                select @return = 1;
                                select @pos = (len(@string) + 1)
                            end
                        else
                            begin
                                select @pos = @pos + 1
                            end
                    end
                else
                    begin
                        select @return = 1;
                        select @pos = (len(@string) + 1)    
                    end
            end
        else
            begin
                select @pos = @pos + 1
            end
    end

    return @return;

end

USAGE:

select Address1 
from PropertyFile_English
where udf_ContainsNonASCIIChars(Address1, 1) = 1

回复收藏 0 原文

挽袖吟 2024-10-03 08:16:14

我采用了 Gerhard Weiss 的脚本并将其变成了我自己的脚本，因为我的业务场景非常简单，只需要更新 10K 行。

--CONTAINS() function requires a full text catalog
--interesting to me is that CONTAINS() can find the ASCII characters, but REPLACE() cannot
WITH cte_AsciiCharacterList AS
(select LineID
    , LineSyntax
    , Position = patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax)
    , InvalidCharacter = substring(LineSyntax,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax),1)
    , ASCIICode = ascii(substring(LineSyntax,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax),1))
    , LineSyntaxPart1 = SUBSTRING(LineSyntax, 1, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) - 1)
    , LineSyntaxPart2 = SUBSTRING(LineSyntax, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) + 3, 10000)
    , NewLineSyntax =  SUBSTRING(LineSyntax, 1, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) - 1) + '-' + SUBSTRING(LineSyntax, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) + 3, 10000)
FROM stg.TableName
WHERE Contains(linesyntax, '"ΓÇö"')
)
UPDATE stg.TableName
SET LineSyntax = NewLineSyntax
FROM stg.TableName x
INNER JOIN cte_AsciiCharacterList y on x.LineID = y.LineID

I took Gerhard Weiss's script and made it my own as my business scenario was pretty simple and only needed to update 10K rows.

--CONTAINS() function requires a full text catalog
--interesting to me is that CONTAINS() can find the ASCII characters, but REPLACE() cannot
WITH cte_AsciiCharacterList AS
(select LineID
    , LineSyntax
    , Position = patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax)
    , InvalidCharacter = substring(LineSyntax,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax),1)
    , ASCIICode = ascii(substring(LineSyntax,patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax),1))
    , LineSyntaxPart1 = SUBSTRING(LineSyntax, 1, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) - 1)
    , LineSyntaxPart2 = SUBSTRING(LineSyntax, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) + 3, 10000)
    , NewLineSyntax =  SUBSTRING(LineSyntax, 1, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) - 1) + '-' + SUBSTRING(LineSyntax, patindex('%[^ !-~]%' COLLATE Latin1_General_BIN,LineSyntax) + 3, 10000)
FROM stg.TableName
WHERE Contains(linesyntax, '"ΓÇö"')
)
UPDATE stg.TableName
SET LineSyntax = NewLineSyntax
FROM stg.TableName x
INNER JOIN cte_AsciiCharacterList y on x.LineID = y.LineID

回复收藏 0 原文

~没有更多了~