Unicode 字符导致 SQL Server 2005 字符串比较出现问题

发布于 2024-08-21 18:12:52 字数 744 浏览 4 评论 0原文

此查询:

select *
from op.tag
where tag = 'fussball'

返回标签列值为“fußball”的结果。列“tag”定义为 nvarchar(150)。

虽然我知道它们是语法上相似的单词,但有人可以解释和捍卫这种行为吗?我认为它与相同的排序规则设置相关,这些设置允许您更改列/表的区分大小写,但谁会想要这种行为?当由于违反约束而存在另一个值时,列上的唯一约束也会导致一个值的插入失败。我该如何关闭此功能?

后续加分问题。解释为什么这个查询不返回任何行:

select 1 
where 'fußball' = 'fussball'

额外问题(答案?):@ScottCher 私下向我指出,这是由于字符串文字“fussball”被视为 varchar。此查询确实返回结果:

select 1 
where 'fußball' = cast('fussball' as nvarchar)

但话又说回来,这个查询没有返回结果:

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

我很困惑。

This query:

select *
from op.tag
where tag = 'fussball'

Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).

While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?

Follow-up bonus point question. Explain why this query does not return any rows:

select 1 
where 'fußball' = 'fussball'

Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:

select 1 
where 'fußball' = cast('fussball' as nvarchar)

But then again, this one does not:

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

I'm confused.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

燃情 2024-08-28 18:12:52

我猜想您的连接/表/数据库的 Unicode 排序规则集指定 ss == ß。后一种行为可能是因为它位于错误的快速路径上,或者可能进行了二进制比较,或者可能您没有以正确的编码传递 ß(我同意这是愚蠢的)。

http://unicode.org/reports/tr10/#Searching 提到 U+00DF是特殊情况的。以下是一段富有洞察力的摘录:

语言敏感的搜索和
匹配密切相关
整理。比较的字符串
在某些强度水平上相等的是那些
做时应该匹配
语言敏感的匹配。为了
例如,在主要强度下,“ß”
将根据“ss”匹配
UCA,“aa”将匹配 a 中的“å”
UCA 的丹麦剪裁。

I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).

http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:

Language-sensitive searching and
matching are closely related to
collation. Strings that compare as
equal at some strength level are those
that should be matched when doing
language-sensitive matching. For
example, at a primary strength, "ß"
would match against "ss" according to
the UCA, and "aa" would match "å" in a
Danish tailoring of the UCA.

楠木可依 2024-08-28 18:12:52

SELECT 确实返回带有排序规则 Latin1_General_CI_AS (SQL2000) 的行。

使用排序规则Latin1_General_BIN。

您可以使用 COLLATE < 为表列分配排序规则整理> N/VARCHAR 之后的关键字。

您还可以使用以下语法将字符串与特定排序规则进行比较

string1 = string2 COLLATE < collation >

The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).

It does not with collation Latin1_General_BIN.

You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.

You can also compare strings with a specific collation using the syntax

string1 = string2 COLLATE < collation >
夜巴黎 2024-08-28 18:12:52

这不是解释行为的答案,但可能相关:

这个问题,我了解到使用 的排序规则

Latin1_General_Bin 

将避免大多数排序规则怪癖。

This isn't an answer that explains behavior, but may be relevant:

In this question, I learned that using the collation of

Latin1_General_Bin 

will avoid most collation quirks.

塔塔猫 2024-08-28 18:12:52

一些帮助答案 - 不是您问题的完整答案,但仍然可能有帮助:

如果您尝试:

SELECT 1 WHERE N'fußball' = N'fussball'  

您会得到“1” - 当使用“N”表示 Unicode 时,两个字符串被认为是相同的 - 为什么这就是情况,我还不知道。

要查找服务器的默认排序规则,请使用

SELECT SERVERPROPERTY('Collation')

要查找数据库中的给定列的排序规则,请使用以下查询:

SELECT
  name 'Column Name', 
  OBJECT_NAME(object_id) 'Table Name', 
  collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name') 
AND name = 'your-column-name'

Some helper answers - not the complete one to your question, but still maybe helpful:

If you try:

SELECT 1 WHERE N'fußball' = N'fussball'  

you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).

To find the default collation for a server, use

SELECT SERVERPROPERTY('Collation')

To find the collation of a given column in a database, use this query:

SELECT
  name 'Column Name', 
  OBJECT_NAME(object_id) 'Table Name', 
  collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name') 
AND name = 'your-column-name'
影子的影子 2024-08-28 18:12:52

额外问题(答案?):@ScottCher
私下向我指出,这
是由于字符串文字造成的
“fussball”被视为 varchar。
此查询确实返回结果:

选择 1,其中 'fußball' =
cast('fussball' as nvarchar)

这里您要处理 SQL Server 数据类型优先级规则,如 数据类型优先级。始终使用优先级较高的类型进行比较:

当一个运算符组合两个
不同数据类型的表达式,
数据类型优先级规则
指定数据类型
较低的优先级被转换为
优先级较高的数据类型。

由于 nvarchar 的优先级高于 varchar,因此示例中的比较将使用 nvarchar 类型进行,因此它实际上与 select 1 where N'fußball' =N'fussball' 完全相同(即使用 Unicode 类型)。我希望这也能清楚地说明为什么您的最后一个案例不返回任何行。

Bonus question (answer?): @ScottCher
pointed out to me privately that this
is due to the string literal
"fussball" being treated as a varchar.
This query DOES return a result:

select 1 where 'fußball' =
cast('fussball' as nvarchar)

Here you're dealing with the SQL Server data type precedence rules, as stated in Data Type Precedence. Comparisons are done always using the higher precedence type:

When an operator combines two
expressions of different data types,
the rules for data type precedence
specify that the data type with the
lower precedence is converted to the
data type with the higher precedence.

Since nvarchar has a higher precedence than varchar, the comparison in your example will occur suing the nvarchar type, so it's really exactly the same as select 1 where N'fußball' =N'fussball' (ie. using Unicode types). I hope this also makes it clear why your last case doesn't return any row.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文