Unicode 字符导致 SQL Server 2005 字符串比较出现问题
此查询:
select *
from op.tag
where tag = 'fussball'
返回标签列值为“fußball”的结果。列“tag”定义为 nvarchar(150)。
虽然我知道它们是语法上相似的单词,但有人可以解释和捍卫这种行为吗?我认为它与相同的排序规则设置相关,这些设置允许您更改列/表的区分大小写,但谁会想要这种行为?当由于违反约束而存在另一个值时,列上的唯一约束也会导致一个值的插入失败。我该如何关闭此功能?
后续加分问题。解释为什么这个查询不返回任何行:
select 1
where 'fußball' = 'fussball'
额外问题(答案?):@ScottCher 私下向我指出,这是由于字符串文字“fussball”被视为 varchar。此查询确实返回结果:
select 1
where 'fußball' = cast('fussball' as nvarchar)
但话又说回来,这个查询没有返回结果:
select 1
where cast('fußball' as varchar) = cast('fussball' as varchar)
我很困惑。
This query:
select *
from op.tag
where tag = 'fussball'
Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).
While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?
Follow-up bonus point question. Explain why this query does not return any rows:
select 1
where 'fußball' = 'fussball'
Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:
select 1
where 'fußball' = cast('fussball' as nvarchar)
But then again, this one does not:
select 1
where cast('fußball' as varchar) = cast('fussball' as varchar)
I'm confused.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我猜想您的连接/表/数据库的 Unicode 排序规则集指定 ss == ß。后一种行为可能是因为它位于错误的快速路径上,或者可能进行了二进制比较,或者可能您没有以正确的编码传递 ß(我同意这是愚蠢的)。
http://unicode.org/reports/tr10/#Searching 提到 U+00DF是特殊情况的。以下是一段富有洞察力的摘录:
I guess the Unicode collation set for your connection/table/database specifies that ss == ß. The latter behavior would be because it's on a faulty fast path, or maybe it does a binary comparison, or maybe you're not passing in the ß in the right encoding (I agree it's stupid).
http://unicode.org/reports/tr10/#Searching mentions that U+00DF is special-cased. Here's an insightful excerpt:
SELECT 确实返回带有排序规则 Latin1_General_CI_AS (SQL2000) 的行。
它不使用排序规则Latin1_General_BIN。
您可以使用 COLLATE < 为表列分配排序规则整理> N/VARCHAR 之后的关键字。
您还可以使用以下语法将字符串与特定排序规则进行比较
The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).
It does not with collation Latin1_General_BIN.
You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.
You can also compare strings with a specific collation using the syntax
这不是解释行为的答案,但可能相关:
在 这个问题,我了解到使用 的排序规则
将避免大多数排序规则怪癖。
This isn't an answer that explains behavior, but may be relevant:
In this question, I learned that using the collation of
will avoid most collation quirks.
一些帮助答案 - 不是您问题的完整答案,但仍然可能有帮助:
如果您尝试:
您会得到“1” - 当使用“N”表示 Unicode 时,两个字符串被认为是相同的 - 为什么这就是情况,我还不知道。
要查找服务器的默认排序规则,请使用
要查找数据库中的给定列的排序规则,请使用以下查询:
Some helper answers - not the complete one to your question, but still maybe helpful:
If you try:
you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).
To find the default collation for a server, use
To find the collation of a given column in a database, use this query:
这里您要处理 SQL Server 数据类型优先级规则,如 数据类型优先级。始终使用优先级较高的类型进行比较:
由于 nvarchar 的优先级高于 varchar,因此示例中的比较将使用 nvarchar 类型进行,因此它实际上与
select 1 where N'fußball' =N'fussball'
完全相同(即使用 Unicode 类型)。我希望这也能清楚地说明为什么您的最后一个案例不返回任何行。Here you're dealing with the SQL Server data type precedence rules, as stated in Data Type Precedence. Comparisons are done always using the higher precedence type:
Since nvarchar has a higher precedence than varchar, the comparison in your example will occur suing the nvarchar type, so it's really exactly the same as
select 1 where N'fußball' =N'fussball'
(ie. using Unicode types). I hope this also makes it clear why your last case doesn't return any row.