Unicode 字符导致 SQL Server 2005 字符串比较出现问题

发布于 2024-08-21 18:12:52 字数 744 浏览 4 评论 0原文

此查询：

select *
from op.tag
where tag = 'fussball'

返回标签列值为“fußball”的结果。列“tag”定义为 nvarchar(150)。

虽然我知道它们是语法上相似的单词，但有人可以解释和捍卫这种行为吗？我认为它与相同的排序规则设置相关，这些设置允许您更改列/表的区分大小写，但谁会想要这种行为？当由于违反约束而存在另一个值时，列上的唯一约束也会导致一个值的插入失败。我该如何关闭此功能？

后续加分问题。解释为什么这个查询不返回任何行：

select 1 
where 'fußball' = 'fussball'

额外问题（答案？）：@ScottCher 私下向我指出，这是由于字符串文字“fussball”被视为 varchar。此查询确实返回结果：

select 1 
where 'fußball' = cast('fussball' as nvarchar)

但话又说回来，这个查询没有返回结果：

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

我很困惑。

原文

This query:

select *
from op.tag
where tag = 'fussball'

Returns a result which has a tag column value of "fußball". Column "tag" is defined as nvarchar(150).

While I understand they are similar words grammatically, can anyone explain and defend this behavior? I assume it is related to the same collation settings which allow you to change case sensitivity on a column/table, but who would want this behavior? A unique constraint on the column also causes failure on inserts of one value when the other exists due to a constraint violation. How do I turn this off?

Follow-up bonus point question. Explain why this query does not return any rows:

select 1 
where 'fußball' = 'fussball'

Bonus question (answer?): @ScottCher pointed out to me privately that this is due to the string literal "fussball" being treated as a varchar. This query DOES return a result:

select 1 
where 'fußball' = cast('fussball' as nvarchar)

But then again, this one does not:

select 1 
where cast('fußball' as varchar) = cast('fussball' as varchar)

I'm confused.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

燃情 2024-08-28 18:12:52

我猜想您的连接/表/数据库的 Unicode 排序规则集指定 ss == ß。后一种行为可能是因为它位于错误的快速路径上，或者可能进行了二进制比较，或者可能您没有以正确的编码传递 ß（我同意这是愚蠢的）。

http://unicode.org/reports/tr10/#Searching 提到 U+00DF是特殊情况的。以下是一段富有洞察力的摘录：

语言敏感的搜索和
匹配密切相关
整理。比较的字符串
在某些强度水平上相等的是那些
做时应该匹配
语言敏感的匹配。为了
例如，在主要强度下，“ß”
将根据“ss”匹配
UCA，“aa”将匹配 a 中的“å”
UCA 的丹麦剪裁。

回复收藏 0 原文

楠木可依 2024-08-28 18:12:52

SELECT 确实返回带有排序规则 Latin1_General_CI_AS (SQL2000) 的行。

它不使用排序规则Latin1_General_BIN。

您可以使用 COLLATE < 为表列分配排序规则整理> N/VARCHAR 之后的关键字。

您还可以使用以下语法将字符串与特定排序规则进行比较

string1 = string2 COLLATE < collation >

The SELECT does return a row with collation Latin1_General_CI_AS (SQL2000).

It does not with collation Latin1_General_BIN.

You can assign a table column a collation by using the COLLATE < collation > keyword after N/VARCHAR.

You can also compare strings with a specific collation using the syntax

string1 = string2 COLLATE < collation >

回复收藏 0 原文

夜巴黎 2024-08-28 18:12:52

这不是解释行为的答案，但可能相关：

在这个问题，我了解到使用的排序规则

Latin1_General_Bin

将避免大多数排序规则怪癖。

This isn't an answer that explains behavior, but may be relevant:

In this question, I learned that using the collation of

Latin1_General_Bin

will avoid most collation quirks.

回复收藏 0 原文

塔塔猫 2024-08-28 18:12:52

一些帮助答案 - 不是您问题的完整答案，但仍然可能有帮助：

如果您尝试：

SELECT 1 WHERE N'fußball' = N'fussball'

您会得到“1” - 当使用“N”表示 Unicode 时，两个字符串被认为是相同的 - 为什么这就是情况，我还不知道。

要查找服务器的默认排序规则，请使用

SELECT SERVERPROPERTY('Collation')

要查找数据库中的给定列的排序规则，请使用以下查询：

SELECT
  name 'Column Name', 
  OBJECT_NAME(object_id) 'Table Name', 
  collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name') 
AND name = 'your-column-name'

Some helper answers - not the complete one to your question, but still maybe helpful:

If you try:

SELECT 1 WHERE N'fußball' = N'fussball'

you'll get "1" - when using the "N" to signify Unicode, the two strings are considered the same - why that's the case, I don't know (yet).

To find the default collation for a server, use

SELECT SERVERPROPERTY('Collation')

To find the collation of a given column in a database, use this query:

SELECT
  name 'Column Name', 
  OBJECT_NAME(object_id) 'Table Name', 
  collation_name
FROM sys.columns
WHERE object_ID = object_ID('your-table-name') 
AND name = 'your-column-name'

回复收藏 0 原文

影子的影子 2024-08-28 18:12:52

额外问题（答案？）：@ScottCher
私下向我指出，这
是由于字符串文字造成的
“fussball”被视为 varchar。
此查询确实返回结果：
选择 1，其中 'fußball' =
cast('fussball' as nvarchar)

这里您要处理 SQL Server 数据类型优先级规则，如数据类型优先级。始终使用优先级较高的类型进行比较：

当一个运算符组合两个
不同数据类型的表达式，
数据类型优先级规则
指定数据类型
较低的优先级被转换为
优先级较高的数据类型。

由于 nvarchar 的优先级高于 varchar，因此示例中的比较将使用 nvarchar 类型进行，因此它实际上与 select 1 where N'fußball' =N'fussball' 完全相同（即使用 Unicode 类型）。我希望这也能清楚地说明为什么您的最后一个案例不返回任何行。