丹麦语文本的 String StartsWith() 问题
谁能解释这种行为?
var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false
我知道可以通过指定 StringComparison.InvariantCulture 来修复它。但我只是对这种行为感到困惑。
我还知道,在丹麦语不区分大小写的比较中,“aA”和“AA”不被视为相同,请参阅 http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx。这解释了
String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)
这是否与第一个代码片段的行为有关?
Can anyone explain this behaviour?
var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false
I know that it can be fixed by specifying StringComparison.InvariantCulture
. But I'm just confused by the behavior.
I also know that "aA" and "AA" are not considered the same in a Danish case-insensitive comparision, see http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx. Which explains this
String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)
Is this linked to the behavior of the first code snippet?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里有一个测试说明了这个问题,daab og dåb(在古代和现代语言中分别是同一个词)意味着洗礼/洗礼。
根据我对语言的理解,上述所有测试都应该成功,我是丹麦人!
不过我没有获得语法学位。 :-)
对我来说似乎是一个错误。
Here a test that illustrates the problem, daab og dåb (same word in old and modern language respectively) means baptism/christening.
All the above tests should be successfull with my understanding of the language, and im danish!
I aint got no degree in grammar though. :-)
Seems like a bug to me.
就像Nappy说的,这是丹麦语的一个特点,其中“aa”和“å”仍然是一样的。丹麦语还有另外两个字母,æ 和 ø,但我不确定它们是否也可以用两个字母书写。
我认为在第二个例子中“aA”没有改变,而“AA”被改变为“Å”。更令人困惑的是,仅在使用不区分大小写的比较时,“Aa”才被视为等于“AA”和“aa”。
Like Nappy said, its a feature of the danish language, where "aa" and "å" is still the same. Danish got another two letters, æ and ø, but I am not sure if they can be written using two letters as well.
I think in the second example "aA" is not changed while "AA" is changed to "Å". Just to confuse things even more, "Aa" is considered equal to "AA" and "aa" only when using case-insensitive comparing.
丹麦语中“洗礼”的现代拼写为 dåb,对于 Danophone 来说,当然不被认为以 da 开头。如果daab被认为是dåb的老式拼写,那么它是否以da开头就有点哲学了。但出于(现代)整理目的,它不会(按字母顺序,这样的 daab 位于 disk 之后,而不是之前)。
但是,如果您的字符串不应该代表自然语言,而是某种技术代码,例如十六进制数字,那么您肯定不想使用任何特定于文化的规则。这里的解决方案是不使用不变区域性。不变的文化有它自己的(英语)规则!
相反,您想要使用序数比较。
序数比较只是将字符串
char
与char
进行比较,而不假设哪些序列在某种意义上是“等效的”。 (技术评论:每个char
都是一个 UTF-16 代码单元,而不是一个“字符”。序数比较不了解 Unicode 规范化的规则。)我认为出现这种混乱是因为,默认情况下,某些
string
方法使用区域性感知比较,其他string
方法使用序数比较。以下示例均使用区域性感知比较:
每一个上述内容也可能取决于 .NET 版本! (例如,如果当前区域性是固定区域性并且您使用的是 .NET Framework 4.8,则第一个给出
true
;但如果当前区域性是不变区域性,并且您使用 .NET 6。)但是这些示例使用序数比较:
因此请记住检查您使用的
string
方法的默认比较是什么,并在需要时指定相反的比较。 (或者,如果您愿意,请始终指定比较,即使是多余的。)The modern spelling of "baptism" in Danish, namely dåb, is certainly not considered to start with da, for a Danophone. If daab is supposed to be an old-fashioned spelling of dåb, it is a bit philosophical whether it starts with da or not. But for (modern) collation purposes, it does not (alphabetically, such daab goes after disk, not before).
However, if your string is not supposed to represent natural language, but is instead some kind of technical code, like hexadecimal digits, surely you do not want to use any culture-specific rules. The solution here is not to use the invariant culture. The invariant culture has (English) rules itself!
Instead, you want to use ordinal comparison.
Ordinal comparison simply compares the strings
char
bychar
, without any assumptions of what sequences are "equivalent" in some sense. (Technical remark: Eachchar
is a UTF-16 code unit, not a "character". Ordinal comparison is ignorant of the rules of Unicode normalization.)I think the confusion arises because, by default, some
string
methods use a culture-aware comparison, and otherstring
methods use the ordinal comparison.The following examples all use a culture-aware comparison:
Each of the above may depend on the .NET version as well! (As an example, the first one gives
true
if the current culture is the invariant culture and you are under .NET Framework 4.8; but it givesfalse
if the current culture is the invariant culture and you use .NET 6.)But these examples use ordinal comparison:
So remember to check what the default comparison is for the
string
method you use, and specify the opposite one if needed. (Or always specify the comparison, even when redundant, if you prefer.)