丹麦语文本的 String StartsWith() 问题

发布于 2024-11-18 09:49:37 字数 574 浏览 2 评论 0原文

谁能解释这种行为?

var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false

我知道可以通过指定 StringComparison.InvariantCulture 来修复它。但我只是对这种行为感到困惑。

我还知道,在丹麦语不区分大小写的比较中,“aA”和“AA”不被视为相同,请参阅 http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx。这解释了

String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)

这是否与第一个代码片段的行为有关?

Can anyone explain this behaviour?

var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false

I know that it can be fixed by specifying StringComparison.InvariantCulture. But I'm just confused by the behavior.

I also know that "aA" and "AA" are not considered the same in a Danish case-insensitive comparision, see http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx. Which explains this

String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)

Is this linked to the behavior of the first code snippet?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦纸 2024-11-25 09:49:37

这里有一个测试说明了这个问题,daab og dåb(在古代和现代语言中分别是同一个词)意味着洗礼/洗礼。

public class can_handle_remnant_of_danish_language
{
    [Fact]
    public void daab_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("då")); // Fails
    }

    [Fact]
    public void daab_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("da")); // Fails
    }

    [Fact]
    public void daab_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("daa")); // Succeeds
    }

    [Fact]
    public void dåb_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("daa")); // Fails
    }

    [Fact]
    public void dåb_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("da")); // Fails
    }

    [Fact]
    public void dåb_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("då")); // Succeeds
    }
}

根据我对语言的理解,上述所有测试都应该成功,我是丹麦人!
不过我没有获得语法学位。 :-)

对我来说似乎是一个错误。

Here a test that illustrates the problem, daab og dåb (same word in old and modern language respectively) means baptism/christening.

public class can_handle_remnant_of_danish_language
{
    [Fact]
    public void daab_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("då")); // Fails
    }

    [Fact]
    public void daab_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("da")); // Fails
    }

    [Fact]
    public void daab_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("daab".StartsWith("daa")); // Succeeds
    }

    [Fact]
    public void dåb_start_with_daa()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("daa")); // Fails
    }

    [Fact]
    public void dåb_start_with_da()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("da")); // Fails
    }

    [Fact]
    public void dåb_start_with_då()
    {
        var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
        Assert.True("dåb".StartsWith("då")); // Succeeds
    }
}

All the above tests should be successfull with my understanding of the language, and im danish!
I aint got no degree in grammar though. :-)

Seems like a bug to me.

甜嗑 2024-11-25 09:49:37

就像Nappy说的,这是丹麦语的一个特点,其中“aa”和“å”仍然是一样的。丹麦语还有另外两个字母,æ 和 ø,但我不确定它们是否也可以用两个字母书写。

我认为在第二个例子中“aA”没有改变,而“AA”被改变为“Å”。更令人困惑的是,仅在使用不区分大小写的比较时,“Aa”才被视为等于“AA”和“aa”。

Like Nappy said, its a feature of the danish language, where "aa" and "å" is still the same. Danish got another two letters, æ and ø, but I am not sure if they can be written using two letters as well.

I think in the second example "aA" is not changed while "AA" is changed to "Å". Just to confuse things even more, "Aa" is considered equal to "AA" and "aa" only when using case-insensitive comparing.

雪花飘飘的天空 2024-11-25 09:49:37

丹麦语中“洗礼”的现代拼写为 dåb,对于 Danophone 来说,当然不被认为以 da 开头。如果daab被认为是dåb的老式拼写,那么它是否以da开头就有点哲学了。但出于(现代)整理目的,它不会(按字母顺序,这样的 daab 位于 disk 之后,而不是之前)。

但是,如果您的字符串不应该代表自然语言,而是某种技术代码,例如十六进制数字,那么您肯定不想使用任何特定于文化的规则。这里的解决方案是使用不变区域性。不变的文化有它自己的(英语)规则!

相反,您想要使用序数比较。

序数比较只是将字符串 charchar 进行比较,而不假设哪些序列在某种意义上是“等效的”。 (技术评论:每个 char 都是一个 UTF-16 代码单元,而不是一个“字符”。序数比较不了解 Unicode 规范化的规则。)

我认为出现这种混乱是因为,默认情况下,某些 string 方法使用区域性感知比较,其他 string 方法使用序数比较。

以下示例均使用区域性感知比较:

"Straße".StartsWith("Strasse", StringComparison.CurrentCulture)
"Straße".Equals("Strasse", StringComparison.CurrentCulture)
"ne\u0301e".StartsWith("née", StringComparison.CurrentCulture)
"ne\u0301e".Equals("née", StringComparison.CurrentCulture)

"Straße".StartsWith("Strasse")  // CurrentCulture is default for 'StartsWith'!
"ne\u0301e".StartsWith("née")   // CurrentCulture is default for 'StartsWith'!

每一个上述内容也可能取决于 .NET 版本! (例如,如果当前区域性是固定区域性并且您使用的是 .NET Framework 4.8,则第一个给出 true;但如果当前区域性是不变区域性,并且您使用 .NET 6。)

但是这些示例使用序数比较:

"Straße".StartsWith("Strasse", StringComparison.Ordinal)
"Straße".Equals("Strasse", StringComparison.Ordinal)
"ne\u0301e".StartsWith("née", StringComparison.Ordinal)
"ne\u0301e".Equals("née", StringComparison.Ordinal)

"Straße".Equals("Strasse")  // Ordinal is default for 'Equals'!
"ne\u0301e".Equals("née")   // Ordinal is default for 'Equals'!

因此请记住检查您使用的 string 方法的默认比较是什么,并在需要时指定相反的比较。 (或者,如果您愿意,请始终指定比较,即使是多余的。)

The modern spelling of "baptism" in Danish, namely dåb, is certainly not considered to start with da, for a Danophone. If daab is supposed to be an old-fashioned spelling of dåb, it is a bit philosophical whether it starts with da or not. But for (modern) collation purposes, it does not (alphabetically, such daab goes after disk, not before).

However, if your string is not supposed to represent natural language, but is instead some kind of technical code, like hexadecimal digits, surely you do not want to use any culture-specific rules. The solution here is not to use the invariant culture. The invariant culture has (English) rules itself!

Instead, you want to use ordinal comparison.

Ordinal comparison simply compares the strings char by char, without any assumptions of what sequences are "equivalent" in some sense. (Technical remark: Each char is a UTF-16 code unit, not a "character". Ordinal comparison is ignorant of the rules of Unicode normalization.)

I think the confusion arises because, by default, some string methods use a culture-aware comparison, and other string methods use the ordinal comparison.

The following examples all use a culture-aware comparison:

"Straße".StartsWith("Strasse", StringComparison.CurrentCulture)
"Straße".Equals("Strasse", StringComparison.CurrentCulture)
"ne\u0301e".StartsWith("née", StringComparison.CurrentCulture)
"ne\u0301e".Equals("née", StringComparison.CurrentCulture)

"Straße".StartsWith("Strasse")  // CurrentCulture is default for 'StartsWith'!
"ne\u0301e".StartsWith("née")   // CurrentCulture is default for 'StartsWith'!

Each of the above may depend on the .NET version as well! (As an example, the first one gives true if the current culture is the invariant culture and you are under .NET Framework 4.8; but it gives false if the current culture is the invariant culture and you use .NET 6.)

But these examples use ordinal comparison:

"Straße".StartsWith("Strasse", StringComparison.Ordinal)
"Straße".Equals("Strasse", StringComparison.Ordinal)
"ne\u0301e".StartsWith("née", StringComparison.Ordinal)
"ne\u0301e".Equals("née", StringComparison.Ordinal)

"Straße".Equals("Strasse")  // Ordinal is default for 'Equals'!
"ne\u0301e".Equals("née")   // Ordinal is default for 'Equals'!

So remember to check what the default comparison is for the string method you use, and specify the opposite one if needed. (Or always specify the comparison, even when redundant, if you prefer.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文