在 Microsoft 编程语言中比较字符串时应使用什么情况?

发布于 2024-07-25 15:13:53 字数 369 浏览 2 评论 0原文

注意:这是我出于历史兴趣而提出的一个问题,因为我意识到现代语言具有内置的正则表达式和不区分大小写的字符串比较方法。

当比较未知大小写的两个字符串时,我记得读过基于 Microsoft 的转换方法,该方法针对大写而不是小写进行了优化。 所以:

If (stringA.ToUpper() == stringB.ToUpper()) { ... }

会比:

If (stringA.ToLower() == stringB.ToLower()) { ... }

如果这是真的,那么当您需要搜索字符串数据时,以大写而不是小写形式存储字符串数据会更好吗?

Note: This is a question I’m asking more out of historical interest, as I realise that modern languages have built-in regular expressions and case insensitive string compare methods.

When comparing two strings of an unknown case, I can remember reading that Microsoft based conversion methods where optimized for uppercase rather than lowercase. So:

If (stringA.ToUpper() == stringB.ToUpper()) { ... }

would be quicker than:

If (stringA.ToLower() == stringB.ToLower()) { ... }

If this is true, would it be better to store string data in upper rather than lower case when you need to search it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

平定天下 2024-08-01 15:13:53

在 .NET 中,我们可以执行如下操作:

if (String.Compare(stringA, stringB, StringComparison.InvariantCultureIgnoreCase) == 0) {...}

并且无需担心将字符串转换为大写或小写。 有关此内容的更多信息,请此处

In .NET we could do something like the following:

if (String.Compare(stringA, stringB, StringComparison.InvariantCultureIgnoreCase) == 0) {...}

and not need to worry about turning the strings into upper or lower case. More on this here.

梦中楼上月下 2024-08-01 15:13:53

一般情况下没有安全的情况可以使用。

无论你做出什么选择,在某些情况下都会失败。

  • 有些语言没有大小写(并不是真正的问题)。
  • 有些语言有第三个“标题”大小写。
  • 有些字符不往返,例如ToUpper(“ß”)是“SS”,ToLower(“SS”)是“ss”,但有些单词只能通过“ß”与“ss”来区分,因此会给出一个误报通过映射到大写来匹配(这将打破关于大小写映射不改变字符串长度的假设)。
  • 大小写映射取决于语言。 例如,ToLower("I") 是“i”,除非您使用土耳其语或阿扎里语工作,结果是“ı”(拉丁文小写字母无点 I),而 ToUpper("i") 是“ı”(带有拉丁文大写字母 I)上面的点)。

在过去基于 ToUpper 和 ToLower 的方法中,假设仅使用英文文本并忽略世界上大多数字形和字符。 为了更加开明,您需要使用大小写映射表作为不区分大小写的比较的基础。

There is no safe case to use in the general case.

Whatever choice you make it will fail in some cases.

  • Some languages have no case (not really a problem).
  • Some languages have a third "title" case.
  • Some characters do not round trip, e.g. ToUpper("ß") is "SS", and ToLower("SS") is "ss", but there are some words only distingished by "ß" vs "ss" so will give a false positive is matched by mapping to upper case (and which will break assumptions about case mapping not changing string lengths).
  • Case mapping is language dependent. E.g. ToLower("I") is "i" unless you have working in Turkish or Azari where the result is "ı" (Latin Small Letter Dotless I) and ToUpper("i") is "İ" (Latin Capital Letter I With Dot Above).

In the past approaches based on ToUpper and ToLower where making assumptions about working in only English text and ignoring the majority of the worlds glyphs and characters. To be more enlightened you need to use case mapping tables as the basis for case-insensitive comparisons.

凡尘雨 2024-08-01 15:13:53

在 ANSI/ASCII 代码中,大写字母的值低于小写字母的值。 “A”是代码 65,“a”是代码 97。二进制 01000001 和 01100001。)因此,小写和大写字母之间的差异是一位。
但这对速度有影响吗? 在所有情况下都必须比较所有 8 位。 因此,如果两个位都为 0 时,比较两个位会更快,那么任何速度差异都可以解释。这对我来说没有多大意义,但话又说回来,在一些较旧的处理器中,这在过去可能是正确的。
但如今呢? 我认为你不会注意到任何差异。


However, there could be a speed difference in converting lowercase to uppercase or vice versa. Especially when you have to support letters with accents or other non-ANSI letters. In these cases a special mapping must be used which might have been optimized for one direction. It's not the comparison that would be slow, it would be the convertion slowing things up.

In ANSI/ASCII codes, uppercase letters have lower values than lowercase letters. The "A" is code 65 and the "a" is code 97. Binary 01000001 and 01100001.) The difference between lowercase and uppercase letters is thus a single bit.
But does this matter for speed? In all cases all 8 bits have to be compared. So any speed difference could be explained if comparing two bits is faster if both bits are 0. That doesn't make much sense to me but then again, in some older processors this could have been true in the past.
But nowadays? I don't think you'll notice any difference.


However, there could be a speed difference in converting lowercase to uppercase or vice versa. Especially when you have to support letters with accents or other non-ANSI letters. In these cases a special mapping must be used which might have been optimized for one direction. It's not the comparison that would be slow, it would be the convertion slowing things up.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文