在 Microsoft 编程语言中比较字符串时应使用什么情况?
注意:这是我出于历史兴趣而提出的一个问题,因为我意识到现代语言具有内置的正则表达式和不区分大小写的字符串比较方法。
当比较未知大小写的两个字符串时,我记得读过基于 Microsoft 的转换方法,该方法针对大写而不是小写进行了优化。 所以:
If (stringA.ToUpper() == stringB.ToUpper()) { ... }
会比:
If (stringA.ToLower() == stringB.ToLower()) { ... }
如果这是真的,那么当您需要搜索字符串数据时,以大写而不是小写形式存储字符串数据会更好吗?
Note: This is a question I’m asking more out of historical interest, as I realise that modern languages have built-in regular expressions and case insensitive string compare methods.
When comparing two strings of an unknown case, I can remember reading that Microsoft based conversion methods where optimized for uppercase rather than lowercase. So:
If (stringA.ToUpper() == stringB.ToUpper()) { ... }
would be quicker than:
If (stringA.ToLower() == stringB.ToLower()) { ... }
If this is true, would it be better to store string data in upper rather than lower case when you need to search it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 .NET 中,我们可以执行如下操作:
并且无需担心将字符串转换为大写或小写。 有关此内容的更多信息,请此处。
In .NET we could do something like the following:
and not need to worry about turning the strings into upper or lower case. More on this here.
一般情况下没有安全的情况可以使用。
无论你做出什么选择,在某些情况下都会失败。
在过去基于 ToUpper 和 ToLower 的方法中,假设仅使用英文文本并忽略世界上大多数字形和字符。 为了更加开明,您需要使用大小写映射表作为不区分大小写的比较的基础。
There is no safe case to use in the general case.
Whatever choice you make it will fail in some cases.
In the past approaches based on ToUpper and ToLower where making assumptions about working in only English text and ignoring the majority of the worlds glyphs and characters. To be more enlightened you need to use case mapping tables as the basis for case-insensitive comparisons.
在 ANSI/ASCII 代码中,大写字母的值低于小写字母的值。 “A”是代码 65,“a”是代码 97。二进制 01000001 和 01100001。)因此,小写和大写字母之间的差异是一位。
但这对速度有影响吗? 在所有情况下都必须比较所有 8 位。 因此,如果两个位都为 0 时,比较两个位会更快,那么任何速度差异都可以解释。这对我来说没有多大意义,但话又说回来,在一些较旧的处理器中,这在过去可能是正确的。
但如今呢? 我认为你不会注意到任何差异。
However, there could be a speed difference in converting lowercase to uppercase or vice versa. Especially when you have to support letters with accents or other non-ANSI letters. In these cases a special mapping must be used which might have been optimized for one direction. It's not the comparison that would be slow, it would be the convertion slowing things up.
In ANSI/ASCII codes, uppercase letters have lower values than lowercase letters. The "A" is code 65 and the "a" is code 97. Binary 01000001 and 01100001.) The difference between lowercase and uppercase letters is thus a single bit.
But does this matter for speed? In all cases all 8 bits have to be compared. So any speed difference could be explained if comparing two bits is faster if both bits are 0. That doesn't make much sense to me but then again, in some older processors this could have been true in the past.
But nowadays? I don't think you'll notice any difference.
However, there could be a speed difference in converting lowercase to uppercase or vice versa. Especially when you have to support letters with accents or other non-ANSI letters. In these cases a special mapping must be used which might have been optimized for one direction. It's not the comparison that would be slow, it would be the convertion slowing things up.