Java 的 toLowerCase() 是否保留原始字符串长度?
假设有两个 Java String 对象:
String str = "<my string>";
String strLower = str.toLowerCase();
对于
的每个值,表达式的
str.length() == strLower.length()
计算结果是否为 true
?
那么,String.toLowerCase()
是否会为任何 String 值保留原始字符串长度?
Assume two Java String objects:
String str = "<my string>";
String strLower = str.toLowerCase();
Is it then true that for every value of <my string>
the expression
str.length() == strLower.length()
evaluates to true
?
So, does String.toLowerCase()
preserve original string length for any value of String?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
令人惊讶的是它没有!
来自 的 Java 文档改为小写
示例:
Surprisingly it does not!!
From Java docs of toLowerCase
Example:
首先,我想指出,我绝对同意@codaddict 的(目前评分最高的)答案。
但我想做一个实验,所以这里是:
这不是一个正式的证明,但是这段代码为我运行而没有到达if
的内部(使用 JDK 1.6.0 Update Ubuntu 上的版本为 16):编辑: 这里还有一些处理区域设置的更新代码:
使用接受的答案中提到的区域设置名称运行该代码将打印一些示例。在不带参数的情况下运行它将尝试所有可用的区域设置(并且需要相当长的时间!)。
它并不广泛,因为理论上可能存在行为不同的多字符字符串,但这是一个很好的初步近似。另请注意,以这种方式生成的许多两个字符组合可能是无效的 UTF-16 ,因此这段代码中没有任何爆炸的事实只能归咎于 Java 中非常强大的 String API。
最后但并非最不重要的一点是:即使当前的 Java 实现的假设是正确的,一旦 Java 的未来版本实现了 Unicode 标准的未来版本,这种情况就很容易改变,其中新字符的规则可能会出现不再适用的情况。确实如此。
所以依赖于此仍然是一个非常糟糕的主意。
First of all, I'd like to point out that I absolutely agree with the (currently highest-rated) answer of @codaddict.
But I wanted to do an experiment, so here it is:
It's not a formal proof, but this code ran for me without ever reaching the inside of theif
(using JDK 1.6.0 Update 16 on Ubuntu):Edit: Here's some updated code that handles Locales as well:
Running that code with the locale names mentioned in the accepted answer will print some examples. Running it without an argument will try all available locales (and take quite a while!).
It's not extensive, because theoretically there could be multi-character Strings that behave differently, but it's a good first approximation.Also note that many of the two-character combinations produced this way are probably invalid UTF-16, so the fact that nothing explodes in this code can only be blamed on a very robust String API in Java.
And last but not least: even if the assumption is true for the current implementation of Java, that can easily change once future versions of Java implement future versions of the Unicode standard, in which the rules for new characters may introduce situations where this no longer holds true.
So depending on this is still a pretty bad idea.
还要记住 toUpperCase() 也不保留长度。示例:对于德语语言环境,“straße”变为“STRASSE”。因此,如果您使用区分大小写的字符串并且需要存储某些内容的索引,那么您或多或少会遇到麻烦。
编辑:
这是@julaine的。这是丑陋的旧式代码,但我不再积极使用 Java 进行编码。
如果你运行它,你会发现我是对的:
原因很简单,直到 2017 年(左右),ß 的唯一官方大写是 SS(或更罕见的 SZ)。语言环境中不需要字典来处理这个问题:-)
Also remember that toUpperCase() does not preserve the length either. Example: “straße” becomes “STRASSE” for the German locale. So you're more or less screwed if you're working with case-sensitive strings and you need to store the index for something.
Edit:
Here is for @julaine. It is ugly old-style code, but I am no longer actively coding in Java.
If you run it, you will see that I am right:
The reason is simply that until 2017 (or so), the only official capitalization of ß was SS (or the rarer SZ). No dictionary is needed in the locale to handle that :-)