C# UTF-32 ToLower
我正在寻找一种将 Unicode UTF-32 (int) 转换为小写的方法。在Java中,类似这样的东西就可以解决问题:
Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))
我有来自Char.ConvertToUtf32的UTF-32,但似乎没有办法降低该值的大小写。
更新: 我正在处理一个字符流/数组,我通过寻找 hi 代理找到了代码点,有点类似于上面的 Java snipit。来回转换为字符串的效率会很低。
I'm looking for a way to convert Unicode UTF-32 (int) to lower case. In Java, something like this, would do the trick:
Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))
I have UTF-32 from Char.ConvertToUtf32, but there doesn't seem to be a way to lower case that value.
UPDATE:
I'm dealing with a stream/array of chars, I've found the code points by looking for the hi surrogate, somewhat similar to the Java snipit above. Converting back and forth to String is going to be to inefficient.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
执行此操作的唯一内置方法是将 UTF-32 转换为字符串。类似以下内容应该有效:
您指出这对于您的需求来说效率低下。你对它进行了基准测试吗?
如果您仍然坚持在 UTF-32 上进行大小写,那么您将需要自己动手。幸运的是,Unicode 联盟已经完成了大部分艰苦的工作。查看 Unicode 大小写折叠文件。解析此文件,将数据存储在适当的结构中。然后可以直接根据您喜欢的任何格式的数据完成大小写。
The only built-in way to do this is convert the UTF-32 to a String. Something like the following should work:
You indicate that this is inefficient for your needs. Have you benchmarked it?
If you still insist on doing casing on UTF-32, then you will need to roll your own. Luckily, the Unicode Consortium has done most of the hard work. Take a look at the Unicode case folding file. Parse this file storing the data in an appropriate structure. Then the casing can be done directly against that with your data in whatever format you prefer.