C# UTF-32 ToLower

发布于 2024-10-09 19:44:40 字数 305 浏览 0 评论 0原文

我正在寻找一种将 Unicode UTF-32 (int) 转换为小写的方法。在Java中,类似这样的东西就可以解决问题:

Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))

我有来自Char.ConvertToUtf32的UTF-32,但似乎没有办法降低该值的大小写。

更新: 我正在处理一个字符流/数组,我通过寻找 hi 代理找到了代码点,有点类似于上面的 Java snipit。来回转换为字符串的效率会很低。

I'm looking for a way to convert Unicode UTF-32 (int) to lower case. In Java, something like this, would do the trick:

Character.toChars(Character.toLowerCase(Character.codePointAt(text, i)))

I have UTF-32 from Char.ConvertToUtf32, but there doesn't seem to be a way to lower case that value.

UPDATE:
I'm dealing with a stream/array of chars, I've found the code points by looking for the hi surrogate, somewhat similar to the Java snipit above. Converting back and forth to String is going to be to inefficient.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只涨不跌 2024-10-16 19:44:40

执行此操作的唯一内置方法是将 UTF-32 转换为字符串。类似以下内容应该有效:

static Int32 ToLower(Int32 c)
{
    // Convert UTF-32 character to a UTF-16 String.
    var strC = Char.ConvertFromUtf32(c);

    // Casing rules depends on the culture.
    // Consider using ToLowerInvariant().
    var lower = strC.ToLower();

    // Convert the UTF-16 String back to UTF-32 character and return it.
    return Char.ConvertToUtf32(lower, 0);
}

您指出这对于您的需求来说效率低下。你对它进行了基准测试吗?

如果您仍然坚持在 UTF-32 上进行大小写,那么您将需要自己动手。幸运的是,Unicode 联盟已经完成了大部分艰苦的工作。查看 Unicode 大小写折叠文件。解析此文件,将数据存储在适当的结构中。然后可以直接根据您喜欢的任何格式的数据完成大小写。

The only built-in way to do this is convert the UTF-32 to a String. Something like the following should work:

static Int32 ToLower(Int32 c)
{
    // Convert UTF-32 character to a UTF-16 String.
    var strC = Char.ConvertFromUtf32(c);

    // Casing rules depends on the culture.
    // Consider using ToLowerInvariant().
    var lower = strC.ToLower();

    // Convert the UTF-16 String back to UTF-32 character and return it.
    return Char.ConvertToUtf32(lower, 0);
}

You indicate that this is inefficient for your needs. Have you benchmarked it?

If you still insist on doing casing on UTF-32, then you will need to roll your own. Luckily, the Unicode Consortium has done most of the hard work. Take a look at the Unicode case folding file. Parse this file storing the data in an appropriate structure. Then the casing can be done directly against that with your data in whatever format you prefer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文