如何获取 Unicode 字符代码?

发布于 2024-08-16 20:03:17 字数 121 浏览 2 评论 0原文

假设我有这个:

char registered = '®';

umlaut,或任何 unicode 字符。我怎样才能得到它的代码?

Let's say I have this:

char registered = '®';

or an umlaut, or whatever unicode character. How could I get its code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

快乐很简单 2024-08-23 20:03:18

有一个开源库 MgntUtils,它有一个实用程序类 StringUnicodeEncoderDecoder。该类提供了将任何 String 转换为 Unicode 序列的静态方法,反之亦然。非常简单又有用。要转换字符串,您只需执行以下操作:

String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);

例如,字符串“Hello World”将转换为

“\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064”

它适用于任何语言。以下是解释有关该库的所有详细信息的文章的链接:MgntUtils。查找副标题“字符串 Unicode 转换器”。该库可以作为 Maven 工件< /a> 或取自 Github (包括源代码和 Javadoc)

There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:

String codes = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(myString);

For example a String "Hello World" will be converted into

"\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064"

It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The library could be obtained as a Maven artifact or taken from Github (including source code and Javadoc)

落花随流水 2024-08-23 20:03:18

亲爱的朋友,Jon Skeet 说你可以找到字符十进制代码,但它不是字符十六进制代码,因为它应该在 unicode 中提到,所以你应该通过 HexCode 而不是 Deciaml 来表示字符代码。

http://unicode.codeplex.com 上有一个开源工具,它提供有关字符或字符的完整信息一句话。

所以最好创建一个解析器,将 char 作为参数并返回 ahexCode 作为字符串

public static String GetHexCode(char character)
    {
        return String.format("{0:X4}", GetDecimal(character));
    }//end

希望它有帮助

dear friend, Jon Skeet said you can find character Decimal codebut it is not character Hex code as it should mention in unicode, so you should represent character codes via HexCode not in Deciaml.

there is an open source tool at http://unicode.codeplex.com that provides complete information about a characer or a sentece.

so it is better to create a parser that give a char as a parameter and return ahexCode as string

public static String GetHexCode(char character)
    {
        return String.format("{0:X4}", GetDecimal(character));
    }//end

hope it help

我不咬妳我踢妳 2024-08-23 20:03:18

//下面可以得到unicode

int a = 'a';
// 'a' 是一个字母或符号,你想得到它的unicode

//你可以通过它的unicode得到下面的符号或字母

System.out.println("\123");
//123是你要传输的unicode

//You can get unicode below

int a = 'a';
// 'a' is a letter or symbol you want to get its unicode

//You can get symbel or letter below by its unicode

System.out.println("\123");
//123 is an unicode you want to transfer

指尖微凉心微凉 2024-08-23 20:03:17

只需将其转换为 int

char registered = '®';
int code = (int) registered;

事实上,存在从 charint 的隐式转换,因此您不必像我一样显式指定它上面已经完成了,但在这种情况下我会这样做,以使你想要做什么变得显而易见。

这将给出 UTF-16 代码单元 - 它与基本多语言平面中定义的任何字符的 Unicode 代码点相同。 (并且只有 BMP 字符可以在 Java 中表示为 char 值。)正如 Andrzej Doyle 的回答所述,如果您想要任意字符串中的 Unicode 代码点,请使用 Character.codePointAt()< /代码>。

一旦您获得了 UTF-16 代码单元或 Unicode 代码点(两者都是整数),就取决于您如何处理它们。如果您想要一个字符串表示形式,您需要准确地决定您想要什么种类的表示形式。 (例如,如果您知道该值始终位于 BMP 中,则可能需要一个固定的 4 位十六进制表示形式,前缀为 U+,例如 "U+0020"对于空间。)但这超出了这个问题的范围,因为我们不知道要求是什么。

Just convert it to int:

char registered = '®';
int code = (int) registered;

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as char values in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt().

Once you've got the UTF-16 code unit or Unicode code points, both of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kind of representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+, e.g. "U+0020" for space.) That's beyond the scope of this question though, as we don't know what the requirements are.

失去的东西太少 2024-08-23 20:03:17

更完整但更冗长的方法是使用 Character.codePointAt 方法。这将处理“高代理”字符,这些字符不能由 char 可以表示的范围内的单个整数表示。

在您给出的示例中,这并不是绝对必要的 - 如果 (Unicode) 字符可以容纳在单个 (Java) char 中(例如 registered 局部变量)那么它必须落在 \u0000\uffff 范围内,并且您无需担心代理对。但是,如果您正在从 String/char 数组中查看潜在的更高代码点,那么调用此方法是明智的,以便覆盖边缘情况。

例如,而不是

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

使用

String input = ...;
int codePoint = Character.codePointAt(input, 4);

在本例中这不仅代码稍微少一些,而且它将为您处理代理对的检测。

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAt method. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a char can represent.

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char (such as the registered local variable) then it must fall within the \u0000 to \uffff range, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

For example, instead of

String input = ...;
char fifthChar = input.charAt(4);
int codePoint = (int)fifthChar;

use

String input = ...;
int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

冷…雨湿花 2024-08-23 20:03:17

在Java中,char在技术上是一个“16位整数”,所以你可以简单地将它转换为int,你就会得到它的代码。
来自 Oracle

char 数据类型是单个 16 位 Unicode 字符。它有一个
最小值为“\u0000”(或 0),最大值为“\uffff”(或
65,535(含)。

所以你可以简单地将它转换为 int 。

char registered = '®';
System.out.println(String.format("This is an int-code: %d", (int) registered));
System.out.println(String.format("And this is an hexa code: %x", (int) registered));

In Java, char is technically a "16-bit integer", so you can simply cast it to int and you'll get it's code.
From Oracle:

The char data type is a single 16-bit Unicode character. It has a
minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or
65,535 inclusive).

So you can simply cast it to int.

char registered = '®';
System.out.println(String.format("This is an int-code: %d", (int) registered));
System.out.println(String.format("And this is an hexa code: %x", (int) registered));
感情废物 2024-08-23 20:03:17

对我来说,只有“Integer.toHexString(registered)”按照我想要的方式工作:

char registered = '®';
System.out.println("Answer:"+Integer.toHexString(registered));

这个答案只会为您提供表格中通常呈现的字符串表示形式。乔恩·斯基特的回答解释了更多。

For me, only "Integer.toHexString(registered)" worked the way I wanted:

char registered = '®';
System.out.println("Answer:"+Integer.toHexString(registered));

This answer will give you only string representations what are usually presented in the tables. Jon Skeet's answer explains more.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文