字符转字节? (爪哇)

发布于 2024-10-17 02:52:32 字数 517 浏览 8 评论 0 原文

为什么会发生这种情况:

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

基本上,我看到 char 是 16 位的。因此,如果将其转换为字节,为什么不会丢失数据呢? (转换为 int 后值是相同的)

提前感谢您回答我的这个无知的小问题。 :P

编辑:哇啊,发现我原来的输出实际上按预期进行,但我只是更新了上面的代码。基本上,一个字符被转换为一个字节,然后转换回一个字符,并保留其原始的 2 字节值。这是怎么发生的?

How come this happens:

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

Basically, I saw that a char is 16-bit. Therefore, if you cast it into a byte, how come no data is lost? (Value is the same after casting into an int)

Thanks in advance for answering this little ignorant question of mine. :P

EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

錯遇了你 2024-10-24 02:52:32

正如 trojanfoe 所说,您对代码结果的困惑部分是由于符号扩展。我将尝试添加更详细的解释,以帮助您解决困惑。

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

正如您所指出的,这确实会导致信息丢失。这被认为是缩小转换< /a>.将 char 转换为字节“简单地丢弃除 n 个最低位之外的所有位”。
结果是:0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

将字节转换为字符被视为特殊转换。它实际上执行两次转换。首先,字节被符号扩展(新的高位从旧的符号位复制)到 int(正常的加宽转换)。其次,通过缩小转换将 int 转换为 char。
结果是:0xFF -> 0xFFFFFFFF-> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

将 char 转换为 int 被视为 扩大转化。当 char 类型扩展为整型时,它会进行零扩展(新的高位设置为 0)。
结果是:0xFFFF -> 0x0000FFFF。打印后,这将为您提供 65535。

我提供的三个链接是有关原始类型转换的官方 Java 语言规范详细信息。我强烈建议您看一下。它们并不是非常冗长(在本例中相对简单)。它详细描述了 java 在幕后通过类型转换执行的操作。这是许多开发人员普遍存在的误解。如果您仍然对任何步骤感到困惑,请发表评论。

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.

The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.

剑心龙吟 2024-10-24 02:52:32

这是符号扩展。尝试用 \u1234 而不是 \uffff 看看会发生什么。

It's sign extension. Try \u1234 instead of \uffff and see what happens.

风筝在阴天搁浅。 2024-10-24 02:52:32

java byte 已签名。这是违反直觉的。在几乎所有使用字节的情况下,程序员都希望使用无符号字节。如果直接将字节转换为 int,则极有可能是一个错误。

这在几乎所有程序中都正确地完成了预期的转换:

int c = 0xff & b ;

根据经验,有符号字节的选择是一个错误。

java byte is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.

This does the intended conversion correctly in almost all programs:

int c = 0xff & b ;

Empirically, the choice of signed byte is a mistake.

海螺姑娘 2024-10-24 02:52:32

你的机器上出现了一些相当奇怪的东西。查看 Java 语言规范,第 4.2 章。 1:

整数类型的值为
以下范围内的整数:

对于字节,从-128到127(含)

...剪掉其他...

如果您的 JVM 符合标准,那么您的输出应该是 -1

Some rather strange stuff going on your machine. Take a look at Java language specification, chapter 4.2.1:

The values of the integral types are
integers in the following ranges:

For byte, from -128 to 127, inclusive

... snip others...

If your JVM is standards compliant, then your output should be -1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文