字符转字节？（爪哇）

发布于 2024-10-17 02:52:32 字数 517 浏览 8 评论 0 原文

为什么会发生这种情况：

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

基本上，我看到 char 是 16 位的。因此，如果将其转换为字节，为什么不会丢失数据呢？（转换为 int 后值是相同的）

提前感谢您回答我的这个无知的小问题。：P

编辑：哇啊，发现我原来的输出实际上按预期进行，但我只是更新了上面的代码。基本上，一个字符被转换为一个字节，然后转换回一个字符，并保留其原始的 2 字节值。这是怎么发生的？

原文

How come this happens:

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

Basically, I saw that a char is 16-bit. Therefore, if you cast it into a byte, how come no data is lost? (Value is the same after casting into an int)

Thanks in advance for answering this little ignorant question of mine. :P

EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

錯遇了你 2024-10-24 02:52:32

正如 trojanfoe 所说，您对代码结果的困惑部分是由于符号扩展。我将尝试添加更详细的解释，以帮助您解决困惑。

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

正如您所指出的，这确实会导致信息丢失。这被认为是缩小转换< /a>.将 char 转换为字节“简单地丢弃除 n 个最低位之外的所有位”。
结果是：0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

将字节转换为字符被视为特殊转换。它实际上执行两次转换。首先，字节被符号扩展（新的高位从旧的符号位复制）到 int（正常的加宽转换）。其次，通过缩小转换将 int 转换为 char。
结果是：0xFF -> 0xFFFFFFFF-> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

将 char 转换为 int 被视为扩大转化。当 char 类型扩展为整型时，它会进行零扩展（新的高位设置为 0）。
结果是：0xFFFF -> 0x0000FFFF。打印后，这将为您提供 65535。

我提供的三个链接是有关原始类型转换的官方 Java 语言规范详细信息。我强烈建议您看一下。它们并不是非常冗长（在本例中相对简单）。它详细描述了 java 在幕后通过类型转换执行的操作。这是许多开发人员普遍存在的误解。如果您仍然对任何步骤感到困惑，请发表评论。

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.

The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.

回复收藏 0 原文

剑心龙吟 2024-10-24 02:52:32

这是符号扩展。尝试用 \u1234 而不是 \uffff 看看会发生什么。

回复收藏 0 原文

风筝在阴天搁浅。 2024-10-24 02:52:32

java byte 已签名。这是违反直觉的。在几乎所有使用字节的情况下，程序员都希望使用无符号字节。如果直接将字节转换为 int，则极有可能是一个错误。

这在几乎所有程序中都正确地完成了预期的转换：

int c = 0xff & b ;

根据经验，有符号字节的选择是一个错误。

java byte is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.

This does the intended conversion correctly in almost all programs: