为什么Java char原语占用2字节内存?
Java char 原始数据类型是 2 个字节,而 C 是 1 个字节,有什么原因吗?
谢谢
Is there any reason why Java char primitive data type is 2 bytes unlike C which is 1 byte?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
Java 最初设计时,预计任何 Unicode 字符都可以容纳在 2 个字节(16 位)中,因此
char
和Character
也是相应设计的。事实上,一个 Unicode 字符现在最多需要 4 个字节。因此,UTF-16(Java 内部编码)要求增补字符使用 2 个代码单元。基本多语言平面中的字符(最常见的)仍然使用 1。每个代码单元使用一个 Javachar
。这篇 Sun 文章对此进行了很好的解释。When Java was originally designed, it was anticipated that any Unicode character would fit in 2 bytes (16 bits), so
char
andCharacter
were designed accordingly. In fact, a Unicode character can now require up to 4 bytes. Thus, UTF-16, the internal Java encoding, requires supplementary characters use 2 code units. Characters in the Basic Multilingual Plane (the most common ones) still use 1. A Javachar
is used for each code unit. This Sun article explains it well.Java中的
char
是UTF-16编码的,每个字符至少需要16位的存储空间。char
in Java is UTF-16 encoded, which requires a minimum of 16-bits of storage for each character.在 Java 中,字符以 UTF-16 编码,使用 2 个字节,而普通字符则使用 2 个字节。 C 字符串或多或少只是一堆字节。当设计 C 时,使用 ASCII (仅涵盖英语语言字符集)被认为是足够的,而Java设计者已经考虑到了国际化。如果您想对 C 字符串使用 Unicode,则 UTF-8 编码是首选方式因为它以 ASCII 作为子集,并且不使用 0 字节(与 UTF-16 不同),该字节在 C 中用作字符串结束标记。在 Java 中,这样的字符串结束标记不是必需的string 在这里是一个复杂类型,具有明确的长度。
In Java, a character is encoded in UTF-16 which uses 2 bytes, while a normal C string is more or less just a bunch of bytes. When C was designed, using ASCII (which only covers the english language character set) was deemed sufficient, while the Java designers already accounted for internationalization. If you want to use Unicode with C strings, the UTF-8 encoding is the preferred way as it has ASCII as a subset and does not use the 0 byte (unlike UTF-16), which is used as a end-of-string marker in C. Such an end-of-string marker is not necessary in Java as a string is a complex type here, with an explicit length.
以前的语言(如 C)使用 ASCII 表示法。
范围是 127 ,代表 127 个唯一符号和语言字符。
而JAVA有一个名为“国际化”的功能,即所有的人类可读字符(包括区域符号)也被添加到其中,而且范围也增加了,所以需要的内存也更多,统一所有这些符号的系统就是“标准Unicode系统”,这样
这种统一需要JAVA中的附加字节。
第一个字节保持原样,ASCII 字符范围为 127,与 C、C++ 中一样,但会附加统一字符。
因此,JAVA 中的 char 为 16 位,C 中的 char 为 8 位。
In previous languages like C ASCII notations are used.
And the range is 127 , for 127 unique symbols and language characters.
While JAVA comes with a feature called "INTERNATIONALIZATION", that is all the Human Readable characters(Including Regional symbols) are also added into it , and the range is also increased , so more the memory required , the system to unify all these symbols is "Standard Unicode System", and so that
this Unification requires that additional byte in JAVA.
The first byte remains as it is and ASCII characters are ranged to 127 as in C,C++ but unified characters are than appended to them.
So 16-bits for char in JAVA and 8-bits for char in C.
Java™ 教程:
Java™ Tutorials:
Java 使用 UNICODE(通用代码)表示形式,它接受世界上所有的语言格式。
其中 1 个字节保留用于 ASCII 和 ASCII。剩余1字节可以接受任何其他语言=> char 为 2 个字节
,而 C/C++ 仅使用 ASCII 表示 => 1 个字节用于字符
Java uses UNICODE (Universal Code) representation which accepts all the language formats in the world.
In this 1 byte is reserved for ASCII & remaining 1 byte can accept any other language => 2byte for char
while C/C++ uses only ASCII Representation => 1 byte for char
Java作为一种国际化工具,它在不同的语言中工作,并且需要多于一个字节的空间,这就是为什么它在char中占用2字节的空间。
例如,中文无法处理一个字节的字符。
Java used as a internationalize so, its work in different languages and need to space more than one byte, that's why its take 2byte of space in char.
for eg the chinese language can't hanfle one byte of char.
正如我们所知,c 支持 ASCII,而 java 支持 Unicode,其中包含 3 个内容:
1-ASCII
2-扩展 ASCII
3-当地语言字符
ASCII 是 unicode 的子集。ASCII 仅支持英语,而 Unicode 支持跨国语言。否则 java 字符使用 UTF-16 编码,使用 2 个字节。出于所有原因,并且由于 Unicode 是 ASCII 的扩展版本,所以它使用 16 位而不是 8 位。
As we know c suppors ASCII where as java supports Unicode which contains 3 things that is
1-ASCII
2-extended ASCII
3-local language character
ASCII is a subset of unicode.ASCII supports only English language where as Unicode supports multinationals language.otherwise java character is encoded within UTF-16 which uses 2 byte.for all of the reason and as the Unicode is the extended version of ASCII ,so it uses 16 bit insted of 8 bit.