为什么Java char原语占用2字节内存?

发布于 2024-09-27 20:49:23 字数 61 浏览 11 评论 0原文

Java char 原始数据类型是 2 个字节,而 C 是 1 个字节,有什么原因吗?

谢谢

Is there any reason why Java char primitive data type is 2 bytes unlike C which is 1 byte?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

遇见了你 2024-10-04 20:49:23

Java 最初设计时,预计任何 Unicode 字符都可以容纳在 2 个字节(16 位)中,因此 charCharacter 也是相应设计的。事实上,一个 Unicode 字符现在最多需要 4 个字节。因此,UTF-16(Java 内部编码)要求增补字符使用 2 个代码单元。基本多语言平面中的字符(最常见的)仍然使用 1。每个代码单元使用一个 Java char。这篇 Sun 文章对此进行了很好的解释。

When Java was originally designed, it was anticipated that any Unicode character would fit in 2 bytes (16 bits), so char and Character were designed accordingly. In fact, a Unicode character can now require up to 4 bytes. Thus, UTF-16, the internal Java encoding, requires supplementary characters use 2 code units. Characters in the Basic Multilingual Plane (the most common ones) still use 1. A Java char is used for each code unit. This Sun article explains it well.

听风吹 2024-10-04 20:49:23

Java中的char是UTF-16编码的,每个字符至少需要16位的存储空间。

char in Java is UTF-16 encoded, which requires a minimum of 16-bits of storage for each character.

云淡风轻 2024-10-04 20:49:23

在 Java 中,字符以 UTF-16 编码,使用 2 个字节,而普通字符则使用 2 个字节。 C 字符串或多或少只是一堆字节。当设计 C 时,使用 ASCII (仅涵盖英语语言字符集)被认为是足够的,而Java设计者已经考虑到了国际化。如果您想对 C 字符串使用 Unicode,则 UTF-8 编码是首选方式因为它以 ASCII 作为子集,并且不使用 0 字节(与 UTF-16 不同),该字节在 C 中用作字符串结束标记。在 Java 中,这样的字符串结束标记不是必需的string 在这里是一个复杂类型,具有明确的长度。

In Java, a character is encoded in UTF-16 which uses 2 bytes, while a normal C string is more or less just a bunch of bytes. When C was designed, using ASCII (which only covers the english language character set) was deemed sufficient, while the Java designers already accounted for internationalization. If you want to use Unicode with C strings, the UTF-8 encoding is the preferred way as it has ASCII as a subset and does not use the 0 byte (unlike UTF-16), which is used as a end-of-string marker in C. Such an end-of-string marker is not necessary in Java as a string is a complex type here, with an explicit length.

最美的太阳 2024-10-04 20:49:23

以前的语言(如 C)使用 ASCII 表示法。
范围是 127 ,代表 127 个唯一符号语言字符

而JAVA有一个名为“国际化”的功能,即所有的人类可读字符(包括区域符号)也被添加到其中,而且范围也增加了,所以需要的内存也更多,统一所有这些符号的系统就是“标准Unicode系统”,这样
这种统一需要JAVA中的附加字节。

第一个字节保持原样,ASCII 字符范围为 127,与 C、C++ 中一样,但会附加统一字符。

因此,JAVA 中的 char 为 16 位,C 中的 char 为 8 位。

In previous languages like C ASCII notations are used.
And the range is 127 , for 127 unique symbols and language characters.

While JAVA comes with a feature called "INTERNATIONALIZATION", that is all the Human Readable characters(Including Regional symbols) are also added into it , and the range is also increased , so more the memory required , the system to unify all these symbols is "Standard Unicode System", and so that
this Unification requires that additional byte in JAVA.

The first byte remains as it is and ASCII characters are ranged to 127 as in C,C++ but unified characters are than appended to them.

So 16-bits for char in JAVA and 8-bits for char in C.

秉烛思 2024-10-04 20:49:23

Java™ 教程

char 数据类型是单个 16 位 Unicode 字符。它的最小值为“\u0000”(或 0),最大值为“\uffff”(或 65,535(含))。

Java™ Tutorials:

The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

轮廓§ 2024-10-04 20:49:23

Java 使用 UNICODE(通用代码)表示形式,它接受世界上所有的语言格式。

     ASCII  American Standard Code for Information Exchange

     ISO 8859-1 for western European Countries

     KOI-8 for Russian

     GB10830 & BIG-5 for Chinese
         

其中 1 个字节保留用于 ASCII 和 ASCII。剩余1字节可以接受任何其他语言=> char 为 2 个字节

,而 C/C++ 仅使用 ASCII 表示 => 1 个字节用于字符

Java uses UNICODE (Universal Code) representation which accepts all the language formats in the world.

     ASCII  American Standard Code for Information Exchange

     ISO 8859-1 for western European Countries

     KOI-8 for Russian

     GB10830 & BIG-5 for Chinese
         

In this 1 byte is reserved for ASCII & remaining 1 byte can accept any other language => 2byte for char

while C/C++ uses only ASCII Representation => 1 byte for char

德意的啸 2024-10-04 20:49:23

Java作为一种国际化工具,它在不同的语言中工作,并且需要多于一个字节的空间,这就是为什么它在char中占用2字节的空间。
例如,中文无法处理一个字节的字符。

Java used as a internationalize so, its work in different languages and need to space more than one byte, that's why its take 2byte of space in char.
for eg the chinese language can't hanfle one byte of char.

紙鸢 2024-10-04 20:49:23

正如我们所知,c 支持 ASCII,而 java 支持 Unicode,其中包含 3 个内容:
1-ASCII
2-扩展 ASCII
3-当地语言字符
ASCII 是 unicode 的子集。ASCII 仅支持英语,而 Unicode 支持跨国语言。否则 java 字符使用 UTF-16 编码,使用 2 个字节。出于所有原因,并且由于 Unicode 是 ASCII 的扩展版本,所以它使用 16 位而不是 8 位。

As we know c suppors ASCII where as java supports Unicode which contains 3 things that is
1-ASCII
2-extended ASCII
3-local language character
ASCII is a subset of unicode.ASCII supports only English language where as Unicode supports multinationals language.otherwise java character is encoded within UTF-16 which uses 2 byte.for all of the reason and as the Unicode is the extended version of ASCII ,so it uses 16 bit insted of 8 bit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文