计算机体系结构中的字符大小是多少?

发布于 2025-01-10 14:07:15 字数 496 浏览 2 评论 0 原文

这篇关于字长的维基百科文章提供了不同计算机体系结构中字长的表格。它有不同的列,如“整数大小”、“浮点大小”等。我想,整数大小是ALU参数的大小,浮点大小是大小FPU 的参数中,地址解析单位是单个地址表示的位数/位数/位数。 字长是处理器使用的数据的自然大小(这仍然有些令人困惑)。

但我想知道表中的 char size 列代表什么?这是理论上可能的最小物体尺寸吗?这是可能的最小对齐吗?对字符大小的数据定义的常见操作是什么?在 x86、x86-64 中,ARM 架构字符大小为 8 位,与最小整数大小相同。但在其他一些架构上,字符大小为 5/6/7 位,这与该架构中的整数大小非常不同。

This Wikipedia article on word sizes provides a table of word sizes in different computer architectures. It has different columns like 'integer size', 'floating point size' etc. I suppose, integer size is the size of arguments for ALU, floating point size is the size of arguments for FPU, unit of address resolution is the number of bits/trits/digits represented by a single address. word size is given as the natural size of data used by the processor (which is still confusing somewhat).

But I'm wondering what does the char size column in the table represents? Is it the smallest object size theoretically possible? Is it the smallest alignment possible? What are the common operations defined over data of char size? In x86, x86-64, ARM architectures char size is 8 bits, which is same as the smallest integer size. But on some other architectures, char size is 5/6/7 bits which is very different from the integer size in that architecture.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我不是你的备胎 2025-01-17 14:07:15

在现代 C 语言中,保证 char 可以独立修改,而不会干扰周围的数据。通常选择最窄的加载/存储指令的宽度。因此,在 Alpha 或字可寻址 CPU 上,char 必须是字大小,否则每个 char 存储都必须编译为包含字的原子 RMW。 (而不是在 C11 为该语言引入线程感知内存模型之前,像某些早期编译器实际使用的那样便宜得多的-原子 RMW。)请参阅 现代 x86 硬件不能将单个字节存储到内存中吗?(一般涵盖现代 ISA)和 C++ 内存模型和 char 数组上的竞争条件,以满足 C++11 和 C11 对 char 的要求。

但是历史机器中的维基百科单词和字符大小表显然是<考虑到尺寸,并非如此。 (例如,在某些可字寻址的机器上小于一个字,我很确定)。

它是关于软件(和终端等字符 I/O 硬件)如何将机器本机字符编码的多个字符(例如 ASCII、EBCDIC 或更早版本的子集)打包到机器字中。

与这段历史相比,Unicode 和可变长度字符编码(如 UTF-8 和 UTF-16)是最近的发明。 https://en.wikipedia.org/wiki/Character_encoding#History
许多系统使用的每个字符少于 8 位,例如 6 位(64 种独特的编码)对于大写和小写拉丁字母加上一些特殊字符和控制代码就足够了。

这些历史字符集促使编程语言选择使用或不使用某些特殊字符,因为它们是在具有特定字符集的系统上开发的。

历史上的机器确实做过类似将 3 个字符的文本打包成 18 位单词的事情。

您可能想在 https://retrocomputing.stackexchange.com/ 上搜索,甚至在阅读更多内容后在那里提问。

In modern C, a char is guaranteed to be independently modifiable, without disturbing surrounding data. It's usually chosen to be the width of the narrowest load/store instruction. So on Alpha or word-addressable CPUs, a char had to be the word size, or else every char store would have to compile to an atomic RMW on the containing word. (Rather than a much cheaper non-atomic RMW like some early compilers actually used, before C11 introduces a thread-aware memory model to the language.) See Can modern x86 hardware not store a single byte to memory? (which covers modern ISAs in general) and C++ memory model and race conditions on char arrays for the requirements C++11 and C11 place on char.

But that Wikipedia table of word and char sizes in historical machines is clearly not about that, given the sizes. (e.g. smaller than a word on some word-addressable machines, I'm pretty sure).

It's about how software (and character I/O hardware like terminals) packed multiple character of the machine's native character encoding (e.g. a subset of ASCII, EBCDIC, or something earlier) into machine words.

Unicode, and variable-length character encodings like UTF-8 and UTF-16, are recent inventions compared to that history. https://en.wikipedia.org/wiki/Character_encoding#History
Many systems used fewer than 8 bits per character, e.g. 6 (64 unique encodings) is enough for the upper and lower case Latin alphabet plus some special characters and control codes.

These historical character sets are what motivated some of the choices for programming languages to use certain special characters or not, because they were developed on systems that had a certain character set.

Historical machines really did do things like pack 3 characters of text into an 18-bit word.

You might want to search on https://retrocomputing.stackexchange.com/, or even ask a question there after doing some more reading.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文