整数的大小?
这和我昨天读到的一个问题有关: 如何确定整数需要多少字节?
无论如何,我有疑问的部分是:
我正在寻找最有效的方法来计算存储整数所需的最小字节数而不丢失精度。
例如
整数:10 = 1 字节
整数:257 = 2 字节
我的问题是,为什么 10 需要 1 个字节,为什么 257 需要 2 个字节?据我了解,您可以将10表示为1010,即4位,将257表示为100000001,即9位。跟字长有关系吗?是不是不能只有 4 位,而是需要整个字节,并且不能只有 9 位,需要整个 2 个字节?
This has to do with a question I read yesterday:
How to determine how many bytes an integer needs?
Anyway, the part that I have a question about is this:
I'm looking for the most efficient way to calculate the minimum number of bytes needed to store an integer without losing precision.
e.g.
int: 10 = 1 byte
int: 257 = 2 bytes
My question is, why does 10 require 1 byte, and why does 257 require 2? From what I understand, you can represent 10 as 1010, which is 4 bits, and 257 as 100000001, which is 9 bits. Does it have to do with word size? Is it that you can't have just 4 bits, but you need the whole byte and you can't just have 9 bits, you need the whole 2 bytes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
没错,字节的大小为每个 8 位1,并且通常无法细分它们。
That's right, bytes come in sizes of 8 bits each1, and you usually can't subdivide them.
呵呵,是的,每个字节都有一个地址,所以不能使用少于一个的地址。
事实上,使用小于 4 或 8 的数字有点困难,因为访问未对齐的标量很慢,因此在考虑缓存块时,语言处理器倾向于将可寻址对象对齐到 4、8 甚至 16 的倍数。实际的数据总线可能等于寄存器宽度,因此如果一个对象没有如此对齐(通常是 32 或 64 位),那么实际上两个对象需要由 CPU 捕获和组合。这很慢,因此编译器会对此进行防范。
有时,甚至会添加更多的对齐方式。
通常,单个对象声明将获得 4 或 8 字节对齐,但函数、模块(链接器输入文件)或其他大型对象可能会获得 16 或 32 字节对齐,因为使用部分缓存块往往会浪费未使用的部分缓存块的性能,如今缓存性能至关重要。
Heh, yes, each byte has an address and so you can't use less than one.
In fact, it's a bit difficult to use less than 4 or 8, because access to unaligned scalars is slow and so language processors tend to align addressable objects to multiples of 4, 8, or even 16 when concerned about cache blocks. The actual data bus is likely to equal the register width, so if an object isn't so aligned (32 or 64 bits, generally) then really two objects need to be snagged and combined by the CPU. That's slow and so the compiler guards against it.
Sometimes, even more alignment is added.
Typical, an individual object declaration will get a 4- or 8- byte alignment, but a function, module (linker input file), or other large object may get 16 or 32, because using a partial cache block tends to waste the unused section of the cache block, and cache performance is critical these days.
内存以字节为单位分配,9 个字节当然需要字节的第二个块来容纳第 9 位。
memory is allocated in bytes and 9 byte will of course need the second block of the byte to accomodate the 9th bit.
不难想出用更少的字节或位数来表示小数字的方案。例如,UTF-8 是一种将 Unicode 代码点(最多 22 位)表示为 1、2 或 3 个字节序列的方法,确保 0 到 127 范围内的代码点占用 1 个字节。
但这些方案往往有一个缺点,即与没有编码的情况相比,较大的数字往往需要更多的位来表示。此外,您还要权衡表示数字所需的位数与对数字进行编码和解码的额外处理器时间。
理论上它不会/他们不会。但实际上,计算机主要设计用于处理 32 位字块。在字节级别上对内存进行寻址,并对可变大小的数字表示进行算术运算将会慢很多。
此外,内存很便宜,因此对于大多数应用程序来说,根本没有足够的回报来证明尝试将“浪费”减少到字粒度以下。
It is not hard to come up with schemes that represent small numbers in a reduced number of bytes or bits. For example, UTF-8 is a way to represent Unicode code points (up to 22 bits) as 1, 2 or 3 byte sequences in a way that ensure code points in the range 0 to 127 occupy 1 byte.
But these schemes tend to have the downside that larger numbers tend to take MORE bits to represent than if you hadn't encoded them. And besides, you are trading off the number of bits needed to represent the numbers against the extra processor time of encoding and decoding the numbers.
Theoretically it doesn't / they don't. But in practice, computers are primarily designed to deal with chunks of 32-bit words. Addressing memory on a byte level, and doing arithmetic on variable sized number representation is going to be a LOT slower.
Besides, memory is cheap, so for most applications it is simply not enough payback to justify trying to reduce "wastage" below the word granularity.