在 c++ 中定义最小可能的数据类型 可以保存六个值
我想定义自己的数据类型,它可以保存六个可能值中的一个,以便了解有关 C++ 中内存管理的更多信息。 在数字中,我希望能够保存 0 到 5。二进制,三位就足够了(101=5),尽管有些(6 和 7)不会被使用。 数据类型还应该消耗尽可能少的内存。
我不确定如何实现这一点。 首先,我尝试了一个为所有字段定义了值的枚举。 据我所知,这些值是十六进制的,所以一个“hexbit”应该允许我存储 0 到 15。但是将它与 char (使用 sizeof)进行比较,它表明它的大小是 char 的 4 倍,并且如果我没有记错的话,一个 char 可以容纳 0 到 255。
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
我显然误解了一些东西,但看不到什么,所以我转向SO。 :)
另外,在写这篇文章时,我意识到我显然缺乏某些部分的词汇。 我已将这篇文章设为社区维基,请对其进行编辑,以便我可以学习所有内容的正确单词。
I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.
Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.
#include <iostream>
enum Foo
{
a = 0x0,
b = 0x1,
c = 0x2,
d = 0x3,
e = 0x4,
f = 0x5,
};
int main()
{
Foo myfoo = a;
char mychar = 'a';
std::cout << sizeof(myfoo); // prints 4
std::cout << sizeof(mychar); // prints 1
return 1;
}
Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)
Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
char
是最小的可能类型。如果您碰巧知道在一个地方需要几个这样的 3 位值,您可以使用带有位域语法的结构:
因此在一个字节内获得其中的 2 个值。 理论上,您可以将 10 个这样的字段打包到一个 32 位“int”值中。
A
char
is the smallest possible type.If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:
and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.
C++ 0x 将包含强类型枚举,您可以在其中指定基础数据类型(在你的示例中 char),但当前的 C++ 不支持这一点。 标准并不清楚这里 char 的使用(示例是 int、short 和 long),但它们提到了底层整型,并且也包括 char。
截至今天,尼尔巴特沃斯为您的问题创建一个类的答案似乎是最优雅的,因为如果您想要值的符号名称,您甚至可以将其扩展为包含嵌套枚举。
C++ 0x will contain Strongly typed enumerations where you can specify the underlying datatype (in your example char), but current C++ does not support this. The standard is not clear about the use of a char here (the examples are with int, short and long), but they mention the underlying integral type and that would include char as well.
As of today Neil Butterworth's answer to create a class for your problem seems the most elegant, as you can even extend it to contain a nested enumeration if you want symbolical names for the values.
您可以存储小于 8 或 32 位的值。 您只需将它们打包到结构(或类)中并使用 位字段< /a>.
例如:
在大多数情况下,编译器会将结构的总大小四舍五入为 32 位平台上的 32 位。 另一个问题是,正如您所指出的,您的值可能没有二次幂的范围。 这会造成空间浪费。 如果您将整个结构体视为一个数字,您会发现如果您的输入范围不是 2 的所有幂,则无法设置这些值。
您可能会发现有趣的另一个功能是 联合。 它们像结构体一样工作,但共享内存。 因此,如果您写入一个字段,它会覆盖其他字段。
现在,如果空间确实很紧张,并且想要将每个位推到最大,那么有一种简单的编码方法。 假设您想要存储 3 个数字,每个数字可以从 0 到 5。位字段是浪费的,因为如果您每个使用 3 位,您将浪费一些值(即您永远无法设置 6 或 7,即使您有存放它们的空间)。 那么,让我们举个例子:
为了最有效地将它们打包在一起,我们应该以 6 为基数进行思考(因为每个值都从 0-5)。 因此,打包到尽可能小的空间中是:
看看我们用六进制编码是什么样子? 每个数字都乘以它的位置,例如 6^n,其中 n 是位置(从 0 开始)。
然后进行解码:
当您必须对条形码或字母数字序列中的某些字段进行编码以供人工输入时,这些方案非常方便。 只需说出这几个部分即可产生巨大差异。 此外,这些字段不必都具有相同的范围。 如果一个字段从 0 到 7,您可以在适当的位置使用 8 而不是 6。 不要求所有字段具有相同的范围。
You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.
For example:
In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.
Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.
Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:
To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:
See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).
Then to decode:
Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.
C++ 不表示小于字节的内存单位。 如果你一次只生产一个,那就是你能做的最好的事情了。 你自己的例子效果很好。 如果您只需要获取一些,您可以按照 Alnitak 的建议使用位字段。 如果您打算一次分配一个,那么情况会更糟。 大多数架构分配页大小单位,常见的是 16 字节。
另一种选择可能是包装 std::bitset 来执行您的命令。 如果您需要许多这样的值,这将浪费很少的空间,每 8 只需要大约 1 位。
如果您将问题视为一个数字,以 6 为基数表示,并将该数字转换为 2 基数,可能使用 Unlimited精度整数(例如 GMP),您根本不会浪费任何位。
当然,这假设您的值具有均匀的随机分布。 如果它们遵循不同的发行版,那么最好的选择是使用 gzip 等对第一个示例进行常规压缩。
C++ does not express units of memory smaller than bytes. If you're producing them one at a time, That's the best you can do. Your own example works well. If you need to get just a few, You can use bit-fields as Alnitak suggests. If you're planning on allocating them one at a time, then you're even worse off. Most archetectures allocate page-size units, 16 bytes being common.
Another choice might be to wrap std::bitset to do your bidding. This will waste very little space, if you need many such values, only about 1 bit for every 8.
If you think about your problem as a number, expressed in base-6, and convert that number to base two, possibly using an Unlimited precision integer (for example GMP), you won't waste any bits at all.
This assumes, of course, that you're values have a uniform, random distribution. If they follow a different distribution, You're best bet will be general compression of the first example, with something like gzip.
您可以使用的最小大小 - 1 字节。
但是,如果您使用枚举值组(写入文件或存储在容器中,..),您可以打包该组 - 每个值 3 位。
Minimal size what you can use - 1 byte.
But if you use group of enum values ( writing in file or storing in container, ..), you can pack this group - 3 bits per value.
您不必枚举枚举的值:
这里
Foo
是int
的别名,在您的机器上占用 4 个字节。最小的类型是
char
,它被定义为目标机器上最小的可寻址数据。CHAR_BIT
宏生成char
中的位数,并在limits.h
中定义。[编辑]
请注意,一般来说您不应该问自己这样的问题。 如果足够的话,请始终使用
[unsigned] int
,除非您分配大量内存(例如int[100*1024]
vschar[100*1024 ]
,但请考虑使用std::vector
代替)。You don't have to enumerate the values of the enum:
Here
Foo
is an alias ofint
, which on your machine takes 4 bytes.The smallest type is
char
, which is defined as the smallest addressable data on the target machine. TheCHAR_BIT
macro yields the number of bits in achar
and is defined inlimits.h
.[Edit]
Note that generally speaking you shouldn't ask yourself such questions. Always use
[unsigned] int
if it's sufficient, except when you allocate quite a lot of memory (e.g.int[100*1024]
vschar[100*1024]
, but consider usingstd::vector
instead).枚举的大小被定义为与 int 相同。 但根据您的编译器,您可能可以选择创建更小的枚举。 例如,在 GCC 中,您可以声明:
Now, sizeof(Foo) == 1。
The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:
Now, sizeof(Foo) == 1.
最好的解决方案是创建您自己的使用 char 实现的类型。 这应该有 sizeof(MyType) == 1,尽管不能保证这一点。
The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.
由于体系结构不支持位级操作(因此每个操作需要多个处理器指令),因此将奇怪大小的值打包到位字段中可能会导致相当大的性能损失。 在实现这样的类型之前,问问自己是否真的有必要使用尽可能少的空间,或者您是否犯了编程的大罪,即过早优化。 最多,我会将值封装在一个类中,如果您确实出于某种原因确实需要压缩每个最后一个字节,则该类的后备存储可以透明地更改。
It is likely that packing oddly sized values into bitfields will incur a sizable performance penalty due to the architecture not supporting bit-level operations (thus requiring several processor instructions per operation). Before you implement such a type, ask yourself if it is really necessary to use as little space as possible, or if you are committing the cardinal sin of programming that is premature optimization. At most, I would encapsulate the value in a class whose backing store can be changed transparently if you really do need to squeeze every last byte for some reason.
您可以使用无符号字符。 可能将其 typedef 为 BYTE。 它只占用一个字节。
You can use an unsigned char. Probably typedef it into an BYTE. It will occupy only one byte.