在 c++ 中定义最小可能的数据类型可以保存六个值

发布于 2024-07-17 02:23:17 字数 748 浏览 4 评论 0原文

我想定义自己的数据类型，它可以保存六个可能值中的一个，以便了解有关 C++ 中内存管理的更多信息。在数字中，我希望能够保存 0 到 5。二进制，三位就足够了（101=5），尽管有些（6 和 7）不会被使用。数据类型还应该消耗尽可能少的内存。

我不确定如何实现这一点。首先，我尝试了一个为所有字段定义了值的枚举。据我所知，这些值是十六进制的，所以一个“hexbit”应该允许我存储 0 到 15。但是将它与 char （使用 sizeof）进行比较，它表明它的大小是 char 的 4 倍，并且如果我没有记错的话，一个 char 可以容纳 0 到 255。

#include <iostream>

enum Foo
{
    a = 0x0, 
    b = 0x1,
    c = 0x2,
    d = 0x3,
    e = 0x4,
    f = 0x5,
};

int main()
{
    Foo myfoo = a;
    char mychar = 'a';

    std::cout << sizeof(myfoo); // prints 4
    std::cout << sizeof(mychar); // prints 1

    return 1;
}

我显然误解了一些东西，但看不到什么，所以我转向SO。 :)

另外，在写这篇文章时，我意识到我显然缺乏某些部分的词汇。我已将这篇文章设为社区维基，请对其进行编辑，以便我可以学习所有内容的正确单词。

原文

I want to define my own datatype that can hold a single one of six possible values in order to learn more about memory management in c++. In numbers, I want to be able to hold 0 through 5. Binary, It would suffice with three bits (101=5), although some (6 and 7) wont be used. The datatype should also consume as little memory as possible.

Im not sure on how to accomplish this. First, I tried an enum with defined values for all the fields. As far as I know, the values are in hex there, so one "hexbit" should allow me to store 0 through 15. But comparing it to a char (with sizeof) it stated that its 4 times the size of a char, and a char holds 0 through 255 if Im not misstaken.

#include <iostream>

enum Foo
{
    a = 0x0, 
    b = 0x1,
    c = 0x2,
    d = 0x3,
    e = 0x4,
    f = 0x5,
};

int main()
{
    Foo myfoo = a;
    char mychar = 'a';

    std::cout << sizeof(myfoo); // prints 4
    std::cout << sizeof(mychar); // prints 1

    return 1;
}

Ive clearly misunderstood something, but fail to see what, so I turn to SO. :)

Also, when writing this post I realised that I clearly lack some parts of the vocabulary. Ive made this post a community wiki, please edit it so I can learn the correct words for everything.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒浓于脸红 2024-07-24 02:23:17

char 是最小的可能类型。

如果您碰巧知道在一个地方需要几个这样的 3 位值，您可以使用带有位域语法的结构：

struct foo {
  unsigned int val1:3;
  unsigned int val2:3;
};

因此在一个字节内获得其中的 2 个值。理论上，您可以将 10 个这样的字段打包到一个 32 位“int”值中。

A char is the smallest possible type.

If you happen to know that you need several such 3 bit values in a single place you get use a structure with bitfield syntax:

struct foo {
  unsigned int val1:3;
  unsigned int val2:3;
};

and hence get 2 of them within one byte. In theory you could pack 10 such fields into a 32-bit "int" value.

回复收藏 0 原文

眼眸印温柔 2024-07-24 02:23:17

C++ 0x 将包含强类型枚举，您可以在其中指定基础数据类型（在你的示例中 char)，但当前的 C++ 不支持这一点。标准并不清楚这里 char 的使用（示例是 int、short 和 long），但它们提到了底层整型，并且也包括 char。

截至今天，尼尔巴特沃斯为您的问题创建一个类的答案似乎是最优雅的，因为如果您想要值的符号名称，您甚至可以将其扩展为包含嵌套枚举。

回复收藏 0 原文

岁月蹉跎了容颜 2024-07-24 02:23:17

您可以存储小于 8 或 32 位的值。您只需将它们打包到结构（或类）中并使用位字段< /a>.

例如：

struct example
{
    unsigned int a : 3; //<Three bits, can be 0 through 7.
            bool b : 1; //<One bit, the stores 0 or 1.
    unsigned int c : 10; //<Ten bits, can be 0 through 1023.
    unsigned int d : 19; //<19 bits, can be 0 through 524287.
}

在大多数情况下，编译器会将结构的总大小四舍五入为 32 位平台上的 32 位。另一个问题是，正如您所指出的，您的值可能没有二次幂的范围。这会造成空间浪费。如果您将整个结构体视为一个数字，您会发现如果您的输入范围不是 2 的所有幂，则无法设置这些值。

您可能会发现有趣的另一个功能是联合。它们像结构体一样工作，但共享内存。因此，如果您写入一个字段，它会覆盖其他字段。

现在，如果空间确实很紧张，并且想要将每个位推到最大，那么有一种简单的编码方法。假设您想要存储 3 个数字，每个数字可以从 0 到 5。位字段是浪费的，因为如果您每个使用 3 位，您将浪费一些值（即您永远无法设置 6 或 7，即使您有存放它们的空间）。那么，让我们举个例子：

//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;

为了最有效地将它们打包在一起，我们应该以 6 为基数进行思考（因为每个值都从 0-5）。因此，打包到尽可能小的空间中是：

//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);

看看我们用六进制编码是什么样子？每个数字都乘以它的位置，例如 6^n，其中 n 是位置（从 0 开始）。

然后进行解码：

const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;

当您必须对条形码或字母数字序列中的某些字段进行编码以供人工输入时，这些方案非常方便。只需说出这几个部分即可产生巨大差异。此外，这些字段不必都具有相同的范围。如果一个字段从 0 到 7，您可以在适当的位置使用 8 而不是 6。不要求所有字段具有相同的范围。

You can store values smaller than 8 or 32 bits. You just need to pack them into a struct (or class) and use bit fields.

For example:

struct example
{
    unsigned int a : 3; //<Three bits, can be 0 through 7.
            bool b : 1; //<One bit, the stores 0 or 1.
    unsigned int c : 10; //<Ten bits, can be 0 through 1023.
    unsigned int d : 19; //<19 bits, can be 0 through 524287.
}

In most cases, your compiler will round up the total size of your structure to 32 bits on a 32 bit platform. The other problem is, like you pointed out, that your values may not have a power of two range. This will make for wasted space. If you read the entire struct as one number, you will find values that will be impossible to set, if your input ranges aren't all powers of 2.

Another feature you may find interesting is a union. They work like a struct, but share memory. So if you write to one field it overwrites the others.

Now, if you are really tight for space, and you want to push each bit to the maximum, there is a simple encoding method. Let's say you want to store 3 numbers, each can be from 0 to 5. Bit fields are wasteful, because if you use 3 bits each, you'll waste some values (i.e. you could never set 6 or 7, even though you have room to store them). So, lets do an example:

//Here are three example values, each can be from 0 to 5:
const int one = 3, two = 4, three = 5;

To pack them together most efficiently, we should think in base 6 (since each value is from 0-5). So packed into the smallest possible space is:

//This packs all the values into one int, from 0 - 215.
//pack could be any value from 0 - 215. There are no 'wasted' numbers.
int pack = one + (6 * two) + (6 * 6 * three);

See how it looks like we're encoding in base six? Each number is multiplied by it's place like 6^n, where n is the place (starting at 0).

Then to decode:

const int one = pack % 6;
pack /= 6;
const int two = pack % 6;
pack /= 6;
const int three = pack;

Theses schemes are extremely handy when you have to encode some fields in a bar code or in an alpha numeric sequence for human typing. Just saying those few partial bits can make a huge difference. Also, the fields don't all have to have the same range. If one field is from 0 through 7, you'd use 8 instead of 6 in the proper place. There is no requirement that all fields have the same range.

回复收藏 0 原文

娇纵 2024-07-24 02:23:17

C++ 不表示小于字节的内存单位。如果你一次只生产一个，那就是你能做的最好的事情了。你自己的例子效果很好。如果您只需要获取一些，您可以按照 Alnitak 的建议使用位字段。如果您打算一次分配一个，那么情况会更糟。大多数架构分配页大小单位，常见的是 16 字节。

另一种选择可能是包装 std::bitset 来执行您的命令。如果您需要许多这样的值，这将浪费很少的空间，每 8 只需要大约 1 位。

如果您将问题视为一个数字，以 6 为基数表示，并将该数字转换为 2 基数，可能使用 Unlimited精度整数（例如 GMP），您根本不会浪费任何位。

当然，这假设您的值具有均匀的随机分布。如果它们遵循不同的发行版，那么最好的选择是使用 gzip 等对第一个示例进行常规压缩。

回复收藏 0 原文

走过海棠暮 2024-07-24 02:23:17

您可以使用的最小大小 - 1 字节。

但是，如果您使用枚举值组（写入文件或存储在容器中，..），您可以打包该组 - 每个值 3 位。

回复收藏 0 原文

锦欢 2024-07-24 02:23:17

您不必枚举枚举的值：

enum Foo
{
    a, 
    b,
    c,
    d,
    e,
    f,
};

Foo myfoo = a;

这里 Foo 是 int 的别名，在您的机器上占用 4 个字节。

最小的类型是char，它被定义为目标机器上最小的可寻址数据。 CHAR_BIT 宏生成 char 中的位数，并在 limits.h 中定义。

[编辑]

请注意，一般来说您不应该问自己这样的问题。如果足够的话，请始终使用[unsigned] int，除非您分配大量内存（例如int[100*1024] vs char[100*1024 ]，但请考虑使用 std::vector 代替）。

You don't have to enumerate the values of the enum:

enum Foo
{
    a, 
    b,
    c,
    d,
    e,
    f,
};

Foo myfoo = a;

Here Foo is an alias of int, which on your machine takes 4 bytes.

The smallest type is char, which is defined as the smallest addressable data on the target machine. The CHAR_BIT macro yields the number of bits in a char and is defined in limits.h.

[Edit]

Note that generally speaking you shouldn't ask yourself such questions. Always use [unsigned] int if it's sufficient, except when you allocate quite a lot of memory (e.g. int[100*1024] vs char[100*1024], but consider using std::vector instead).

回复收藏 0 原文

望她远 2024-07-24 02:23:17

枚举的大小被定义为与 int 相同。但根据您的编译器，您可能可以选择创建更小的枚举。例如，在 GCC 中，您可以声明：

enum Foo {
    a, b, c, d, e, f
}
__attribute__((__packed__));

Now, sizeof(Foo) == 1。

The size of an enumeration is defined to be the same of an int. But depending on your compiler, you may have the option of creating a smaller enum. For example, in GCC, you may declare:

enum Foo {
    a, b, c, d, e, f
}
__attribute__((__packed__));

Now, sizeof(Foo) == 1.

回复收藏 0 原文

白日梦 2024-07-24 02:23:17

最好的解决方案是创建您自己的使用 char 实现的类型。这应该有 sizeof(MyType) == 1，尽管不能保证这一点。

#include <iostream>
using namespace std;

class MyType {

    public:

        MyType( int a ) : val( a ) {
            if ( val < 0 || val > 6 ) {
                throw( "bad value" );
            }
        }

        int Value() const {
            return val;
        }

    private:

        char val;
};

int main() {

    MyType v( 2 );
    cout << sizeof(v) << endl;
    cout << v.Value() << endl;
}

The best solution is to create your own type implemented using a char. This should have sizeof(MyType) == 1, though this is not guaranteed.

#include <iostream>
using namespace std;

class MyType {

    public:

        MyType( int a ) : val( a ) {
            if ( val < 0 || val > 6 ) {
                throw( "bad value" );
            }
        }

        int Value() const {
            return val;
        }

    private:

        char val;
};

int main() {

    MyType v( 2 );
    cout << sizeof(v) << endl;
    cout << v.Value() << endl;
}

回复收藏 0 原文

海拔太高太耀眼 2024-07-24 02:23:17

由于体系结构不支持位级操作（因此每个操作需要多个处理器指令），因此将奇怪大小的值打包到位字段中可能会导致相当大的性能损失。在实现这样的类型之前，问问自己是否真的有必要使用尽可能少的空间，或者您是否犯了编程的大罪，即过早优化。最多，我会将值封装在一个类中，如果您确实出于某种原因确实需要压缩每个最后一个字节，则该类的后备存储可以透明地更改。

回复收藏 0 原文