c 中的短短整数?
我正试图从我的记忆中挤出尽可能多的东西。 我有一个 4.9999995e13 整数矩阵,但它们只需要为 true 或 false - 基本上我只需要为每个整数分配一位存储空间。
我知道 C 中没有单个位类型(也许有人可以向我解释为什么),而且我也知道如果存在 short Short int
,它将是 1 个字节,与 char 相同。然而,C 中的所有逻辑运算都返回整数(以及一些其他函数)。
所以我的问题是:
- 是否有某种方法可以使
short Short int
存在? - 如果我改用
char
,性能是否会因为必须转换为int
而降低? - 我还缺少另一种方式吗?
以防万一它是相关的,我正在使用 GCC for C99 进行编译。
编辑我刚刚在这个维基百科页面上看到有是 _Bool
类型,这实际上是标准的吗?
I'm trying to squeeze as much out of my memory as possible.
I have a matrix of 4.9999995e13
ints but they only need to be true or false - basically I only need one bit of storage for each of these ints.
I understand that there are no single bit types in C (maybe someone can explain why, to me), and I also know that if a short short int
existed it would be 1 byte, same as char. However all of the logical operations in C return ints (as well as a few other functions).
So my questions are:
- Is there some way of making a
short short int
exist? - If I was to use
char
instead, would I have performance decrease because of all the casting toint
that would have to be done? - Is there another way that I'm missing?
Just in-case it's relevant, I am compiling with GCC for C99.
EDIT I've just seen on this wikipedia page that there is a _Bool
type, is this actually standard?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
_Bool
类型是最新版本的 C 中的标准类型,但这仍然不是您想要的,因为_Bool
仍然占用至少一个字节(与_Bool
一样)代码>字符,根据定义)。不,如果您想要那么多布尔位,您需要将它们打包到 位字段 或 位数组。 C 中的位域没有标准数据类型,因此您还必须编写自己的宏或函数来获取特定偏移处的位。我还希望您能够在具有充足 RAM 的 64 位计算机上运行此程序,否则您将很快耗尽内存。
The
_Bool
type is standard in the most recent version of C, but that's still not what you want, because a_Bool
still takes up at least one byte (as does achar
, by definition).No, if you want that many boolean bits you need to pack them into a bitfield or bit array. There is no standard datatype for bitfields in C, so you're also going to have to write your own macros or functions for getting the bit at a particular offset. I also hope that you're going to run this on a 64-bit machine with plenty of RAM, otherwise you're going to run out of memory and fast.
您拥有大约 50 太比特的数据。您想将它们一次性全部放入 RAM 中吗?为了保存一位信息而使用多于一位的 RAM 是完全疯狂的,即使这样,你的计算机也必须有这个星球上最大的超级计算机的大小。忘记位打包的性能。你将不得不担心完全不同的事情。
You have about 50 terabits of data. Do you want to fit them all in RAM at once? It woulld be totally insane to use more than one bit of RAM in orrder to keep one bit of information, and even then your computer would have to be about the size of the largest supercomputer on this planet. Forget performance of bit-packing. You will have to worry about totally different things.
你想要的是一个位图(或者维基百科所说的位数组)。
并且不存在
short Short int
这样的东西,它只是一个char
,它是 C 中最小的整数存储类。使用这种方法时可能会产生一些性能开销,但不是因为隐式转换为整数,而是因为操作位图比直接操作数组成员更棘手。
一个小例子可能有助于说明:
使用普通整数矩阵:
使用位图:
What you want is a bitmap (or bit array as Wikipedia calls it).
And there is no such thing as a
short short int
, that's just achar
which is the smallest integer storage class in C.There might be some performance overhead when using this approach, but not because of implicit casts to ints, but rather because manipulating a bitmap is more tricky than directly manipulating array members.
A small example might help to illustrate:
Using a normal integer matrix:
With a bitmap:
5e13 大约需要 5.6 TB 的存储空间,您只需要表示您的位字段。可能有更好的方法来处理您的问题。
5e13 that's about 5.6 terabytes of storage you would need only to represent your bitfield. There's probably a better way to handle your problem.
也许您可以使用 ANSI C 中可用的位字段结构的一些明智实现。
像这样:
然后,您可以创建一些快速函数(可能是宏)来获取和设置此矩阵中的元素。不过,我还没有实施过这样的事情。
Maybe you could use some wise implementation of the bit field structs available in ANSI C.
Something like this:
Then, you could make some fast functions (maybe macros) to get and set elements in this matrix. I haven't ever implemented something like this, though.
C99
stdbool.h
允许使用bool
。然而这里你的问题是 4.9999995e13/8 或多或少会给出 6.2500e+12 ($10^9$ 是 Gbyte,$10^12$ 是 Tbyte),所以你需要超过 6 TB 的实际 + 虚拟内存(要幸运的)。这表明您还做错了其他事情。您需要将问题“扩展”为可以使用更少内存处理的子问题。C99
stdbool.h
allows the use ofbool
. However here your problem is that 4.9999995e13/8 would give more or less 6.2500e+12 ($10^9$ are Gbyte, $10^12$ are Tbyte), so you need more than 6 Tbytes of real + virtual memory (to be lucky). This suggests you are doing something else wrong. You need to "scale" your problem in subproblems you can handle using less memory.正如其他人所建议的,您可能应该使用位字段。
此外,如果您只是使用真/假值,并且其中一个值比另一个值不太常见,请考虑使用隐式编码。您可以使用地图数据结构轻松完成此任务。当您使用图形时,如果您的图形非常稀疏,这将为您节省大量内存。如果将此与上面的位打包技术结合起来,您甚至可以将其全部放入 RAM 中。不过,必须非常聪明地处理索引。
如果您不关心处理过程中的性能损失(即,如果您更担心存储它而不是处理它),您可以做的另一件事是通过压缩运行结构块中的算法。有一个用于 bzip2 的 C 库,它可能会为您节省 90% 或更多的费用。缺点是这会花费(非常!)很长的时间。您可能会从动态马尔可夫压缩 (DMC) 等按位压缩器中获得类似的性能,而且速度要快得多。
As other people have suggested, you should probably use a bitfield.
In addition though, if you're just using true/false values, and one of the values is much less common than the other, consider using an implicit coding. You can accomplish this easily with a map data structure. As you're doing work with graphs, this will save you an enormous amount of memory if your graph is at all sparse. If you combine this with the bit packing techniques above, you might even fit it all in RAM. Have to be pretty clever about the indexing though.
The other thing you could do, if you don't care about taking a performance hit during processing (i.e. if you're more worried about storing it than processing it), is run the structure through a compression algorithm in blocks. There's a C library for bzip2 which might save you 90% or more on something like that. Drawbacks are that this would take a (very!) long time. You might get comparable performance out of a bitwise compressor like Dynamic Markov Compression (DMC) on this, and those are much faster.
如果这是真的,那么您就不会浪费 8 位来存储 1 位数据。您会使用位字段。
如果您了解矩阵的内容类型,那么您可以使用其他优化。例如,如果您知道矩阵的绝大多数通常设置为零,那么您可以仅存储设置为 1 的元素的 x,y 对。
如果没有,那么 4.9999995e13 将占用大约 6 TB RAM!
If this were true, then you would not waste 8 bits to store 1 bit worth of data. You'd use a bitfield.
If you know anything about the sort of contents the matrix has, then you can use other optimizations. For example, if you know that the vast majority of the matrix is usually set to zero, then you can store only the x,y pairs of the elements set to one.
If not, then 4.9999995e13 will take about 6 TB of RAM!