如何存储向量?或文件中的位集,但按位?
How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
最简单的方法:采用连续的 8 个布尔值,将它们表示为单个字节,将该字节写入文件。这样可以节省很多空间。
在文件的开头,您可以写入要写入文件的布尔值数量;该数字将有助于从文件读取字节并将它们转换回布尔值!
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
如果您想要最支持转换为二进制的位集类,并且您的位集大于 unsigned long 的大小,那么最好使用的选项是 boost::dynamic_bitset。 (如果您关心节省空间,我认为它超过 32 位,甚至 64 位)。
从dynamic_bitset中,您可以使用to_block_range将位写入底层整数类型。您可以通过使用 from_block_range 或其 BlockInputIterator 中的构造函数或通过调用append() 来从块构造dynamic_bitset。
现在您已经拥有了原始格式(块)的字节,但仍然存在将其写入流并读回的问题。
您需要首先存储一些“标头”信息:您拥有的块数以及可能的字节序。或者您可以使用宏来转换为标准字节序(例如 ntohl,但理想情况下您将使用对最常见平台无操作的宏,因此如果这是小字节序,您可能希望以这种方式存储并仅转换为大端系统)。
(注意:我假设 boost::dynamic_bitset 标准地以相同的方式转换整数类型,无论底层字节序如何。他们的文档没有说明)。
要将二进制数字写入流,请使用 os.write( &data[0], sizeof(Block) * nBlocks ) 并使用 is.
read( &data[0] , sizeof(Block) * nBlocks )
其中数据被假定为vector
并且在读取之前您必须执行data.resize(nBlocks)
(不是reserve()
)。 (您还可以使用 istream_iterator 或 istreambuf_iterator 做奇怪的事情,但 resize() 可能更好)。If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use
os.write( &data[0], sizeof(Block) * nBlocks )
and to read use is.read( &data[0], sizeof(Block) * nBlocks )
where data is assumed to bevector<Block>
and before read you must dodata.resize(nBlocks)
(notreserve()
). (You can also do weird stuff withistream_iterator
oristreambuf_iterator
but resize() is probably better).这是使用两个函数的尝试,这两个函数将使用最少的字节数,而不压缩位集。
请注意,可能使用位集使用的内存部分作为字符数组的reinterpret_cast也可以工作,但它可能无法跨系统移植,因为您不知道位集的表示是什么(字节序?)
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
这个怎么样
How about this
一种方法可能是:
请注意,这假设您不关心位布局最终在内存中的位置,因为它不会对任何内容进行调整。但是,只要您还序列化实际存储的位数(以涵盖位数不是 CHAR_BITS 倍数的情况),您就可以反序列化与最初相同的位集或向量,如下所示。
(我对桶大小的计算并不满意,但现在是凌晨 1 点,我很难想到更优雅的东西)。
One way might be:
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
有两种选择:
花额外的英镑(或便士,更有可能)购买更大的磁盘。
编写一个例程,一次从位集中提取 8 位,将它们组合成字节,然后将它们写入输出流。
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.