如何存储向量?或文件中的位集,但按位?

发布于 2024-10-11 21:44:15 字数 186 浏览 5 评论 0原文

如何将位集数据写入文件?

第一个答案没有正确回答问题,因为它占用的空间比应有的空间多了 8 倍。

你会怎么做?我真的需要它来保存很多真/假值。

How to write bitset data to a file?

The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.

How would you do it ? I really need it to save a lot of true/false values.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

茶色山野 2024-10-18 21:44:15

最简单的方法:采用连续的 8 个布尔值,将它们表示为单个字节,将该字节写入文件。这样可以节省很多空间。

在文件的开头,您可以写入要写入文件的布尔值数量;该数字将有助于从文件读取字节并将它们转换回布尔值!

Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.

In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!

一场信仰旅途 2024-10-18 21:44:15

如果您想要最支持转换为二进制的位集类,并且您的位集大于 unsigned long 的大小,那么最好使用的选项是 boost::dynamic_bitset。 (如果您关心节省空间,我认为它超过 32 位,甚至 64 位)。

从dynamic_bitset中,您可以使用to_block_range将位写入底层整数类型。您可以通过使用 from_block_range 或其 BlockInputIterator 中的构造函数或通过调用append() 来从块构造dynamic_bitset。

现在您已经拥有了原始格式(块)的字节,但仍然存在将其写入流并读回的问题。

您需要首先存储一些“标头”信息:您拥有的块数以及可能的字节序。或者您可以使用宏来转换为标准字节序(例如 ntohl,但理想情况下您将使用对最常见平台无操作的宏,因此如果这是小字节序,您可能希望以这种方式存储并仅转换为大端系统)。

(注意:我假设 boost::dynamic_bitset 标准地以相同的方式转换整数类型,无论底层字节序如何。他们的文档没有说明)。

要将二进制数字写入流,请使用 os.write( &data[0], sizeof(Block) * nBlocks ) 并使用 is.read( &data[0] , sizeof(Block) * nBlocks ) 其中数据被假定为 vector 并且在读取之前您必须执行 data.resize(nBlocks) (不是reserve())。 (您还可以使用 istream_iterator 或 istreambuf_iterator 做奇怪的事情,但 resize() 可能更好)。

If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).

From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.

Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.

You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).

(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).

To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).

红玫瑰 2024-10-18 21:44:15

这是使用两个函数的尝试,这两个函数将使用最少的字节数,而不压缩位集。

template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
    // export a bitset consisting of I bits to an output stream.
    // Eight bits are stored to a single stream byte.
    unsigned int i = 0;  // the current bit index
    unsigned char c = 0; // the current byte
    short bits = 0;      // to process next byte
    while(i < in.size())
    {
        c = c << 1;       //
        if(in.at(i)) ++c; // adding 1 if bit is true
        ++bits;
        if(bits == 8)
        {
            out.put((char)c);
            c = 0;
            bits = 0;
        }
        ++i;
    }
    // dump remaining
    if(bits != 0) {
        // pad the byte so that first bits are in the most significant positions.
        while(bits != 8)
        {
            c = c << 1;
            ++bits;
        }
        out.put((char)c);
    }
    return;
}

template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
    // read bytes from the input stream to a bitset of size I.
    /* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
    unsigned int i = 0;          // current bit index
    unsigned char mask = 0x80;   // current byte mask
    unsigned char c = 0;         // current byte in stream
    while(in.good() && (i < I))
    {
        if((i%8) == 0)           // retrieve next character
        { c = in.get();
          mask = 0x80;
        }
        else mask = mask >> 1;   // shift mask
        out.at(i) = (c & mask);
        ++i;
    }
}

请注意,可能使用位集使用的内存部分作为字符数组的reinterpret_cast也可以工作,但它可能无法跨系统移植,因为您不知道位集的表示是什么(字节序?)

Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.

template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
    // export a bitset consisting of I bits to an output stream.
    // Eight bits are stored to a single stream byte.
    unsigned int i = 0;  // the current bit index
    unsigned char c = 0; // the current byte
    short bits = 0;      // to process next byte
    while(i < in.size())
    {
        c = c << 1;       //
        if(in.at(i)) ++c; // adding 1 if bit is true
        ++bits;
        if(bits == 8)
        {
            out.put((char)c);
            c = 0;
            bits = 0;
        }
        ++i;
    }
    // dump remaining
    if(bits != 0) {
        // pad the byte so that first bits are in the most significant positions.
        while(bits != 8)
        {
            c = c << 1;
            ++bits;
        }
        out.put((char)c);
    }
    return;
}

template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
    // read bytes from the input stream to a bitset of size I.
    /* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
    unsigned int i = 0;          // current bit index
    unsigned char mask = 0x80;   // current byte mask
    unsigned char c = 0;         // current byte in stream
    while(in.good() && (i < I))
    {
        if((i%8) == 0)           // retrieve next character
        { c = in.get();
          mask = 0x80;
        }
        else mask = mask >> 1;   // shift mask
        out.at(i) = (c & mask);
        ++i;
    }
}

Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)

氛圍 2024-10-18 21:44:15

这个怎么样

#include <sys/time.h>
#include <unistd.h>

#include <algorithm>
#include <fstream>
#include <vector>

...
{
  std::srand(std::time(nullptr));
  std::vector<bool> vct1, vct2;
  vct1.resize(20000000, false);
  vct2.resize(20000000, false);
  // insert some data
  for (size_t i = 0; i < 1000000; i++) {
    vct1[std::rand() % 20000000] = true;
  }
  
  // serialize to file
  std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
  for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
    auto vct1_iter = vct1.begin();
    vct1_iter += i;
    uint32_t block_num = i / std::_S_word_bit;
    std::_Bit_type block_val = *(vct1_iter._M_p);
    if (block_val != 0) {
      // only write not-zero block
      ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
      ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
    }
  }
  ofs.close();

  // deserialize
  std::ifstream ifs("bitset", std::ios::in);
  ifs.seekg(0, std::ios::end);
  uint64_t file_size = ifs.tellg();
  ifs.seekg(0);
  uint64_t load_size = 0;
  while (load_size < file_size) {
    uint32_t block_num;
    ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
    std::_Bit_type block_value;
    ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
    load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
    auto offset = block_num * std::_S_word_bit;
    if (offset >= vct2.size()) {
      std::cout << "error! already touch end" << std::endl;
      break;
    }
    auto iter = vct2.begin();
    iter += offset;
    *(iter._M_p) = block_value;
  }
  ifs.close();

  // check result
  int count_true1 = std::count(vct1.begin(), vct1.end(), true);
  int count_true2 = std::count(vct2.begin(), vct2.end(), true);
  std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;

}

How about this

#include <sys/time.h>
#include <unistd.h>

#include <algorithm>
#include <fstream>
#include <vector>

...
{
  std::srand(std::time(nullptr));
  std::vector<bool> vct1, vct2;
  vct1.resize(20000000, false);
  vct2.resize(20000000, false);
  // insert some data
  for (size_t i = 0; i < 1000000; i++) {
    vct1[std::rand() % 20000000] = true;
  }
  
  // serialize to file
  std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
  for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
    auto vct1_iter = vct1.begin();
    vct1_iter += i;
    uint32_t block_num = i / std::_S_word_bit;
    std::_Bit_type block_val = *(vct1_iter._M_p);
    if (block_val != 0) {
      // only write not-zero block
      ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
      ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
    }
  }
  ofs.close();

  // deserialize
  std::ifstream ifs("bitset", std::ios::in);
  ifs.seekg(0, std::ios::end);
  uint64_t file_size = ifs.tellg();
  ifs.seekg(0);
  uint64_t load_size = 0;
  while (load_size < file_size) {
    uint32_t block_num;
    ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
    std::_Bit_type block_value;
    ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
    load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
    auto offset = block_num * std::_S_word_bit;
    if (offset >= vct2.size()) {
      std::cout << "error! already touch end" << std::endl;
      break;
    }
    auto iter = vct2.begin();
    iter += offset;
    *(iter._M_p) = block_value;
  }
  ifs.close();

  // check result
  int count_true1 = std::count(vct1.begin(), vct1.end(), true);
  int count_true2 = std::count(vct2.begin(), vct2.end(), true);
  std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;

}
妞丶爷亲个 2024-10-18 21:44:15

一种方法可能是:

std::vector<bool> data = /* obtain bits somehow */

// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS)); 

for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
   for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
       int bit = data[byteIndex * CHAR_BITS + bitIndex];

       bytes[byteIndex] |= bit << bitIndex;
   }
}

请注意,这假设您不关心位布局最终在内存中的位置,因为它不会对任何内容进行调整。但是,只要您还序列化实际存储的位数(以涵盖位数不是 CHAR_BITS 倍数的情况),您就可以反序列化与最初相同的位集或向量,如下所示。

(我对桶大小的计算并不满意,但现在是凌晨 1 点,我很难想到更优雅的东西)。

One way might be:

std::vector<bool> data = /* obtain bits somehow */

// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS)); 

for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
   for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
       int bit = data[byteIndex * CHAR_BITS + bitIndex];

       bytes[byteIndex] |= bit << bitIndex;
   }
}

Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.

(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).

撧情箌佬 2024-10-18 21:44:15
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
走过海棠暮 2024-10-18 21:44:15

有两种选择:

花额外的英镑(或便士,更有可能)购买更大的磁盘。

编写一个例程,一次从位集中提取 8 位,将它们组合成字节,然后将它们写入输出流。

Two options:

Spend the extra pounds (or pence, more likely) for a bigger disk.

Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文