当前位置：文江博客话题详情

如何存储向量？或文件中的位集，但按位？

发布于 2024-10-11 21:44:15 字数 186 浏览 5 评论 0原文

如何将位集数据写入文件？

第一个答案没有正确回答问题，因为它占用的空间比应有的空间多了 8 倍。

你会怎么做？我真的需要它来保存很多真/假值。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

茶色山野 2024-10-18 21:44:15

最简单的方法：采用连续的 8 个布尔值，将它们表示为单个字节，将该字节写入文件。这样可以节省很多空间。

在文件的开头，您可以写入要写入文件的布尔值数量；该数字将有助于从文件读取字节并将它们转换回布尔值！

回复收藏 0 原文

一场信仰旅途 2024-10-18 21:44:15

如果您想要最支持转换为二进制的位集类，并且您的位集大于 unsigned long 的大小，那么最好使用的选项是 boost::dynamic_bitset。（如果您关心节省空间，我认为它超过 32 位，甚至 64 位）。

从dynamic_bitset中，您可以使用to_block_range将位写入底层整数类型。您可以通过使用 from_block_range 或其 BlockInputIterator 中的构造函数或通过调用append() 来从块构造dynamic_bitset。

现在您已经拥有了原始格式（块）的字节，但仍然存在将其写入流并读回的问题。

您需要首先存储一些“标头”信息：您拥有的块数以及可能的字节序。或者您可以使用宏来转换为标准字节序（例如 ntohl，但理想情况下您将使用对最常见平台无操作的宏，因此如果这是小字节序，您可能希望以这种方式存储并仅转换为大端系统）。

（注意：我假设 boost::dynamic_bitset 标准地以相同的方式转换整数类型，无论底层字节序如何。他们的文档没有说明）。

要将二进制数字写入流，请使用 os.write( &data[0], sizeof(Block) * nBlocks ) 并使用 is.read( &data[0] , sizeof(Block) * nBlocks ) 其中数据被假定为 vector 并且在读取之前您必须执行 data.resize(nBlocks) （不是reserve())。（您还可以使用 istream_iterator 或 istreambuf_iterator 做奇怪的事情，但 resize() 可能更好）。

回复收藏 0 原文

红玫瑰 2024-10-18 21:44:15

这是使用两个函数的尝试，这两个函数将使用最少的字节数，而不压缩位集。

template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
    // export a bitset consisting of I bits to an output stream.
    // Eight bits are stored to a single stream byte.
    unsigned int i = 0;  // the current bit index
    unsigned char c = 0; // the current byte
    short bits = 0;      // to process next byte
    while(i < in.size())
    {
        c = c << 1;       //
        if(in.at(i)) ++c; // adding 1 if bit is true
        ++bits;
        if(bits == 8)
        {
            out.put((char)c);
            c = 0;
            bits = 0;
        }
        ++i;
    }
    // dump remaining
    if(bits != 0) {
        // pad the byte so that first bits are in the most significant positions.
        while(bits != 8)
        {
            c = c << 1;
            ++bits;
        }
        out.put((char)c);
    }
    return;
}

template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
    // read bytes from the input stream to a bitset of size I.
    /* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
    unsigned int i = 0;          // current bit index
    unsigned char mask = 0x80;   // current byte mask
    unsigned char c = 0;         // current byte in stream
    while(in.good() && (i < I))
    {
        if((i%8) == 0)           // retrieve next character
        { c = in.get();
          mask = 0x80;
        }
        else mask = mask >> 1;   // shift mask
        out.at(i) = (c & mask);
        ++i;
    }
}

请注意，可能使用位集使用的内存部分作为字符数组的reinterpret_cast也可以工作，但它可能无法跨系统移植，因为您不知道位集的表示是什么（字节序？）

Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.

template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
    // export a bitset consisting of I bits to an output stream.
    // Eight bits are stored to a single stream byte.
    unsigned int i = 0;  // the current bit index
    unsigned char c = 0; // the current byte
    short bits = 0;      // to process next byte
    while(i < in.size())
    {
        c = c << 1;       //
        if(in.at(i)) ++c; // adding 1 if bit is true
        ++bits;
        if(bits == 8)
        {
            out.put((char)c);
            c = 0;
            bits = 0;
        }
        ++i;
    }
    // dump remaining
    if(bits != 0) {
        // pad the byte so that first bits are in the most significant positions.
        while(bits != 8)
        {
            c = c << 1;
            ++bits;
        }
        out.put((char)c);
    }
    return;
}

template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
    // read bytes from the input stream to a bitset of size I.
    /* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
    unsigned int i = 0;          // current bit index
    unsigned char mask = 0x80;   // current byte mask
    unsigned char c = 0;         // current byte in stream
    while(in.good() && (i < I))
    {
        if((i%8) == 0)           // retrieve next character
        { c = in.get();
          mask = 0x80;
        }
        else mask = mask >> 1;   // shift mask
        out.at(i) = (c & mask);
        ++i;
    }
}

Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)

回复收藏 0 原文

氛圍 2024-10-18 21:44:15

这个怎么样

#include <sys/time.h>
#include <unistd.h>

#include <algorithm>
#include <fstream>
#include <vector>

...
{
  std::srand(std::time(nullptr));
  std::vector<bool> vct1, vct2;
  vct1.resize(20000000, false);
  vct2.resize(20000000, false);
  // insert some data
  for (size_t i = 0; i < 1000000; i++) {
    vct1[std::rand() % 20000000] = true;
  }
  
  // serialize to file
  std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
  for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
    auto vct1_iter = vct1.begin();
    vct1_iter += i;
    uint32_t block_num = i / std::_S_word_bit;
    std::_Bit_type block_val = *(vct1_iter._M_p);
    if (block_val != 0) {
      // only write not-zero block
      ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
      ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
    }
  }
  ofs.close();

  // deserialize
  std::ifstream ifs("bitset", std::ios::in);
  ifs.seekg(0, std::ios::end);
  uint64_t file_size = ifs.tellg();
  ifs.seekg(0);
  uint64_t load_size = 0;
  while (load_size < file_size) {
    uint32_t block_num;
    ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
    std::_Bit_type block_value;
    ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
    load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
    auto offset = block_num * std::_S_word_bit;
    if (offset >= vct2.size()) {
      std::cout << "error! already touch end" << std::endl;
      break;
    }
    auto iter = vct2.begin();
    iter += offset;
    *(iter._M_p) = block_value;
  }
  ifs.close();

  // check result
  int count_true1 = std::count(vct1.begin(), vct1.end(), true);
  int count_true2 = std::count(vct2.begin(), vct2.end(), true);
  std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;

}

How about this

#include <sys/time.h>
#include <unistd.h>

#include <algorithm>
#include <fstream>
#include <vector>

...
{
  std::srand(std::time(nullptr));
  std::vector<bool> vct1, vct2;
  vct1.resize(20000000, false);
  vct2.resize(20000000, false);
  // insert some data
  for (size_t i = 0; i < 1000000; i++) {
    vct1[std::rand() % 20000000] = true;
  }
  
  // serialize to file
  std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
  for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
    auto vct1_iter = vct1.begin();
    vct1_iter += i;
    uint32_t block_num = i / std::_S_word_bit;
    std::_Bit_type block_val = *(vct1_iter._M_p);
    if (block_val != 0) {
      // only write not-zero block
      ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
      ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
    }
  }
  ofs.close();

  // deserialize
  std::ifstream ifs("bitset", std::ios::in);
  ifs.seekg(0, std::ios::end);
  uint64_t file_size = ifs.tellg();
  ifs.seekg(0);
  uint64_t load_size = 0;
  while (load_size < file_size) {
    uint32_t block_num;
    ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
    std::_Bit_type block_value;
    ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
    load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
    auto offset = block_num * std::_S_word_bit;
    if (offset >= vct2.size()) {
      std::cout << "error! already touch end" << std::endl;
      break;
    }
    auto iter = vct2.begin();
    iter += offset;
    *(iter._M_p) = block_value;
  }
  ifs.close();

  // check result
  int count_true1 = std::count(vct1.begin(), vct1.end(), true);
  int count_true2 = std::count(vct2.begin(), vct2.end(), true);
  std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;

}

回复收藏 0 原文

妞丶爷亲个 2024-10-18 21:44:15

一种方法可能是：

std::vector<bool> data = /* obtain bits somehow */

// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS)); 

for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
   for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
       int bit = data[byteIndex * CHAR_BITS + bitIndex];

       bytes[byteIndex] |= bit << bitIndex;
   }
}

请注意，这假设您不关心位布局最终在内存中的位置，因为它不会对任何内容进行调整。但是，只要您还序列化实际存储的位数（以涵盖位数不是 CHAR_BITS 倍数的情况），您就可以反序列化与最初相同的位集或向量，如下所示。

（我对桶大小的计算并不满意，但现在是凌晨 1 点，我很难想到更优雅的东西）。

One way might be:

std::vector<bool> data = /* obtain bits somehow */

// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS)); 

for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
   for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
       int bit = data[byteIndex * CHAR_BITS + bitIndex];

       bytes[byteIndex] |= bit << bitIndex;
   }
}

Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.

(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).

回复收藏 0 原文

撧情箌佬 2024-10-18 21:44:15

#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);

#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);

回复收藏 0 原文