使用指向 igzstreams 的 STL 向量读取 gzip 压缩文件

发布于 2024-12-12 03:24:00 字数 1769 浏览 0 评论 0原文

作为输入,我有一个 gzip 压缩文件列表。如此处所示,我使用 gzstream 来处理它们。出于实际原因,我想打开每个文件并将每个流记录到一个向量中。这看起来很简单,但我无法让它发挥作用。这是最小的代码:

#include <cstdlib>

#include <iostream>
#include <vector>
using namespace std;

#include <gzstream.h>

int main (int argc, char ** argv)
{
  size_t i;
  vector<string> vInFiles;
  vector<igzstream *> vStreams;
  string line;

  // create the dummy input files
  system ("rm -f infile*.gz; for i in {1..2}; do echo \"toto\"${i} | gzip > infile${i}.gz; done");
  vInFiles.push_back ("infile1.gz");
  vInFiles.push_back ("infile2.gz");

  // open each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    igzstream inStream;
    inStream.open (vInFiles[i].c_str());
    if (! inStream.good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (&inStream);
  }

  // manipulate each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    cout << "read first line of file " << vInFiles[i] << endl;
    getline (*(vStreams[i]), line);
    if (line.empty())
    {
      cerr << "empty line" << endl;
      exit (1);
    }
    cout << line << endl;
  }

  // close each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    vStreams[i]->close();
  }
  vStreams.clear();

  return 0;
}

该代码可以正确编译:

$ gcc -Wall test.cpp -lstdc++ -lgzstream -lz

尽管它运行顺利,但它无法正确读取文件:

$ ./a.out
read first line of file infile1.gz
empty line                

As input, I have a list of gzipped files. As shown here, I use gzstream to handle them. For practical reasons, I want to open each file and record each stream into a vector. It seems pretty straightforward but I don't manage to make it work. Here is the minimal code:

#include <cstdlib>

#include <iostream>
#include <vector>
using namespace std;

#include <gzstream.h>

int main (int argc, char ** argv)
{
  size_t i;
  vector<string> vInFiles;
  vector<igzstream *> vStreams;
  string line;

  // create the dummy input files
  system ("rm -f infile*.gz; for i in {1..2}; do echo \"toto\"${i} | gzip > infile${i}.gz; done");
  vInFiles.push_back ("infile1.gz");
  vInFiles.push_back ("infile2.gz");

  // open each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    igzstream inStream;
    inStream.open (vInFiles[i].c_str());
    if (! inStream.good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (&inStream);
  }

  // manipulate each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    cout << "read first line of file " << vInFiles[i] << endl;
    getline (*(vStreams[i]), line);
    if (line.empty())
    {
      cerr << "empty line" << endl;
      exit (1);
    }
    cout << line << endl;
  }

  // close each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    vStreams[i]->close();
  }
  vStreams.clear();

  return 0;
}

This code compiles properly:

$ gcc -Wall test.cpp -lstdc++ -lgzstream -lz

And although it run smoothly, it doesn't read the files properly:

$ ./a.out
read first line of file infile1.gz
empty line                

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

羁绊已千年 2024-12-19 03:24:00

迭代结束后,您的流指针无效,因为自动流对象随后被销毁。如果您确实需要,则需要在免费商店上分配它们(或使 igzstream 可移动)。

// std::vector<boost::shared_ptr<igzstream>> for C++03 
std::vector<std::unique_ptr<igzstream>> vStreams;

// ...

for (size_t i = 0; i < vInFiles.size(); ++i) {
    // boost::shared_ptr<igzstream> inStream = boost::make_shared<igzstream>();
    auto inStream = std::unique_ptr<igzstream>(new igzstream);
    inStream->open(...);
    // ...
    vStreams.push_back(inStream);
}

// ...

Your stream pointers are invalid after the iteration ends, as the automatic stream object is destroyed then. If you really need that you need to allocate them on the free store (or make igzstream movable).

// std::vector<boost::shared_ptr<igzstream>> for C++03 
std::vector<std::unique_ptr<igzstream>> vStreams;

// ...

for (size_t i = 0; i < vInFiles.size(); ++i) {
    // boost::shared_ptr<igzstream> inStream = boost::make_shared<igzstream>();
    auto inStream = std::unique_ptr<igzstream>(new igzstream);
    inStream->open(...);
    // ...
    vStreams.push_back(inStream);
}

// ...
放手` 2024-12-19 03:24:00

这已破了;您存储指向流的指针向量,但使用指向流的本地作用域自动实例的指针(在 for 循环内)对其进行初始化。一旦循环的每次迭代完成,该实例就超出了范围,并且您有一个指向一些废话的指针。

然后你稍后再使用这些垃圾,然后你就会得到垃圾。

使用智能指针,例如

  std::vector<boost::shared_ptr<igzstream> > vStreams;
  // to initialize
  for (i = 0; i < vInFiles.size(); ++i)
  {
    boost::shared_ptr<igzstream> inStream(new igzstream(vInFiles[i].c_str());
    if (!inStream->good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (inStream); // save the smart pointer
  }

This is broken; you store a vector of pointers to streams, but you initialize it with a pointer to a locally scoped automatic instance of the stream (inside the for-loop). Once each iteration of the loop completes, that instance is out of scope, and you have a pointer to some crap.

You then use that crap later, and you get crap out.

Use a smart pointer, e.g.

  std::vector<boost::shared_ptr<igzstream> > vStreams;
  // to initialize
  for (i = 0; i < vInFiles.size(); ++i)
  {
    boost::shared_ptr<igzstream> inStream(new igzstream(vInFiles[i].c_str());
    if (!inStream->good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (inStream); // save the smart pointer
  }
那一片橙海, 2024-12-19 03:24:00

正如评论中提到的,我不想使用 Boost 并且我只有 gcc 4.1.2。因此,感谢 Cat Plus Plus 的建议,这是使用免费商店的解决方案:

  // open each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    igzstream * pt_inStream = new igzstream;
    pt_inStream->open (vInFiles[i].c_str());
    if (! pt_inStream->good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (pt_inStream);
  }

并且:

  // close each input file                                                                                                                                                                                                                                           
  for (i = 0; i < vInFiles.size(); ++i)
  {
    vStreams[i]->close();
    delete vStreams[i];
  }

As mentioned in the comments, I would prefer not to use Boost and I only have gcc 4.1.2. Thus, here is the solution using the free store, thanks to the suggestion of Cat Plus Plus:

  // open each input file
  for (i = 0; i < vInFiles.size(); ++i)
  {
    igzstream * pt_inStream = new igzstream;
    pt_inStream->open (vInFiles[i].c_str());
    if (! pt_inStream->good())
    {
      cerr << "ERROR: can't open file " << vInFiles[i] << endl;
      exit (1);
    }
    vStreams.push_back (pt_inStream);
  }

And:

  // close each input file                                                                                                                                                                                                                                           
  for (i = 0; i < vInFiles.size(); ++i)
  {
    vStreams[i]->close();
    delete vStreams[i];
  }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文