boost::iostreams::copy() 的异常

发布于 2024-09-07 15:00:00 字数 797 浏览 11 评论 0原文

在下面的代码中,我有一个损坏的“hello.bz2”,其中包含超出 EOF 的杂散字符。

有没有办法让 boost::iostreams::copy() 调用 throw ?

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main() 
{
    using namespace std;
    using namespace boost::iostreams;

    ifstream file("hello.bz2", ios_base::in | ios_base::binary);
    filtering_streambuf<input> in;
    in.push(bzip2_decompressor());
    in.push(file);
    boost::iostreams::copy(in, cout);
}

编辑: 请忽略目前为止最受关注的那一行; EOF。请假设使用损坏的 bzip2 文件。 我使用“EOF”提示在文件上运行 bzcat 时出现的错误

bzcat hello.bz2
hello world

bzcat: hello.bz2: trailing garbage after EOF ignored

In the below code, I have a corrupt "hello.bz2" which has stray characters beyond the EOF.

Is there a way to make the boost::iostreams::copy() call to throw ?

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main() 
{
    using namespace std;
    using namespace boost::iostreams;

    ifstream file("hello.bz2", ios_base::in | ios_base::binary);
    filtering_streambuf<input> in;
    in.push(bzip2_decompressor());
    in.push(file);
    boost::iostreams::copy(in, cout);
}

EDIT:
Please ignore the line that is so far attracted most attention; the EOF. Please assume working with a corrupted bzip2 file.
I used "EOF" suggesting the error I got when I run bzcat on the file

bzcat hello.bz2
hello world

bzcat: hello.bz2: trailing garbage after EOF ignored

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故人如初 2024-09-14 15:00:00

研究

std::ios_base::failure 是“Iostreams 库中的函数作为异常抛出的所有对象类型的基类,用于报告在流缓冲区操作期间检测到的错误”。

查看 boost 文档:

class bzip2_error : public std::ios_base::failure {
public:
    bzip2_error(int error);
    int error() const;
};

bzip2_error是使用bzip2过滤器时抛出的特定异常,它继承自std::ios_base::failure。正如您所看到的,它是通过传入表示错误代码的整数来构造的。它还有一个方法error(),它返回构造它所用的错误代码。
文档列出了 bzip2 错误代码,如下所示:

  • data_error - 表示压缩数据流已损坏。等于 BZ_DATA_ERROR。
  • data_error_magic - 指示压缩数据流不以“magic”序列“B”“Z”“h”开头。等于 BZ_DATA_ERROR_MAGIC。
  • config_error - 表示当前平台的 libbzip2 配置不正确。等于 BZ_CONFIG_ERROR。

代码

编辑
我还想澄清一下,boost::iostreams::copy() 不会是这里抛出异常的那个,而是 bzip2 过滤器。只有 iostream 或过滤器会抛出异常,仅复制使用 iostream/filter 可能会导致 iostream/filter 抛出异常。

**编辑2 **
正如您所料,问题似乎出在 bzip2_decompressor_impl 上。当 bz2 文件为空时,我复制了无限旋转循环。我花了一些时间才弄清楚如何构建 boost 并与 bzip2、zlib 和 iostreams 库链接,看看是否可以复制您的结果。

g++ test.cpp -lz -lbz2 boostinstall/boost/bin.v2/libs/iostreams/build/darwin-4.2.1/release/link-static/threading-multi/libboost_iostreams.a -Lboostinstall/boost/bin.v2/libs/ -Iboost/include/boost-1_42 -g

test.cpp:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main()
{
    using namespace std;
    using namespace boost::iostreams;

    try {
        ifstream file("hello.bz2", ios_base::in | ios_base::binary);
        filtering_streambuf<input> in;
        in.push(bzip2_decompressor());
        in.push(file);
        boost::iostreams::copy(in, cout);
    }
    catch(const bzip2_error& exception) {
        int error = exception.error();

        if(error == boost::iostreams::bzip2::data_error) {
            // compressed data stream is corrupted
            cout << "compressed data stream is corrupted";
        }
        else if(error == boost::iostreams::bzip2::data_error_magic)
        {
            // compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'
            cout << "compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'";
        }
        else if(boost::iostreams::bzip2::config_error) {
            // libbzip2 has been improperly configured for the current platform
            cout << "libbzip2 has been improperly configured for the current platform";
        }
    }
}

调试:

gdb a.out
(gdb) b bzip2.hpp:344

在 symmetry.hpp:109 中有一个驱动 bzip2 解压缩的循环:在

        while (true)
        {
            // Invoke filter if there are unconsumed characters in buffer or if
            // filter must be flushed.
            bool flush = status == f_eof;
            if (buf.ptr() != buf.eptr() || flush) {
                const char_type* next = buf.ptr();
                bool done =
                    !filter().filter(next, buf.eptr(), next_s, end_s, flush);
                buf.ptr() = buf.data() + (next - buf.data());
                if (done)
                    return detail::check_eof(
                               static_cast<std::streamsize>(next_s - s)
                           );
            }

            // If no more characters are available without blocking, or
            // if read request has been satisfied, return.
            if ( (status == f_would_block && buf.ptr() == buf.eptr()) ||
                 next_s == end_s )
            {
                return static_cast<std::streamsize>(next_s - s);
            }

            // Fill buffer.
            if (status == f_good)
                status = fill(src);
        }

symmetry.hpp:117 上调用 bzip2_decompressor_impl 的过滤方法 bzip2.hpp:344 :

template<typename Alloc>
bool bzip2_decompressor_impl<Alloc>::filter
    ( const char*& src_begin, const char* src_end,
      char*& dest_begin, char* dest_end, bool /* flush */ )
{
    if (!ready())
        init();
    if (eof_)
        return false;
    before(src_begin, src_end, dest_begin, dest_end);
    int result = decompress();
    after(src_begin, dest_begin);
    bzip2_error::check BOOST_PREVENT_MACRO_SUBSTITUTION(result);
    return !(eof_ = result == bzip2::stream_end);
}

我认为问题很简单,bzip2_decompressor_impl 的 eof_ 标志永远不会被设置。除非它以某种我不明白的神奇方式发生,否则它由 bzip2_decompressor_impl 类拥有,并且只会被设置为 false。因此,当我们这样做时:

cat /dev/null > hello.bz2

我们得到一个永远不会结束的旋转循环,当遇到 EOF 时我们不会中断。这肯定是一个错误,因为其他程序(如 vim)打开以类似方式创建的文本文件不会有问题。不过,当 bz2 文件“损坏”时,我能够让过滤器抛出异常:

echo "other corrupt" > hello.bz2
./a.out
compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'

有时您必须对开源代码持保留态度。您的 bz2 更有可能被损坏并正确抛出。然而,/dev/null 情况是一个严重的错误。我们应该将其提交给 boost 开发人员,以便他们可以修复它。

Research

std::ios_base::failure is the "the base class for the types of all objects thrown as exceptions, by functions in the Iostreams library, to report errors detected during stream buffer operations."

Looking at the boost docs:

class bzip2_error : public std::ios_base::failure {
public:
    bzip2_error(int error);
    int error() const;
};

bzip2_error is a specific exception thrown when using the bzip2 filter, which inherits from std::ios_base::failure. As you can see, it is constructed by passing in an integer representing the error code. It also has a method error() which returns the error code it was constructed with.
The docs list bzip2 error codes as the following:

  • data_error - Indicates that the compressed data stream is corrupted. Equal to BZ_DATA_ERROR.
  • data_error_magic - Indicates that the compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'. Equal to BZ_DATA_ERROR_MAGIC.
  • config_error - Indicates that libbzip2 has been improperly configured for the current platform. Equal to BZ_CONFIG_ERROR.

Code

EDIT
I also want to clarify that boost::iostreams::copy() will not be the one throwing the exception here, but the bzip2 filter. Only the iostream or filters will throw exceptions, copy just uses the iostream/filter which may cause the iostream/filter to throw an exception.

**EDIT 2 **
It appears the problem is with bzip2_decompressor_impl as you have expected. I have replicated the endless spinning loop when the bz2 file is empty. It took me a little while to figure out how to build boost and link with bzip2, zlib, and iostreams library to see if I could replicate your results.

g++ test.cpp -lz -lbz2 boostinstall/boost/bin.v2/libs/iostreams/build/darwin-4.2.1/release/link-static/threading-multi/libboost_iostreams.a -Lboostinstall/boost/bin.v2/libs/ -Iboost/include/boost-1_42 -g

test.cpp:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main()
{
    using namespace std;
    using namespace boost::iostreams;

    try {
        ifstream file("hello.bz2", ios_base::in | ios_base::binary);
        filtering_streambuf<input> in;
        in.push(bzip2_decompressor());
        in.push(file);
        boost::iostreams::copy(in, cout);
    }
    catch(const bzip2_error& exception) {
        int error = exception.error();

        if(error == boost::iostreams::bzip2::data_error) {
            // compressed data stream is corrupted
            cout << "compressed data stream is corrupted";
        }
        else if(error == boost::iostreams::bzip2::data_error_magic)
        {
            // compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'
            cout << "compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'";
        }
        else if(boost::iostreams::bzip2::config_error) {
            // libbzip2 has been improperly configured for the current platform
            cout << "libbzip2 has been improperly configured for the current platform";
        }
    }
}

debugging:

gdb a.out
(gdb) b bzip2.hpp:344

There is a loop that drives the bzip2's uncompression in symmetric.hpp:109 :

        while (true)
        {
            // Invoke filter if there are unconsumed characters in buffer or if
            // filter must be flushed.
            bool flush = status == f_eof;
            if (buf.ptr() != buf.eptr() || flush) {
                const char_type* next = buf.ptr();
                bool done =
                    !filter().filter(next, buf.eptr(), next_s, end_s, flush);
                buf.ptr() = buf.data() + (next - buf.data());
                if (done)
                    return detail::check_eof(
                               static_cast<std::streamsize>(next_s - s)
                           );
            }

            // If no more characters are available without blocking, or
            // if read request has been satisfied, return.
            if ( (status == f_would_block && buf.ptr() == buf.eptr()) ||
                 next_s == end_s )
            {
                return static_cast<std::streamsize>(next_s - s);
            }

            // Fill buffer.
            if (status == f_good)
                status = fill(src);
        }

bzip2_decompressor_impl's filter method bzip2.hpp:344 gets called on symmetric.hpp:117 :

template<typename Alloc>
bool bzip2_decompressor_impl<Alloc>::filter
    ( const char*& src_begin, const char* src_end,
      char*& dest_begin, char* dest_end, bool /* flush */ )
{
    if (!ready())
        init();
    if (eof_)
        return false;
    before(src_begin, src_end, dest_begin, dest_end);
    int result = decompress();
    after(src_begin, dest_begin);
    bzip2_error::check BOOST_PREVENT_MACRO_SUBSTITUTION(result);
    return !(eof_ = result == bzip2::stream_end);
}

I think the problem is simple, the bzip2_decompressor_impl's eof_ flag never gets set. Unless it's suppose to happen in some magic way I don't understand, it's owned by the bzip2_decompressor_impl class and it's only ever being set to false. So when we do this:

cat /dev/null > hello.bz2

We get a spinning loop that never ends, we don't break when an EOF is hit. This is certainly a bug, because other programs (like vim) would have no problem opening a text file created in a similar manner. However I am able to get the filter to throw when the bz2 file is "corrupted":

echo "other corrupt" > hello.bz2
./a.out
compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'

Sometimes you have to take open source code with a grain of salt. It will be more likely that your bz2's will be corrupted and properly throw. However, the /dev/null case is a serious bug. We should submit it to the boost dev so they can fix it.

千仐 2024-09-14 15:00:00

如何在文件末尾之外出现杂散字符?

如果您的意思是该文件中包含垃圾数据,那么解压缩算法如何判断该数据是否为垃圾数据,以便能够做出抛出的决定?

How do you have stray characters beyond the end of the file?

If you mean that the file has garbage data in it, how would the decompression algorithm be able to tell whether or not the data is garbage to be able to make a decision to throw?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文