在 Boost Spirit 中解码 char UTF8 转义

发布于 2024-12-14 21:16:19 字数 1707 浏览 2 评论 0原文

提出的问题: 精神通用列表< /a>

大家好,

我不确定我的主题是否正确,但测试代码可能会显示 我想要实现什么。

我正在尝试解析以下内容:

  • “%40”到“@”
  • “%3C”到“<”

我下面有一个最小的测试用例。我不明白为什么 这是行不通的。这可能是我犯了一个错误,但我没有看到。

使用: 编译器:gcc 4.6 Boost:当前主干

我使用以下编译行:

g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp


#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    using qi::xdigit;
    using qi::_1;
    using qi::_2;
    using qi::_val;

    qi::rule<std::string::const_iterator, uchar()> pchar =
        ('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];

    std::string result;
    bool r = qi::parse(begin, end, pchar, result);
    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

问候,

Matthijs Möhlmann

question asked: Spirit-general list

Hello all,

I'm not sure if my subject is correct, but the testcode will probably show
what I want to achieve.

I'm trying to parse things like:

  • '%40' to '@'
  • '%3C' to '<'

I have a minimal testcase below. I don't understand why
this doesn't work. It's probably me making a mistake but I don't see it.

Using:
Compiler: gcc 4.6
Boost: current trunk

I use the following compile line:

g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    using qi::xdigit;
    using qi::_1;
    using qi::_2;
    using qi::_val;

    qi::rule<std::string::const_iterator, uchar()> pchar =
        ('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];

    std::string result;
    bool r = qi::parse(begin, end, pchar, result);
    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

Regards,

Matthijs Möhlmann

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

寄居人 2024-12-21 21:16:19

qi::xdigit 并不像您想象的那样:它返回原始字符(即 '0',而不是 0x00)。

您可以利用 qi::uint_parser 对您有利,使您的解析更加简单作为奖励:

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
  • 无需依赖phoenix(使其可以在旧版本的Boost上工作)
  • 将两个字符合而为一go(否则,您可能需要添加大量转换以防止整数符号扩展)

这是一个固定的示例:

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
const static xuchar xuchar_ = xuchar();


int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_;

    uchar result;
    bool r = qi::parse(begin, end, pchar, result);

    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

输出:

Output:   60
Expected: < (LESS-THAN SIGN)

'<'确实是 ASCII 60

qi::xdigit does not do what you think it does: it returns the raw character (i.e. '0', not 0x00).

You could leverage qi::uint_parser to your advantage, making your parse much simpler as a bonus:

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
  • no need to rely on phoenix (making it work on older versions of Boost)
  • get both characters in one go (otherwise, you might have needed to add copious casting to prevent integer sign extensions)

Here is a fixed up sample:

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
const static xuchar xuchar_ = xuchar();


int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_;

    uchar result;
    bool r = qi::parse(begin, end, pchar, result);

    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

Output:

Output:   60
Expected: < (LESS-THAN SIGN)

'<' is indeed ASCII 60

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文