当前位置：文江博客话题详情

在 Boost Spirit 中解码 char UTF8 转义

发布于 2024-12-14 21:16:19 字数 1707 浏览 2 评论 0原文

提出的问题： 精神通用列表< /a>

_大家好，

我不确定我的主题是否正确，但测试代码可能会显示我想要实现什么。

我正在尝试解析以下内容：

“%40”到“@”
“%3C”到“<”

我下面有一个最小的测试用例。我不明白为什么这是行不通的。这可能是我犯了一个错误，但我没有看到。

使用：编译器：gcc 4.6 Boost：当前主干

我使用以下编译行：

g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    using qi::xdigit;
    using qi::_1;
    using qi::_2;
    using qi::_val;

    qi::rule<std::string::const_iterator, uchar()> pchar =
        ('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];

    std::string result;
    bool r = qi::parse(begin, end, pchar, result);
    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

问候，

Matthijs Möhlmann

原文

question asked: Spirit-general list

_{Hello all,}

I'm not sure if my subject is correct, but the testcode will probably show
what I want to achieve.

I'm trying to parse things like:

'%40' to '@'
'%3C' to '<'

I have a minimal testcase below. I don't understand why
this doesn't work. It's probably me making a mistake but I don't see it.

Using:
Compiler: gcc 4.6
Boost: current trunk

I use the following compile line:

g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    using qi::xdigit;
    using qi::_1;
    using qi::_2;
    using qi::_val;

    qi::rule<std::string::const_iterator, uchar()> pchar =
        ('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];

    std::string result;
    bool r = qi::parse(begin, end, pchar, result);
    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

Regards,

Matthijs Möhlmann

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居人 2024-12-21 21:16:19

qi::xdigit 并不像您想象的那样：它返回原始字符（即 '0'，而不是 0x00）。

您可以利用 qi::uint_parser 对您有利，使您的解析更加简单作为奖励：

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;

无需依赖phoenix（使其可以在旧版本的Boost上工作）
将两个字符合而为一go（否则，您可能需要添加大量转换以防止整数符号扩展）

这是一个固定的示例：

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
const static xuchar xuchar_ = xuchar();


int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_;

    uchar result;
    bool r = qi::parse(begin, end, pchar, result);

    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

输出：

Output:   60
Expected: < (LESS-THAN SIGN)

'<'确实是 ASCII 60

qi::xdigit does not do what you think it does: it returns the raw character (i.e. '0', not 0x00).

You could leverage qi::uint_parser to your advantage, making your parse much simpler as a bonus:

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;

no need to rely on phoenix (making it work on older versions of Boost)
get both characters in one go (otherwise, you might have needed to add copious casting to prevent integer sign extensions)

Here is a fixed up sample:

#include <iostream>
#include <string>

#define BOOST_SPIRIT_UNICODE

#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>

typedef boost::uint32_t uchar; // Unicode codepoint

namespace qi = boost::spirit::qi;

typedef qi::uint_parser<uchar, 16, 2, 2> xuchar;
const static xuchar xuchar_ = xuchar();


int main(int argc, char **argv) {

    // Input
    std::string input = "%3C";
    std::string::const_iterator begin = input.begin();
    std::string::const_iterator end = input.end();

    qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_;

    uchar result;
    bool r = qi::parse(begin, end, pchar, result);

    if (r && begin == end) {
        std::cout << "Output:   " << result << std::endl;
        std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
    } else {
        std::cerr << "Error" << std::endl;
        return 1;
    }

    return 0;
}

Output:

Output:   60
Expected: < (LESS-THAN SIGN)

'<' is indeed ASCII 60

回复收藏 0 原文

~没有更多了~

关于作者

心碎的声音

暂无简介

文章

25 人气

关注发私信

╰ゝ天使的微笑

文章 0 评论 0

关注

少女净妖师

文章 0 评论 0

关注

朱洁

文章 0 评论 0

关注

觉浅

文章 0 评论 0

关注

滥情空心

文章 0 评论 0

关注

hl1314520

文章 0 评论 0

友情链接

文江博客

在 Boost Spirit 中解码 char UTF8 转义

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

在 Boost Spirit 中解码 char UTF8 转义

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。