在 Boost Spirit 中解码 char UTF8 转义
提出的问题: 精神通用列表< /a>
大家好,
我不确定我的主题是否正确,但测试代码可能会显示 我想要实现什么。
我正在尝试解析以下内容:
- “%40”到“@”
- “%3C”到“<”
我下面有一个最小的测试用例。我不明白为什么 这是行不通的。这可能是我犯了一个错误,但我没有看到。
使用: 编译器:gcc 4.6 Boost:当前主干
我使用以下编译行:
g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp
#include <iostream>
#include <string>
#define BOOST_SPIRIT_UNICODE
#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>
typedef boost::uint32_t uchar; // Unicode codepoint
namespace qi = boost::spirit::qi;
int main(int argc, char **argv) {
// Input
std::string input = "%3C";
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
using qi::xdigit;
using qi::_1;
using qi::_2;
using qi::_val;
qi::rule<std::string::const_iterator, uchar()> pchar =
('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];
std::string result;
bool r = qi::parse(begin, end, pchar, result);
if (r && begin == end) {
std::cout << "Output: " << result << std::endl;
std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
} else {
std::cerr << "Error" << std::endl;
return 1;
}
return 0;
}
问候,
Matthijs Möhlmann
question asked: Spirit-general list
Hello all,
I'm not sure if my subject is correct, but the testcode will probably show
what I want to achieve.
I'm trying to parse things like:
- '%40' to '@'
- '%3C' to '<'
I have a minimal testcase below. I don't understand why
this doesn't work. It's probably me making a mistake but I don't see it.
Using:
Compiler: gcc 4.6
Boost: current trunk
I use the following compile line:
g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp
#include <iostream>
#include <string>
#define BOOST_SPIRIT_UNICODE
#include <boost/cstdint.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>
typedef boost::uint32_t uchar; // Unicode codepoint
namespace qi = boost::spirit::qi;
int main(int argc, char **argv) {
// Input
std::string input = "%3C";
std::string::const_iterator begin = input.begin();
std::string::const_iterator end = input.end();
using qi::xdigit;
using qi::_1;
using qi::_2;
using qi::_val;
qi::rule<std::string::const_iterator, uchar()> pchar =
('%' > xdigit > xdigit) [_val = (_1 << 4) + _2];
std::string result;
bool r = qi::parse(begin, end, pchar, result);
if (r && begin == end) {
std::cout << "Output: " << result << std::endl;
std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl;
} else {
std::cerr << "Error" << std::endl;
return 1;
}
return 0;
}
Regards,
Matthijs Möhlmann
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
qi::xdigit
并不像您想象的那样:它返回原始字符(即'0'
,而不是0x00
)。您可以利用
qi::uint_parser
对您有利,使您的解析更加简单作为奖励:这是一个固定的示例:
输出:
'<'确实是 ASCII 60
qi::xdigit
does not do what you think it does: it returns the raw character (i.e.'0'
, not0x00
).You could leverage
qi::uint_parser
to your advantage, making your parse much simpler as a bonus:Here is a fixed up sample:
Output:
'<' is indeed ASCII 60