使用替代运算符 '|' 提升精神失败!当有两条可能的规则时
我正在开发一个 http 解析器。当我尝试使用替代运算符进行解析时,它发现了一个问题。这与属性中的值无关,我可以使用hold[]来修复它们。当有两个开头相似的规则时,就会出现问题 规则。这里有一些简单的规则来演示我的问题;
qi::rule<string_iterator> some_rule(
(char_('/') >> *char_("0-9")) /*first rule accept /123..*/
| (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
);
然后我使用 qi::parse 解析此规则,如果输入字符串喜欢,它将失败; "/abcd"
但是,当我在第一条规则之前切换第二条规则时。解析器将返回 true 我认为问题是因为当解析器使用第一条规则消耗输入时 然后发现第一条规则是Fail。它不会返回到第二条规则,即 第一条规则的替代方案。
我尝试将 hold[]
放在第一条规则中,但它仅有助于生成属性。它 并不能解决这个问题。我不知道如何解决这个问题,因为 HTTP 有很多 他们的规则开头与其他规则相同。
===========有关我的代码的更多信息==============================
这是我用于解析字符串的函数
typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using namespace rule;
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
在 main 中我有这段代码;
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
我得到这个结果;
[不正确]/htmlquery? [dead with]/htmlquery <= 你可以看到它不能消耗'?'
但是当我像这样切换规则时; (我将“rule_w_question”放在“rule_wo_question”之前)
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
| rule_wo_question
);
parse_to_string(str, whatever_rule, result);
输出将是; [正确]/htmlquery?
第一个版本(错误的版本)似乎解析消耗了“/htmlquery”(“rule_wo_question”),然后发现它无法消耗“?”这使得这个规则失败。 那么这条规则就不能转到替代规则(“rule_w_question”)。最后程序返回“[不正确]”
第二个版本我在“rule_wo_question”之前切换“rule_w_question”。这就是解析器返回“[正确]”结果的原因。
=================================================== =========== 我使用 boost 1.47 与 pthread 和 boost_filesystem 链接的整个代码 这是我的 main .c
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>
using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;
typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
int main()
{
std::string str, result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
return 0;
}
结果是
[incorrect]/htmlquery?
[dead with]/htmlquery
I am working on a http parser. It found a promblem when I try to parse using alternative operator. it is not about the values in attribute that I can fix them using hold[]. The problem occurs when there are two rules that are similar in the beginning of the
rule. here are some simple rules to demonstrate my problem;
qi::rule<string_iterator> some_rule(
(char_('/') >> *char_("0-9")) /*first rule accept /123..*/
| (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
);
Then I parse this rule using qi::parse
it will fail if the input string likes;"/abcd"
However when I switch the second rule before the first rule. The parser will return true
I think the problem is because when the parser consume the input with the first rule
and then it finds that the first rule is Fail. It wont return to the second rule which is
an alternative of the first rule.
I try to put hold[]
to the first rule but it only helps for generating an attribute. It
doesn't fix this problem. I have no idea how to fix this problem since HTTP have a lot of
rules that they have the beginning of the rules are same as others.
===========more info about my code============================
here is my function for parsing a string
typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using namespace rule;
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
In main I have this code;
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
I get this result;
[incorrect]/htmlquery?
[dead with]/htmlquery <= you can see it cannot consume '?'
however when I switch the rule like this; (I put "rule_w_question" before "rule_wo_question")
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
| rule_wo_question
);
parse_to_string(str, whatever_rule, result);
The output will be;
[correct]/htmlquery?
The first verions (wrong one) seems like the parse consume '/htmlquery' ("rule_wo_question")and then it finds that it cannot consume '?' which make this rule fail.
Then this rule cannot go to an alternative rule ("rule_w_question") . Finally the program return "[incorrect]"
The second version I switch the "rule_w_question" before "rule_wo_question". This is the reason why the parser return "[correct]" as a result.
==============================================================
my whole code with boost 1.47 linked with pthread and boost_filesystem
here is my main .c
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>
using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;
typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
int main()
{
std::string str, result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
return 0;
}
the result is
[incorrect]/htmlquery?
[dead with]/htmlquery
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Spirit 按照指定的顺序尝试给定的替代方案,并在与第一个替代方案匹配后停止解析。不执行详尽的匹配。如果有一种选择匹配,它就会停止寻找。 IOW,替代方案的顺序很重要。您应该始终首先列出“最长”的替代方案。
Spirit tries given alternatives in the sequence they are specified and stops parsing after it matched the first one. No exhaustive matching is performed. If one alternative matches it stops looking. IOW, the sequence of alternatives is important. You should always list the 'longest' alternatives first.
你还有什么理由不这样做呢?
编辑:实际上,这将匹配
/
后跟空(“0-9”0次),并且不会费心寻找“az”,更改*< /code> 到
+
。Any reason why you don't do this instead?
Edit: Actually that would match
/
followed by empty ("0-9" 0 times) and won't bother looking for "a-z", change*
to+
.您可以使用“,”或其他终止符来代替
eol
。问题是char_('/') >> *char_("0-9"))
匹配“/”后跟 0 个或多个数字。所以“/abcd”匹配“/”,然后停止解析。 K-ballo 的解决方案是我处理这种情况的方法,但此解决方案是作为替代方案提供的,以防万一(由于某种原因)他的解决方案不可接受。Instead of
eol
you could use ',' or some other terminator. The problem is thatchar_('/') >> *char_("0-9"))
matches '/' followed by 0 or more numbers. So "/abcd" matches the "/" and then stops parseing. K-ballo's solution is the way I would do this case, but this solution is provided as an alternate in case (for some reason) his is not acceptable.这是因为你的第一条规则有一个匹配,而精神是贪婪的。
将“/abcd”输入此规则将产生以下逻辑:
您可以考虑将“*”(表示“0 或更多”)更改为“+”(表示“1 或更多”)。
It's because there is a match for your first rule, and Spirit is greedy.
Feeding "/abcd" into this rule will result in the following logic:
You might consider changing the '*', which means "0 or more", to a '+', which means "1 or more".