使用替代运算符 '|' 提升精神失败！当有两条可能的规则时

发布于 2024-12-05 01:56:06 字数 5252 浏览 7 评论 0原文

我正在开发一个 http 解析器。当我尝试使用替代运算符进行解析时，它发现了一个问题。这与属性中的值无关，我可以使用hold[]来修复它们。当有两个开头相似的规则时，就会出现问题规则。这里有一些简单的规则来演示我的问题；

qi::rule<string_iterator> some_rule(
        (char_('/') >> *char_("0-9")) /*first rule accept  /123..*/
      | (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
    );

然后我使用 qi::parse 解析此规则，如果输入字符串喜欢，它将失败； "/abcd"

但是，当我在第一条规则之前切换第二条规则时。解析器将返回 true 我认为问题是因为当解析器使用第一条规则消耗输入时然后发现第一条规则是Fail。它不会返回到第二条规则，即第一条规则的替代方案。

我尝试将 hold[] 放在第一条规则中，但它仅有助于生成属性。它并不能解决这个问题。我不知道如何解决这个问题，因为 HTTP 有很多他们的规则开头与其他规则相同。

===========有关我的代码的更多信息==============================
这是我用于解析字符串的函数

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using namespace rule;
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}

在 main 中我有这段代码；

std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                        | rule_w_question
                                                       );
parse_to_string(str, whatever_rule, result);

我得到这个结果；

[不正确]/htmlquery？ [dead with]/htmlquery <= 你可以看到它不能消耗'?'

但是当我像这样切换规则时；（我将“rule_w_question”放在“rule_wo_question”之前）

std::string result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
                                                            | rule_wo_question
                                                           );
    parse_to_string(str, whatever_rule, result);

输出将是； [正确]/htmlquery？

第一个版本（错误的版本）似乎解析消耗了“/htmlquery”（“rule_wo_question”），然后发现它无法消耗“？”这使得这个规则失败。那么这条规则就不能转到替代规则(“rule_w_question”)。最后程序返回“[不正确]”

第二个版本我在“rule_wo_question”之前切换“rule_w_question”。这就是解析器返回“[正确]”结果的原因。

=================================================== =========== 我使用 boost 1.47 与 pthread 和 boost_filesystem 链接的整个代码这是我的 main .c

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>

using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}





int main()
{
    std::string str, result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                           | rule_w_question
                                                           );
    parse_to_string(str, whatever_rule, result);
    return 0;
}

结果是

[incorrect]/htmlquery?

[dead with]/htmlquery

原文

I am working on a http parser. It found a promblem when I try to parse using alternative operator. it is not about the values in attribute that I can fix them using hold[]. The problem occurs when there are two rules that are similar in the beginning of the
rule. here are some simple rules to demonstrate my problem;

qi::rule<string_iterator> some_rule(
        (char_('/') >> *char_("0-9")) /*first rule accept  /123..*/
      | (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
    );

Then I parse this rule using qi::parse it will fail if the input string likes;
"/abcd"

However when I switch the second rule before the first rule. The parser will return true
I think the problem is because when the parser consume the input with the first rule
and then it finds that the first rule is Fail. It wont return to the second rule which is
an alternative of the first rule.

I try to put hold[] to the first rule but it only helps for generating an attribute. It
doesn't fix this problem. I have no idea how to fix this problem since HTTP have a lot of
rules that they have the beginning of the rules are same as others.

===========more info about my code============================
here is my function for parsing a string

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using namespace rule;
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}

In main I have this code;

std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                        | rule_w_question
                                                       );
parse_to_string(str, whatever_rule, result);

I get this result;

[incorrect]/htmlquery?
[dead with]/htmlquery <= you can see it cannot consume '?'

however when I switch the rule like this; (I put "rule_w_question" before "rule_wo_question")

std::string result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
                                                            | rule_wo_question
                                                           );
    parse_to_string(str, whatever_rule, result);

The output will be;
[correct]/htmlquery?

The first verions (wrong one) seems like the parse consume '/htmlquery' ("rule_wo_question")and then it finds that it cannot consume '?' which make this rule fail.
Then this rule cannot go to an alternative rule ("rule_w_question") . Finally the program return "[incorrect]"

The second version I switch the "rule_w_question" before "rule_wo_question". This is the reason why the parser return "[correct]" as a result.

==============================================================
my whole code with boost 1.47 linked with pthread and boost_filesystem
here is my main .c

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>

using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}





int main()
{
    std::string str, result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                           | rule_w_question
                                                           );
    parse_to_string(str, whatever_rule, result);
    return 0;
}

the result is

[incorrect]/htmlquery?

[dead with]/htmlquery

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冰葑 2024-12-12 01:56:06

Spirit 按照指定的顺序尝试给定的替代方案，并在与第一个替代方案匹配后停止解析。不执行详尽的匹配。如果有一种选择匹配，它就会停止寻找。 IOW，替代方案的顺序很重要。您应该始终首先列出“最长”的替代方案。

回复收藏 0 原文

白馒头 2024-12-12 01:56:06

你还有什么理由不这样做呢？

some_rule(
     char_('/')
     >> (
         *char_("0-9")  /\*first rule accept /123..\*/
       | *char_("a-z") /\*second rule accept/abc..\*/
     )
);

编辑：实际上，这将匹配 / 后跟空（“0-9”0次），并且不会费心寻找“az”，更改 *< /code> 到 +。

Any reason why you don't do this instead?

some_rule(
     char_('/')
     >> (
         *char_("0-9")  /\*first rule accept /123..\*/
       | *char_("a-z") /\*second rule accept/abc..\*/
     )
);

Edit: Actually that would match / followed by empty ("0-9" 0 times) and won't bother looking for "a-z", change * to +.

回复收藏 0 原文

最美的太阳 2024-12-12 01:56:06

qi::rule<string_iterator> some_rule(
    (char_('/') >> *char_("0-9")) >> qi::eol /*first rule accept  /123..*/
  | (char_('/') >> *char_("a-z")) >> qi::eol /*second rule accept /abc..*/
);

您可以使用“,”或其他终止符来代替 eol。问题是 char_('/') >> *char_("0-9")) 匹配“/”后跟 0 个或多个数字。所以“/abcd”匹配“/”，然后停止解析。 K-ballo 的解决方案是我处理这种情况的方法，但此解决方案是作为替代方案提供的，以防万一（由于某种原因）他的解决方案不可接受。

qi::rule<string_iterator> some_rule(
    (char_('/') >> *char_("0-9")) >> qi::eol /*first rule accept  /123..*/
  | (char_('/') >> *char_("a-z")) >> qi::eol /*second rule accept /abc..*/
);

Instead of eol you could use ',' or some other terminator. The problem is that char_('/') >> *char_("0-9")) matches '/' followed by 0 or more numbers. So "/abcd" matches the "/" and then stops parseing. K-ballo's solution is the way I would do this case, but this solution is provided as an alternate in case (for some reason) his is not acceptable.

回复收藏 0 原文

暗恋未遂 2024-12-12 01:56:06

这是因为你的第一条规则有一个匹配，而精神是贪婪的。

(char_('/') >> *char_("0-9"))

将“/abcd”输入此规则将产生以下逻辑：

“/abcd”->下一个字符是“/”吗？是的。子规则匹配。 -> “abcd”仍然存在。
“abcd”->是否有 0 个或多个数字？是的。有 0 位数字。子规则匹配。 -> “abcd”仍然存在。
替代 ('|') 语句的第一个子句匹配；跳过剩余的替代条款。 -> “abcd”仍然存在。
规则与剩余的“abcd”匹配。这可能不会解析并导致您的失败。

您可以考虑将“*”（表示“0 或更多”）更改为“+”（表示“1 或更多”）。

It's because there is a match for your first rule, and Spirit is greedy.

(char_('/') >> *char_("0-9"))

Feeding "/abcd" into this rule will result in the following logic:

"/abcd" -> Is '/' the next character? Yes. Subrule matches. -> "abcd" remains.
"abcd" -> Are there 0 or more digits? Yes. There are 0 digits. Subrule matches. -> "abcd" remains.
First clause of alternative ('|') statement matches; skip remaining alternative clauses. -> "abcd" remains.
Rule matches with "abcd" remaining. Which probably then doesn't parse and causes your failure.

You might consider changing the '*', which means "0 or more", to a '+', which means "1 or more".

回复收藏 0 原文

~没有更多了~