更新解析器以允许带引号的字符串中包含括号
我需要更新解析器以接受这些新功能,但我无法一次管理所有这些新功能:
- 命令必须接受不确定数量的参数(> 0)。
- 参数可以是数字、不带引号的字符串或带引号的字符串。
- 参数之间用逗号分隔。
- 在引用的字符串中,应允许使用左/右括号。
(查看源代码示例更容易理解这些要求)
我当前的代码(包括检查)如下:
Godbolt 链接: https://godbolt.org/z/5d6o53n9h
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace script
{
struct Command
{
enum Type { NONE, WRITE_LOG, INSERT_LABEL, START_PROCESS, END_PROCESS, COMMENT, FAIL };
Type type{ Type::NONE };
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
}//namespace script
BOOST_FUSION_ADAPT_STRUCT(script::Command, type, args)
namespace script
{
namespace qi = boost::spirit::qi;
template <typename It>
class Parser : public qi::grammar<It, Commands()>
{
private:
qi::symbols<char, Command::Type> type;
qi::rule<It, Command(), qi::blank_type> none, command, comment, fail;//By its very nature "fail" must be the last one to be checked
qi::rule<It, Commands()> start;
public:
Parser() : Parser::base_type(start)
{
using namespace qi;//NOTE: "as_string" is neccessary in all args due to std::vector<std::string>
auto empty_args = copy(attr(std::vector<std::string>{}));
type.add
("WriteLog", Command::WRITE_LOG)
("InsertLabel", Command::INSERT_LABEL)
("StartProcess", Command::START_PROCESS)
("EndProcess", Command::END_PROCESS);
none = omit[*blank] >> &(eol | eoi)
>> attr(Command::NONE)
>> empty_args;//ignore args
command = type >> '('
>> as_string[lexeme[+~char_("(),\r\n")]] % ',' >> ')';
comment = lit("//")
>> attr(Command::COMMENT)
>> as_string[lexeme[*~char_("\r\n")]];
fail = omit[*~char_("\r\n")]
>> attr(Command::FAIL)
>> empty_args;//ignore args
start = skip(blank)[(none | command | comment | fail) % eol] >> eoi;
}
};
Commands parse(std::istream& in)
{
using It = boost::spirit::istream_iterator;
static const Parser<It> parser;
Commands commands;
It first(in >> std::noskipws), last;//No white space skipping
if (!qi::parse(first, last, parser, commands))
throw std::runtime_error("command parse error");
return commands;
}
}//namespace script
std::stringstream ss{
R"(// just a comment
WriteLog("this is a log")
WriteLog("this is also (in another way) a log")
WriteLog("but this is just a fail)
StartProcess(17, "program.exe", True)
StartProcess(17, "this_is_a_fail.exe, True)
)"};
int main()
{
using namespace script;
try
{
auto commands = script::parse(ss);
std::array args{ 0, 0, 1, 1, -1, 0, 3, -1, 0 };//Fails may have any number of arguments. It doesn't care. Sets as -1 by convenience flag
std::array types{ Command::COMMENT, Command::NONE, Command::WRITE_LOG, Command::WRITE_LOG, Command::FAIL, Command::NONE, Command::START_PROCESS, Command::FAIL, Command::NONE };
std::cout << std::boolalpha << "size correct? " << (commands.size() == 9) << std::endl;
std::cout << "types correct? " << std::equal(commands.begin(), commands.end(), types.begin(), types.end(), [](auto& cmd, auto& type) { return cmd.type == type; }) << std::endl;
std::cout << "arguments correct? " << std::equal(commands.begin(), commands.end(), args.begin(), args.end(), [](auto& cmd, auto arg) { return cmd.args.size() == arg || arg == -1; }) << std::endl;
}
catch (std::exception const& e)
{
std::cout << e.what() << "\n";
}
}
任何有关此问题的帮助将不胜感激。
I need to update a parser to admit these new features, but I am not able to manage all them at a time:
- The commands must admit an indeterminate number of parameters (> 0).
- Parameters might be numbers, unquoted strings or quoted strings.
- Parameters are separate by commas.
- Within quoted strings, it shall be permitted to use opening/closing parenthesis.
(It easier to understand these requirements looking at source code example)
My current code, including checks, is as follows:
Godbolt link: https://godbolt.org/z/5d6o53n9h
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace script
{
struct Command
{
enum Type { NONE, WRITE_LOG, INSERT_LABEL, START_PROCESS, END_PROCESS, COMMENT, FAIL };
Type type{ Type::NONE };
std::vector<std::string> args;
};
using Commands = std::vector<Command>;
}//namespace script
BOOST_FUSION_ADAPT_STRUCT(script::Command, type, args)
namespace script
{
namespace qi = boost::spirit::qi;
template <typename It>
class Parser : public qi::grammar<It, Commands()>
{
private:
qi::symbols<char, Command::Type> type;
qi::rule<It, Command(), qi::blank_type> none, command, comment, fail;//By its very nature "fail" must be the last one to be checked
qi::rule<It, Commands()> start;
public:
Parser() : Parser::base_type(start)
{
using namespace qi;//NOTE: "as_string" is neccessary in all args due to std::vector<std::string>
auto empty_args = copy(attr(std::vector<std::string>{}));
type.add
("WriteLog", Command::WRITE_LOG)
("InsertLabel", Command::INSERT_LABEL)
("StartProcess", Command::START_PROCESS)
("EndProcess", Command::END_PROCESS);
none = omit[*blank] >> &(eol | eoi)
>> attr(Command::NONE)
>> empty_args;//ignore args
command = type >> '('
>> as_string[lexeme[+~char_("(),\r\n")]] % ',' >> ')';
comment = lit("//")
>> attr(Command::COMMENT)
>> as_string[lexeme[*~char_("\r\n")]];
fail = omit[*~char_("\r\n")]
>> attr(Command::FAIL)
>> empty_args;//ignore args
start = skip(blank)[(none | command | comment | fail) % eol] >> eoi;
}
};
Commands parse(std::istream& in)
{
using It = boost::spirit::istream_iterator;
static const Parser<It> parser;
Commands commands;
It first(in >> std::noskipws), last;//No white space skipping
if (!qi::parse(first, last, parser, commands))
throw std::runtime_error("command parse error");
return commands;
}
}//namespace script
std::stringstream ss{
R"(// just a comment
WriteLog("this is a log")
WriteLog("this is also (in another way) a log")
WriteLog("but this is just a fail)
StartProcess(17, "program.exe", True)
StartProcess(17, "this_is_a_fail.exe, True)
)"};
int main()
{
using namespace script;
try
{
auto commands = script::parse(ss);
std::array args{ 0, 0, 1, 1, -1, 0, 3, -1, 0 };//Fails may have any number of arguments. It doesn't care. Sets as -1 by convenience flag
std::array types{ Command::COMMENT, Command::NONE, Command::WRITE_LOG, Command::WRITE_LOG, Command::FAIL, Command::NONE, Command::START_PROCESS, Command::FAIL, Command::NONE };
std::cout << std::boolalpha << "size correct? " << (commands.size() == 9) << std::endl;
std::cout << "types correct? " << std::equal(commands.begin(), commands.end(), types.begin(), types.end(), [](auto& cmd, auto& type) { return cmd.type == type; }) << std::endl;
std::cout << "arguments correct? " << std::equal(commands.begin(), commands.end(), args.begin(), args.end(), [](auto& cmd, auto arg) { return cmd.args.size() == arg || arg == -1; }) << std::endl;
}
catch (std::exception const& e)
{
std::cout << e.what() << "\n";
}
}
Any help with this will be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您说您希望在带引号的字符串中允许使用括号。但你甚至不支持带引号的字符串!
所以问题是你的论证规则。这甚至不存在。它大概是这部分:
其中
argument
可能被声明为事实上,以有组织的方式重写测试,这就是我们现在得到的:
Live On Compiler Explorer
打印
如您所见,在我的预期中,它也无法引用带引号的字符串。这是因为引用是一种语言结构。在 AST(解析结果)中,您不关心它是如何用代码编写的。例如,
“hello\ world\041”
也可能等效于“hello world!”
,因此两者都应产生参数值hello world!
。因此,让我们按照我们所说的去做:
我们可以添加一些规则:
并定义它们:
现在,我想说您可能希望
Argument
类似于variant
code>,而不仅仅是std::string
,仅通过此更改,所有问题都实际上消失了:Live On Compiler Explorer:
现在,索引#7看起来非常时髦,但这实际上是 Spirit 中的一个众所周知的现象。 BOOST_SPIRIT_DEBUG 演示了这一点:
因此,该字符串被接受为原始字符串,即使它以
"
开头。这很容易修复,但我们甚至不需要。我们可以直接应用qi::hold
以避免重复:结果:
但是,如果您预计它会失败,请修复其他问题:
摘要/清单
这就是我们最终得到的结果:
Live On Compiler Explorer
打印
1 请参见例如 boost::spirit 替代解析器返回重复项(链接到另外三个同类)
You say you want to allow parentheses within quoted strings. But you don't even support quoted strings!
So the problem is your argument rule. Which doesn't even exist. It whould be roughly this part:
Where
argument
might be declared asIn fact, rewriting the tests in an organized fashion, here's what we get right now:
Live On Compiler Explorer
Prints
As you can see, it fails quoted strings too, in my expectation. That's because the quoting is a language construct. In the AST (parsed results) you donot care about how exactly it was written in code. E.g.
"hello\ world\041"
might be equivalent too"hello world!"
so both should result in the argument valuehello world!
.So, let's do as we say:
We can add a few rules:
And define them:
Now, I'd say you probably want
Argument
to be something likevariant<double, std::string, bool>
, instead of juststd::string
.With only this change, all the problems have practically vanished: Live On Compiler Explorer:
Now, index #7 looks very funky, but it's actually a well-known phenomenon in Spirit¹. Enabling BOOST_SPIRIT_DEBUG demonstrates it:
So, the string gets accepted as a raw string, even though it started with
"
. That's easily fixed, but we don't even need to. We could just applyqi::hold
to avoid the duplication:Result:
However, if you expect it to fail, fix that other problem:
Summary / Listing
This is what we end up with:
Live On Compiler Explorer
Prints
¹ see e.g. boost::spirit alternative parsers return duplicates (which links to three more of the same kind)