使Boost :: Spirit ::符号解析器非贪婪
我想制作一个与IE int
匹配的关键字解析器,但与 Integer
中的 int
与 eger> eger int
>剩下。我使用 X3 ::符号
自动获取以枚举值表示的解析关键字。
最小示例:
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>
namespace x3 = boost::spirit::x3;
enum class TypeKeyword { Int, Float, Bool };
struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
TypeKeywordSymbolTable()
{
add("float", TypeKeyword::Float)
("int", TypeKeyword::Int)
("bool", TypeKeyword::Bool);
}
};
const TypeKeywordSymbolTable type_keyword_symbol_table;
struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;
const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);
using Iterator = std::string_view::const_iterator;
/* Thrown when the parser has failed to parse the whole input stream. Contains
* the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
public:
LeftoverError(Iterator begin, Iterator end)
: std::runtime_error(std::string(begin, end))
{}
std::string_view get_leftover_data() const noexcept { return what(); }
};
template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
Iterator begin = input.begin();
Iterator end = input.end();
using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
typename Rule::attribute_type result;
try {
bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
if (r && begin == end) {
return result;
} else { // Occurs when the whole input stream has not been consumed.
throw LeftoverError(begin, end);
}
} catch (const ExpectationFailure& exc) {
throw LeftoverError(exc.where(), end);
}
}
int main()
{
// TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
// "boolean" leftover is desired.
parse("boolean", type_keyword);
// TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
// "integer" leftover is desired.
parse("integer", type_keyword);
// TypeKeyword::Int is parsed successfully and this is the desired behavior.
parse("int", type_keyword);
}
基本上,我想要 Integer
不要被识别为关键字,而剩下的 eger
剩下的。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我将测试用例变成了自我描述的期望:
live on Compiler Explorer
印刷:
现在,最简单,最幼稚的方法是确保您通过简单地更改
为
eoi
,而实际上是测试通过:让我们想象一个更涉及的语法,其中
键入标识符;
将被解析:我将留下编译器资源管理器:
看起来不错。但是,如果我们添加一些有趣的测试:
它打印( live ),
所以缺乏测试用例。您的散文描述实际上更接近:
。幼稚的尝试可能会检查没有标识符字符遵循类型关键字:
其中
Identchar
是从didentifier
类似的情况下计入的:但是,这是行不通的。您能明白为什么(允许窥视: httpps://godbolt.org.org/z/jb4zfhfwb )?
现在,我们最新的Devious测试用例已通过(是的),但是
int J;
现在被拒绝!如果您考虑一下,这只会有意义,因为您已经跳过了。我刚才使用的基本词是 lexeme :您想将某些单位视为lexemes(而Whitespace停止了Lexeme。或者更确切地说,Whitespace并未在lexeme内部自动跳过)。因此,修复程序将是:
lo( live ):
总结
这个主题是一个经常经常出现的话题,它首先需要对船长,Lexemes有深入的了解。这是其他一些灵感的文章:
放解码标识符,关键字除外
Boost Spirit x3:分析划分的字符串我介绍一个更普遍的帮助者,您可能会发现有用:
>
>
>
在x3
!
完整列表
反竞争,最终列表:
¹请参阅
I morphed the test cases into self-describing expectations:
Live On Compiler Explorer
Prints:
Now, the simplest, naive approach would be to make sure you parse till the
eoi
, by simply changingTo
And indeed the tests pass: Live
However, this fits the tests, but not the goal. Let's imagine a more involved grammar, where
type identifier;
is to be parsed:I'll leave the details for Compiler Explorer:
Looks good. But what if we add some interesting tests:
It prints (Live)
So, the test cases were lacking. Your prose description is actually closer:
That correctly implies you want to check the lexeme inside the
type_keyword
rule. A naive try might be checking that no identifier character follows the type keyword:Where
identchar
was factored out ofidentifier
like so:However, this doesn't work. Can you see why (peeking allowed: https://godbolt.org/z/jb4zfhfWb)?
Our latest devious test case now passes (yay), but
int j;
is now rejected! If you think about it, it only makes sense, because you have spaced skipped.The essential word I used a moment ago was lexeme: you want to treat some units as lexemes (and whitespace stops the lexeme. Or rather, whitespace isn't automatically skipped inside the lexeme¹). So, a fix would be:
Lo and behold (Live):
Summarizing
This topic is a frequently recurring one, and it requires a solid understanding of skippers, lexemes first and foremost. Here are some other posts for inspiration:
Stop X3 symbols from matching substrings
parsing identifiers except keywords
Boost Spirit x3: parse delimited string Where I introduce a more general helper you might find useful:
Stop X3 symbols from matching substrings
Dynamically switching symbol tables in x3
Good luck!
Complete Listing
Anti-Bitrot, the final listing:
¹ see Boost spirit skipper issues