Boost::Spirit::Qi 自动规则——避免重复复制 AST 数据结构
我一直在使用 Qi 和 Karma 对几种小语言进行一些处理。大多数语法都非常小(20-40 条规则)。我几乎能够完全使用自动规则,因此我的解析树完全由变体、结构和 std::vector 组成。
此设置非常适合常见情况:
1)解析某些东西(Qi),
2)对解析树(访问者)进行较小的操作,并且
3)输出一些东西(Karma)。
但是,我担心如果我想对语法树进行复杂的结构更改(例如移动大子树)会发生什么。考虑以下玩具示例:
使用自动规则的 s-expr 样式逻辑表达式的语法......
// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por %= lit("(or ") >> (pexpr % lit(" ")) >> ")";
pnot %= lit("(not ") >> pexpr >> ")";
这导致解析树表示如下所示...
struct var {
std::string name;
};
struct bconst {
bool val;
};
struct pand;
struct por;
struct pnot;
typedef boost::variant<bconst,
var,
boost::recursive_wrapper<pand>,
boost::recursive_wrapper<por>,
boost::recursive_wrapper<pnot> > pexpr;
struct pand {
std::vector<pexpr> operands;
};
struct por {
std::vector<pexpr> operands;
};
struct pnot {
pexpr victim;
};
// Many Fusion Macros here
假设我有一个如下所示的解析树:
pand
/ ... \
por por
/ \ / \
var var var var
(省略号表示“pand
还有更多形状相似的子节点。”)
现在,假设我要对每个 por
节点求反,因此最终结果为:
pand
/ ... \
pnot pnot
| |
por por
/ \ / \
var var var var
对于每个 por
子树,直接的方法是:
- 创建pnot
节点
(在构造中复制 por
);
- 在pand
节点中重新分配适当的向量槽
(复制 pnot
节点及其 por
子树)。
或者,我可以构建一个单独的向量,然后批量替换(交换)pand
向量,从而消除第二轮复制。
与基于指针的树表示相比,所有这些看起来都很麻烦,基于指针的树表示允许插入 pnot 节点,而无需复制现有节点。我的问题:
有没有办法避免使用符合自动规则的数据结构进行大量复制的树操作?我是否应该硬着头皮使用非自动规则来构建基于指针的 AST(例如 http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?
I've been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I've been able to use autorules almost exclusively, so my parse trees consist entirely of variants, structs, and std::vectors.
This setup works great for the common case:
1) parse something (Qi),
2) make minor manipulations to the parse tree (visitor), and
3) output something (Karma).
However, I'm concerned about what will happen if I want to make complex structural changes to a syntax tree, like moving big subtrees around. Consider the following toy example:
A grammar for s-expr-style logical expressions that uses autorules...
// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por %= lit("(or ") >> (pexpr % lit(" ")) >> ")";
pnot %= lit("(not ") >> pexpr >> ")";
... which leads to parse tree representation that looks like this...
struct var {
std::string name;
};
struct bconst {
bool val;
};
struct pand;
struct por;
struct pnot;
typedef boost::variant<bconst,
var,
boost::recursive_wrapper<pand>,
boost::recursive_wrapper<por>,
boost::recursive_wrapper<pnot> > pexpr;
struct pand {
std::vector<pexpr> operands;
};
struct por {
std::vector<pexpr> operands;
};
struct pnot {
pexpr victim;
};
// Many Fusion Macros here
Suppose I have a parse tree that looks something like this:
pand
/ ... \
por por
/ \ / \
var var var var
(The ellipsis means 'many more children of similar shape for pand
.')
Now, suppose that I want negate each of the por
nodes, so that the end result is:
pand
/ ... \
pnot pnot
| |
por por
/ \ / \
var var var var
The direct approach would be, for each por
subtree:
- create pnot
node
(copies por
in construction);
- re-assign the appropriate vector slot in the pand
node
(copies pnot
node and its por
subtree).
Alternatively, I could construct a separate vector, and then replace (swap) the pand
vector wholesale, eliminating a second round of copying.
All of this seems cumbersome compared to a pointer-based tree representation, which would allow for the pnot
nodes to be inserted without any copying of existing nodes. My question:
Is there a way to avoid copy-heavy tree manipulations with autorule-compliant data structures? Should I bite the bullet and just use non-autorules to build a pointer-based AST (e.g., http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
复制子树不应该像您想象的那么昂贵,因为 recursive_variant 本质上是 share_ptr 的包装器。我相信,这不仅是最好的,也是最简单的解决方案。
Copying the subtrees shouldn't be as expensive as you assume as the recursive_variant is essentially a wrapper around a shared_ptr. I believe, it's not only the best, but also the easiest solution.