Boost::Spirit::Qi 自动规则——避免重复复制 AST 数据结构

发布于 2024-10-11 07:45:50 字数 2168 浏览 17 评论 0原文

我一直在使用 Qi 和 Karma 对几种小语言进行一些处理。大多数语法都非常小（20-40 条规则）。我几乎能够完全使用自动规则，因此我的解析树完全由变体、结构和 std::vector 组成。

此设置非常适合常见情况：
1）解析某些东西（Qi），
2）对解析树（访问者）进行较小的操作，并且
3）输出一些东西（Karma）。

但是，我担心如果我想对语法树进行复杂的结构更改（例如移动大子树）会发生什么。考虑以下玩具示例：

使用自动规则的 s-expr 样式逻辑表达式的语法......

// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand  %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por   %= lit("(or ") >>  (pexpr % lit(" ")) >> ")";
pnot  %= lit("(not ") >> pexpr >> ")";

这导致解析树表示如下所示...

struct var {
   std::string name;
};
struct bconst {
   bool val;
};

struct pand;
struct por;
struct pnot;                               

typedef boost::variant<bconst,
                       var,
                       boost::recursive_wrapper<pand>,
                       boost::recursive_wrapper<por>,
                       boost::recursive_wrapper<pnot> > pexpr;
struct pand {
   std::vector<pexpr> operands;                    
};

struct por {
   std::vector<pexpr> operands;                    
};

struct pnot {
   pexpr victim;
};

// Many Fusion Macros here

假设我有一个如下所示的解析树：

       pand
      / ... \
   por      por
   / \      / \
 var var   var var

（省略号表示“pand 还有更多形状相似的子节点。”）

现在，假设我要对每个 por 节点求反，因此最终结果为：

       pand
      / ... \
   pnot     pnot
    |        |
   por      por
   / \      / \
 var var   var var

对于每个 por 子树，直接的方法是：
- 创建pnot节点
（在构造中复制 por）；
- 在pand节点中重新分配适当的向量槽
（复制 pnot 节点及其 por 子树）。

或者，我可以构建一个单独的向量，然后批量替换（交换）pand 向量，从而消除第二轮复制。

与基于指针的树表示相比，所有这些看起来都很麻烦，基于指针的树表示允许插入 pnot 节点，而无需复制现有节点。我的问题：

有没有办法避免使用符合自动规则的数据结构进行大量复制的树操作？我是否应该硬着头皮使用非自动规则来构建基于指针的 AST（例如 http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)？

原文

I've been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I've been able to use autorules almost exclusively, so my parse trees consist entirely of variants, structs, and std::vectors.

This setup works great for the common case:
1) parse something (Qi),
2) make minor manipulations to the parse tree (visitor), and
3) output something (Karma).

However, I'm concerned about what will happen if I want to make complex structural changes to a syntax tree, like moving big subtrees around. Consider the following toy example:

A grammar for s-expr-style logical expressions that uses autorules...

// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand  %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por   %= lit("(or ") >>  (pexpr % lit(" ")) >> ")";
pnot  %= lit("(not ") >> pexpr >> ")";

... which leads to parse tree representation that looks like this...

struct var {
   std::string name;
};
struct bconst {
   bool val;
};

struct pand;
struct por;
struct pnot;                               

typedef boost::variant<bconst,
                       var,
                       boost::recursive_wrapper<pand>,
                       boost::recursive_wrapper<por>,
                       boost::recursive_wrapper<pnot> > pexpr;
struct pand {
   std::vector<pexpr> operands;                    
};

struct por {
   std::vector<pexpr> operands;                    
};

struct pnot {
   pexpr victim;
};

// Many Fusion Macros here

Suppose I have a parse tree that looks something like this:

       pand
      / ... \
   por      por
   / \      / \
 var var   var var

(The ellipsis means 'many more children of similar shape for pand.')

Now, suppose that I want negate each of the por nodes, so that the end result is:

       pand
      / ... \
   pnot     pnot
    |        |
   por      por
   / \      / \
 var var   var var

The direct approach would be, for each por subtree:
- create pnot node
(copies por in construction);
- re-assign the appropriate vector slot in the pand node
(copies pnot node and its por subtree).

Alternatively, I could construct a separate vector, and then replace (swap) the pand vector wholesale, eliminating a second round of copying.

All of this seems cumbersome compared to a pointer-based tree representation, which would allow for the pnot nodes to be inserted without any copying of existing nodes. My question:

Is there a way to avoid copy-heavy tree manipulations with autorule-compliant data structures? Should I bite the bullet and just use non-autorules to build a pointer-based AST (e.g., http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?

分享到QQ

分享到微博