YACC|BISON:如何操作解析树?
我的应用程序的目标是验证 SQL 代码,同时从该代码生成经过一些修改的格式化代码。
例如这个 where 子句:
其中
e.student_name= c.contact_name and ( c.address = " nefta" 或 c.address=" tozeur ") 和
e.age <18
我们将得到类似这样的格式化输出:
其中 e.student_name= c.contact_name
and (c.address=trim("nefta")
或 c.address=trim("tozeur") )
和 e.age <18
我希望我已经很好地解释了我的目标
问题是语法可能包含递归规则,这使得重写任务成为可能不可靠;例如,在我的 sql 语法中,我有这样的:
search_condition : search_condition OR search_condition{clbck_or} | search_condition AND search_condition{clbck_and} | NOT search_condition {clbck_not} | '(' search_condition ')'{clbck__} | predicate {clbck_pre} ;
知道我指定了优先级来解决移位减少问题
%左或
%左和
%左不是
那么回到上一个例子;我的子句将这样消耗:
c.address="nefta"或 c.address="tozeur" ->搜索条件
(c.address="nefta"或 c.address="tozeur")->search_condition
e.student_name= c.contact_name and (c.address="nefta"or c.address="tozeur")->搜索条件
...以及e.age<18->搜索条件
顺便说一句,您可以理解,由于顺序不同,因此很难通过每次归约触发的回调来重建输入流。
对于这个问题有什么帮助吗?
The goal of my application is to validate an sql code and generate,in the mean time, from that code a formatted one with some modification.
For example this where clause :
where
e.student_name= c.contact_name and ( c.address = " nefta"
or c.address=" tozeur ") and
e.age <18
we will have as formatted output something like that :
where e.student_name= c.contact_name
and (c.address=trim("nefta")
or c.address=trim("tozeur") )
and e.age <18
I hope I've explained my aim well
The problem is grammars may contain recursive rules which make the rewrite task unreliable ; for instance in my sql grammar i have this :
search_condition : search_condition OR search_condition{clbck_or} | search_condition AND search_condition{clbck_and} | NOT search_condition {clbck_not} | '(' search_condition ')'{clbck__} | predicate {clbck_pre} ;
Knowing that I specified a precedence priority to solve shift reduce problems
%left OR
%left AND
%left NOT
So back on the last example ; my clause where will be consumed this way:
c.address="nefta"or c.address="tozeur" -> search_condition
(c.address="nefta"or c.address="tozeur")->search_condition
e.student_name= c.contact_name and (c.address="nefta"or c.address="tozeur")-> search_condition
... and e.age<18-> search_condition
You can by the way understand that it's tough to rebuild the input stream referring to callbacks triggered by each reduction cause the order is not the same.
Any help for this problem ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你的问题有点模糊,所以我猜你实际上在你的 clbck_or () 等中打印。 Wildplasser 提到的“常见”方式是使用“语义值”,即(未经测试):
如果您使用 Bison,手册中的“中缀表示法计算器:`calc'”部分有一个很好的示例。对于字符串和 C,您将必须添加一些内存处理。
Your question is a bit vague, so I'm guessing that you actually print in your clbck_or (), etc. The "common" way to which wildplasser has alluded is to use "semantic values", i. e. (untested):
If you're using Bison, the manual has a fine example in the section "Infix Notation Calculator: `calc'". With strings and C, you will have to add some memory handling.
Bison 擅长解析,并且在一些手动帮助下,擅长构建自定义语法树。之后,你就可以对这棵树做你想做的事情了。好消息是您可以做任何您想做的事情。坏消息是你仍然需要建造大量机器来完成你想做的事情。重新生成源代码的基本问题称为“漂亮打印”;请参阅我的关于如何漂亮打印的答案来了解如何做到这一点,包括词法语法的所有小错误(你不要丢失文字字符串中的转义符,对吧?)。您根本没有解决如何在树中找到您想要更改的构造,或者如何粉碎树来更改它。
如果您不想做所有这些,那么您真正想要的是一个 程序转换系统,它擅长解析,为你构建语法树(所以你不必考虑它,SQL是相当大的语法),会让你在SQL方面找到树中的模式你使用的语法到,在不了解树的形状的情况下进行树更改,并且最终可以通过 beautifulprint 重新生成有效的源文本,正如我在上面的答案链接中所描述的那样。 (程序转换系统本质上包括作为子例程的解析器)。
我们的DMS软件再工程工具包就是这样一个程序转换系统。它具有一组预定义的语言定义,包括 SQL2011 以及用于配置特定方言的方法。
使用 DMS 源到源语法规则,您可以使用以下规则在示例中执行更改:
这是 DMS 规则语言(元)语法,用于描述对(“域”)SQL 代码的重写。
该规则有一个名称(因为在复杂的应用程序中有很多规则)并且它
作为句法占位符“f”和“s”;它仅重写代码中的条件。
引号是 RSL 元引号;里面的内容是带有 RSL 元变量“\f”的 SQL
和“\s”;外面的东西是 RSL 规则语法。规则说的是,
“对于显式命名为“c”的变量的任何条件,以及任何字段 f,
如果该字段与某个文字字符串进行相等比较,则替换
将“trim”的文字字符串应用于文字字符串”。
我省略了一些代码,基本上是说,“将此规则应用于整个树,并且不要在同一位置应用两次”。该“策略”是DMS 中内置的许多
规则之一是通过 DMS 将 SQL 解析器应用于元引用字符串来完成的,以生成带有元变量写入位置的“模式”语法树。然后,手边模式树与目标树进行匹配,占位符引用子树;右手树被拼接到左树匹配的位置,并且占位符子树被转移,因此,程序员看到了您熟悉和喜欢的表面语法;该工具适用于树木,因此它不会被文本混淆。
现在,我认为我的规则与您的意图并不完全匹配,但这部分是因为我无法猜测您的实际意图,如果不是的话,您可以编写其他规则。不是你什么 ;
这个规则纯粹是由语法驱动的 如果您希望将更复杂的条件应用于规则(例如,变量必须仅在您定义的某些范围内),则可以添加语义谓词(未显示),这会变得更混乱。但它比爬过 AST 的 C 代码要简单得多,也更容易阅读(注意你在这里没有看到 AST?)并试图弄清楚这一切。
解析和漂亮打印发生在规则应用之前和之后;实现所有这些需要很多机器,但是这些机器内置于 DMS 中(例如,它内置了比 Bison [但更强大] 的东西),并且对于 SQL 等预定义域,所有漂亮的打印工作都具有也已预先配置。
如果您想更好地了解使用 DMS 进行完整周期需要什么(定义您自己的语言解析器、定义漂亮的打印机、定义复杂的规则),这里有一个很好且完整的 使用 DMS 定义和符号简化微积分的示例。
Bison is good at parsing, and with some manual help, good at building a custom syntax tree. After that, its up to you to do what you want with the tree. The good news is you can do whatever you want. The bad news is you still have to build a lot of machinery to do what you want. Your basic problem of regenerating source code is called "prettyprinting"; see my SO answer on how to prettyprint to understand what it takes to do this, including all the peccadillos of lexical syntax (you don't to lose the escapes in your literal strings, right?). You didn't at all address how to find the construct you wanted to change in the tree, or how you'd smash the tree to change it.
If you don't want to do all of that, then what you really want is a program transformation system, which is good at parsing, building a syntax tree for you (so you don't have to think about it, SQL is pretty big grammar), will let you find patterns in the tree in terms of SQL syntax you are used to, make tree changes without knowing much about the shape of the tree, and can finally regenerate valid source text by prettyprint as I describe in my answer link above. (A program transformation systems essentially includes a parser as a subroutine).
Our DMS Software Reengineering Toolkit is such a program transformation system. It has a set of predefined language definitions including SQL2011 and means for configuring for a particular dialect.
Using DMS source-to-source syntax rules, you could carry out the change in your example with the following rule:
This is DMS Rule language (meta) syntax to describe a rewrite on ("domain") SQL code.
The rule has a name (because in complex application there's lot of rules) and it
as syntactic place holders "f" and "s"; it rewrites only conditions in the code.
The quotes are RSL meta-quotes; stuff inside is SQL with RSL metavariables "\f"
and "\s"; stuff outside is RSL rule syntax. What the rule says is,
"for any condition on a variable explicitly named 'c', with any field f,
if that field is compared by equality to some literal string, then replace
the literal string by 'trim' applied to the literal string".
I left out some code that basically says, "apply this rule to the entire tree, and don't apply it twice in the same place". That "strategy" is one of many built into DMS.
There's the question of how does the rule work. that is accomplished by DMS applying the SQL parser to the meta-quoted strings, to produce "pattern" syntax trees with placeholders where the metavariables are written. The left hand side pattern tree is then matched against the target tree with placeholder referring to subtrees; the right hand tree is spliced in where the left tree matched, and the placeholder subtrees transferred. So, you the programmer see surface sytax that you know and love; the tool works with trees and so it isn't confused by text.
Now, I don't think my rule matches exactly your intent, but that's partly because I can't guess your actual intent. You can write other rules if this isn't what you wanted.
This rule is purely driven by syntax; one can add a semantic predicate (not shown) if you want more complicated conditions to apply to the rule (e.g, the variable has to be ones only in certain scopes you define), and that gets messier to say. But it is much simpler and far easier to read than C code that climbs over the AST (notice you didn't see the AST here?) and tries to figure all this out.
The parsing and prettyprinting happens before and after rule application; there's a lot of machinery required to implement all that, but that machinery is built into DMS (e.g., it has something like [but more powerful] than Bison built in), and for predefined domains such as SQL, all the pretty printing works has been preconfigured, too.
If you want to get a better sense of what it takes to go full cycle with DMS (define your own language parser, define a pretty printer, define complicated rules), here's a nice and complete example of defining and symbolically simplifying calculus using DMS.