如何使用Antlr实现函数调用,以便在定义之前就可以调用它?

发布于 2024-09-30 16:33:37 字数 287 浏览 4 评论 0原文

一旦构建了 AST,实现树遍历器以便可以按任意顺序定义和调用函数的最佳方法是什么?

例如,这在 PHP 中是有效的:

<?php
f(); // function called before it’s defined
function f() {
  print 3;
}
?>

我猜想一定有第二遍,或者树转换,但我找不到关于这个主题的任何有趣的东西。这个问题可能不是 Antlr 特有的问题,但如果你能给我一个 Antlr 示例来说明如何完成此操作,那就更好了!

Once the AST is built, what is the best way implement the tree walker so that functions can be defined and called in whatever order?

For example, this is valid in PHP:

<?php
f(); // function called before it’s defined
function f() {
  print 3;
}
?>

I’m guessing that somehow there must be a second pass, or a tree transformation, but I can’t find anything interesting on this subject. The problem is probably not an Antlr-specific one, but if you could point me to an Antlr example of how this is done, even better!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

人间不值得 2024-10-07 16:33:37

是的,你是对的:这是通过多次 AST 完成的。

首先创建一个语法来构建源的 AST,然后创建一个树语法,用于迭代树并发现所有定义的函数。然后,您可以使用另一个树语法来评估脚本,该语法从先前的树语法中获取发现的函数。

一个演示。

获取源代码:

<?php
f(); // function called before it’s defined
function f() {
  g();
}
function g() {}
?>

将其解析为以下 AST:

alt text

使用(组合)语法

grammar PHPMin;

options { 
  output=AST; 
}

tokens {
  SCRIPT; F_CALL; F_DECL; F_BODY;
}

parse
  :  script EOF -> script
  ;

script
  :  '<?php' atom* '?>' -> ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  Identifier '(' ')' ';' -> ^(F_CALL Identifier)
  ;

functionDecl
  :  'function' Identifier '(' ')' '{' functionBody '}' -> ^(F_DECL Identifier functionBody)
  ;

functionBody
  :  functionCall* -> ^(F_BODY functionCall*)
  ;

Identifier  : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
LineComment : '//' ~('\r' | '\n')* ('\r'? '\n' | EOF){skip();} ;
Space       : (' ' | '\t' | '\r' | '\n'){skip();} ;

:然后发现声明的函数使用从以下树语法生成的“tree-walker”:

tree grammar PHPMinFunctionWalker;

options {
    tokenVocab=PHPMin;
    ASTLabelType=CommonTree;
}

@members {
    java.util.Set<String> declared = new java.util.HashSet<String>();
}

discover
  :  script
  ;

script
  :  ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  ^(F_CALL Identifier)
  ;

functionDecl
  :  ^(F_DECL Identifier functionBody) {declared.add($Identifier.text);}
  ;

functionBody
  :  ^(F_BODY functionCall*)
  ;

要测试所有内容,请创建词法分析器和解析器 (A),生成“tree-walker”(B),编译所有源文件 (C):

// A
java -cp antlr-3.2.jar org.antlr.Tool PHPMin.g

// B 
java -cp antlr-3.2.jar org.antlr.Tool PHPMinFunctionWalker.g

// C
javac -cp antlr-3.2.jar *.java

// D     
java -cp .:antlr-3.2.jar Main    // *nix 
java -cp .;antlr-3.2.jar Main    // Windows

并运行以下主类 (D):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

    public static void main(String[] args) throws Exception {

        String source = "<?php                                          \n" + 
                        "f(); // function called before it’s defined    \n" + 
                        "function f() {                                 \n" + 
                        "  g();                                         \n" + 
                        "}                                              \n" + 
                        "function g() {}                                \n" + 
                        "?>                                             \n";

        // create a lexer and parser for the source
        ANTLRStringStream in = new ANTLRStringStream(source);
        PHPMinLexer lexer = new PHPMinLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PHPMinParser parser = new PHPMinParser(tokens);
        PHPMinParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();

        // create a tree walker to discover all declared functions
        CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
        functions.discover();
        System.out.println("Declared functions: "+functions.declared);
    }
}

产生以下输出:

Declared functions: [f, g]

当然,这只是如何处理它的示例,而不是如何最好地完成它的示例。我可以想象(当使用 Java 解释脚本时),您不会将声明的函数存储为 Set 中的简单字符串,而是存储为 Map; 轻松获取函数的根并在调用时对其求值。

进一步阅读: http://www.antlr.org/wiki/ display/ANTLR3/Simple+tree-based+interpeter

祝你好运!

编辑

然后,秒数传递可以使用前面的树遍历器检查是否所有函数都在其之前定义:

tree grammar PHPMinValidateWalker;

options {
    tokenVocab=PHPMin;
    ASTLabelType=CommonTree;
}

@members {
    java.util.Set<String> declared = new java.util.HashSet<String>();
}

validate
  :  script
  ;

script
  :  ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  ^(F_CALL Identifier) 
     {
       if(!declared.contains($Identifier.text)) {
         throw new RuntimeException("no such function: " +  $Identifier.text);
       }
     }
  ;

functionDecl
  :  ^(F_DECL Identifier functionBody)
  ;

functionBody
  :  ^(F_BODY functionCall*)
  ;

使用测试:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

    public static void main(String[] args) throws Exception {

        String source = "<?php                                          \n" + 
                        "f(); // function called before it’s defined    \n" + 
                        "function f() {                                 \n" + 
                        "  g();                                         \n" + 
                        "  x();                                         \n" + 
                        "}                                              \n" + 
                        "function g() {}                                \n" + 
                        "?>                                             \n";

        // create a lexer and parser for the source
        ANTLRStringStream in = new ANTLRStringStream(source);
        PHPMinLexer lexer = new PHPMinLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PHPMinParser parser = new PHPMinParser(tokens);
        PHPMinParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();

        // create a tree walker to discover all declared functions
        CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
        functions.discover();
        System.out.println("Declared functions: "+functions.declared);

        // PHPMinValidateWalker
        nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinValidateWalker validator = new PHPMinValidateWalker(nodes);
        validator.declared = functions.declared;
        validator.validate();
    }
}

产生异常,因为 x() 未定义任何地方。从源中删除它将导致树遍历器不会产生异常。

Yes, you are right: this is done in more than one pass over the AST.

You first create a grammar that builds a AST of the source, then you create a tree grammar that is used to iterate over the tree and discovers all defined function. You could then evaluate the script using another tree grammar that takes the discovered functions from the previous tree grammar.

A demo.

Take the source:

<?php
f(); // function called before it’s defined
function f() {
  g();
}
function g() {}
?>

which is parsed into the following AST:

alt text

using the (combined) grammar:

grammar PHPMin;

options { 
  output=AST; 
}

tokens {
  SCRIPT; F_CALL; F_DECL; F_BODY;
}

parse
  :  script EOF -> script
  ;

script
  :  '<?php' atom* '?>' -> ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  Identifier '(' ')' ';' -> ^(F_CALL Identifier)
  ;

functionDecl
  :  'function' Identifier '(' ')' '{' functionBody '}' -> ^(F_DECL Identifier functionBody)
  ;

functionBody
  :  functionCall* -> ^(F_BODY functionCall*)
  ;

Identifier  : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
LineComment : '//' ~('\r' | '\n')* ('\r'? '\n' | EOF){skip();} ;
Space       : (' ' | '\t' | '\r' | '\n'){skip();} ;

Then discover the declared functions using a "tree-walker" generated from the following tree grammar:

tree grammar PHPMinFunctionWalker;

options {
    tokenVocab=PHPMin;
    ASTLabelType=CommonTree;
}

@members {
    java.util.Set<String> declared = new java.util.HashSet<String>();
}

discover
  :  script
  ;

script
  :  ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  ^(F_CALL Identifier)
  ;

functionDecl
  :  ^(F_DECL Identifier functionBody) {declared.add($Identifier.text);}
  ;

functionBody
  :  ^(F_BODY functionCall*)
  ;

To test it all, create a lexer and parser (A), generate the "tree-walker" (B), compile all source files (C):

// A
java -cp antlr-3.2.jar org.antlr.Tool PHPMin.g

// B 
java -cp antlr-3.2.jar org.antlr.Tool PHPMinFunctionWalker.g

// C
javac -cp antlr-3.2.jar *.java

// D     
java -cp .:antlr-3.2.jar Main    // *nix 
java -cp .;antlr-3.2.jar Main    // Windows

and run the following main class (D):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

    public static void main(String[] args) throws Exception {

        String source = "<?php                                          \n" + 
                        "f(); // function called before it’s defined    \n" + 
                        "function f() {                                 \n" + 
                        "  g();                                         \n" + 
                        "}                                              \n" + 
                        "function g() {}                                \n" + 
                        "?>                                             \n";

        // create a lexer and parser for the source
        ANTLRStringStream in = new ANTLRStringStream(source);
        PHPMinLexer lexer = new PHPMinLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PHPMinParser parser = new PHPMinParser(tokens);
        PHPMinParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();

        // create a tree walker to discover all declared functions
        CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
        functions.discover();
        System.out.println("Declared functions: "+functions.declared);
    }
}

which produces the following output:

Declared functions: [f, g]

Of course, this is just an example of how to approach it, not of how it is best done. I can imagine (when using Java to interpret the script), you wouldn't store the declared functions as simple Strings in a Set<String>, but rather as a Map<String, CommonTree> to easily get the root of a function and evaluate it when called.

Further reading: http://www.antlr.org/wiki/display/ANTLR3/Simple+tree-based+interpeter

Good luck!

EDIT

The seconds pass could then check if all functions are defined ahead of it using the previous tree-walker:

tree grammar PHPMinValidateWalker;

options {
    tokenVocab=PHPMin;
    ASTLabelType=CommonTree;
}

@members {
    java.util.Set<String> declared = new java.util.HashSet<String>();
}

validate
  :  script
  ;

script
  :  ^(SCRIPT atom*)
  ;

atom
  :  functionCall
  |  functionDecl
  ;

functionCall
  :  ^(F_CALL Identifier) 
     {
       if(!declared.contains($Identifier.text)) {
         throw new RuntimeException("no such function: " +  $Identifier.text);
       }
     }
  ;

functionDecl
  :  ^(F_DECL Identifier functionBody)
  ;

functionBody
  :  ^(F_BODY functionCall*)
  ;

Using the test:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

    public static void main(String[] args) throws Exception {

        String source = "<?php                                          \n" + 
                        "f(); // function called before it’s defined    \n" + 
                        "function f() {                                 \n" + 
                        "  g();                                         \n" + 
                        "  x();                                         \n" + 
                        "}                                              \n" + 
                        "function g() {}                                \n" + 
                        "?>                                             \n";

        // create a lexer and parser for the source
        ANTLRStringStream in = new ANTLRStringStream(source);
        PHPMinLexer lexer = new PHPMinLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PHPMinParser parser = new PHPMinParser(tokens);
        PHPMinParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();

        // create a tree walker to discover all declared functions
        CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
        functions.discover();
        System.out.println("Declared functions: "+functions.declared);

        // PHPMinValidateWalker
        nodes = new CommonTreeNodeStream(tree);
        nodes.setTokenStream(tokens);
        PHPMinValidateWalker validator = new PHPMinValidateWalker(nodes);
        validator.declared = functions.declared;
        validator.validate();
    }
}

produces an exception since x() is not define anywhere. Removing it from the source will cause the tree-walker to produce no exception.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文