使用 ANTLR 识别 JavaScript 文件中的全局变量声明

发布于 2024-09-24 17:33:06 字数 554 浏览 1 评论 0原文

我一直在使用 ANTLR 提供的 ECMAScript 语法,目的是识别 JavaScript 全局变量。生成了 AST,我现在想知道过滤全局变量声明的基本方法是什么。

我有兴趣在 AST 中查找所有最外层的“variableDeclaration”标记;但实际的操作方法却让我无法理解。这是我到目前为止的设置代码:

String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);

JavaScriptLexer lexer = new JavaScriptLexer(cs);

CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);

JavaScriptParser parser = new JavaScriptParser(tokens);

program_return programReturn = parser.program();

作为 ANTLR 的新手,有人可以提供任何指示吗?

I've been using the ANTLR supplied ECMAScript grammar with the objective of identifying JavaScript global variables. An AST is produced and I'm now wondering what the based way of filtering out the global variable declarations is.

I'm interested in looking for all of the outermost "variableDeclaration" tokens in my AST; the actual how-to-do-this is eluding me though. Here's my set up code so far:

String input = "var a, b; var c;";
CharStream cs = new ANTLRStringStream(input);

JavaScriptLexer lexer = new JavaScriptLexer(cs);

CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);

JavaScriptParser parser = new JavaScriptParser(tokens);

program_return programReturn = parser.program();

Being new to ANTLR can anyone offer any pointers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

盛夏尉蓝 2024-10-01 17:33:06

我猜您正在使用此语法

尽管该语法表明创建了正确的 AST,但事实并非如此。它使用一些内联运算符从解析树中排除某些标记,但它从不为树创建任何根,从而产生完全平坦的解析树。由此,您无法以合理的方式获取所有全局变量。

您需要稍微调整语法:

在语法文件顶部的 options { ... } 下添加以下内容:

tokens
{
  VARIABLE;
  FUNCTION;
}

现在替换以下规则:functionDeclarationfunctionExpressionvariableDeclaration 以及这些:

functionDeclaration
  :  'function' LT* Identifier LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier formalParameterList functionBody)
  ;

functionExpression
  :  'function' LT* Identifier? LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier? formalParameterList functionBody)
  ;

variableDeclaration
  :  Identifier LT* initialiser? 
     -> ^(VARIABLE Identifier initialiser?)
  ;

现在生成了更合适的树。如果您现在解析源代码:

var a = 1; function foo() { var b = 2; } var c = 3;

将生成以下树:

alt text

您现在所要做的就是迭代子级当您偶然发现 VARIABLE 标记时,您就知道它是“全局”的,因为所有其他变量都将位于 FUNCTION 节点下。

下面是如何做到这一点:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
        ANTLRStringStream in = new ANTLRStringStream(source);
        JavaScriptLexer lexer = new JavaScriptLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaScriptParser parser = new JavaScriptParser(tokens);
        JavaScriptParser.program_return returnValue = parser.program();
        CommonTree tree = (CommonTree)returnValue.getTree();
        for(Object o : tree.getChildren()) {
            CommonTree child = (CommonTree)o;
            if(child.getType() == JavaScriptParser.VARIABLE) {
                System.out.println("Found a global var: "+child.getChild(0));
            }
        }
    }
}

它会产生以下输出:

Found a global var: a
Found a global var: c

I guess you're using this grammar.

Although that grammar suggests a proper AST is created, this is not the case. It uses some inline operators to exclude certain tokens from the parse-tree, but it never creates any roots for the tree, resulting in a completely flat parse tree. From this, you can't get all global vars in a reasonable way.

You'll need to adjust the grammar slightly:

Add the following under the options { ... } at the top of the grammar file:

tokens
{
  VARIABLE;
  FUNCTION;
}

Now replace the following rules: functionDeclaration, functionExpression and variableDeclaration with these:

functionDeclaration
  :  'function' LT* Identifier LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier formalParameterList functionBody)
  ;

functionExpression
  :  'function' LT* Identifier? LT* formalParameterList LT* functionBody 
     -> ^(FUNCTION Identifier? formalParameterList functionBody)
  ;

variableDeclaration
  :  Identifier LT* initialiser? 
     -> ^(VARIABLE Identifier initialiser?)
  ;

Now a more suitable tree is generated. If you now parse the source:

var a = 1; function foo() { var b = 2; } var c = 3;

the following tree is generated:

alt text

All you now have to do is iterate over the children of the root of your tree and when you stumble upon a VARIABLE token, you know it's a "global" since all other variables will be under FUNCTION nodes.

Here's how to do that:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "var a = 1; function foo() { var b = 2; } var c = 3;";
        ANTLRStringStream in = new ANTLRStringStream(source);
        JavaScriptLexer lexer = new JavaScriptLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaScriptParser parser = new JavaScriptParser(tokens);
        JavaScriptParser.program_return returnValue = parser.program();
        CommonTree tree = (CommonTree)returnValue.getTree();
        for(Object o : tree.getChildren()) {
            CommonTree child = (CommonTree)o;
            if(child.getType() == JavaScriptParser.VARIABLE) {
                System.out.println("Found a global var: "+child.getChild(0));
            }
        }
    }
}

which produces the following output:

Found a global var: a
Found a global var: c
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文