ANTLR：如何使用词法分析器解析匹配括号内的区域

发布于 2024-11-27 17:22:35 字数 440 浏览 3 评论 0原文

我想在我的词法分析器中解析类似的内容：

( begin expression )

其中表达式也被括号包围。表达式中的内容并不重要，我只想将 (begin 和匹配的 ) 之间的所有内容作为令牌。一个例子是：

(begin 
    (define x (+ 1 2)))

所以令牌的文本应该是

(define x (+ 1 2)))

类似的东西

PROGRAM : LPAREN BEGIN .* RPAREN;

（显然）不起作用，因为一旦他看到“）”，他就认为规则已经结束，但我需要匹配的括号。

我怎样才能做到这一点？

原文

i want to parse something like this in my lexer:

( begin expression )

where expressions are also surrounded by brackets. it isn't important what is in the expression, i just want to have all what's between the (begin and the matching ) as a token. an example would be:

(begin 
    (define x (+ 1 2)))

so the text of the token should be

(define x (+ 1 2)))

something like

PROGRAM : LPAREN BEGIN .* RPAREN;

does (obviously) not work because as soon as he sees a ")", he thinks the rule is over, but i need the matching bracket for this.

how can i do that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ˉ厌 2024-12-04 17:22:35

在词法分析器规则内，您可以递归地调用规则。所以，这是解决这个问题的一种方法。另一种方法是跟踪左括号和右括号的数量，并让 门控语义谓词 只要您的计数器大于零，就会循环。

演示：

Tg

grammar T;

parse
  :  BeginToken {System.out.println("parsed :: " + $BeginToken.text);} EOF
  ;

BeginToken 
@init{int open = 1;}
  :  '(' 'begin' ( {open > 0}?=>              // keep reapeating `( ... )*` as long as open > 0
                     ( ~('(' | ')')           // match anything other than parenthesis
                     | '('          {open++;} // match a '(' in increase the var `open`
                     | ')'          {open--;} // match a ')' in decrease the var `open`
                     )
                 )*
  ;

Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String input = "(begin (define x (+ (- 1 3) 2)))";
    TLexer lexer = new TLexer(new ANTLRStringStream(input));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

java -cp antlr-3.3-complete.jar org.antlr.Tool T.g
javac -cp antlr-3.3-complete.jar *.java
java -cp .:antlr-3.3-complete.jar Main

parsed :: (begin (define x (+ (- 1 3) 2)))

请注意，您需要注意源代码中可能包含括号的字符串文字：

BeginToken
@init{int open = 1;}
  :  '(' 'begin' ( {open > 0}?=>              // ...
                     ( ~('(' | ')' | '"')     // ...
                     | '('          {open++;} // ...
                     | ')'          {open--;} // ...
                     |  '"' ...               // TODO: define a string literal here
                     )
                 )*
  ;

或可能包含括号的注释。

带有谓词的建议使用一些特定于语言的代码（在本例中为 Java）。递归调用词法分析器规则的优点是您的词法分析器中没有自定义代码：

BeginToken 
  :  '(' Spaces? 'begin' Spaces? NestedParens Spaces? ')'
  ;

fragment NestedParens
  :  '(' ( ~('(' | ')') | NestedParens )* ')'
  ;

fragment Spaces
  :  (' ' | '\t')+
  ;

Inside lexer rules, you can invoke rules recursively. So, that's one way to solve this. Another approach would be to keep track of the number of open- and close parenthesis and let a gated semantic predicate loop as long as your counter is more than zero.

A demo:

T.g

grammar T;

parse
  :  BeginToken {System.out.println("parsed :: " + $BeginToken.text);} EOF
  ;

BeginToken 
@init{int open = 1;}
  :  '(' 'begin' ( {open > 0}?=>              // keep reapeating `( ... )*` as long as open > 0
                     ( ~('(' | ')')           // match anything other than parenthesis
                     | '('          {open++;} // match a '(' in increase the var `open`
                     | ')'          {open--;} // match a ')' in decrease the var `open`
                     )
                 )*
  ;

Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String input = "(begin (define x (+ (- 1 3) 2)))";
    TLexer lexer = new TLexer(new ANTLRStringStream(input));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

java -cp antlr-3.3-complete.jar org.antlr.Tool T.g
javac -cp antlr-3.3-complete.jar *.java
java -cp .:antlr-3.3-complete.jar Main

parsed :: (begin (define x (+ (- 1 3) 2)))

Note that you'll need to beware of string literals inside your source that might include parenthesis:

BeginToken
@init{int open = 1;}
  :  '(' 'begin' ( {open > 0}?=>              // ...
                     ( ~('(' | ')' | '"')     // ...
                     | '('          {open++;} // ...
                     | ')'          {open--;} // ...
                     |  '"' ...               // TODO: define a string literal here
                     )
                 )*
  ;

or comments that may contain parenthesis.

The suggestion with the predicate uses some language specific code (Java, in this case). An advantage of calling a lexer rule recursively is that you don't have custom code in your lexer:

BeginToken 
  :  '(' Spaces? 'begin' Spaces? NestedParens Spaces? ')'
  ;

fragment NestedParens
  :  '(' ( ~('(' | ')') | NestedParens )* ')'
  ;

fragment Spaces
  :  (' ' | '\t')+
  ;

回复收藏 0 原文

~没有更多了~