什么是“语义谓词”?在ANTLR?

发布于 2024-09-05 18:52:08 字数 49 浏览 5 评论 0原文

ANTLR 中的语义谓词是什么?

What is a semantic predicate in ANTLR?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

西瑶 2024-09-12 18:52:08

ANTLR 4

对于 ANTLR 4 中的谓词,请查看这些堆栈溢出问答:


ANTLR 3

语义谓词是一种在语法上强制执行额外(语义)规则的方法
使用纯代码的操作。

语义谓词有 3 种类型:

  • 验证语义谓词;
  • 门控语义谓词;
  • 消除歧义语义谓词。

语法示例

假设您有一个仅由数字组成的文本块,数字之间用
逗号,忽略任何空格。你想解析这个输入
确保数字最多为 3 位“长”(最多 999)。下列
语法 (Numbers.g) 会做这样的事情:

grammar Numbers;

// entry point of this parser: it parses an input string consisting of at least 
// one number, optionally followed by zero or more comma's and numbers
parse
  :  number (',' number)* EOF
  ;

// matches a number that is between 1 and 3 digits long
number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

// matches a single digit
Digit
  :  '0'..'9'
  ;

// ignore spaces
WhiteSpace
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

测试

可以使用以下类来测试语法:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");
        NumbersLexer lexer = new NumbersLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        NumbersParser parser = new NumbersParser(tokens);
        parser.parse();
    }
}

通过生成词法分析器和解析器、编译所有 .java 来测试它文件和
运行 Main 类:

java -cp antlr-3.2.jar org.antlr.Tool Numbers.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main

执行此操作时,控制台上不会打印任何内容,这表明没有任何内容
出错了。尝试将:更改

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");

为:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777   , 89");

并再次进行测试:您将在控制台上的字符串 777 之后看到错误。


语义谓词

这给我们带来了语义谓词。假设你想解析
长度在 1 到 10 位数字之间的数字。像这样的规则

number
  :  Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit
  |  Digit Digit Digit Digit Digit Digit Digit Digit Digit
     /* ... */
  |  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

会变得很麻烦。语义谓词可以帮助简化此类规则。


1. 验证语义谓词

验证语义谓词什么都不是
不仅仅是一段代码块后跟一个问号:

RULE { /* a boolean expression in here */ }?

要使用验证解决上述问题
语义谓词,将语法中的 number 规则更改为:

number
@init { int N = 0; }
  :  (Digit { N++; } )+ { N <= 10 }?
  ;

{ int N = 0; }{ N++; } 是纯 Java 语句,其中
第一个是在解析器“输入”number 规则时初始化的。实际的
谓词是:{ N <= 10 }?,这会导致解析器抛出
FailedPredicateException
每当数字长度超过 10 位时。

使用以下 ANTLRStringStream 对其进行测试:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

它不会产生异常,而以下代码会引发异常:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

2. 门控语义谓词

门控语义谓词 类似于 >验证语义谓词
只有门控版本会产生语法错误,而不是FailedPredicateException

门控语义谓词的语法是:

{ /* a boolean expression in here */ }?=> RULE

要使用门控谓词来匹配最多 10 位数字的数字来解决上述问题,您可以编写:

number
@init { int N = 1; }
  :  ( { N <= 10 }?=> Digit { N++; } )+
  ;

使用:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

和:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

再次测试它,您将看到最后一个会抛出错误。


3. 消除语义谓词歧义 谓词

的最终类型是消除语义谓词歧义,它看起来有点像验证谓词 ({boolean-expression}?),但作用更多就像门控语义谓词(当布尔表达式计算结果为 false 时,不会引发异常)。您可以在规则的开头使用它来检查规则的某些属性,并让解析器匹配或不匹配该规则。

假设示例语法创建了 Number 标记(词法分析器规则而不是解析器规则),它将匹配 0..999 范围内的数字。现在在解析器中,您希望区分低数字和高数字(低:0..500,高:501..999)。这可以使用消除歧义的语义谓词来完成,您可以在其中检查流中的下一个标记 (input.LT(1)) 以检查它是低还是高。

演示:

grammar Numbers;

parse
  :  atom (',' atom)* EOF
  ;

atom
  :  low  {System.out.println("low  = " + $low.text);}
  |  high {System.out.println("high = " + $high.text);}
  ;

low
  :  {Integer.valueOf(input.LT(1).getText()) <= 500}? Number
  ;

high
  :  Number
  ;

Number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

fragment Digit
  :  '0'..'9'
  ;

WhiteSpace
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

如果您现在解析字符串 "123, 999, 456, 700, 89, 0",您将看到以下输出:

low  = 123
high = 999
low  = 456
high = 700
low  = 89
low  = 0

ANTLR 4

For predicates in ANTLR 4, checkout these stackoverflow Q&A's:


ANTLR 3

A semantic predicate is a way to enforce extra (semantic) rules upon grammar
actions using plain code.

There are 3 types of semantic predicates:

  • validating semantic predicates;
  • gated semantic predicates;
  • disambiguating semantic predicates.

Example grammar

Let's say you have a block of text consisting of only numbers separated by
comma's, ignoring any white spaces. You would like to parse this input making
sure that the numbers are at most 3 digits "long" (at most 999). The following
grammar (Numbers.g) would do such a thing:

grammar Numbers;

// entry point of this parser: it parses an input string consisting of at least 
// one number, optionally followed by zero or more comma's and numbers
parse
  :  number (',' number)* EOF
  ;

// matches a number that is between 1 and 3 digits long
number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

// matches a single digit
Digit
  :  '0'..'9'
  ;

// ignore spaces
WhiteSpace
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

Testing

The grammar can be tested with the following class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");
        NumbersLexer lexer = new NumbersLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        NumbersParser parser = new NumbersParser(tokens);
        parser.parse();
    }
}

Test it by generating the lexer and parser, compiling all .java files and
running the Main class:

java -cp antlr-3.2.jar org.antlr.Tool Numbers.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main

When doing so, nothing is printed to the console, which indicates that nothing
went wrong. Try changing:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");

into:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777   , 89");

and do the test again: you will see an error appearing on the console right after the string 777.


Semantic Predicates

This brings us to the semantic predicates. Let's say you want to parse
numbers between 1 and 10 digits long. A rule like:

number
  :  Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit
  |  Digit Digit Digit Digit Digit Digit Digit Digit Digit
     /* ... */
  |  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

would become cumbersome. Semantic predicates can help simplify this type of rule.


1. Validating Semantic Predicates

A validating semantic predicate is nothing
more than a block of code followed by a question mark:

RULE { /* a boolean expression in here */ }?

To solve the problem above using a validating
semantic predicate, change the number rule in the grammar into:

number
@init { int N = 0; }
  :  (Digit { N++; } )+ { N <= 10 }?
  ;

The parts { int N = 0; } and { N++; } are plain Java statements of which
the first is initialized when the parser "enters" the number rule. The actual
predicate is: { N <= 10 }?, which causes the parser to throw a
FailedPredicateException
whenever a number is more than 10 digits long.

Test it by using the following ANTLRStringStream:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

which produces no exception, while the following does thow an exception:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

2. Gated Semantic Predicates

A gated semantic predicate is similar to a validating semantic predicate,
only the gated version produces a syntax error instead of a FailedPredicateException.

The syntax of a gated semantic predicate is:

{ /* a boolean expression in here */ }?=> RULE

To instead solve the above problem using gated predicates to match numbers up to 10 digits long you would write:

number
@init { int N = 1; }
  :  ( { N <= 10 }?=> Digit { N++; } )+
  ;

Test it again with both:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

and:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

and you will see the last on will throw an error.


3. Disambiguating Semantic Predicates

The final type of predicate is a disambiguating semantic predicate, which looks a bit like a validating predicate ({boolean-expression}?), but acts more like a gated semantic predicate (no exception is thrown when the boolean expression evaluates to false). You can use it at the start of a rule to check some property of a rule and let the parser match said rule or not.

Let's say the example grammar creates Number tokens (a lexer rule instead of a parser rule) that will match numbers in the range of 0..999. Now in the parser, you'd like to make a distinction between low- and hight numbers (low: 0..500, high: 501..999). This could be done using a disambiguating semantic predicate where you inspect the token next in the stream (input.LT(1)) to check if it's either low or high.

A demo:

grammar Numbers;

parse
  :  atom (',' atom)* EOF
  ;

atom
  :  low  {System.out.println("low  = " + $low.text);}
  |  high {System.out.println("high = " + $high.text);}
  ;

low
  :  {Integer.valueOf(input.LT(1).getText()) <= 500}? Number
  ;

high
  :  Number
  ;

Number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

fragment Digit
  :  '0'..'9'
  ;

WhiteSpace
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

If you now parse the string "123, 999, 456, 700, 89, 0", you'd see the following output:

low  = 123
high = 999
low  = 456
high = 700
low  = 89
low  = 0
云雾 2024-09-12 18:52:08

我一直使用 wincent.com 上 ANTLR 谓词 的简洁参考作为我的指南。

I've always used the terse reference to ANTLR predicates on wincent.com as my guide.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文