ANTLR 解析 Java 属性

发布于 2024-11-09 14:59:40 字数 628 浏览 0 评论 0原文

我正在尝试学习 ANTLR 并为 Java Properties 编写语法。我在这里遇到了一个问题，希望得到一些帮助。

在 Java Properties 中，它有一点奇怪的转义处理。例如，

key1=1=Key1
key\=2==

Java 运行时中的键值对结果为

KEY     VALUE
===     =====
key1    1=Key1
key=2   =

到目前为止，这是我可以模仿的最好的结果。通过将“=”和值折叠成一个标记。

grammar Prop;
file : (pair | LINE_COMMENT)* ;
pair : ID VALUE ;
ID  :   (~('='|'\r'|'\n') | '\\=')* ;
VALUE   :   '=' (~('\r'|'\n'))*;
CARRIAGE_RETURN
    :       ('\r'|'\n') + {$channel=HIDDEN;}
    ;
LINE_COMMENT
    : '#' ~('\r'|'\n')* ('\r'|'\n'|EOF)
    ;

如果我可以实施更好的方案，有什么好的建议吗？多谢

原文

I'm trying to pick up ANTLR and writing a grammar for Java Properties. I'm hitting an issue here and will appreciate some help.

In Java Properties, it has a little strange escape handling. For example,

key1=1=Key1
key\=2==

results in key-value pairs in Java runtime as

KEY     VALUE
===     =====
key1    1=Key1
key=2   =

So far, this is the best I can mimic.. by folding the '=' and value into one single token.

grammar Prop;
file : (pair | LINE_COMMENT)* ;
pair : ID VALUE ;
ID  :   (~('='|'\r'|'\n') | '\\=')* ;
VALUE   :   '=' (~('\r'|'\n'))*;
CARRIAGE_RETURN
    :       ('\r'|'\n') + {$channel=HIDDEN;}
    ;
LINE_COMMENT
    : '#' ~('\r'|'\n')* ('\r'|'\n'|EOF)
    ;

Is there any good suggestion if I can implement a better one?
Thanks a lot

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

噩梦成真你也成魔 2024-11-16 14:59:40

这并不那么容易。您无法在词法分析级别处理太多事情，因为很多事情都依赖于特定的上下文。因此，在词法分析级别，您只能匹配单个字符并在解析器规则中构造键和值。此外，= 和 : 作为可能的键值分隔符以及这些字符可以作为值的开头这一事实，使它们成为屁股上的疼痛要翻译成语法。最简单的方法是将这些（可能的）分隔符包含在您的值规则中，并将分隔符和值匹配在一起后，从中删除分隔符。

一个小演示：

JavaProperties.g

grammar JavaProperties;

parse
  :  line* EOF
  ;

line
  :  Space* keyValue
  |  Space* Comment eol
  |  Space* LineBreak
  ;

keyValue
  :  key separatorAndValue eol
     {
       // Replace all escaped `=` and `:`
       String k = $key.text.replace("\\:", ":").replace("\\=", "=");

       // Remove the  separator, if it exists
       String v = $separatorAndValue.text.replaceAll("^\\s*[:=]\\s*", "");

       // Remove all escaped line breaks with trailing spaces
       v = v.replaceAll("\\\\(\r?\n|\r)[ \t\f]*", "").trim();

       System.out.println("\nkey   : `" + k + "`");
       System.out.println("value : `" + v + "`");
     }
  ;

key
  :  keyChar+
  ;

keyChar
  :  AlphaNum 
  |  Backslash (Colon | Equals)
  ;

separatorAndValue
  :  (Space | Colon | Equals) valueChar+
  ;

valueChar
  :  AlphaNum 
  |  Space 
  |  Backslash LineBreak
  |  Equals
  |  Colon
  ;

eol
  :  LineBreak
  |  EOF
  ;

Backslash : '\\';
Colon     : ':';
Equals    : '=';

Comment
  :  ('!' | '#') ~('\r' | '\n')*
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

Space
  :  ' ' 
  |  '\t' 
  |  '\f'
  ;

AlphaNum
  :  'a'..'z'
  |  'A'..'Z'
  |  '0'..'9'
  ;

来测试上面的语法

可以使用类： Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    ANTLRStringStream in = new ANTLRFileStream("test.properties");
    JavaPropertiesLexer lexer = new JavaPropertiesLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    JavaPropertiesParser parser = new JavaPropertiesParser(tokens);
    parser.parse();
  }
}

和输入文件：

test.properties

key1 = value 1
        key2:value 2
 key3                  :value3
ke\:\=y4=v\
    a\
    l\
    u\
    e    4
key\=5==
key6           value6

，以产生以下输出：

key   : `key1`
value : `value 1`

key   : `key2`
value : `value 2`

key   : `key3`
value : `value3`

key   : `ke:=y4`
value : `value    4`

key   : `key=5`
value : `=`

key   : `key6`
value : `value6`

意识到我的语法只是一个示例：它不考虑所有有效的语法属性文件（有时应该忽略反斜杠，没有 Unicode 转义，键和值中缺少许多字符）。有关属性文件的完整规范，请参阅：
http:// /download.oracle.com/javase/6/docs/api/java/util/Properties.html#load%28java.io.Reader%29

It's not as easy as that. You can't handle much at the lexing level because many things depend on a certain context. So at the lexing level, you can only match single characters and construct key and values in parser rules. Also, the = and : as possible key-value separators and the fact that these characters can be the start of a value, makes them a pain in the butt to translate into a grammar. The easiest would be to include these (possible) separator chars in your value-rule and after matching the separator and value together, strip the separator chars from it.

A small demo:

JavaProperties.g

grammar JavaProperties;

parse
  :  line* EOF
  ;

line
  :  Space* keyValue
  |  Space* Comment eol
  |  Space* LineBreak
  ;

keyValue
  :  key separatorAndValue eol
     {
       // Replace all escaped `=` and `:`
       String k = $key.text.replace("\\:", ":").replace("\\=", "=");

       // Remove the  separator, if it exists
       String v = $separatorAndValue.text.replaceAll("^\\s*[:=]\\s*", "");

       // Remove all escaped line breaks with trailing spaces
       v = v.replaceAll("\\\\(\r?\n|\r)[ \t\f]*", "").trim();

       System.out.println("\nkey   : `" + k + "`");
       System.out.println("value : `" + v + "`");
     }
  ;

key
  :  keyChar+
  ;

keyChar
  :  AlphaNum 
  |  Backslash (Colon | Equals)
  ;

separatorAndValue
  :  (Space | Colon | Equals) valueChar+
  ;

valueChar
  :  AlphaNum 
  |  Space 
  |  Backslash LineBreak
  |  Equals
  |  Colon
  ;

eol
  :  LineBreak
  |  EOF
  ;

Backslash : '\\';
Colon     : ':';
Equals    : '=';

Comment
  :  ('!' | '#') ~('\r' | '\n')*
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

Space
  :  ' ' 
  |  '\t' 
  |  '\f'
  ;

AlphaNum
  :  'a'..'z'
  |  'A'..'Z'
  |  '0'..'9'
  ;

The grammar above can be tested with the class:

Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    ANTLRStringStream in = new ANTLRFileStream("test.properties");
    JavaPropertiesLexer lexer = new JavaPropertiesLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    JavaPropertiesParser parser = new JavaPropertiesParser(tokens);
    parser.parse();
  }
}

and the input file:

test.properties

key1 = value 1
        key2:value 2
 key3                  :value3
ke\:\=y4=v\
    a\
    l\
    u\
    e    4
key\=5==
key6           value6

to produce the following output:

key   : `key1`
value : `value 1`

key   : `key2`
value : `value 2`

key   : `key3`
value : `value3`

key   : `ke:=y4`
value : `value    4`

key   : `key=5`
value : `=`

key   : `key6`
value : `value6`

Realize that my grammar is just an example: it does not account for all valid properties files (sometimes backslashes should be ignored, there's no Unicode escapes, many characters are missing in the key and value). For a complete specification of properties files, see:
http://download.oracle.com/javase/6/docs/api/java/util/Properties.html#load%28java.io.Reader%29

回复收藏 0 原文

~没有更多了~