ANTLR4 - 如何结束“最长比赛胜利”并使用第一个匹配规则？

发布于 2025-01-13 20:12:26 字数 2756 浏览 3 评论 0原文

原始问题：

我要解析的代码： N100G1M4 我的期望：N100 G1 M4 但是 ANTLR 无法识别这一点，因为 ANTLR 总是匹配最长的子字符串？案件如何处理？

更新

我要做的事情：

我正在尝试解析 CNC G 代码 txt 并从文件流中获取关键字，该文件流通常用于控制机器并驱动电机移动。

G代码规则是：

// Define a grammar called Hello
grammar GCode;

script  : blocks+ EOF;

blocks: 
      assign_stat
    | ncblock 
    | NEWLINE
    ;

ncblock : 
     ncelements  NEWLINE  // 
    ;
ncelements :
        ncelement+
    ;

ncelement 
    :   
        LINENUMEXPR    // linenumber N100 
    |   GCODEEXPR   // G10 G54.1
    |   MCODEEXPR   // M30
    |   coordexpr   // X100 Y100 Z[A+b*c]
    |   FeedExpr    // F10.12
    |   AccExpr     // E2.0
    // |   callSubroutine 
    ;

assign_stat: 
        VARNAME '=' expression NEWLINE
    ;

expression: 
       multiplyingExpression  ('+' | '-') multiplyingExpression   
    ;

multiplyingExpression
   : powExpression (('*' | '/') powExpression)*
   ;

powExpression
   : signedAtom ('^' signedAtom)*
   ;

signedAtom
   : '+' signedAtom
   | '-' signedAtom
   | atom
   ;

atom
   : scientific
   | variable
   | '(' expression ')'
   ;

LINENUMEXPR: 'N' Digit+ ;
GCODEEXPR : 'G' GPOSTFIX;
MCODEEXPR : 'M' INT;
coordexpr: 
        CoordExpr
    |   ParameterKeyword getValueExpr
    ;

getValueExpr: 
        '[' expression ']'
    ;

CoordExpr 
        : 
         ParameterKeyword SCIENTIFIC_NUMBER
        ;
ParameterKeyword: [XYZABCUVWIJKR];
FeedExpr: 'F' SCIENTIFIC_NUMBER;
AccExpr: 'E' SCIENTIFIC_NUMBER;



fragment
GPOSTFIX
    : Digit+ ('.' Digit+)*
    ;

variable
   : VARNAME
   ;

scientific
   : SCIENTIFIC_NUMBER
   ;

SCIENTIFIC_NUMBER
   : SIGN? NUMBER (('E' | 'e') SIGN? NUMBER)?
   ;

fragment NUMBER
   : ('0' .. '9') + ('.' ('0' .. '9') +)?
   ;

HEX_INTEGER
 : '0' [xX] HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : [0-9a-fA-F]
 ;
 
INT : Digit+;

fragment
Digit : [0-9];

fragment 
SIGN
   : ('+' | '-')
   ;

VARNAME
    : [a-zA-Z_][a-zA-Z_0-9]*
    ;

NEWLINE 
    : '\r'? '\n'
    ;

WS : [ \t]+ -> skip ; // skip spaces, tabs, newlines

示例程序（除了最后一行之外，它运行良好）：

N200 G54.1
a = 100
b = 10
c = a + b 
Z[a + b*c]
N002 G2 X30.1 Y20.1 I20.1 J0.1 K0.2 R20

N100 G1X100.5Z[VAR1+100]M3H3 // it works well except the last line

我想将 N100G1X100.5YE5Z[VAR1+100]M3H3 解析为

-> N100 G1 X100 Z[VAR1+100]
->或者最好将节点 X100 拆分为两个子节点 X 100：

我正在尝试使用 ANTLR ，但ANTLR 始终遵循“最长匹配获胜”的规则。 N100G1X100 被识别为一个单词。

追加问题：完成任务的最佳工具是什么？

原文

Orignial question:

My code to parse:
N100G1M4
What I expcted: N100 G1 M4
But ANTLR can not idetify this because ANTLR always match longest substring?
How to handle the case?

Update

What I am going to do:

I am trying to parse CNC G-Code txt and get keywords from a file stream, which is usually used to control a machine and drive motors to move.

The G-Code rule is :

// Define a grammar called Hello
grammar GCode;

script  : blocks+ EOF;

blocks: 
      assign_stat
    | ncblock 
    | NEWLINE
    ;

ncblock : 
     ncelements  NEWLINE  // 
    ;
ncelements :
        ncelement+
    ;

ncelement 
    :   
        LINENUMEXPR    // linenumber N100 
    |   GCODEEXPR   // G10 G54.1
    |   MCODEEXPR   // M30
    |   coordexpr   // X100 Y100 Z[A+b*c]
    |   FeedExpr    // F10.12
    |   AccExpr     // E2.0
    // |   callSubroutine 
    ;

assign_stat: 
        VARNAME '=' expression NEWLINE
    ;

expression: 
       multiplyingExpression  ('+' | '-') multiplyingExpression   
    ;

multiplyingExpression
   : powExpression (('*' | '/') powExpression)*
   ;

powExpression
   : signedAtom ('^' signedAtom)*
   ;

signedAtom
   : '+' signedAtom
   | '-' signedAtom
   | atom
   ;

atom
   : scientific
   | variable
   | '(' expression ')'
   ;

LINENUMEXPR: 'N' Digit+ ;
GCODEEXPR : 'G' GPOSTFIX;
MCODEEXPR : 'M' INT;
coordexpr: 
        CoordExpr
    |   ParameterKeyword getValueExpr
    ;

getValueExpr: 
        '[' expression ']'
    ;

CoordExpr 
        : 
         ParameterKeyword SCIENTIFIC_NUMBER
        ;
ParameterKeyword: [XYZABCUVWIJKR];
FeedExpr: 'F' SCIENTIFIC_NUMBER;
AccExpr: 'E' SCIENTIFIC_NUMBER;



fragment
GPOSTFIX
    : Digit+ ('.' Digit+)*
    ;

variable
   : VARNAME
   ;

scientific
   : SCIENTIFIC_NUMBER
   ;

SCIENTIFIC_NUMBER
   : SIGN? NUMBER (('E' | 'e') SIGN? NUMBER)?
   ;

fragment NUMBER
   : ('0' .. '9') + ('.' ('0' .. '9') +)?
   ;

HEX_INTEGER
 : '0' [xX] HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : [0-9a-fA-F]
 ;
 
INT : Digit+;

fragment
Digit : [0-9];

fragment 
SIGN
   : ('+' | '-')
   ;

VARNAME
    : [a-zA-Z_][a-zA-Z_0-9]*
    ;

NEWLINE 
    : '\r'? '\n'
    ;

WS : [ \t]+ -> skip ; // skip spaces, tabs, newlines

Sample program(it works well except the last line):

N200 G54.1
a = 100
b = 10
c = a + b 
Z[a + b*c]
N002 G2 X30.1 Y20.1 I20.1 J0.1 K0.2 R20

N100 G1X100.5Z[VAR1+100]M3H3 // it works well except the last line

I want to parse N100G1X100.5YE5Z[VAR1+100]M3H3 to