ANTLR：词法分析器规则严格接受一个字母和多个字符的标记，而不是仅接受一个（Java）

发布于 2024-10-11 01:10:03 字数 2018 浏览 2 评论 0原文

我已经为 ANTLR 解析器和词法分析器编写了以下语法，用于为逻辑公式构建树，并且有几个问题（如果有人可以帮助的话）：

class AntlrFormulaParser extends Parser;

options {
    buildAST = true;
}

biconexpr : impexpr (BICONDITIONAL^ impexpr)*;

impexpr : orexpr (IMPLICATION^ orexpr)*;

orexpr : andexpr (DISJUNCTION^ andexpr)*;

andexpr : notexpr (CONJUNCTION^ notexpr)*;

notexpr : (NEGATION^)? formula;

formula 
    : atom
    | LEFT_PAREN! biconexpr RIGHT_PAREN!
    ;

atom
    : CHAR
    | TRUTH
    | FALSITY
    ;


class AntlrFormulaLexer extends Lexer;

// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');

// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | "->");
BICONDITIONAL: ('\u2194' | "<->");

WHITESPACE : (' ' | '\t' | '\r' | '\n') { $setType(Token.SKIP); };

树语法：

tree grammar AntlrFormulaTreeParser;

options {
    tokenVocab=AntlrFormula;
    ASTLabelType=CommonTree;
}

expr returns [Formula f]
    : ^(BICONDITIONAL f1=expr f2=expr) {
        $f = new Biconditional(f1, f2);
    }
    | ^(IMPLICATION f1=expr f2=expr) {
        $f = new Implication(f1, f2);
    }
    | ^(DISJUNCTION f1=expr f2=expr) {
        $f = new Disjunction(f1, f2);
    }
    | ^(CONJUNCTION f1=expr f2=expr) {
        $f = new Conjunction(f1, f2);
    }
    | ^(NEGATION f1=expr) {
        $f = new Negation(f1);
    }
    | CHAR {
        $f = new Atom($CHAR.getText());
    }
    | TRUTH {
        $f = Atom.TRUTH;
    }
    | FALSITY {
        $f = Atom.FALSITY;
    }
    ;

我在上述语法中遇到的问题是：

AntlrFormulaLexer 的 java 代码中的标记、IMPLICATION 和 BICONDITIONAL 似乎仅检查其各自的第一个字符（即“-”和“<”）以匹配标记，而不是按照语法中指定的整个字符串。
在测试 AntlrFormulaParser 的 java 代码时，如果我传递一个字符串，例如“~ab”，它会返回一个“(~ a)”树（字符串“ab&c”仅返回“a”），当它确实应该返回错误/异常时，因为根据上述语法，原子只能有一个字母。对于这些示例字符串，它根本不会给出任何错误/异常。

如果有人能帮助我解决这两个问题，我将非常感激。谢谢：）

原文

I've written the below grammar for ANTLR parser and lexer for building trees for logical formulae and had a couple of questions if someone could help:

class AntlrFormulaParser extends Parser;

options {
    buildAST = true;
}

biconexpr : impexpr (BICONDITIONAL^ impexpr)*;

impexpr : orexpr (IMPLICATION^ orexpr)*;

orexpr : andexpr (DISJUNCTION^ andexpr)*;

andexpr : notexpr (CONJUNCTION^ notexpr)*;

notexpr : (NEGATION^)? formula;

formula 
    : atom
    | LEFT_PAREN! biconexpr RIGHT_PAREN!
    ;

atom
    : CHAR
    | TRUTH
    | FALSITY
    ;


class AntlrFormulaLexer extends Lexer;

// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');

// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | "->");
BICONDITIONAL: ('\u2194' | "<->");

WHITESPACE : (' ' | '\t' | '\r' | '\n') { $setType(Token.SKIP); };

The tree grammar:

tree grammar AntlrFormulaTreeParser;

options {
    tokenVocab=AntlrFormula;
    ASTLabelType=CommonTree;
}

expr returns [Formula f]
    : ^(BICONDITIONAL f1=expr f2=expr) {
        $f = new Biconditional(f1, f2);
    }
    | ^(IMPLICATION f1=expr f2=expr) {
        $f = new Implication(f1, f2);
    }
    | ^(DISJUNCTION f1=expr f2=expr) {
        $f = new Disjunction(f1, f2);
    }
    | ^(CONJUNCTION f1=expr f2=expr) {
        $f = new Conjunction(f1, f2);
    }
    | ^(NEGATION f1=expr) {
        $f = new Negation(f1);
    }
    | CHAR {
        $f = new Atom($CHAR.getText());
    }
    | TRUTH {
        $f = Atom.TRUTH;
    }
    | FALSITY {
        $f = Atom.FALSITY;
    }
    ;

The problems I'm having with the above grammar are these:

The tokens, IMPLICATION and BICONDITIONAL, in the java code for AntlrFormulaLexer only seem to be checking for their respective first character (i.e. '-' and '<') to match the token, instead of the whole string, as specified in the grammar.
When testing the java code for AntlrFormulaParser, if I pass a string such as "~ab", it returns a tree of "(~ a)" (and a string "ab&c" returns just "a"), when it should really be returning an error/exception, since an atom can only have one letter according to the above grammar. It doesn't give any error/exception at all with these sample strings.

I'd really appreciate if someone could help me solve these couple of problems. Thank you :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓦然回首 2024-10-18 01:10:03

我会将以下定义更改为：

IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');

注意“->”与 '->'

并解决错误问题：

formula 
    : (
         atom
       | LEFT_PAREN! biconexpr RIGHT_PAREN! 
      ) EOF
    ;

从这里：
http://www.antlr.org/wiki/pages/viewpage.action ?pageId=4554943

修复了针对 antlr 3.3 进行编译的语法（另存为 AntlrFormula.g）：

grammar AntlrFormula;

options {
    output = AST; 
}


program : formula ;

formula : atom | LEFT_PAREN! biconexpr RIGHT_PAREN! ;

biconexpr : impexpr (BICONDITIONAL^ impexpr)*;

impexpr : orexpr (IMPLICATION^ orexpr)*;

orexpr : andexpr (DISJUNCTION^ andexpr)*;

andexpr : notexpr (CONJUNCTION^ notexpr)*;

notexpr : (NEGATION^)? formula;


atom
    : CHAR
    | TRUTH
    | FALSITY
    ;


// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');

// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');

WHITESPACE : (' ' | '\t' | '\r' | '\n') { $channel = HIDDEN; };

链接到 antlr 3.3 二进制文件：http://www.antlr.org/download/antlr-3.3-complete.jar

您需要尝试匹配程序规则才能匹配完整文件。

可使用此类进行测试：

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) {
        AntlrFormulaLexer lexer = new AntlrFormulaLexer(new ANTLRStringStream("(~ab)"));
        AntlrFormulaParser p = new AntlrFormulaParser(new CommonTokenStream(lexer));

        try {
            p.program();
            if ( p.failed() || p.getNumberOfSyntaxErrors() != 0) {
                System.out.println("failed");
            }
        } catch (RecognitionException e) {
            e.printStackTrace();
        }
    }
}

I would change the following definitions as:

IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');

note "->" vs '->'

And to solve the error issue:

formula 
    : (
         atom
       | LEFT_PAREN! biconexpr RIGHT_PAREN! 
      ) EOF
    ;

from here:
http://www.antlr.org/wiki/pages/viewpage.action?pageId=4554943

Fixed grammar to compile against antlr 3.3 (save as AntlrFormula.g):

grammar AntlrFormula;

options {
    output = AST; 
}


program : formula ;

formula : atom | LEFT_PAREN! biconexpr RIGHT_PAREN! ;

biconexpr : impexpr (BICONDITIONAL^ impexpr)*;

impexpr : orexpr (IMPLICATION^ orexpr)*;

orexpr : andexpr (DISJUNCTION^ andexpr)*;

andexpr : notexpr (CONJUNCTION^ notexpr)*;

notexpr : (NEGATION^)? formula;


atom
    : CHAR
    | TRUTH
    | FALSITY
    ;


// Atoms
CHAR: 'a'..'z';
TRUTH: ('\u22A4' | 'T');
FALSITY: ('\u22A5' | 'F');

// Grouping
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
NEGATION: ('\u00AC' | '~' | '!');
CONJUNCTION: ('\u2227' | '&' | '^');
DISJUNCTION: ('\u2228' | '|' | 'V');
IMPLICATION: ('\u2192' | '->');
BICONDITIONAL: ('\u2194' | '<->');

WHITESPACE : (' ' | '\t' | '\r' | '\n') { $channel = HIDDEN; };

Link to antlr 3.3 binary: http://www.antlr.org/download/antlr-3.3-complete.jar

you will need to try to match the program rule in order to match the complete file.

testable with this class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) {
        AntlrFormulaLexer lexer = new AntlrFormulaLexer(new ANTLRStringStream("(~ab)"));
        AntlrFormulaParser p = new AntlrFormulaParser(new CommonTokenStream(lexer));

        try {
            p.program();
            if ( p.failed() || p.getNumberOfSyntaxErrors() != 0) {
                System.out.println("failed");
            }
        } catch (RecognitionException e) {
            e.printStackTrace();
        }
    }
}

回复收藏 0 原文

~没有更多了~