这个antlr语法与这个输入不匹配的原因是什么?

发布于 2024-10-18 17:45:58 字数 4861 浏览 2 评论 0原文

对于提出如此类似的问题,我提前表示歉意,但我感到相当沮丧,而且我可能能够更好地解释新问题。

我正在尝试重写结构化文件的部分内容,并考虑使用antlr。 这些文件是 {X} 标记行。 我正在寻找一个字符,这样如果我找到它,我就可以重写文件的不同部分。 但是这个字符('#')可以出现在文件的许多部分。但是,如果它出现在第四个 {#} 上,它会确定我是否需要以某种方式或以另一种方式重写下一个 {X} 的一部分,或者根本不需要(如果那里没有任何内容)。

典型输入:

{ 1 }{ 去哪里? # }{ 去哪里? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }

{ 2 }{ 开车即可。 }{ 开车就行。 }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ 不在这里。 }

(我添加了第一个 #,所以你会看到它可以在那里。 我的语法,antlr 3.3 - 它警告“第 1:31 行在输入 '{ # }' 处没有可行的替代方案”和“第 2:35 行在输入 '{ 0 }' 处没有可行的替代方案”

grammar VampireDialog;

options
{
output=AST;
ASTLabelType=CommonTree;
language=Java;
} 
tokens
{
REWRITE;
}

@parser::header {
import java.util.LinkedList;
import java.io.File;
}

@members {
//the lookahead type i'm using ( ()=> ) wraps everything after in a if, including the actions ( {} )
//that i need to use to prepare the arguments for the replace rules. Declare them global.
    String condition, command, wrappedCommand; boolean isEmpty, alreadyProcessed;

    public static void main(String[] args) throws Exception {
        File vampireDir = new File(System.getProperty("user.home"), "Desktop/Vampire the Masquerade - Bloodlines/Vampire the Masquerade - Bloodlines/Vampire/dlg/dummy");
        
        List<File> files = new LinkedList<File>();
        getFiles(256, new File[]{vampireDir}, files, new LinkedList<File>());
        for (File f : files) {
            if (f.getName().endsWith(".dlg")) {
                System.out.println(f.getName());
                VampireDialogLexer lex = new VampireDialogLexer(new ANTLRFileStream(f.getAbsolutePath(), "Windows-1252"));
                TokenRewriteStream tokens = new TokenRewriteStream(lex);
                VampireDialogParser parser = new VampireDialogParser(tokens);
                    Tree t = (Tree) parser.dialog().getTree();
                    System.out.println(t.toStringTree());
            }
        }
    }

    public static void getFiles(int levels, File[] search, List<File> files, List<File> directories) {
        for (File f : search) {
            if (!f.exists()) {
                throw new AssertionError("Search file array has non-existing files");
            }
        }
        getFilesAux(levels, search, files, directories);
    }

    private static void getFilesAux(int levels, File[] startFiles, List<File> files, List<File> directories) {
        List<File[]> subFilesList = new ArrayList<File[]>(50);
        for (File f : startFiles) {
            File[] subFiles = f.listFiles();
            if (subFiles == null) {
                files.add(f);
            } else {
                directories.add(f);
                subFilesList.add(subFiles);
            }
        }

        if (levels > 0) {
            for (File[] subFiles : subFilesList) {
                getFilesAux(levels - 1, subFiles, files, directories);
            }
        }
    }
}




/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
dialog : (ANY ANY ANY npc_or_pc ANY* NL*)*;

npc_or_pc : (ANY ANY) =>
         pc_marker  pc_condition
|        npc_marker npc_condition;


pc_marker  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?;
npc_marker :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?;

pc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Count()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

npc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Reset()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

marker_text :    TEXT;
condition_text : TEXT;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
//in the parser ~('#') means: "match any token except the token that matches '#'" 
//and in lexer rules ~('#') means: "match any character except '#'"


TEXT : ~('{'|NL|'}')*;
ANY : '{' TEXT '}';
NL : ( '\r' | '\n'| '\u000C');

I apologize in advance about asking a so similar question, but i'm rather frustrated, and i will probably be able to explain better on a new question.

I'm trying to rewrite parts of a structured file, and thought of using antlr.
These files are lines of {X} tokens.
There is a character i'm looking ahead for, so that i can rewrite parts of the file different if i find it.
But this character ( '#' ), can occur in many parts of the file. However, if it appears on 4th {#} it determines if i need to rewrite part of the next {X} in a way, or in another way, or not at all (if there is nothing there).

Typical input:

{ 1 }{ Where to? # }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }

{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

(I added the first # so you see it can be there.)
My grammar, antlr 3.3 - it is warns "line 1:31 no viable alternative at input '{ # }'" and "line 2:35 no viable alternative at input '{ 0 }'"

grammar VampireDialog;

options
{
output=AST;
ASTLabelType=CommonTree;
language=Java;
} 
tokens
{
REWRITE;
}

@parser::header {
import java.util.LinkedList;
import java.io.File;
}

@members {
//the lookahead type i'm using ( ()=> ) wraps everything after in a if, including the actions ( {} )
//that i need to use to prepare the arguments for the replace rules. Declare them global.
    String condition, command, wrappedCommand; boolean isEmpty, alreadyProcessed;

    public static void main(String[] args) throws Exception {
        File vampireDir = new File(System.getProperty("user.home"), "Desktop/Vampire the Masquerade - Bloodlines/Vampire the Masquerade - Bloodlines/Vampire/dlg/dummy");
        
        List<File> files = new LinkedList<File>();
        getFiles(256, new File[]{vampireDir}, files, new LinkedList<File>());
        for (File f : files) {
            if (f.getName().endsWith(".dlg")) {
                System.out.println(f.getName());
                VampireDialogLexer lex = new VampireDialogLexer(new ANTLRFileStream(f.getAbsolutePath(), "Windows-1252"));
                TokenRewriteStream tokens = new TokenRewriteStream(lex);
                VampireDialogParser parser = new VampireDialogParser(tokens);
                    Tree t = (Tree) parser.dialog().getTree();
                    System.out.println(t.toStringTree());
            }
        }
    }

    public static void getFiles(int levels, File[] search, List<File> files, List<File> directories) {
        for (File f : search) {
            if (!f.exists()) {
                throw new AssertionError("Search file array has non-existing files");
            }
        }
        getFilesAux(levels, search, files, directories);
    }

    private static void getFilesAux(int levels, File[] startFiles, List<File> files, List<File> directories) {
        List<File[]> subFilesList = new ArrayList<File[]>(50);
        for (File f : startFiles) {
            File[] subFiles = f.listFiles();
            if (subFiles == null) {
                files.add(f);
            } else {
                directories.add(f);
                subFilesList.add(subFiles);
            }
        }

        if (levels > 0) {
            for (File[] subFiles : subFilesList) {
                getFilesAux(levels - 1, subFiles, files, directories);
            }
        }
    }
}




/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
dialog : (ANY ANY ANY npc_or_pc ANY* NL*)*;

npc_or_pc : (ANY ANY) =>
         pc_marker  pc_condition
|        npc_marker npc_condition;


pc_marker  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?;
npc_marker :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?;

pc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Count()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

npc_condition : '{' condition_text '}'
   { 
     condition = $condition_text.tree.toStringTree();
     isEmpty = condition.trim().isEmpty();
     command = "npc.Reset()";
     wrappedCommand  =  "("+condition+") and "+ command;
     alreadyProcessed = condition.endsWith(command);
   }
   -> {alreadyProcessed}?   '{' condition_text '}'
   -> {isEmpty}?            '{' REWRITE[command] '}'
   ->                       '{' REWRITE[wrappedCommand] '}';

marker_text :    TEXT;
condition_text : TEXT;


/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
//in the parser ~('#') means: "match any token except the token that matches '#'" 
//and in lexer rules ~('#') means: "match any character except '#'"


TEXT : ~('{'|NL|'}')*;
ANY : '{' TEXT '}';
NL : ( '\r' | '\n'| '\u000C');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我恋#小黄人 2024-10-25 17:45:58

您的语法中有 3 处错误:


问题

#1

在您的 npc_or_pc 规则中:

npc_or_pc 
  :  (ANY ANY)=> pc_marker  pc_condition 
  |              npc_marker npc_condition
  ;

您不应该向前查找 ANY ANY,因为这会同时满足 >pc_marker npc_marker。您应该先查找 pc_marker,然后查找 ANY(或 pc_condition)。

#2

在您的 pc_conditionnpc_condition 规则中:

pc_condition 
  :  '{' condition_text '}'
  ;

npc_condition 
  :  '{' condition_text '}'
  ;

您使用的是标记 {}但词法分析器永远不会创建这样的标记。一旦词法分析器看到一个{,它后面总是跟着TEXT '}',所以词法分析器生成的唯一标记将是ANYNL 类型:这些是解析器可用的唯一标记,这给我们带来了问题 3:

3

在您的规则中 marker_textcondition_text

marker_text    : TEXT;
condition_text : TEXT;

您使用的是令牌TEXT,它永远不会成为令牌流的一部分(请参阅#2)。


解决方案

#1

更改前向查找 pc_marker

npc_or_pc 
  :  (pc_marker ... )=> pc_marker  ...
  |                     npc_marker ...
  ;

#2

删除 pc_conditionnpc_condition 规则并将其替换为 ANY 标记:

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY
  |                    npc_marker ANY
  ;

#3

删除 marker_textcondition_text 规则,自从删除了 pc_condition 后,您就不再需要它们了npc_condition 已经。


演示

这是您修改后的语法:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY* NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY {System.out.print("PC  :: ");}
  |                    npc_marker ANY {System.out.print("NPC :: ");}
  ;


pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

npc_marker 
  :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?
  ;

TEXT : ~('{'|NL|'}')*;
ANY  : '{' TEXT '}';
NL   : ( '\r' | '\n'| '\u000C');

甚至是稍短的等效语法:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY+ NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker {System.out.print("PC  :: ");}
  |                    ANY       {System.out.print("NPC :: ");}
  ;

pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

ANY  : '{' ~('{'|NL|'}')* '}';
NL   : ( '\r' | '\n'| '\u000C');

可以使用以下命令进行测试:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = 
                "{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }\n" + 
                "{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }\n";
        ANTLRStringStream in = new ANTLRStringStream(source);
        VampireDialogLexer lexer = new VampireDialogLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        VampireDialogParser parser = new VampireDialogParser(tokens);
        parser.dialog();
    }
}

它将在控制台上打印以下内容:

NPC :: { 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
PC  :: { 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }

There are 3 things going wrong in your grammar:


Problems

#1

In your npc_or_pc rule:

npc_or_pc 
  :  (ANY ANY)=> pc_marker  pc_condition 
  |              npc_marker npc_condition
  ;

you shouldn't be looking ahead for ANY ANY, because that would satisfy both pc_marker and npc_marker. You should look ahead for pc_marker followed by ANY (or pc_condition).

#2

In both your pc_condition and npc_condition rules:

pc_condition 
  :  '{' condition_text '}'
  ;

npc_condition 
  :  '{' condition_text '}'
  ;

you're using the tokens { and } but the lexer will never create such tokens. As soon as the lexer sees a { it will always be followed by TEXT '}', so the only tokens that the lexer produces will be of type ANY and NL: those are the only tokens available for the parser, which brings us to problem 3:

3

In your rules marker_text and condition_text:

marker_text    : TEXT;
condition_text : TEXT;

you're using the token TEXT, which will never be a part of the token stream (see #2).


Solutions

#1

Change the look ahead to look for pc_marker instead:

npc_or_pc 
  :  (pc_marker ... )=> pc_marker  ...
  |                     npc_marker ...
  ;

#2

Remove both the pc_condition and npc_condition rules and replace them by ANY tokens:

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY
  |                    npc_marker ANY
  ;

#3

Remove both the marker_text and condition_text rules, you don't need them anymore since you removed pc_condition and npc_condition already.


Demo

Here's your modified grammar:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY* NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker  ANY {System.out.print("PC  :: ");}
  |                    npc_marker ANY {System.out.print("NPC :: ");}
  ;


pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

npc_marker 
  :  t=ANY {!t.getText().trim().isEmpty() &&  t.getText().contains("#")}?
  ;

TEXT : ~('{'|NL|'}')*;
ANY  : '{' TEXT '}';
NL   : ( '\r' | '\n'| '\u000C');

or even the slightly shorter equivalent:

grammar VampireDialog;

dialog 
  :  (line {System.out.print($line.text);})* EOF
  ;

line
  :  ANY ANY ANY npc_or_pc ANY+ NL+
  ;

npc_or_pc 
  :  (pc_marker ANY)=> pc_marker {System.out.print("PC  :: ");}
  |                    ANY       {System.out.print("NPC :: ");}
  ;

pc_marker  
  :  t=ANY {!t.getText().trim().isEmpty() && !t.getText().contains("#")}?
  ;

ANY  : '{' ~('{'|NL|'}')* '}';
NL   : ( '\r' | '\n'| '\u000C');

which can be tested with:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = 
                "{ 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }\n" + 
                "{ 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }\n";
        ANTLRStringStream in = new ANTLRStringStream(source);
        VampireDialogLexer lexer = new VampireDialogLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        VampireDialogParser parser = new VampireDialogParser(tokens);
        parser.dialog();
    }
}

which will print the following to the console:

NPC :: { 1 }{ Where to? }{ Where to? }{ # }{ }{ G.Cabbie_Line = 1 }{ }{ }{ }{ }{ }{ }{ }
PC  :: { 2 }{ Just drive. }{ Just drive. }{ 0 }{ }{ npc.WorldMap( G.WorldMap_State ) }{ }{ }{ }{ }{ }{ }{ Not here. }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文