ANTLR:如何将所有定义为空格的字符替换为实际空格

发布于 2024-10-08 18:53:44 字数 749 浏览 4 评论 0原文

我的 ANTLR 代码如下:

LPARENTHESIS : ('('); 
RPARENTHESIS : (')'); 

fragment CHARACTER : ('a'..'z'|'0'..'9'|); 
fragment QUOTE     : ('"'); 
fragment WILDCARD  : ('*'); 
fragment SPACE     : (' '|'\n'|'\r'|'\t'|'\u000C'|';'|':'|','); 

WILD_STRING 
   : (CHARACTER)* 
     ( 
       ('?') 
       (CHARACTER)* 
     )+ 
   ; 
PREFIX_STRING 
   : (CHARACTER)+
     ( 
       ('*')  
     )+ 
   ; 
WS     : (SPACE) { $channel=HIDDEN; }; 
PHRASE : (QUOTE)(LPARENTHESIS)?(WORD)(WILDCARD)?(RPARENTHESIS)?((SPACE)+(LPARENTHESIS)?(WORD)(WILDCARD)?(RPARENTHESIS)?)*(SPACE)+(QUOTE); 
WORD   : (CHARACTER)+; 

我想要做的是将所有标记为空格的字符替换为 PHRASE 中的实际空格字符。另外,如果可能的话,我希望所有连续空间都由单个空间表示。

任何帮助将不胜感激。由于某种原因,我发现很难理解 ANTLR。有什么好的教程吗?

My ANTLR code is as follow :

LPARENTHESIS : ('('); 
RPARENTHESIS : (')'); 

fragment CHARACTER : ('a'..'z'|'0'..'9'|); 
fragment QUOTE     : ('"'); 
fragment WILDCARD  : ('*'); 
fragment SPACE     : (' '|'\n'|'\r'|'\t'|'\u000C'|';'|':'|','); 

WILD_STRING 
   : (CHARACTER)* 
     ( 
       ('?') 
       (CHARACTER)* 
     )+ 
   ; 
PREFIX_STRING 
   : (CHARACTER)+
     ( 
       ('*')  
     )+ 
   ; 
WS     : (SPACE) { $channel=HIDDEN; }; 
PHRASE : (QUOTE)(LPARENTHESIS)?(WORD)(WILDCARD)?(RPARENTHESIS)?((SPACE)+(LPARENTHESIS)?(WORD)(WILDCARD)?(RPARENTHESIS)?)*(SPACE)+(QUOTE); 
WORD   : (CHARACTER)+; 

What I would like to do is to replace all characters marked as space to be replaced with actual space character in the PHRASE. Also if possible, I would then like all continuous spaces to be represented by a single space.

Any help would be most appreciated. For some reason, I am finding it hard to understand ANTLR. Any good tutorials out there ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花间憩 2024-10-15 18:53:44

Java

调用您的 词法分析器的 setText(...) 方法< /a>:

grammar T;

parse
  :  words EOF {System.out.println($words.text);}
  ;

words    
  :  Word (Spaces Word)* 
  ;

Word  
  :  ('a'..'z'|'A'..'Z')+
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n')+ {setText(" ");}
  ;

可以使用 class: 进行测试,

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "This         is     \n    just \t\t\t\t\t\t a \n\t\t test";
        ANTLRStringStream in = new ANTLRStringStream(source);
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        System.out.println("------------------------------\nSource:\n" + source +
                "\n------------------------------\nAfter parsing:");
        parser.parse();
    }
}

产生以下输出:

------------------------------
Source:
This         is     
    just                         a 
         test
------------------------------
After parsing:
This is just a test

Puneet Pawaia 写道:

任何帮助将不胜感激。由于某种原因,我发现很难理解 ANTLR。有什么好的教程吗?

ANTLR Wiki 有大量信息,尽管有点无组织(但这可能就是我!)。

最好的 ANTLR 教程是这本书:权威 ANTLR 参考:构建领域特定语言< /a>.

C#

对于 C# 目标,请尝试以下操作:

grammar T;

options {
  language=CSharp2;
}

@parser::namespace { Demo }
@lexer::namespace { Demo }

parse
  :  words EOF {Console.WriteLine($words.text);}
  ;

words    
  :  Word (Spaces Word)* 
  ;

Word  
  :  ('a'..'z'|'A'..'Z')+
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n')+ {Text = " ";}
  ;

使用测试类:

using System;
using Antlr.Runtime;

namespace Demo
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            ANTLRStringStream Input = new ANTLRStringStream("This         is     \n    just \t\t\t\t\t\t a \n\t\t test"); 
            TLexer Lexer = new TLexer(Input);
            CommonTokenStream Tokens = new CommonTokenStream(Lexer);
            TParser Parser = new TParser(Tokens);
            Parser.parse();
        }
    }
}

它还会将 This is just a test 打印到控制台。我尝试使用 SetText(...) 而不是 setText(...) 但这也不起作用,并且 C# API 文档 目前已离线,因此我使用了试验和错误破解 {Text = " “;}。我使用 C# 3.1.1 运行时 DLL 进行了测试。

祝你好运!

Java

Invoke your lexer's setText(...) method:

grammar T;

parse
  :  words EOF {System.out.println($words.text);}
  ;

words    
  :  Word (Spaces Word)* 
  ;

Word  
  :  ('a'..'z'|'A'..'Z')+
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n')+ {setText(" ");}
  ;

Which can be tested with the class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "This         is     \n    just \t\t\t\t\t\t a \n\t\t test";
        ANTLRStringStream in = new ANTLRStringStream(source);
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        System.out.println("------------------------------\nSource:\n" + source +
                "\n------------------------------\nAfter parsing:");
        parser.parse();
    }
}

which produces the following output:

------------------------------
Source:
This         is     
    just                         a 
         test
------------------------------
After parsing:
This is just a test

Puneet Pawaia wrote:

Any help would be most appreciated. For some reason, I am finding it hard to understand ANTLR. Any good tutorials out there ?

The ANTLR Wiki has loads of informative info, albeit a bit unstructured (but that could just be me!).

The best ANTLR tutorial is the book: The Definitive ANTLR Reference: Building Domain-Specific Languages.

C#

For the C# target, try this:

grammar T;

options {
  language=CSharp2;
}

@parser::namespace { Demo }
@lexer::namespace { Demo }

parse
  :  words EOF {Console.WriteLine($words.text);}
  ;

words    
  :  Word (Spaces Word)* 
  ;

Word  
  :  ('a'..'z'|'A'..'Z')+
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n')+ {Text = " ";}
  ;

with the test class:

using System;
using Antlr.Runtime;

namespace Demo
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            ANTLRStringStream Input = new ANTLRStringStream("This         is     \n    just \t\t\t\t\t\t a \n\t\t test"); 
            TLexer Lexer = new TLexer(Input);
            CommonTokenStream Tokens = new CommonTokenStream(Lexer);
            TParser Parser = new TParser(Tokens);
            Parser.parse();
        }
    }
}

which also prints This is just a test to the console. I tried to use SetText(...) instead of setText(...) but that didn't work either, and the C# API docs are currently off-line, so I used the trial and error-hack {Text = " ";}. I tested it with the C# 3.1.1 runtime DLL's.

Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文