根据上下文有选择地跳过换行符

发布于 2024-12-22 11:01:12 字数 498 浏览 1 评论 0原文

我必须解析由两部分组成的文件。在第一个中，必须跳过新行。在第二个中，它们很重要并用作分隔符。

我想避免像 http://www.antlr.org/wiki 这样的解决方案/pages/viewpage.action?pageId=1734 并使用谓词。

目前，我有类似的情况：

WS:     ( ' ' | '\t' | NEWLINE) {SKIP();};
fragment NEWLINE : '\r'|'\n'|'\r\n';

我尝试添加一个动态作用域变量 keepNewline ，该变量在“输入”文件的第二部分时设置为 true 。

但是，我无法创建正确的谓词来关闭换行符的“跳过”。

任何帮助将不胜感激。

此致。

原文

I must parse files made of two parts. In the first one, new lines must be skipped. In the second one, they are important and used as a delimiter.

I want to avoid solutions like http://www.antlr.org/wiki/pages/viewpage.action?pageId=1734 and use predicate instead.

For the moment, I have something like:

WS:     ( ' ' | '\t' | NEWLINE) {SKIP();};
fragment NEWLINE : '\r'|'\n'|'\r\n';

I tried to add a dynamically scoped variable keepNewline that is set to true when "entering" second part of the file.

However, I am not able to create the correct predicate to switch off the "skipping" of newlines.

Any help would be greatly appreciated.

Best regards.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳絮泡泡 2024-12-29 11:01:12

这比您想象的要容易：您甚至不需要谓词。

假设您只想在

...

标记内保留换行符。下面的虚拟语法就是这样做的：

grammar Pre;

@lexer::members {
  private boolean keepNewLine = false;
}

parse
 : (t=. 
    {
     System.out.printf("\%-10s '\%s'\n", tokenNames[$t.type], $t.text.replace("\n", "\\n"));
    }
   )* 
   EOF
 ;

Word
 : ('a'..'z' | 'A'..'Z')+
 ;

OPr
 : '<pre>' {keepNewLine = true;}
 ;

CPr
 : '</pre>' {keepNewLine = false;}
 ;

NewLine
 : ('\r'? '\n' | '\r') {if(!keepNewLine) skip();}
 ;

Space
 : (' ' | '\t') {skip();}
 ;

您可以使用类进行测试：

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    PreLexer lexer = new PreLexer(new ANTLRFileStream("in.txt"));
    PreParser parser = new PreParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

如果 in.txt 将包含：

foo  bar
<pre>
a

b
</pre>


baz

运行 Main 类的输出将是：

Word       'foo'
Word       'bar'
OPr        '<pre>'
NewLine    '\n'
Word       'a'
NewLine    '\n'
NewLine    '\n'
Word       'b'
NewLine    '\n'
CPr        '</pre>'
Word       'baz'

It's easier than you might think: you don't even need a predicate.

Let's say you want to preserve line breaks only inside <pre>...</pre> tags. The following dummy grammar does just that:

grammar Pre;

@lexer::members {
  private boolean keepNewLine = false;
}

parse
 : (t=. 
    {
     System.out.printf("\%-10s '\%s'\n", tokenNames[$t.type], $t.text.replace("\n", "\\n"));
    }
   )* 
   EOF
 ;

Word
 : ('a'..'z' | 'A'..'Z')+
 ;

OPr
 : '<pre>' {keepNewLine = true;}
 ;

CPr
 : '</pre>' {keepNewLine = false;}
 ;

NewLine
 : ('\r'? '\n' | '\r') {if(!keepNewLine) skip();}
 ;

Space
 : (' ' | '\t') {skip();}
 ;

which you can test with the class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    PreLexer lexer = new PreLexer(new ANTLRFileStream("in.txt"));
    PreParser parser = new PreParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

And if in.txt would contain:

foo  bar
<pre>
a

b
</pre>


baz

the output of running the Main class would be:

Word       'foo'
Word       'bar'
OPr        '<pre>'
NewLine    '\n'
Word       'a'
NewLine    '\n'
NewLine    '\n'
Word       'b'
NewLine    '\n'
CPr        '</pre>'
Word       'baz'

回复收藏 0 原文

~没有更多了~