如何匹配字符串,但不区分大小写?

发布于 2024-08-13 01:21:42 字数 320 浏览 4 评论 0原文

假设我想匹配“beer”,但不关心大小写。

目前我将一个标记定义为 ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') 但我有很多这样的标记,但实际上并没有想要处理“确实这是非常长的令牌,确实是肌肉炎”。

antlr 维基 似乎表明它无法完成(在antlr中)...但我只是想知道是否有人有一些聪明的技巧...

Let's say that I want to match "beer", but don't care about case sensitivity.

Currently I am defining a token to be ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') but I have a lot of such and don't really want to handle 'verilythisisaverylongtokenindeedomyyesitis'.

The antlr wiki seems to suggest that it can't be done (in antlr) ... but I just wondered if anyone had some clever tricks ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

对你再特殊 2024-08-20 01:21:42

我想添加已接受的答案:可以在 不区分大小写的antlr构建块,为了方便起见,下面包含相关部分

fragment A:[aA];
fragment B:[bB];
fragment C:[cC];
fragment D:[dD];
fragment E:[eE];
fragment F:[fF];
fragment G:[gG];
fragment H:[hH];
fragment I:[iI];
fragment J:[jJ];
fragment K:[kK];
fragment L:[lL];
fragment M:[mM];
fragment N:[nN];
fragment O:[oO];
fragment P:[pP];
fragment Q:[qQ];
fragment R:[rR];
fragment S:[sS];
fragment T:[tT];
fragment U:[uU];
fragment V:[vV];
fragment W:[wW];
fragment X:[xX];
fragment Y:[yY];
fragment Z:[zZ];

所以一个例子是

   HELLOWORLD : H E L L O W O R L D;

I would like to add to the accepted answer: a ready -made set can be found at case insensitive antlr building blocks, and the relevant portion included below for convenience

fragment A:[aA];
fragment B:[bB];
fragment C:[cC];
fragment D:[dD];
fragment E:[eE];
fragment F:[fF];
fragment G:[gG];
fragment H:[hH];
fragment I:[iI];
fragment J:[jJ];
fragment K:[kK];
fragment L:[lL];
fragment M:[mM];
fragment N:[nN];
fragment O:[oO];
fragment P:[pP];
fragment Q:[qQ];
fragment R:[rR];
fragment S:[sS];
fragment T:[tT];
fragment U:[uU];
fragment V:[vV];
fragment W:[wW];
fragment X:[xX];
fragment Y:[yY];
fragment Z:[zZ];

So an example is

   HELLOWORLD : H E L L O W O R L D;
旧情勿念 2024-08-20 01:21:42

如何为每个允许的标识符字符定义一个词法分析器标记,然后将解析器标记构造为一系列这些标记?

beer: B E E R;

A : 'A'|'a';
B: 'B'|'b';

ETC。

How about define a lexer token for each permissible identifier character, then construct the parser token as a series of those?

beer: B E E R;

A : 'A'|'a';
B: 'B'|'b';

etc.

灰色世界里的红玫瑰 2024-08-20 01:21:42

ANTLR 刚刚添加了不区分大小写的选项

options { caseInsensitive = true; }

https:// github.com/antlr/antlr4/blob/master/doc/options.md#caseinsensitive

旧链接现已损坏,这些应该可以继续工作。

A case-insensitive option was just added to ANTLR

options { caseInsensitive = true; }

https://github.com/antlr/antlr4/blob/master/doc/options.md#caseinsensitive

The old links are now broken, these should continue to work.

孤凫 2024-08-20 01:21:42

定义不区分大小写的标记

BEER: [Bb] [Ee] [Ee] [Rr];

Define case-insensitive tokens with

BEER: [Bb] [Ee] [Ee] [Rr];
聚集的泪 2024-08-20 01:21:42

新的文档页面已出现在 ANTLR GitHub 存储库中: Case-不敏感的词法分析。您可以使用两种方法:

  1. @javadba's 答案中描述的一种方法
  2. 或者将字符流添加到代码中,这会将输入流转换为小写或大写。您可以在同一文档页面上找到主要语言的示例。

我认为,最好使用第一种方法并拥有描述所有规则的语法。但如果您使用众所周知的语法,例如来自 为 ANTLR v4 编写的语法,那么第二个方法可能更合适。

New documentation page has appeared in ANTLR GitHub repo: Case-Insensitive Lexing. You can use two approaches:

  1. The one described in @javadba's answer
  2. Or add a character stream to your code, which will transform an input stream to lower or upper case. Examples for the main languages you can find on the same doc page.

My opinion, it's better to use the first approach and have the grammar which describes all the rules. But if you use well-known grammar, for example from Grammars written for ANTLR v4, then second approach may be more appropriate.

女皇必胜 2024-08-20 01:21:42

我使用antlr-4.7.1-complete.jar生成TrinoSqlParser java代码,但发现“警告(83):TrinoLexer.g4:22:4:不支持的选项caseInsensitive”。

因此,我尝试使用 com.facebook.presto.sql.parser.CaseInsensitiveStream 来包装 charstream,然后它就可以工作了。看起来 CaseInsensitiveStream 只是将小写转换为大写。

我的代码如下:

@Test
public void testSqlRewriterL() {
    CaseInsensitiveStream upper = new CaseInsensitiveStream(CharStreams.fromString((scripts.get(1))));
    TrinoLexer lexer = new TrinoLexer(upper);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    TrinoParser sqlBaseParser = new TrinoParser(tokens);
    SqlRewriterL sqlRewriterL = new SqlRewriterL(tokens);
    ParseTreeWalker walker = new ParseTreeWalker();
    walker.walk(sqlRewriterL, sqlBaseParser.statement());
    System.out.println(sqlRewriterL.getResult());
}

I use antlr-4.7.1-complete.jar to generate TrinoSqlParser java code, but found "warning(83): TrinoLexer.g4:22:4: unsupported option caseInsensitive".

So, I try to use com.facebook.presto.sql.parser.CaseInsensitiveStream to wrap the charstream, then it works. Seems like that CaseInsensitiveStream just transform lowercase to upppercase.

My code is below:

@Test
public void testSqlRewriterL() {
    CaseInsensitiveStream upper = new CaseInsensitiveStream(CharStreams.fromString((scripts.get(1))));
    TrinoLexer lexer = new TrinoLexer(upper);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    TrinoParser sqlBaseParser = new TrinoParser(tokens);
    SqlRewriterL sqlRewriterL = new SqlRewriterL(tokens);
    ParseTreeWalker walker = new ParseTreeWalker();
    walker.walk(sqlRewriterL, sqlBaseParser.statement());
    System.out.println(sqlRewriterL.getResult());
}
萌酱 2024-08-20 01:21:42

我在 C# 中使用的解决方案:使用 ASCII 代码将字符转换为较小的大小写。

class CaseInsensitiveStream : Antlr4.Runtime.AntlrInputStream {
  public CaseInsensitiveStream(string sExpr)
     : base(sExpr) {
  }
  public override int La(int index) {
     if(index == 0) return 0;
     if(index < 0) index++;
     int pdx = p + index - 1;
     if(pdx < 0 || pdx >= n) return TokenConstants.Eof;
     var x1 = data[pdx];
     return (x1 >= 65 && x1 <= 90) ? (97 + x1 - 65) : x1;
  }
}

A solution I used in C#: use ASCII code to shift character to smaller case.

class CaseInsensitiveStream : Antlr4.Runtime.AntlrInputStream {
  public CaseInsensitiveStream(string sExpr)
     : base(sExpr) {
  }
  public override int La(int index) {
     if(index == 0) return 0;
     if(index < 0) index++;
     int pdx = p + index - 1;
     if(pdx < 0 || pdx >= n) return TokenConstants.Eof;
     var x1 = data[pdx];
     return (x1 >= 65 && x1 <= 90) ? (97 + x1 - 65) : x1;
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文