某处是否有 java 字符串文字的 jflex 规范?

发布于 2024-08-19 22:57:21 字数 961 浏览 5 评论 0原文

我所说的字符串文字是指那些也包含类似 \123 的字符的字符串。 我写了一些东西,但我不知道它是否完美:

<STRING> {
  \"                             { yybegin(YYINITIAL); 
                                   return new Token(TokenType.STRING,string.toString()); }
  \\[0-3][0-7][0-7]              { string.append( yytext() ); }
  \\[0-3][0-7]                   { string.append( yytext() ); }
  \\[0-7]                        { string.append( yytext() ); }
  [^\n\r\"\\]+                   { string.append( yytext() ); }
  \\t                            { string.append('\t'); }
  \\n                            { string.append('\n'); }

  \\r                            { string.append('\r'); }
  \\\"                           { string.append('\"'); }
  \\                             { string.append('\\'); }
}

事实上,我知道这并不完美,因为对于解析 \ddd 类字符的三行,我不把字符串中的字符本身,而是其表示形式。 我可能会尝试使用字符方法来转换它,但也许我并不详尽,也许还有其他我没有处理的转义序列......所以如果有一个规范的 jflex 文件那就太完美了。

And by string literals I mean those containing \123-like characters too.
I've written something but I don't know if it's perfect:

<STRING> {
  \"                             { yybegin(YYINITIAL); 
                                   return new Token(TokenType.STRING,string.toString()); }
  \\[0-3][0-7][0-7]              { string.append( yytext() ); }
  \\[0-3][0-7]                   { string.append( yytext() ); }
  \\[0-7]                        { string.append( yytext() ); }
  [^\n\r\"\\]+                   { string.append( yytext() ); }
  \\t                            { string.append('\t'); }
  \\n                            { string.append('\n'); }

  \\r                            { string.append('\r'); }
  \\\"                           { string.append('\"'); }
  \\                             { string.append('\\'); }
}

In fact, I know this isn't perfect, since for the three lines parsing \ddd-like characters, I don't put the character itself in the string, but its representation instead.
I may try to convert it using Character methods, but then maybe I'm not exhaustive, maybe there are other escape sequences I didn't handle.... so if there is a canonical jflex file for that it would be perfect.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦巷 2024-08-26 22:57:21

查看 JLS 时,段落 3.10.5字符串文字,它定义字符串文字如下:

    StringLiteral:
      " StringCharacters* "

    StringCharacters:
      StringCharacter
      StringCharacters StringCharacter

    StringCharacter:
      InputCharacter but not " or \
      EscapeSequence

其中 EscapeSequence 定义于 3.10.6

    EscapeSequence:
      \ b            /* \u0008: backspace BS */
      \ t            /* \u0009: horizontal tab HT */
      \ n            /* \u000a: linefeed LF */
      \ f            /* \u000c: form feed FF */
      \ r            /* \u000d: carriage return CR */
      \ "            /* \u0022: double quote " */
      \ '            /* \u0027: single quote ' */
      \ \            /* \u005c: backslash \ */
      OctalEscape    /* \u0000 to \u00ff: from octal value */

    OctalEscape:
      \ OctalDigit
      \ OctalDigit OctalDigit
      \ ZeroToThree OctalDigit OctalDigit

    OctalDigit: one of
      0 1 2 3 4 5 6 7

    ZeroToThree: one of
      0 1 2 3

请注意,目前 \' 也是字符串文字中的有效转义序列,你仍然错过了几个转义序列。您可能还需要考虑 Java 源文件中可能存在的 Unicode 转义(因此也存在于字符串文字中): \u HEX HEX HEX HEX 其中 HEX0-9 之一 | AF

When looking at the JLS, paragraph 3.10.5 String Literals, it defines String literals as follows:

    StringLiteral:
      " StringCharacters* "

    StringCharacters:
      StringCharacter
      StringCharacters StringCharacter

    StringCharacter:
      InputCharacter but not " or \
      EscapeSequence

where an EscapeSequence is defined in 3.10.6:

    EscapeSequence:
      \ b            /* \u0008: backspace BS */
      \ t            /* \u0009: horizontal tab HT */
      \ n            /* \u000a: linefeed LF */
      \ f            /* \u000c: form feed FF */
      \ r            /* \u000d: carriage return CR */
      \ "            /* \u0022: double quote " */
      \ '            /* \u0027: single quote ' */
      \ \            /* \u005c: backslash \ */
      OctalEscape    /* \u0000 to \u00ff: from octal value */

    OctalEscape:
      \ OctalDigit
      \ OctalDigit OctalDigit
      \ ZeroToThree OctalDigit OctalDigit

    OctalDigit: one of
      0 1 2 3 4 5 6 7

    ZeroToThree: one of
      0 1 2 3

Note that \' is also a valid escape sequence in a String literal and at the moment, you still miss a couple of escape sequences. You may also want to account for Unicode escapes that can be present in Java source files (and thus in String literals as well): \u HEX HEX HEX HEX where HEX is one of 0-9 | A-F.

浪推晚风 2024-08-26 22:57:21

是的。下载 JFlex 并查看文件 examples/java/java.flex。它具有 Java 语言所有词汇组件的 JFlex 语法定义。

干杯。

Yes. Download JFlex an see the files examples/java/java.flex. It has the definitions in JFlex syntax for all of the lexical components of the Java language.

Cheers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文