fslex 中的 Lua 长字符串
我在业余时间一直在研究 Lua fslex 词法分析器,使用 ocamllex 手册作为参考。
我在尝试正确标记长字符串时遇到了一些障碍。 “长字符串”由 '[' ('=')* '['
和 ']' ('=')* ']'
标记分隔; =
符号的数量必须相同。
在第一个实现中,词法分析器似乎无法识别 [[
模式,尽管有最长匹配规则,但仍生成两个 LBRACKET
标记,而 [=[
以及正确识别的变化。此外,正则表达式无法确保使用正确的结束标记,在第一个 ']' ('=')* ']'
捕获处停止,无论实际的长字符串“级别如何” ”。此外,fslex 似乎不支持正则表达式中的“as”结构。
let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']'
(* ... *)
| lualongstring { (* ... *) }
| '[' { LBRACKET }
| ']' { RBRACKET }
(* ... *)
我一直在尝试用词法分析器中的另一条规则来解决这个问题:
rule tokenize = parse
(* ... *)
| '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf }
(* ... *)
and longstring level = parse
| ']' ('=')* ']' { (* check level, do something *) }
| _ { (* aggregate other chars *) }
(* or *)
| _ {
let c = lexbuf.LexerChar(0);
(* ... *)
}
但是我被困住了,原因有两个:首先,我认为我不能“推送”,可以这么说,一次将令牌推到下一个规则我读完了长字符串;其次,我不喜欢逐字符读取直到找到正确的结束标记的想法,这使得当前的设计毫无用处。
如何在 fslex 中标记 Lua 长字符串?感谢您的阅读。
I've been working on a Lua fslex lexer in my spare time, using the ocamllex manual as a reference.
I hit a few snags while trying to tokenize long strings correctly. "Long strings" are delimited by '[' ('=')* '['
and ']' ('=')* ']'
tokens; the number of =
signs must be the same.
In the first implementation, the lexer seemed to not recognize [[
patterns, producing two LBRACKET
tokens despite the longest match rule, whereas [=[
and variations where recognized correctly. In addition, the regular expression failed to ensure that the correct closing token is used, stopping at the first ']' ('=')* ']'
capture, no matter the actual long string "level". Also, fslex does not seem to support "as" constructs in regular expressions.
let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']'
(* ... *)
| lualongstring { (* ... *) }
| '[' { LBRACKET }
| ']' { RBRACKET }
(* ... *)
I've been trying to solve the issue with another rule in the lexer:
rule tokenize = parse
(* ... *)
| '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf }
(* ... *)
and longstring level = parse
| ']' ('=')* ']' { (* check level, do something *) }
| _ { (* aggregate other chars *) }
(* or *)
| _ {
let c = lexbuf.LexerChar(0);
(* ... *)
}
But I'm stuck, for two reasons: first, I don't think I can "push", so to speak, a token to the next rule once I'm done reading the long string; second, I don't like the idea of reading char by char until the right closing token is found, making the current design useless.
How can I tokenize Lua long strings in fslex? Thanks for reading.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我回答我自己的问题,我很抱歉,但我想贡献自己的解决方案以供将来参考。
我使用 LexBuffer<_>.BufferLocalStore 属性来保持词法分析器函数调用的状态,该属性只是一个可写的 IDictionary 实例。
注意:长括号用于长字符串和多行注释。这常常是 Lua 语法中被忽视的部分。
以下是我用来简化将数据存储到 BufferLocalStore 中的函数:
也许它的功能不是很强大,但它似乎可以完成工作。
编辑:您可以在 http://ironlua.codeplex.com 找到该项目。词法分析和解析应该没问题。我打算使用 DLR。欢迎评论和建设性批评。
Apologies if I answer my own question, but I'd like to contribute with my own solution to the problem for future reference.
I am keeping state across lexer function calls with the LexBuffer<_>.BufferLocalStore property, which is simply a writeable IDictionary instance.
Note: long brackets are used both by long string and multiline comments. This is often an overlooked part of the Lua grammar.
Here are the functions I use to simplify storing data into the BufferLocalStore:
Perhaps it's not very functional, but it seems to be getting the job done.
Edit: You can find the project at http://ironlua.codeplex.com. Lexing and parsing should be okay. I am planning on using the DLR. Comments and constructive criticism welcome.