使用 Ocamllex 进行字符串词法分析 (The Tiger Compiler)
我正在尝试遵循 Appel 的“ML 中的现代编译器实现”,并使用 Ocamllex 编写词法分析器。
规范要求词法分析器在翻译转义序列后返回字符串。 以下代码摘自 ocamllex 输入文件:
rule tiger = parse
...
| '"'
{ let buffer = Buffer.create 1 in
STRING (stringl buffer lexbuf)
}
and stringl buffer = parse
| '"' { Buffer.contents buffer }
| "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
| '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
| eof { raise End_of_file }
| _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }
有更好的方法吗?
I'm trying to follow Appel's "Modern Compiler Implementation in ML" and am writing the lexer using Ocamllex.
The specification asks for the lexer to return strings after translating escape sequences.
The following code is an excerpt from the ocamllex input file:
rule tiger = parse
...
| '"'
{ let buffer = Buffer.create 1 in
STRING (stringl buffer lexbuf)
}
and stringl buffer = parse
| '"' { Buffer.contents buffer }
| "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
| '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
| eof { raise End_of_file }
| _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }
Is there a better way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能有兴趣了解 Ocaml 词法分析器 执行此操作(搜索
和字符串
)。本质上,它与您的方法相同,没有漂亮的本地缓冲区(我发现您的代码在这一点上更好,但这效率较低),稍微复杂一点,因为支持更多转义,并使用转义表( char_for_backslash)来分解类似的规则。另外,您的规则
"\\n"
重复了两次,我认为1
是对字符串长度的非常悲观的估计,我宁愿使用20
此处(以避免不必要的调整大小)。You may be interested in looking at how the Ocaml lexer does this (search for
and string
). In essence, it's the same method as yours, without the nice local buffer (I find your code nicer on this point, but this is a bit less efficient), a bit more complex because more escaping is supported, and using an escape table (char_for_backslash) to factorize similar rules.Also, you have the rule
"\\n"
repeated twice, and I think1
is a very pessimistic estimate of your string length, I would rather use20
here (to avoid needless resizing).