代码着色如何工作?
代码着色引擎到底是如何工作的?他们只是生成一个保留空白、为叶子着色并重建原始程序的解析树吗?实时代码着色如何能够足够高效地即时完成?
How do code coloring engines work, exactly? Do they just generate a parse tree that preserves whitespace, color the leaves, and reconstruct the original program? How does live code coloring manage to be efficient enough to do it on the fly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我所知道的大多数语法高亮不会对语法树做出反应,而只是根据源文本和颜色文本形成的标记类型来对其进行标记。像荧光笔这样最困难的任务是识别多行注释(和/或字符串,如果语言允许的话);其他所有内容都可以保存在单个源代码行中。
自动缩进引擎涉及更多。理论上,最好的结果来自于重建完整的语法树,但这很慢并且会引发错误处理问题(因为大多数程序在编辑时甚至不是格式良好的)。相反,他们使用各种简化的扫描和启发式方法,这并不总是能够匹配语言的真实语法。
(编辑:进一步思考这并不完全正确。例如,Eclipse 的 Java 编辑器还会根据标识符是否命名局部变量、实例字段或静态变量/方法来更改标识符的颜色。这种情况会发生在编辑器解析和类型检查代码以进行实时交叉引用之后,与基本词汇突出显示分开进行)。
Most syntax hightligters I know of do not react to the syntax tree, but just tokenize the source and color text according to which kinds of tokens it forms. The most difficult task such as highlighter has to do is recognizing multi-line comments (and/or strings, if the language allows that); everything else can be kept within a single source line.
Automatic indentation engines are more involved. In theory the best results would come from reconstructing a full syntax tree, but that is slow and raises problems of error handling (because most programs are not even well-formed while they're being edited). Instead they use various kinds of simplified scanning and heuristics, which doesn't always manage to match the true syntax of the language.
(edit: on further thought this is not completely true. For example, Eclipse's Java editor will also change the color of identifiers according to whether they name local variables, instance fields or static variables/methods. This happens in a separate pass from the basic lexical highlighting, after the editor has parsed and typechecked the code for live crossreferencing).
语法突出显示通常在词法分析器级别起作用,而不是解析器级别。
它本质上是一个源自一组正则表达式的有限状态机,因此运行速度非常快。
Syntax highlighting usually works at the lexer level, not the parser level.
It's essentially a finite state machine derived from a set of regular expressions, so it's very quick to run.