用于语法着色的状态机

发布于 2024-07-19 03:09:50 字数 747 浏览 7 评论 0 原文

我目前正在学习词法分析器和解析器的工作原理,并且我有以下关于状态机的问题。 例如,我需要通过以下规则对文本进行着色: 对于此规则,简单的状态转换表将如下所示:

current event next  action
IDLE    $     COLOR -
COLOR   any   -     OnColor()
COLOR   \n    IDLE  -

这将为“$”和行尾之间的每个字符调用 OnColor() 操作,以便我可以对其进行着色。 当然,可以从正则表达式自动生成相同的内容,但我真的想知道在大量使用魔法之前它是如何工作的:)。 接下来的问题是:如果我有一条规则: (想要对以美元结尾的任何文本行进行着色,状态转换表不是很清楚:

current      event next             action
IDLE         any   -                -
IDLE         $     DOUND_DOLLAR     -
FOUND_DOLLAR \n    IDLE             OnDollar()
FOUND_DOLLAR any   IDLE             -

如果我的状态机在行尾发现了“$”符号,我可以教我的状态机调用 OnDollar() ,但我可以做什么为了对遇到美元符号之前的文本进行着色,解决此类问题的常见模式是什么?当然,这将是 1 行正则表达式,但我真的很想知道如何通过状态机和实现这样的解析器这有可能吗?

I'm currently learning how lexers and parsers work, and i have following question about state machine. For example, i need to colorize text by following rule:

For this rule simple state transition table will look like this:

current event next  action
IDLE    $     COLOR -
COLOR   any   -     OnColor()
COLOR   \n    IDLE  -

This will call OnColor() action for every character that is between '$' and line end so i can colorize it. Of course same can be automatically generated from regexp, but i really want to know how it works before heavy magic usage :). Next goes problem: if i have a rule:
(want to color any line of text that ends with dollar, the state transition table is not very clear:

current      event next             action
IDLE         any   -                -
IDLE         $     DOUND_DOLLAR     -
FOUND_DOLLAR \n    IDLE             OnDollar()
FOUND_DOLLAR any   IDLE             -

I can teach my state machine to call OnDollar() if it founds a '$' sign at end of line, but what i can do in order to colorize text that was BEFORE dollar sign encounter? What are common patterns to solve such problems? Of course it will be 1 line with regexp, but i'm really interested to know how such parser can be implemented via state machine and is it possible at all.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

铁憨憨 2024-07-26 03:09:50

如果您被限制一次只能为一个字符着色(即您没有缓冲、前瞻、重新着色或标记功能),那么这是不可能的。

不然的话,你有这样的能力,就可以做到; 该技术取决于可用的技术。

  • 重新着色 - 有一个可以将 n 个字符重新着色的操作。 显然,这是一个简单的解决方案。

  • 缓冲/标记 - 具有将字符放置到缓冲区末尾/在源中设置命名标记的操作,而不是让字符通过。 然后,当您稍后发现要做什么时,可以采取一种操作以某种方式提交缓冲区,或者从命名标记中刷新。 不过,用此方法重新着色 1 个以上的字符会有些复杂。

  • Lookahead - 具有推测性转换,即使用 NFA 而不是 DFA

If you are constrained to color one character at a time (i.e. you have no buffering, lookahead, recoloring or marking capability), then it is impossible.

Otherwise, if you have such capabilities, it can be done; the technique depends on what's available.

  • Recoloring - have an action that can recolor n characters back. Obviously, this is a trivial solution.

  • Buffering / marking - have an action that places character onto end of a buffer / sets a named mark in the source, rather than letting the character through. Then, when you find out later what to do, have an action that commits the buffer one way or another, or flushes from a named mark. Recoloring more than 1 character with this gets somewhat complicated though.

  • Lookahead - have speculative transitions, i.e. use an NFA instead of a DFA.

段念尘 2024-07-26 03:09:50

大多数着色器总是在更大的块上工作,比如一整行(在大多数情况下就足够了)加上一个“泄漏”标志,比如多行注释。 有关此类 API,请参阅 Qt 语法荧光笔 示例。

Most colorizers always work on a larger block, say a whole line (which is sufficient in most cases) plus a "leak" flag for, say, multi-line comments. See the Qt Syntax Highlighter example for such an API.

为你鎻心 2024-07-26 03:09:50

通过阅读《紫龙书》(原文如此),现代编译器和解释器似乎积极使用“前瞻”缓冲区并累积最近的文本,因此他们可以轻松检查几个下一个符号和几个前一个符号,以获得准确的词法类型。

因此,在我的示例中, event() 需要查看下一个和上一个符号,以便决定可能累积的词法类型。

By reading "Purple Dragon Book" (sic) it seems that modern compilers and interpreters actively using "look ahead" buffer and accumulate recent text, so they can easily check few next symbols and few previous symbols in order to get exact lexem type.

So, in my example event() need to look at next and previous symbols in order to decide type of lexem that might be accumulated.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文