哪些语法糖或语言特征使语言难以解析?

发布于 2024-09-01 01:52:44 字数 182 浏览 9 评论 0原文

我做了一些搜索,但没有找到“直接”回答这个问题的问题。

无论如何,这个问题的基本要点是我想知道什么“语言功能”或“语法”使语言成为构建解析器、语法突出显示等的主要痛苦?

这可能是主观的,但我想到的是,例如解析 Lisp 等语言及其(func parms 等..)结构与 C++ 等具有所有模板、括号等的语言之间的差异。

I did some searching and didn't find a question that "directly" answered this question.

Anyway the basic gist of this question is I am wondering what "language feature" or "syntax" that makes a language be a major pain to build a parser, syntax highlighting, etc?

This might be subjective but I was thinking of like for example the difference in parsing a language like say Lisp for example with its (func parms etc..) structure, as versus to something like C++ with all of the templates, brackets and so forth.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

莫相离 2024-09-08 01:52:44

通过宏或其他方式支持语法扩展的语言无法完全解析,除非您可以正确扩展宏。对于具有完整过程宏的语言,例如 Lisp 或 Curl,如果不实现语言本身,就无法完全解析!

通常,出于对此类语言进行语法突出显示的目的,您不会尝试扩展宏并假设宏遵循常规语言习惯用法。

Languages that support syntax extension through macros or other means cannot be fully parsed unless you can properly expand the macros. For languages with full procedural macros such as Lisp or Curl, you can't fully parse without implementing the language itself!

Typically for the purposes of syntax highlighting for such languages you don't try to expand macros and assume that macros follow conventional language idioms.

夏九 2024-09-08 01:52:44

恕我直言,从正式语言和语法的角度来看,有两个主要方面。首先,你的语言的语法应该属于一些易于处理的类别。例如,具有上下文无关语法的语言,这意味着您的语言具有太多元素,其数量相互依赖,例如开括号和闭括号,可能需要潜在无限量的内存来解析。 C++ 具有上下文相关的语法,这更糟糕,例如可以是具有相互依赖数量的三个元素的语法。另一个方面是解析时的歧义。在歧义语法中,您可以用不同的方式解析相同的文本,这意味着您必须为解析算法找到正确的方法 - 大多数算法根本不允许歧义。

我不完全确定,但我想说,解析括号和空格(合理定义时)同样复杂。对于这两种情况,您都需要一个计数器来检查块嵌套的级别,但是使用空格,您可以在本地识别级别(通过计算空格),并且您可以确定,您的计数器不会低于零,这可能会发生在您右括号多于左括号。

From the point of view of formal langugaes and grammars there are two main aspects IMHO. First of all grammar for your language should belong to some easy processable category. For example language with context-free grammar, which means that e.g. your language has too elements, whose count depend on each other, like open and close brackets for example, might need potentially infinite amount of memory to parse. C++ has context sensitive grammar which is even worse, example could be grammar having three elements with interdependent ammounts. Another aspect is about ambiguity while parsing. In ambiguous grammar you can parse same text in different ways, which means you have to find the right way for your parsing algorithm - most of them do not allow ambiguity at all.

I am not entirely sure, but I would say, that parsing brackets and whitespaces (when reasonably defined) is equally complex. For both cases you would need a counter to check the level of block nesting, however using whitespaces you can identify the level locally (by counting whitespaces) and you can be sure, that your counter will not go under zero, which might happen when you have more closing brackets than opening.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文