当前位置：文江博客话题详情

如何为专有语言构建 AST？

发布于 2024-10-08 09:27:37 字数 148 浏览 18 评论 0原文

我试图了解如何为专有语言构建 AST。我需要构建一个 AST，以便我可以输入我的规则和指南来检查源代码中可能存在的错误。

如何构建 AST？有没有任何书籍、文章可以帮助我入门。龙的编译器书有帮助吗？

请注意，我没有 CS 背景。

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自我难过 2024-10-15 09:27:37

这是一个相当大的问题。我确实感受到你的痛苦，因为我也是从非计算机背景背景解决这个问题的。这让我更加欣赏CS了。

您在研究中可能会经常看到的一件事是扩展巴科斯-诺尔范式 (EBNF)。它基本上是描述您的语言的一种方式。为你的语言创建一个 EBNF 将帮助你了解计算机需要做什么来解析它，这会有所帮助。

回到手头的问题：您可能会使用词法分析器/解析器来构建树。

用于执行此操作的传统工具是 lex 和 yacc，或者它们更现代的近亲 flex 和 bison。

一种较新的方法是 Antlr。强烈推荐它，但超出了我的理解范围。

我发现的第三种方法是 Python 的 pyparsing 库。由于我对 Python 的熟悉以及它描述你需要它解析的内容的可读方式，我最终选择了它。

有大量示例可用于 pyparsing，这很有帮助。我发现对构建我的解析器最有帮助的是 SimpleCalc。然而，它基于相当旧的 pyparsing 版本，并且比 pyparsing 后来实现的一些强大操作所需的更复杂。 SimpleArith 是一个类似的但较新的版本。

我实际上尚未使用 pyparsing 处理的一件事是正确分析语法错误。然而，它似乎为您提供了这样做所需的工具。

无论如何，这并不是您问题的完整答案，但我希望它至少能为您指出一些起点。为复杂语言构建解析器并不容易！

回复收藏 0 原文

A君 2024-10-15 09:27:37

代码分析引擎通常需要相当多的复杂性，而不仅仅是构建 AST。

要进行任何认真的代码分析，您需要了解代码中标识符的含义以及它们在何处/如何定义（“符号表”），并且您通常需要了解信息如何在程序中传播（控制和数据流分析）。您需要机器来支持所有这些，然后您需要将该机器与您的专有语言联系起来。

我认为攀登珠穆朗玛峰是一个类比。获得 AST 就像到达 10,000 英尺大本营。任何土人都可以通过使用基本技术（登山靴）步行上山来做到这一点。攀登最后 17,000 英尺需要完全不同的技术、承诺和计划，而大多数人在爬完前 10,000 英尺后，根本就没有为接下来的旅程做好准备。（我在这里有一些经验，请查看我的简历）。

这些都是非常详细的主题，如果您缺乏计算机科学背景，您的道路可能会非常艰难。（然而，我们都从某个地方开始，所以这实际上是一个野心问题）。《龙》一书是一本极好的资源，它将帮助您了解所有这些机器的用途以及为什么需要它；还有许多其他优秀的编译器书籍，并且通常也能起到同样的作用。但你需要做好认真阅读的准备。

提升曲线的一种方法是使用一种工具，其中的大部分机制已经由一群在构建此类工具方面经验丰富的计算机科学家为您考虑和实施。那么你的问题就大大减少了：你只需要学习如何使用他们提供的东西，而不是试图弄清楚你需要什么（大多数人从未通过 AST 阶段）并实现所有必要的支持机制。

ANTLR（已经提到过，由一位相当优秀的 CS 教授完成）就是其中之一，因为它提供了解析功能，使您能够定义 AST 的构建方式，以及如何以程序方式爬取生成的 AST。但它并没有提供您完成任务所需的太多其他内容。

我们的DMS 软件重组工具包提供了我在第一段中提到的所有设施，并且然后一些。使用 DMS 时您会注意到的第一个区别是您只需要提供语法；它将构建 AST，无需您的任何进一步帮助。

您可以通过这个 DMS 应用于高学校代数和微积分。特别是，它展示了如何仅使用简单的代数/微积分语法就可以轻松定义，然后可以操纵该语言的“程序”。该应用程序是“转换”代码而不是分析代码的应用程序，但基本原理是相同的。

分析您的专有语言的“真实”DMS 应用程序将复杂得多。

Code analysis engines generally require quite a lot of sophistication above and beyond just building ASTs.

To do any serious code analysis, you need to know the meaning of identifiers in code and where/how they are defined ("symbol tables"), and you often need to know how information travels around the program (control and data flow analysis). You need machinery to support all of these, and then you need to tie that machinery to your proprietary language.

I think of climbing Everest as an analogy. Getting ASTs is like getting to the 10,000 foot base camp. Any clod can do that by just walking up the hill using basic technology (hiking boots). Climbing the last 17,000 feet requires a whole different kind of technology, commitment, and plan, and most folks, having walked up the first 10,000 feet, are simply unprepared for the rest of the trip. (I have some experience here, check my bio).

These are all pretty detailed topics and your absence of CS background is going to make the road for you likely pretty rough. (We all start somewhere, however, so this is really a matter of ambition). The Dragon book is an excellent resource that will help you understand what all this machinery does and why you need it; many other fine compiler books exist and will generally serve just as well. But you need to be prepared to do some serious reading.

One way to get up the curve is to use a tool in which much of this machinery has been already thought out and implemented for you, by a bunch of computer scientists experienced at building such tools. Then your problem is considerably reduced: you only need to learn how to use what they provided, rather than trying to figure out what you need (most folks never get past the AST stage) and implement all the necessary support machinery.

ANTLR (already mentioned, done by a pretty good CS professor) is sort of one, in that it provides parsing capabilities, enables you to define how ASTs are built, and how to climb over the resulting ASTs procedurally. But it doesn't provide much else you need for your task.

Our DMS Software Reengineering Toolkit provides all the facilities I mentioned in the first paragraph, and then some. One of the first differences you will notice working with DMS is that you only need to provide it grammar; it will builds ASTs without any further help from you.

You can get a flavor of what working with DMS is like, at this example of DMS applied to high school algebra and calculus. In particular, it shows how using just a simple grammar for algebra/calculus can be easily defined, and then "programs" in that language can be manipulated. This application is one that "transforms" code rather than analyzes it, but the basics are the same.

A "real" DMS application that analyzed your proprietary langauge will be considerably more complex.

回复收藏 0 原文

~没有更多了~