测试编译器

发布于 2024-11-27 00:45:26 字数 1340 浏览 7 评论 0原文

我目前正在开发一种使用 sablecc 构建的编译器。

长话短说，编译器将把规范文件（这就是我们正在解析的）和 .class 文件作为输入，并将检测 .class 文件字节码，以确保在运行 .class 文件时，任何规范没有被违反（这有点像 jml/code 合同！但更强大）。

我们有几十个系统测试，涵盖了分析阶段的很大一部分（与确保规范有意义，并且它们也与它们应该指定的 .class 文件一致）。

我们将它们分为两组：有效测试和无效测试。

有效测试由源代码文件组成，当我们的编译器编译时，不应弹出编译器错误/警告。
无效测试由源代码文件组成，当由我们的编译器编译时，应至少弹出一个编译器错误/警告。
无效

当我们处于分析阶段时，这对我们很有帮助。现在的问题是如何测试代码生成阶段。过去，我曾对我在编译器课程中开发的一个小型编译器进行过系统测试。每个测试都包含该语言的几个源文件和一个 output.txt。运行测试时，我会编译源文件，然后运行其 main 方法，检查输出结果是否等于 output.txt。当然，所有这一切都是自动化的。

现在，处理这个更大的编译器/字节码仪器，事情就没那么容易了。复制我用简单的编译器所做的事情并不是一件容易的事。我想目前应该采取的方法是放弃系统测试，并专注于单元测试。

正如任何编译器开发人员都知道的那样，编译器由许多访问者组成。我不太确定如何继续对它们进行单元测试。据我所知，大多数访问者都在调用具有与该访问者相关的方法的对应类（我想这个想法是为访问者保留 SRP）。

我可以采用几种技术来对我的编译器进行单元测试：

分别对每个访问者的方法进行单元测试。对于无堆栈访问者来说，这似乎是个好主意，但对于使用一个（或多个）堆栈的访问者来说，这似乎是一个糟糕的主意。然后，我还以传统方式对标准（读取、非访问者）类中的其他每个方法进行单元测试。
一次性对整个访问者进行单元测试。也就是说，我创建了然后访问的树。最后，我验证符号表是否正确更新。我不关心模拟它的依赖关系。
与 2) 相同，但现在模拟访问者的依赖项。
还有什么？

我仍然有一个问题，单元测试将与 sabbleCC 的 AST 紧密耦合（说实话，这真的很难看）。

我们目前没有进行任何新的测试，但我想让火车回到正轨，因为我确信不测试系统就等于喂养一个怪物，它迟早会回来咬我们。当我们最意想不到的时候；-(

有没有人有编译器测试的经验，可以给一些关于现在如何继续的建议？我有点迷失了！

原文

I'm currently working on kind of compiler that was built making use of sablecc.

Long story short, the compiler will take as input both specification files(this is what we're parsing) and .class files and will instrument the .class files bytecode so to make sure that when running the .class files, any of the specifications is not being violated (this is a bit like jml/code contracts! but way more powerful).

We have some dozens of system tests that cover a large part of the analysis phase (related with making sure the specifications make sense, and that they also are in concordance with the .class files they are supposed to specify).

We divided them in two sets: the valid tests and the invalid tests.

The valid tests are comprised of source code files that when compiled by our compiler should should pop up no compiler errors / warnings.
The invalid tests are comprised of source code files that when compiled by our compiler should should pop up at least one compiler error / warning.

This has served us well while we were in the analysis phase. The question now is on how to test the code generation phase. I've done, in the past, system tests over a little compiler I've developed on a compilers course. Each test would consist of a couple of source files of that language and a output.txt. When running the test, I'd compile the source files and then run its main method, checking that the output result would be equal to output.txt. All of this was automated, of course.

Now, dealing with this bigger compiler/bytecode-instrumentator, things are not so easy. It's no easy task to replicate what I've done with my simple compiler. I guess the way to go is to lean back from system tests at this stage, and focus on unit-tests.

As any compiler developer knows, a compiler consists of lots of visitors. I am not too sure on how to proceed with unit-testing them. From what I've seen, most of the visitors are calling a counterpart class that has methods related with that visitor (I guess the idea was to keep the SRP for the visitors).

There are a couple of techniques I can take to unit-test my compiler:

Unit testing each one of the visitor's methods separately. This seems to be a good idea for a stackless visitor, but looks like a terrible idea for visitors that use one (or more) stacks. I then go about also unit-testing each of the other methods from standard(read, non-visitors) classes the traditional way.
Unit testing the whole visitor in one go. That is, I create tree that I then visit. In the end, I verify if the symbol table was correctly updated or not. I do not care about mocking its dependencies.
The same as 2), but now mocking the visitor's dependencies.
What others?

I still have the problem that the unit-tests will be very tightly coupled with sabbleCC's AST (which tbh is really ugly).

We are currently not making any new tests, but I'd like to bring the train back on track, as I am sure that not testing the system is the same as feeding a monster that sooner or later will come back to bite us in the butt when we least expect it ;-(

Has anyone had any experience with compiler testing that could give some awwweeeesome advice on how to proceed now? I'm kinda lost here !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝天 2024-12-04 00:45:26

我参与了一个项目，其中使用 Eclipse 编译器将 Java AST 翻译成另一种语言 OpenCL，并且遇到类似的问题。

我没有什么神奇的解决方案可以给你，但我会分享我的经验，以防有帮助。

您使用预期输出（使用output.txt）进行测试的技术也是我开始的方式，但它绝对成为测试的维护噩梦。当我由于某种原因不得不更改生成器或输出时（这种情况发生了几次），我必须重写所有预期的输出文件 - 并且其中有大量的文件。我开始根本不想改变输出，因为担心破坏所有测试（这很糟糕），但最终我放弃了它们，而是对生成的 AST 进行了测试。这意味着我可以“宽松地”测试输出。例如，如果我想测试 if 语句的生成，我可以在生成的类中找到唯一的 if 语句（我编写了辅助方法来完成所有这些常见的 AST 操作），验证有关它的一些事情，然后完成吧。该测试不会关心类的命名方式或是否有额外的注释或注释。由于测试更加集中，最终效果非常好。缺点是测试与代码的耦合更紧密，因此如果我想删除 Eclipse 编译器/AST 库并使用其他东西，我需要重写所有测试。最后，因为代码生成会随着时间的推移而改变，所以我愿意付出这个代价。

我还严重依赖集成测试 - 以目标语言实际编译和运行生成的代码的测试。我进行的此类测试比单元测试多得多，纯粹是因为它们似乎更有用并且可以捕获更多问题。

至于访问者测试，我再次对它们进行更多集成式测试 - 获取一个非常小的/特定的 Java 源文件，使用 Eclipse 编译器加载它，用它运行我的一个访问者并检查结果。在不调用 Eclipse 编译器的情况下进行测试的唯一其他方法是模拟整个 AST，这是不可行的 - 大多数访问者都非常重要，需要一个完全构建/有效的 Java AST，因为他们将从主类读取注释。大多数访问者都可以通过这种方式进行测试，因为他们要么生成小的 OpenCL 代码片段，要么构建单元测试可以验证的数据结构。

是的，我的所有测试都与 Eclipse 编译器紧密耦合。但我们正在编写的实际软件也是如此。使用其他任何东西都意味着我们无论如何都必须重写整个程序，所以这是我们很乐意付出的代价。我想没有一种解决方案 - 您需要权衡紧密耦合的成本与测试可维护性/简单性。

我们还有大量的测试实用程序代码，例如使用默认设置设置 Eclipse 编译器、提取方法树的主体节点的代码等。我们尝试使测试尽可能小（我知道这是可能是常识，但可能值得一提）。

（下面对评论的回复进行了编辑/添加 - 比评论回复更容易阅读/格式化）

“我还严重依赖集成测试 - 以目标语言实际编译和运行生成的代码的测试”这些测试实际上做了什么？它们与 output.txt 测试有何不同？

（再次编辑：重新阅读问题后，我意识到我们的方法是相同的，所以忽略这一点）

集成测试不仅仅是生成源代码并将其与我最初所做的预期输出进行比较，而是生成 OpenCL 代码，编译并运行它。所有生成的代码都会产生输出，然后比较该输出。

例如，我有一个 Java 类，如果生成器正常工作，应该生成 OpenCL 代码，对两个缓冲区中的值求和并将该值放入第三个缓冲区中。最初，我会使用预期的 OpenCL 代码编写一个文本文件，并在测试中进行比较。现在，集成测试生成代码，通过 OpenCL 编译器运行它，运行它，然后测试检查值。

“至于访问者测试，我再次对它们进行更多集成式测试 - 获取一个非常小的/特定的 Java 源文件，使用 Eclipse 编译器加载它，用它运行我的访问者之一并检查结果。”您的意思是与您的一位访问者一起运行，还是将所有访问者运行到您想要测试的访问者？

大多数访问者可以彼此独立运行。在可能的情况下，我将仅与我正在测试的访问者一起运行，或者如果存在对其他访问者的依赖，则需要最少的访问者集（通常只需要另一个访问者）。访问者不会直接相互交谈，而是使用传递的上下文对象。这些可以在测试中人为构建，以使事物进入已知状态。

另一个问题，你在这个项目中使用模拟吗？此外，您是否经常在其他项目中使用模拟？我只是想清楚地了解正在与我交谈的人：P

在这个项目中，我们在大约 5% 的测试中使用模拟，甚至可能更少。而且我不会模拟任何 Eclipse 编译器的东西。

模拟的问题是我需要很好地理解我正在模拟的内容，而 Eclipse 编译器的情况并非如此。有很多访问者方法被调用，有时我不确定应该调用哪一个（例如，是为字符串文字调用访问 ExtendedStringLiteral 还是访问 StringLiteral？）如果我确实模拟了这一点并假设其中之一，这可能与现实不符，即使测试通过，程序也会失败 - 这不是所希望的。我们所做的唯一模拟是几个注释处理器 API、几个 Eclipse 编译器适配器和一些我们自己的核心类。

其他项目，例如 Java EE 的东西，使用了更多的模拟，但我仍然不是它们的狂热用户。 API 的定义越明确、越容易理解、越可预测，我就越有可能考虑使用模拟。

我们程序的第一阶段就像常规编译器一样。我们从源文件中提取信息，并填充一个（又大又复杂！）符号表。您将如何进行系统测试？理论上，我可以使用源文件和包含有关符号表的所有信息的 symbolTable.txt （或 .xml 或其他内容）创建一个测试，但我认为这样做有点复杂。每一个集成测试的完成都是一件复杂的事情！

我会尝试采用测试符号表的一小部分而不是一次性测试全部符号表的方法。如果我要测试 Java 树是否正确构建，我会得到如下内容：

仅针对 if 语句进行一个测试：
- 拥有包含一个 if 语句的一种方法的源代码
- 从此源构建符号表/树
- 仅从主类的方法体中提取语句树（如果>1或没有方法体、找到类、方法体中的顶级语句节点，则测试失败）
- 以编程方式比较 if 语句的节点属性（条件、主体）
以类似的风格对每种其他类型的语句至少进行一次测试。
其他测试，可能是针对多个语句等或任何需要的东西。

这种方法是集成式测试，但每个集成测试只测试系统的一小部分。

本质上我会尽量保持测试规模尽可能小。用于提取树中某些部分的许多测试代码可以移至实用程序方法中，以保持测试类较小。

我想也许我可以创建一个漂亮的打印机，它可以处理符号表并输出相应的源文件（如果一切正常，就会像原始源文件一样）。问题是原始文件的顺序可能与我漂亮的打印机打印的顺序不同。我担心用这种方法我可能会打开另一罐蠕虫。我一直在不懈地重构部分代码，错误开始显现出来。我确实需要一些集成测试来让我走上正轨。

这正是我所采取的方法。然而在我的系统中，东西的顺序并没有太大变化。我有一些生成器，其本质上是输出响应 Java AST 节点的代码，但生成器可以递归地调用自身，因此有一定的自由度。例如，响应 Java If 语句而触发的 'if' 生成器 AST 节点可以写出 'if ('，然后要求其他生成器渲染条件，然后编写 ') {'，要求其他生成器编写退出正文，然后写“}”。

I am involved in a project where a Java AST is translated into another language, OpenCL, using the Eclipse compiler, and have similar issues.

I have no magic solutions for you, but I'll share my experience in case it helps.

Your technique of testing with expected output (with output.txt) is how I started out as well, but it became an absolute maintenance nightmare for the tests. When I had to change the generator or the output for some reason (which happened a few times) I had to rewrite all the expected output files - and there were huge amounts of them. I started to not want to change output at all for fear of breaking all the tests (which was bad), but in the end I scrapped them and instead did testing on the resulting AST. This meant I could 'loosely' test the output. For example, if I wanted to test generation of if statements I could just find the one-and-only if statement in the generated class (I wrote helper methods to do all this common AST stuff), verify a few things about it, and be done. That test wouldn't care how the class was named or whether there were extra annotations or comments. This ended up working quite well as the tests were more focused. The disadvantage is that the tests were more tightly coupled to the code, so if I ever wanted to rip out the Eclipse compiler/AST library and use something else I'd need to rewrite all my tests. In the end because the code generation would change over time I was willing to pay that price.

I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language. I had way more of these types of tests than unit tests purely because they seemed to be more useful and catch more problems.

As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. The only other way to test without invoking the Eclipse compiler would be to mock out an entire AST which was just not feasible - most of the visitors were non-trivial and required a fully constructed/valid Java AST as they would read annotations from main class. Most of the visitors were testable in this way because they either generated small OpenCL code fragments or built up a data structure which the unit tests could verify.

Yes, all my tests are very tightly coupled to the Eclipse compiler. But so is the actual software we are writing. Using anything else would mean we'd have to rewrite the whole program anyway so it's a price we're pretty happy to pay. I guess there is no one solution - you need to weigh up cost of tight coupling versus test maintainability/simplicity.

We also have a fair amount of testing utility code, such as setting up the Eclipse compiler with default settings, code to pull out the body nodes of method trees, etc. We try to keep the tests as small as possible (I know this is probably common sense but possibly worth mentioning).

(Edits/Additions below in responses to comments - easier to read/format than comment responses)

"I also heavily rely on integration tests - tests that actually compile and run the generated code in the target language" What did these tests actually do? How are they different than the output.txt tests?

(Edit again: After re-reading the question I realize our approaches are the same so ignore this)

Rather than just generate source code and compare that to expected output which I did initially, the integration tests generate OpenCL code, compile it and run it. All of the generated code produces output and that output is then compared.

For example, I have a Java class that, if the generator works properly, should generate OpenCL code that sums up values in two buffers and puts the value in a third buffer. Initially I would have written a text file with the expected OpenCL code and compared that in my test. Now, the integration test generates the code, runs it through the OpenCL compiler, runs it and the test then checks the values.

"As for visitor testing, again I do more integration-style testing with them - get a really small/specific Java source file, load it up with Eclipse compiler, run one of my visitors with it and check results. " Do you mean run with one of your visitors, or run all the visitors up to the visitor you wanna test?

Most of the visitors could be run independently of each other. Where possible I would run with only the visitor I am testing, or if there is a dependency on others, the minimal set of visitors required (usually just one other one was required). The visitors don't talk directly to each other, but use context objects that are passed around. These can be constructed artificially in the tests to get things into a known state.

Other question, do you use mocks -- at all, in this project? Moreover, do you regularly use mocks in other projects? I'm just trying to get a clear picture about the person I'm talking with :P

In this project we use mocks in about 5% of tests, probably even less. And I don't mock out any Eclipse compiler stuff.

The thing with mocks is that I'd need to understand what I'm mocking out well, and that is not the case with the Eclipse compiler. There are a whole lot of visitor methods that are called, and sometimes I'm not sure which one should be called (e.g. is visit ExtendedStringLiteral or visit StringLiteral called for string literals?) If I did mock this out and assumed one or the other, this might not correspond to reality and the program would fail even if the tests would pass - not desired. The only mocks we do are a couple for the annotation processor API, a couple of Eclipse compiler adapters, and some of our own core classes.

Other projects, such as Java EE stuff, more mocks were used, but I'm still not an avid user of them. The more defined, understood and predictable an API is the more likely I am to consider using mocks.

The first phases of our program are just like of a regular compiler. We extract info from the source files and we fill up a (big and complex!) symbol table. How would you go about system testing this? In theory, I could create a test with the source files and also a symbolTable.txt (or .xml or whatever) that contains all the info about the symbolTable, but that would, I think, be a bit complex to do. Each one of those integration tests would be a complex thing to accomplish!

I'd try to take the approach of testing small bits of the symbol table rather than the whole lot in one go. If I were testing whether a Java tree was built correctly, I'd have something like:

one test just for if statements:
- have source code with one method containing one if statement
- builds symboltable / tree from this source
- pull out statement tree from only method body from main class (fail test if >1 or no method bodies, classes found, top-level statement nodes in method body)
- compare if statement's node attributes (condition, body) programmatically
at least one test for each other kind of statement in a similar style.
other tests, maybe for multiple statements, etc. or whatever is needed

This approach is integration-style testing, but each integration test only tests a small part of the system.

Essentially I'd try to keep the tests as small as possible. A lot of the testing code for pulling out bits of the tree can be moved into utility methods to keep the test classes small.

I thought that maybe I could create a pretty printer that would take on the Symbol Table and output the correspondent source files (that, if everything was ok, would be just like the original source files). The problem is that the original files can have things in different order than what my pretty printer prints. I'm afraid that with this approach I might just be opening another can of worms. I've been relentless refactoring parts of the code and the bugs are starting to show off. I really need some integration tests to keep me on track.

That's exactly the approach I've taken. However in my system the order of stuff doesn't change much. I have generators that essentially output code in response to Java AST nodes, but there is a bit of freedom in that generators can call themselves recursively. For example, the 'if' generator that gets fired off in response to a Java If statement AST node can write out 'if (', then ask other generators to render the condition, then write ') {', ask other generators to write out the body, then write '}'.

回复收藏 0 原文

~没有更多了~