加速runhaskell
我有一个小型测试框架。它执行一个循环,执行以下操作:
生成一个小型 Haskell 源文件。
使用
runhaskell
执行此操作。该程序生成各种磁盘文件。处理刚刚生成的磁盘文件。
这种情况发生了几十次。事实证明,runhaskell
占用了程序的绝大多数执行时间。
一方面,事实上,runhaskell
设法从磁盘加载文件、标记它、解析它、进行依赖性分析、从磁盘加载 20KB 以上的文本、标记并解析所有这些、执行完整类型推理、检查类型、对 Core 进行脱糖、链接到已编译的机器代码,以及在解释器中执行该操作,所有这些都在 2 秒的时间内完成,当你仔细想想时,这实际上是非常令人印象深刻的。另一方面,我还是想让它走得更快。 ;-)
编译测试器(运行上述循环的程序)产生了微小的性能差异。编译脚本链接的 20KB 库代码产生了相当明显的改进。但每次调用 runhaskell
仍然需要大约 1 秒的时间。
生成的 Haskell 文件每个都刚刚超过 1KB,但实际上只有文件的一部分发生了变化。也许编译文件并使用 GHC 的 -e
开关会更快?
或者,也许是重复创建和销毁许多操作系统进程的开销导致速度减慢?每次调用 runhaskell 可能都会导致操作系统探索系统搜索路径,找到必要的二进制文件,将其加载到内存中(这肯定已经在磁盘缓存中了吗?),将其链接到任何 DLL,并点燃它。有没有某种方法可以(轻松)保持 GHC 的一个实例运行,而不必不断创建和销毁操作系统进程?
最终,我想 GHC API 总是存在的。但据我了解,这是极其难以使用的,高度无文档记录,并且在 GHC 的每个小点发布时都容易发生根本性的变化。我想要执行的任务非常简单,所以我不想让事情变得比必要的更复杂。
建议?
更新:切换到GHC -e
(即,现在除了正在执行的一个表达式之外,所有内容都已编译)没有产生可测量的性能差异。现在看来很清楚,这都是操作系统开销。我想知道是否可以创建一个从测试仪到 GHCi 的管道,从而仅使用一个操作系统进程......
I have a small test framework. It executes a loop which does the following:
Generate a small Haskell source file.
Execute this with
runhaskell
. The program generates various disk files.Process the disk files just generated.
This happens a few dozen times. It turns out that runhaskell
is taking up the vast majority of the program's execution time.
On one hand, the fact that runhaskell
manages to load a file from disk, tokenise it, parse it, do dependency analysis, load 20KB more text from disk, tokenise and parse all of this, perform complete type inference, check types, desugar to Core, link against compiled machine code, and execute the thing in an interpreter, all inside of 2 seconds of wall time, is actually pretty damned impressive when you think about it. On the other hand, I still want to make it go faster. ;-)
Compiling the tester (the program that runs the above loop) produced a tiny performance difference. Compiling the 20KB of library code that the scripts link against produced a rather more noticeable improvement. But it's still taking about 1 second per invocation of runhaskell
.
The generated Haskell files are just over 1KB each, but only one part of the file actually changes. Perhaps compiling the file and using GHC's -e
switch would be faster?
Alternatively, maybe it's the overhead of repeatedly creating and destroying many OS processes which is slowing this down? Every invocation of runhaskell
presumably causes the OS to explore the system search path, locate the necessary binary file, load it into memory (surely this is already in the disk cache?), link it against whatever DLLs, and fire it up. Is there some way I can (easily) keep one instance of GHC running, rather than having to constantly create and destroy the OS process?
Ultimately, I suppose there's always the GHC API. But as I understand it, that's nightmarishly difficult to use, highly undocumented, and prone to radical changes at every minor point release of GHC. The task I'm trying to perform is only very simple, so I don't really want to make things more complex than necessary.
Suggestions?
Update: Switching to GHC -e
(i.e., now everything is compiled except the one expression being executed) made no measurable performance difference. It seems pretty clear at this point that it's all OS overhead. I'm wondering if I could maybe create a pipe from the tester to GHCi and thus make use of just one OS process...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好吧,我有一个解决方案:我创建了一个 GHCi 进程并将其
stdin
连接到管道,以便我可以向其发送表达式以进行交互评估。后来进行了几次相当大的程序重构,整个测试套件现在大约需要 8 秒来执行,而不是 48 秒。这对我来说就够了! :-D
(对于任何其他尝试这样做的人:看在上帝的份上,请记住将
-v0
开关传递给 GHCi,否则您将受到 GHCi 的欢迎奇怪的是,如果您以交互方式运行 GHCi,即使使用-v0
,命令提示符仍然会出现,但是当连接到管道时,命令提示符会消失;我认为这是一个有用的设计功能,而不是随机事故。)当然,我走这条奇怪路线的一半原因是我想将 stdout 和 stderr 捕获到文件中。使用 RunHaskell,这非常简单;只需在创建子进程时传递适当的选项即可。但现在所有测试用例都由单个操作系统进程运行,因此没有明显的方法来重定向
stdin
和stdout
。我想出的解决方案是将所有测试输出定向到一个文件,并且在测试之间让 GHCi 打印出一个魔术字符串(我希望!)不会出现在测试输出中。然后退出 GHCi,读取文件,并查找魔术字符串,以便我可以将文件剪切成合适的块。
Alright, I have a solution: I created a single GHCi process and connected its
stdin
to a pipe, so that I can send it expressions to interactively evaluate.Several fairly large program refactorings later, and the entire test suite now takes roughly 8 seconds to execute, rather than 48 seconds. That'll do for me! :-D
(To anyone else trying to do this: For the love of God, remember to pass the
-v0
switch to GHCi, or you'll get a GHCi welcome banner! Weirdly, if you run GHCi interactively, even with-v0
the command prompt still appears, but when connected to a pipe the command prompt vanishes; I'm presuming this is a helpful design feature rather than an random accident.)Of course, half the reason I'm going down this strange route is that I want to capture
stdout
andstderr
to a file. UsingRunHaskell
, that's quite easy; just pass the appropriate options when creating the child process. But now all of the test cases are being run by a single OS process, so there's no obvious way to redirectstdin
andstdout
.The solution I came up with was to direct all test output to a single file, and between tests have GHCi print out a magic string which (I hope!) won't appear in test output. Then quit GHCi, slurp up the file, and look for the magic strings so I can snip the file into suitable chunks.
您可能会在 TBC 中找到一些有用的代码。它有不同的目标 - 特别是废弃测试样板和测试可能无法完全编译的项目 - 但它可以通过监视目录功能进行扩展。测试在 GHCi 中运行,但使用由 cabal 成功构建的对象(“runghc 安装构建”)。
我开发它是为了使用复杂类型的黑客技术来测试 EDSL,即繁重的计算工作由其他库完成。
我目前正在将其更新到最新的 Haskell 平台,并欢迎任何评论或补丁。
You might find some useful code in TBC. It has different ambitions - in particular to scrap test boilerplate and test projects that may not compile completely - but it could be extended with a watch-directory feature. The tests are run in GHCi but objects successfully built by cabal ("runghc Setup build") are used.
I developed it to test EDSLs with complicated type hackery, i.e. where the heavy computational lifting is done by other libraries.
I am presently updating it to the latest Haskell Platform and welcome any comments or patches.
如果大多数源文件保持不变,您可以使用 GHC 的
-fobject-code
(可能与-outputdir
结合使用)标志来编译一些库文件。If the majority of the source files remain unchanged, you can possibly use GHC's
-fobject-code
(possibly in conjunction with-outputdir
) flag to compile some of the library files.如果测试彼此很好地隔离,您可以将所有测试代码放入一个程序中并调用一次 runhaskell。如果某些测试是根据其他测试的结果创建的,或者某些测试调用
unsafeCrash
,则这可能不起作用。我认为您生成的代码如下所示
您可以将所有测试的代码放入一个文件中。每个测试代码生成器负责编写
do_something_for_test_N
。现在,您只需承担一次调用
runhaskell
的开销。If the tests are well isolated from one another, you can put all the test code into a single program and invoke
runhaskell
once. This may not work if some tests are created based on the results of others, or if some tests callunsafeCrash
.I presume your generated code looks like this
You can put the code of all the tests into one file. Each test code generator is responsible for writing
do_something_for_test_N
.Now you only incur the overhead of a single call to
runhaskell
.如果调用 runhaskell 需要这么多时间,那么也许您应该完全消除它?
如果您确实需要更改 Haskell 代码,那么您可以尝试以下操作。
示例模块:
If calling
runhaskell
takes so much time then perhaps you should eliminate it completely?If you really need to work with changing Haskell code then you can try the following.
Example module: