加速runhaskell

发布于 2025-01-06 16:58:40 字数 1095 浏览 6 评论 0原文

我有一个小型测试框架。它执行一个循环，执行以下操作：

生成一个小型 Haskell 源文件。
使用runhaskell执行此操作。该程序生成各种磁盘文件。
处理刚刚生成的磁盘文件。

这种情况发生了几十次。事实证明，runhaskell 占用了程序的绝大多数执行时间。

一方面，事实上，runhaskell 设法从磁盘加载文件、标记它、解析它、进行依赖性分析、从磁盘加载 20KB 以上的文本、标记并解析所有这些、执行完整类型推理、检查类型、对 Core 进行脱糖、链接到已编译的机器代码，以及在解释器中执行该操作，所有这些都在 2 秒的时间内完成，当你仔细想想时，这实际上是非常令人印象深刻的。另一方面，我还是想让它走得更快。 ;-)

编译测试器（运行上述循环的程序）产生了微小的性能差异。编译脚本链接的 20KB 库代码产生了相当明显的改进。但每次调用 runhaskell 仍然需要大约 1 秒的时间。

生成的 Haskell 文件每个都刚刚超过 1KB，但实际上只有文件的一部分发生了变化。也许编译文件并使用 GHC 的 -e 开关会更快？

或者，也许是重复创建和销毁许多操作系统进程的开销导致速度减慢？每次调用 runhaskell 可能都会导致操作系统探索系统搜索路径，找到必要的二进制文件，将其加载到内存中（这肯定已经在磁盘缓存中了吗？），将其链接到任何 DLL，并点燃它。有没有某种方法可以（轻松）保持 GHC 的一个实例运行，而不必不断创建和销毁操作系统进程？

最终，我想 GHC API 总是存在的。但据我了解，这是极其难以使用的，高度无文档记录，并且在 GHC 的每个小点发布时都容易发生根本性的变化。我想要执行的任务非常简单，所以我不想让事情变得比必要的更复杂。

建议？

更新：切换到GHC -e（即，现在除了正在执行的一个表达式之外，所有内容都已编译）没有产生可测量的性能差异。现在看来很清楚，这都是操作系统开销。我想知道是否可以创建一个从测试仪到 GHCi 的管道，从而仅使用一个操作系统进程......

原文

I have a small test framework. It executes a loop which does the following:

Generate a small Haskell source file.
Execute this with runhaskell. The program generates various disk files.
Process the disk files just generated.

This happens a few dozen times. It turns out that runhaskell is taking up the vast majority of the program's execution time.

On one hand, the fact that runhaskell manages to load a file from disk, tokenise it, parse it, do dependency analysis, load 20KB more text from disk, tokenise and parse all of this, perform complete type inference, check types, desugar to Core, link against compiled machine code, and execute the thing in an interpreter, all inside of 2 seconds of wall time, is actually pretty damned impressive when you think about it. On the other hand, I still want to make it go faster. ;-)

Compiling the tester (the program that runs the above loop) produced a tiny performance difference. Compiling the 20KB of library code that the scripts link against produced a rather more noticeable improvement. But it's still taking about 1 second per invocation of runhaskell.

The generated Haskell files are just over 1KB each, but only one part of the file actually changes. Perhaps compiling the file and using GHC's -e switch would be faster?

Alternatively, maybe it's the overhead of repeatedly creating and destroying many OS processes which is slowing this down? Every invocation of runhaskell presumably causes the OS to explore the system search path, locate the necessary binary file, load it into memory (surely this is already in the disk cache?), link it against whatever DLLs, and fire it up. Is there some way I can (easily) keep one instance of GHC running, rather than having to constantly create and destroy the OS process?

Ultimately, I suppose there's always the GHC API. But as I understand it, that's nightmarishly difficult to use, highly undocumented, and prone to radical changes at every minor point release of GHC. The task I'm trying to perform is only very simple, so I don't really want to make things more complex than necessary.

Suggestions?

Update: Switching to GHC -e (i.e., now everything is compiled except the one expression being executed) made no measurable performance difference. It seems pretty clear at this point that it's all OS overhead. I'm wondering if I could maybe create a pipe from the tester to GHCi and thus make use of just one OS process...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花伊自在美 2025-01-13 16:58:40

好吧，我有一个解决方案：我创建了一个 GHCi 进程并将其 stdin 连接到管道，以便我可以向其发送表达式以进行交互评估。

后来进行了几次相当大的程序重构，整个测试套件现在大约需要 8 秒来执行，而不是 48 秒。这对我来说就够了！ :-D

（对于任何其他尝试这样做的人：看在上帝的份上，请记住将 -v0 开关传递给 GHCi，否则您将受到 GHCi 的欢迎奇怪的是，如果您以交互方式运行 GHCi，即使使用 -v0，命令提示符仍然会出现，但是当连接到管道时，命令提示符会消失；我认为这是一个有用的设计功能，而不是随机事故。）

当然，我走这条奇怪路线的一半原因是我想将 stdout 和 stderr 捕获到文件中。使用 RunHaskell，这非常简单；只需在创建子进程时传递适当的选项即可。但现在所有测试用例都由单个操作系统进程运行，因此没有明显的方法来重定向stdin和stdout。

我想出的解决方案是将所有测试输出定向到一个文件，并且在测试之间让 GHCi 打印出一个魔术字符串（我希望！）不会出现在测试输出中。然后退出 GHCi，读取文件，并查找魔术字符串，以便我可以将文件剪切成合适的块。

回复收藏 0 原文

一曲琵琶半遮面シ 2025-01-13 16:58:40

您可能会在 TBC 中找到一些有用的代码。它有不同的目标 - 特别是废弃测试样板和测试可能无法完全编译的项目 - 但它可以通过监视目录功能进行扩展。测试在 GHCi 中运行，但使用由 cabal 成功构建的对象（“runghc 安装构建”）。

我开发它是为了使用复杂类型的黑客技术来测试 EDSL，即繁重的计算工作由其他库完成。

我目前正在将其更新到最新的 Haskell 平台，并欢迎任何评论或补丁。

回复收藏 0 原文

终弃我 2025-01-13 16:58:40

如果大多数源文件保持不变，您可以使用 GHC 的 -fobject-code （可能与 -outputdir 结合使用）标志来编译一些库文件。

回复收藏 0 原文

梅窗月明清似水 2025-01-13 16:58:40

如果测试彼此很好地隔离，您可以将所有测试代码放入一个程序中并调用一次 runhaskell。如果某些测试是根据其他测试的结果创建的，或者某些测试调用 unsafeCrash，则这可能不起作用。

我认为您生成的代码如下所示

module Main where
boilerplate code
main = do_something_for_test_3

您可以将所有测试的代码放入一个文件中。每个测试代码生成器负责编写do_something_for_test_N。

module Main where
boilerplate code

-- Run each test in its own directory
withTestDir d m = do
  cwd <- getCurrentDirectory
  createDirectory d
  setCurrentDirectory d
  m
  setCurrentDirectory cwd

-- ["test1", "test2", ...]
dirNames = map ("test"++) $ map show [1..] 
main = zipWithM withTestDir dirNames tests

-- Put tests here
tests =
  [ do do_something_for_test_1
  , do do_something_for_test_2
  , ...
  ]

现在，您只需承担一次调用 runhaskell 的开销。

If the tests are well isolated from one another, you can put all the test code into a single program and invoke runhaskell once. This may not work if some tests are created based on the results of others, or if some tests call unsafeCrash.

I presume your generated code looks like this

module Main where
boilerplate code
main = do_something_for_test_3

You can put the code of all the tests into one file. Each test code generator is responsible for writing do_something_for_test_N.

module Main where
boilerplate code

-- Run each test in its own directory
withTestDir d m = do
  cwd <- getCurrentDirectory
  createDirectory d
  setCurrentDirectory d
  m
  setCurrentDirectory cwd

-- ["test1", "test2", ...]
dirNames = map ("test"++) $ map show [1..] 
main = zipWithM withTestDir dirNames tests

-- Put tests here
tests =
  [ do do_something_for_test_1
  , do do_something_for_test_2
  , ...
  ]

Now you only incur the overhead of a single call to runhaskell.

回复收藏 0 原文

∝单色的世界 2025-01-13 16:58:40

如果调用 runhaskell 需要这么多时间，那么也许您应该完全消除它？

如果您确实需要更改 Haskell 代码，那么您可以尝试以下操作。

根据需要创建一组具有不同内容的模块。
每个模块都应该导出它的主函数。
附加包装器模块应该根据输入参数执行集合中的正确模块。每次您想要执行单个测试时，您都会使用不同的参数。
整个程序静态编译

示例模块：

module Tester where

import Data.String.Interpolation -- package Interpolation

submodule nameSuffix var1 var2 = [str|
module Sub$nameSuffix$ where

someFunction x = $var1$ * x
anotherFunction v | v == $var2$ = v
                  | otherwise = error ("anotherFunction: argument is not " ++ $:var2$)

|]

modules = [ let suf = (show var1 ++ "_" ++ show var2)  in (suf,submodule suf var1 var2) | var1 <- [1..10], var2 <- [1..10]]

writeModules = mapM_ (\ (file,what) -> writeFile file what) modules

If calling runhaskell takes so much time then perhaps you should eliminate it completely?

If you really need to work with changing Haskell code then you can try the following.

Create a set of modules with varying contents as needed.
Each module should export it's main function
Additional wrapper module should execute the right module from the set based on input arguments. Each time you want to execute a single test you would use a different arguments.
The whole program is compiled statically

Example module:

module Tester where

import Data.String.Interpolation -- package Interpolation

submodule nameSuffix var1 var2 = [str|
module Sub$nameSuffix$ where

someFunction x = $var1$ * x
anotherFunction v | v == $var2$ = v
                  | otherwise = error ("anotherFunction: argument is not " ++ $:var2$)

|]

modules = [ let suf = (show var1 ++ "_" ++ show var2)  in (suf,submodule suf var1 var2) | var1 <- [1..10], var2 <- [1..10]]

writeModules = mapM_ (\ (file,what) -> writeFile file what) modules

回复收藏 0 原文

~没有更多了~