编译器和运行时环境中的持久代码记忆

发布于 2024-12-06 02:46:48 字数 1301 浏览 5 评论 0原文

我相信代码缓存的概念(例如ccache)应该扩展到更精细的 GCC 或 LLVM+Clang 等编译器中中间代码 (IC) 和目标代码 (TC) 的粒度记忆。

然后,这可以用于一系列突破性的聪明才智,从而有利于程序员的生产力以及编译、运行时性能和运行时内存使用。

更具体地说,这个存储库(或数据库)应该自动缓存函数的IC和TC。然后可以在不同的构建集中(链接多个时仅编译一次)在不同的程序和库集中查找和重用这些内容,而不仅仅是在链接期间跨对象边界(LTO)。

这尤其有利于 C++ STL 容器算法实例。例如,有多少次应用在 std::vector 上的 std::sort 算法被实例化、优化并在使用相同类型的不同程序中编译T 通常是 intfloatdouble

在实现中,IC模块应通过从哈希链构建的密钥进行索引(SHA-1 应该足够)编译器配置和 IC 代码树(包括它调用的函数的子树代码哈希)并存储在例如提供的 std::unordered_map 中非常廉价查找。为了进一步促进代码的重用,IC 存储库可以作为网络服务在线

当然,只有在需要最佳性能时才应缓存记忆。这应该有一个非常小的开销。由于大多数哈希键查找应该会丢失,因此键应该放置在内存中,但不一定是代码片段。

这个project已经证明了这个想法应用于Python语言的有用性。我相信 Haskell (GHC) 可能是试验这些想法的理想语言,因为它默认的函数纯度和对函数副作用的灵活控制。

I believe the concept of a code-cache (for example ccache) should be extended into a more fine-grained memoization of both intermediate code (IC) and target code (TC) in compilers such as GCC or LLVM+Clang.

This can then be used for a whole range of ground-breaking cleverness benefiting both programmer productivity and compile-, run-time performance and run-time memory usage.

More specifically, this repository (or database) should automatically cache IC and TC of functions. These can then be looked up and reused in different sets of builds (compiled only once link many) in across sets of programs and libraries and not just across object boundaries during linking (LTO).

This would especially benefit C++ STL container-algorithm-instantiations. For example how many times hasn't algorithms such std::sort applied on std::vector<T> been instantiated and optimized and compiled in different programs using the same type T typically int, float and double?

In an implementation, IC-modules should be indexed by keys constructed from hash-chain (SHA-1 should suffice) of compiler configuration and IC-code-tree (including the sub-tree-code-hashes of the functions it calls) and stored in for example an std::unordered_map providing very cheap lookups. To even further promote reuse of code the IC-repository could be put online as network-service.

Of course the memoizations should only be cached when needed for optimal good performance. This should have a very small overhead. As most hash-keys lookups should be misses the keys should be placed in memory but not necessarily the code-snippets.

This project has already proved the usefulness of this idea applied to the Python language. I believe Haskell (GHC) may be the ideal language for experimenting with these ideas because of its default function purity and flexible control on function side-effects.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文