确定哪些变量在什么情况下保持不变

发布于 2024-09-01 13:25:17 字数 1594 浏览 3 评论 0原文

这个想法有点类似于苹果在 OpenGL 堆栈中所做的事情。我想要更笼统一些。

基本上，我想为某些特定情况提供一些代码的专门和优化变体。

换句话说：我已经给出了函数的算法/代码（令 B = {0,1}）

f : B^n -> B^m

现在，我通过函数特殊了一个特定情况（它预定义了 f 的输入的一部分）

preset : {1..n} -> {0,1,unset}

预定义的数量（ε {0..n}）然后由

pn := |preset⁻¹({0,1})|

规范给出，我们现在得到一个专门的函数

f_preset : B^(n-pn) -> B^m

同样规范地，我们得到这个专门函数的代码/算法。当然，f_preset 的代码会比 pn > 的 f 快一些。 0.然后，您还可以进一步优化此代码（现在可能有一些死代码，现在可以解压一些循环，可以预先计算一些计算等）。在某些情况下，它可以有显着的改进。

Apple 对其 OpenGL 堆栈大致是这样做的（根据我读过/知道的）：在为不再改变的变量设置所有内容后，他们尝试在运行时找到一个好的预设，然后制作专用函数的优化版本，并且仅使用该函数代替原来的函数。

最初，我想到了一种方法来优化自己的一些游戏的物理模拟。我有很多粒子对象和一组粒子类型（在编译时未知）。粒子类型是一组属性。一旦加载，粒子类型就固定不变。每个粒子对象都属于其中一种粒子类型。粒子对象的物理模拟是一些非常繁重的代码，具有许多分支，并且在很大程度上取决于粒子类型。我现在的想法是为每种粒子类型提供一个优化的物理模拟函数。

经过一番思考后，我想更进一步：

我想在运行时自动计算一组这样的预设，并为每个预设维护优化的代码。我想在情况发生变化时自动添加或删除预设。

现在有几个问题：

有没有一种简单的方法来计算一个好的预设？我如何知道在给定情况下哪些变量是恒定的？
有没有一种简单的方法来检查预设的好坏？ “良好”是指生成的优化代码的性能。
如何比较两种算法/代码的性能？通过一些启发式？或者通过随机输入进行测试？
一个函数应该有多少个预设（和优化的代码变体）？所有功能都有固定限制？或者每个功能都不同？它甚至可能取决于当前的计算机状态吗？
如何维护不同的优化代码变体？自动选择最佳优化变体的围绕 f 的包装函数似乎不太好，因为每次调用都需要进行可能不那么容易的检查。该问题的解决方案也可能与如何找到良好预设的集合/数量的问题密切相关。（在粒子类型的情况下，优化的代码将附加到粒子类型/与粒子类型一起保存。粒子类型的数量也定义预设的数量。）

对于我最初的情况，这些问题中的大多数都有点过时了，但我现在真的对如何以更通用的方式做到这一点感兴趣。当然，大多数/所有这些问题也是无法计算的，但我想知道在什么程度上你仍然可以获得好的结果。

整个主题对于 JIT 编译器的优化也非常重要。他们已经在进行此类优化了吗？到什么程度呢？

最近是否有好的研究成果可以回答我的一些问题？或者也许还有一些结果表明以如此通用的方式做到这一点太难了？

原文

The idea is somewhat similar to what Apple has done in the OpenGL stack. I want to have that a bit more general.

Basically, I want to have specialised and optimised variants of some code for some specific cases.

In other words: I have given an algorithm/code for a function (let B = {0,1})

f : B^n -> B^m

Now, I special a specific case by a function (which predefines part of the input of f)

preset : {1..n} -> {0,1,unset}

The amount of predefinitions (∈ {0..n}) is then given by

pn := |preset⁻¹({0,1})|

Canonically, we now get a specialised function

f_preset : B^(n-pn) -> B^m

Also canonically, we get the code/algorithm for this specialised function. Naturally, the code for f_preset will be somewhat more fast than f with pn > 0. Then, you also can optimise this code further (there might be some dead code now, some loops can be unpacked now, some calculations can be precalculated, etc). In some cases, it can have noteable improvements.

Apple does roughly this for their OpenGL stack (from what I have read / know): They try to find a good preset at runtime after everything is setup for variables which will not change anymore, then make an optimised version of the specialised function and only use that one instead of the original function.

Initially, I thought about a way to optimise the physics simulation of some own game. There I have a lot of particle objects and a set of particle types (which is unknown at compile time). A particle type is a set of attributes. The particle types are fixed and constant once they are loaded. Each particle object is of one of theye particle types. The physic simulation for a particle object is some very heavy peace of code with many many branches and very heavily depends on the particle type. My idea was now to have an optimised physics simulation function for each particle type.

After thinking a bit about this, I wanted to go a bit further:

I want to automatically calculate a set of such presets at runtime and maintain the optimised code for each. And I want to automatically add or remove presets when the circumstances change.

There are several questions now:

Is there an easy way to calculate a good preset? How do I know what variables are constant for a given situation?
Is there an easy way to check how good a preset is? 'Good' refers to the performance of the resulting optimised code.
How to compare two algorithms/codes for performance? Via some heuristic? Or by testing with random input?
How many presets (and optimised code variants) should there be for a function? A fixed limit for all functions? Or is this different for every function? Is it maybe even depending on the current computer state?
How to maintain the different optimised code variants? A wrapper function around f which chooses automatically the best optimised variant doesn't seem to be very nice as this maybe not so easy check would be needed for every single call. A solution to this problem might also be deeply related to the question about how to find the set/amount of good presets. (In the particle type case, the optimised code would be attached to / saved together with the particle type. The amount of particle types also define the amount of presets.)

For my initial case, most of these questions are kind of obsolete but am really interested now in how to do this in a more general way. Of course, most/all of these questions are also uncalculateable but I wonder to what degree you may still get good results.

This whole topic is also very important for optimisations in JIT compilers. Are they doing these kind of optimisations already? To what degree?

Are there good recent research works which answers some of my questions? Or maybe also some results which say that it is just too hard to do this in such a general way?

分享到QQ

分享到微博