确定哪些变量在什么情况下保持不变
这个想法有点类似于苹果在 OpenGL 堆栈中所做的事情。我想要更笼统一些。
基本上,我想为某些特定情况提供一些代码的专门和优化变体。
换句话说:我已经给出了函数的算法/代码(令 B = {0,1})
f : B^n -> B^m
现在,我通过函数特殊了一个特定情况(它预定义了 f 的输入的一部分)
preset : {1..n} -> {0,1,unset}
预定义的数量(ε {0..n})然后由
pn := |preset⁻¹({0,1})|
规范给出,我们现在得到一个专门的函数
f_preset : B^(n-pn) -> B^m
同样规范地,我们得到这个专门函数的代码/算法。当然,f_preset 的代码会比 pn > 的 f 快一些。 0.然后,您还可以进一步优化此代码(现在可能有一些死代码,现在可以解压一些循环,可以预先计算一些计算等)。在某些情况下,它可以有显着的改进。
Apple 对其 OpenGL 堆栈大致是这样做的(根据我读过/知道的):在为不再改变的变量设置所有内容后,他们尝试在运行时找到一个好的预设,然后制作专用函数的优化版本,并且仅使用该函数代替原来的函数。
最初,我想到了一种方法来优化自己的一些游戏的物理模拟。我有很多粒子对象和一组粒子类型(在编译时未知)。粒子类型是一组属性。一旦加载,粒子类型就固定不变。每个粒子对象都属于其中一种粒子类型。粒子对象的物理模拟是一些非常繁重的代码,具有许多分支,并且在很大程度上取决于粒子类型。我现在的想法是为每种粒子类型提供一个优化的物理模拟函数。
经过一番思考后,我想更进一步:
我想在运行时自动计算一组这样的预设,并为每个预设维护优化的代码。我想在情况发生变化时自动添加或删除预设。
现在有几个问题:
- 有没有一种简单的方法来计算一个好的预设?我如何知道在给定情况下哪些变量是恒定的?
- 有没有一种简单的方法来检查预设的好坏? “良好”是指生成的优化代码的性能。
- 如何比较两种算法/代码的性能?通过一些启发式?或者通过随机输入进行测试?
- 一个函数应该有多少个预设(和优化的代码变体)?所有功能都有固定限制?或者每个功能都不同?它甚至可能取决于当前的计算机状态吗?
- 如何维护不同的优化代码变体?自动选择最佳优化变体的围绕 f 的包装函数似乎不太好,因为每次调用都需要进行可能不那么容易的检查。该问题的解决方案也可能与如何找到良好预设的集合/数量的问题密切相关。 (在粒子类型的情况下,优化的代码将附加到粒子类型/与粒子类型一起保存。粒子类型的数量也定义预设的数量。)
对于我最初的情况,这些问题中的大多数都有点过时了,但我现在真的对如何以更通用的方式做到这一点感兴趣。当然,大多数/所有这些问题也是无法计算的,但我想知道在什么程度上你仍然可以获得好的结果。
整个主题对于 JIT 编译器的优化也非常重要。他们已经在进行此类优化了吗?到什么程度呢?
最近是否有好的研究成果可以回答我的一些问题?或者也许还有一些结果表明以如此通用的方式做到这一点太难了?
The idea is somewhat similar to what Apple has done in the OpenGL stack. I want to have that a bit more general.
Basically, I want to have specialised and optimised variants of some code for some specific cases.
In other words: I have given an algorithm/code for a function (let B = {0,1})
f : B^n -> B^m
Now, I special a specific case by a function (which predefines part of the input of f)
preset : {1..n} -> {0,1,unset}
The amount of predefinitions (∈ {0..n}) is then given by
pn := |preset⁻¹({0,1})|
Canonically, we now get a specialised function
f_preset : B^(n-pn) -> B^m
Also canonically, we get the code/algorithm for this specialised function. Naturally, the code for f_preset will be somewhat more fast than f with pn > 0. Then, you also can optimise this code further (there might be some dead code now, some loops can be unpacked now, some calculations can be precalculated, etc). In some cases, it can have noteable improvements.
Apple does roughly this for their OpenGL stack (from what I have read / know): They try to find a good preset at runtime after everything is setup for variables which will not change anymore, then make an optimised version of the specialised function and only use that one instead of the original function.
Initially, I thought about a way to optimise the physics simulation of some own game. There I have a lot of particle objects and a set of particle types (which is unknown at compile time). A particle type is a set of attributes. The particle types are fixed and constant once they are loaded. Each particle object is of one of theye particle types. The physic simulation for a particle object is some very heavy peace of code with many many branches and very heavily depends on the particle type. My idea was now to have an optimised physics simulation function for each particle type.
After thinking a bit about this, I wanted to go a bit further:
I want to automatically calculate a set of such presets at runtime and maintain the optimised code for each. And I want to automatically add or remove presets when the circumstances change.
There are several questions now:
- Is there an easy way to calculate a good preset? How do I know what variables are constant for a given situation?
- Is there an easy way to check how good a preset is? 'Good' refers to the performance of the resulting optimised code.
- How to compare two algorithms/codes for performance? Via some heuristic? Or by testing with random input?
- How many presets (and optimised code variants) should there be for a function? A fixed limit for all functions? Or is this different for every function? Is it maybe even depending on the current computer state?
- How to maintain the different optimised code variants? A wrapper function around f which chooses automatically the best optimised variant doesn't seem to be very nice as this maybe not so easy check would be needed for every single call. A solution to this problem might also be deeply related to the question about how to find the set/amount of good presets. (In the particle type case, the optimised code would be attached to / saved together with the particle type. The amount of particle types also define the amount of presets.)
For my initial case, most of these questions are kind of obsolete but am really interested now in how to do this in a more general way. Of course, most/all of these questions are also uncalculateable but I wonder to what degree you may still get good results.
This whole topic is also very important for optimisations in JIT compilers. Are they doing these kind of optimisations already? To what degree?
Are there good recent research works which answers some of my questions? Or maybe also some results which say that it is just too hard to do this in such a general way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我看来,您似乎在询问部分评估。
事实上,我对这个概念有一点疑问,因为它通常用过于学术和过于困难的术语来表达。
通常的表达方式是,您有一些通用函数 F(Islow, Ifast),其参数可以在不同时间采用不同的值。
Islow
参数很少更改,而Ifast
参数每次调用时都可能不同。那么问题是编写某种部分求值函数
G(F, Islow) -> F1(Ifast)
接受函数F
和Islow
参数,并生成一个新的(更简单)函数F1
,该函数仅接受Ifast
参数。问题是 1) 有人必须编写通用函数
F
,2) 有人必须编写通用部分求值器G
。对我来说更有意义的是从头开始编写一个函数
H(Islow) -> F1(Ifast)
,即专门为F1
编写一个代码生成器,而不是编写两个函数F
和G
,特别是在G
很难写的地方。H
通常比F
容易写,而G
根本不需要写!结果函数F1
通常比F
更小并且性能更高,因此这是一个双赢的局面。当人们编写代码生成器时,这就是他们正在做的事情,这是一种非常有效的编程技术。
It seems to me you are asking about partial evaluation.
I actually have a bit of a problem with that concept, because it is usually couched in terms that are over-academic and over-difficult.
The way it is usually expressed is that you have some general function
F(Islow, Ifast)
having arguments that can take different values at different times. TheIslow
arguments change seldom, and theIfast
arguments can be different every time it is called.Then the problem is to write some kind of partial-evaluator function
G(F, Islow) -> F1(Ifast)
that takes functionF
and theIslow
arguments, and generates a new (simpler) functionF1
that only takes theIfast
arguments.The problem with this is 1) somebody has to write the general function
F
, and 2) somebody has to write the general partial evaluatorG
.What makes more sense to me is to write from scratch a function
H(Islow) -> F1(Ifast)
, that is, write a code-generator specifically forF1
, rather than writing two functionsF
andG
, especially whereG
is very difficult to write.H
is usually much easier to write thanF
, andG
need not be written at all! The result functionF1
usually is smaller and has much higher performance thanF
, so it's a win-win situation.When people write code generators, that is what they are doing, and it is a very effective programming technique.