我是否可以始终将仅可变算法转换为单赋值并且仍然高效?
背景
这个问题的背景是我想玩一下基因表达编程(GEP) ,一种进化算法,使用 Erlang。 GEP 使用基于字符串的 DSL,称为“Karva 表示法”。 Karva 表示法 很容易翻译成表达式解析树,但翻译算法假设有一个实现具有可变对象:不完整的子表达式在翻译过程的早期创建,并且它们自己的子表达式稍后用创建时未知的值填充。
Karva 表示法的目的是保证创建语法正确的表达式,而无需任何昂贵的编码技术或遗传密码校正。问题是,使用像 Erlang 这样的单赋值编程语言,我必须 重新创建< /a> 随着每个子表达式的填充而不断地生成表达式树。这需要花费很少的时间 - O(n)? - 更新操作并将其转换为将在指数时间内完成的操作(除非我弄错了)。如果我找不到一种有效的函数算法将 K 表达式转换为表达式树,那么 GEP 的一个引人注目的功能就会丢失。
我意识到 K 表达式翻译问题
相当晦涩,所以我想要的是关于如何将本质上非功能性算法(利用可变数据结构的 alg)转换为非功能性算法的建议。纯函数式编程语言如何适应计算机科学早期产生的许多算法和数据结构,这些算法和数据结构依赖于可变性来获得所需的性能特征?
The Context
The context of this question is that I want to play around with Gene Expression Programming (GEP), a form of evolutionary algorithm, using Erlang. GEP makes use of a string based DSL called 'Karva notation'. Karva notation is easily translated into expression parse trees, but the translation algorithm assumes an implementation having mutable objects: incomplete sub-expressions are created early-on the translation process and their own sub-expressions are filled-in later-on with values that were not known at the time they were created.
The purpose of Karva notation is that it guarantees syntactically correct expressions are created without any expensive encoding techniques or corrections of genetic code. The problem is that with a single-assignment programming language like Erlang, I have to recreate the expression tree continually as each sub expression gets filled in. This takes an inexpensive - O(n)? - update operation and converts it into one that would complete in exponential time (unless I'm mistaken). If I can't find an efficient functional algorithm to convert K-expressions into expression trees, then one of the compelling features of GEP is lost.
The Question
I appreciate that the K-expression translation problem is pretty obscure, so what I want is advice on how to convert an inherently-non-functional algorithm (alg that exploits mutable data structures) into one that does not. How do pure functional programming languages adapt many of the algorithms and data structures that were produced in the early days of computer science that depend on mutability to get the performance characteristics they need?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
精心设计的不变性避免了不必要的更新
不可变数据结构只有在不断变化或者以错误的方式构建它们时才会出现效率问题。例如,不断向不断增长的列表的末尾添加更多内容是二次的,而连接列表的列表是线性的。如果你仔细思考,你通常可以以合理的方式构建你的结构,而懒惰的评估是你的朋友 - 承诺解决它并停止担心。
盲目地尝试复制命令式算法可能是低效的,但是您错误地断言函数式编程在这里必须是渐近糟糕的。
案例研究:纯函数 GEP:线性时间中的 Karva 表示法
我将继续解析 GEP 的 Karva 表示法的案例研究。 (
我在这个答案中更全面地研究了这个解决方案。)
这是一个相当干净的纯函数式解决方案。我将借此机会列出一些好的通用递归方案。
代码
(导入
Data.Tree
提供data Tree a = Node {rootLabel :: a,subForest :: Forest a}
,其中type Forest a = [Tree a]
.)正态是变形(build up、unfoldr)和变形(combine、foldr)的组合。
这些术语是在开创性论文 函数式编程 中向 FP 社区介绍的与香蕉、镜片和铁丝网。
我们将把关卡拉出来(ana/展开)并将它们重新组合在一起(cata/折叠)。
为了拉出一个级别,我们使用前一个级别的总数量来找到在哪里分割这个新级别,并传递这个级别的总数量,为下一次做好准备:
将一个级别(作为字符串)与下面的级别(已经是森林),我们只需计算每个角色所需的树木数量即可。
现在我们可以使用hylomorphism 来解析Karva。请注意,我们使用
1
字符串外部的总元数作为种子,因为根级别只有一个节点。相应地,我们将head
应用于结果,以在水态之后恢复该单例。线性时间
没有指数爆炸,也没有重复的 O(log(n)) 查找或昂贵的修改,所以我们不应该遇到太多麻烦。
数量
为O(1
)splitAt部分
为O(部分
)pullLevel (部分,cs)
是 O(part
),使用splitAt
获取level
,再加上 O(part
)对于地图数量级别
,所以 O(part
)splitAt
,combineLevel (c:cs)
是 O(arity c
),并且 O (sum $map arity cs
) 用于递归调用hylomorphism []combineLevel pullLevel 0 (1,cs)
pullLevel
调用,因此总pullLevel
成本为 O(sum
parts) = O(n)combineLevel
调用,因此总combineLevel
成本为 O(sum $map arity
level) = O(n ),因为有效字符串的整个输入的总数量受 n 限制。0
进行 O(#levels) 次调用(即 O(1
)),并且#levels
受绑定>n
,所以它也低于 O(n)因此 karvaToTree 与输入的长度呈线性关系。
我认为这可以消除您需要使用可变性来获得线性算法的断言。
演示
让我们绘制一下结果(因为 Tree 充满语法,很难阅读输出!)。您必须
cabal install Pretty-tree
才能获取Data.Tree.Pretty
。它与我在其中找到示例的本教程中预期的输出相匹配:
Carefully designed immutability avoids unecessary updating
Immutable data structures are only an efficiency problem if they're constantly changing, or you build them up the wrong way. For example, continually appending more to the end of a growing list is quadratic, whereas concatenating a list of lists is linear. If you think carefully, you can usually build up your structure in a sensible way, and lazy evaluation is your friend - hand out a promise to work it out and stop worrying.
Blindly trying to replicate an imperative algorithm can be ineffecient, but you're mistaken in your assertion that functional programming has to be asymptotically bad here.
Case study: pure functional GEP: Karva notation in linear time
I'll stick with your case study of parsing Karva notation for GEP. (
I've played with this solution more fully in this answer.)
Here's a fairly clean pure functional solution to the problem. I'll take the opportunity to name drop some good general recursion schemes along the way.
Code
(Importing
Data.Tree
suppliesdata Tree a = Node {rootLabel :: a, subForest :: Forest a}
wheretype Forest a = [Tree a]
.)A hylomorphism is the composition of an anamorphism (build up, unfoldr) and a catamorphism (combine, foldr).
These terms are introduced to the FP community in the seminal paper Functional Programming with Bananas, Lenses and Barbed wire.
We're going to pull the levels out (ana/unfold) and combine them back together (cata/fold).
To pull out a level, we use the total arity from the previous level to find where to split off this new level, and pass on the total arity for this one ready for next time:
To combine a level (as a String) with the level below (that's already a Forest), we just pull off the number of trees that each character needs.
Now we can parse the Karva using a hylomorphism. Note that we seed it with a total arity from outside the string of
1
, since there's only one node at the root level. Correspondingly we applyhead
to the result to get this singleton back out after the hylomorphism.Linear Time
There's no exponential blowup, nor repeated O(log(n)) lookups or expensive modifications, so we shouldn't be in too much trouble.
arity
is O(1
)splitAt part
is O(part
)pullLevel (part,cs)
is O(part
) for grab usingsplitAt
to getlevel
, plus O(part
) for themap arity level
, so O(part
)combineLevel (c:cs)
is O(arity c
) for thesplitAt
, and O(sum $ map arity cs
) for the recursive callhylomorphism [] combineLevel pullLevel zero (1,cs)
pullLevel
call for each level, so the totalpullLevel
cost is O(sum
parts) = O(n)combineLevel
call for each level, so the totalcombineLevel
cost is O(sum $ map arity
levels) = O(n), since the total arity of the entire input is bound by n for valid strings.zero
(which is O(1
)), and#levels
is bound byn
, so that's below O(n) tooHence
karvaToTree
is linear in the length of the input.I think that puts to rest the assertion that you needed to use mutability to get a linear algorithm here.
Demo
Let's have a draw of the results (because Tree is so full of syntax it's hard to read the output!). You have to
cabal install pretty-tree
to getData.Tree.Pretty
.which matches the output expected from this tutorial where I found the example:
没有单一的方法可以做到这一点,必须根据具体情况进行尝试。我通常尝试使用折叠和展开将它们分解为更简单的操作,然后从那里进行优化。正如其他人指出的那样,Karva 解码案例是广度优先的树展开,因此我从 treeUnfoldM_BF 开始。也许Erlang中也有类似的功能。
如果解码操作成本过高,您可以记住解码并共享/重用子树...尽管它可能不适合通用树展开器,并且您需要编写专门的函数来执行此操作。如果适应度函数足够慢,那么使用像我下面列出的这样的简单解码器可能没问题。每次调用都会完全重建树。
There isn't a single way to do this, it really has to be attempted case-by-case. I typically try to break them down into simpler operations using fold and unfold and then optimize from there. Karva decoding case is a breadth-first tree unfold as others have noted, so I started with treeUnfoldM_BF. Perhaps there are similar functions in Erlang.
If the decoding operation is unreasonably expensive, you could memoize the decoding and share/reuse subtrees... though it probably wouldn't fit into a generic tree unfolder and you'd need to write specialized function to do so. If the fitness function is slow enough, it may be fine to use a naive decoder like the one I have listed below. It will fully rebuild the tree each invocation.
当函数式编程中需要可变状态时,有几种解决方案。
使用不同的算法来解决相同的问题。例如,快速排序通常被认为是可变的,因此在功能设置中可能不太有用,但合并排序通常更适合功能设置。我无法判断此选项是否可行或在您的情况下是否有意义。
即使是函数式编程语言通常也提供某种改变状态的方法。 (这篇博客帖子似乎展示了如何在 Erlang 中完成。)对于某些算法和数据结构,这确实是唯一可用的选择(我认为关于该主题的研究很活跃);例如,函数式编程语言中的哈希表通常是用可变状态实现的。
就您而言,我不太确定不变性是否真的会导致性能瓶颈。你是对的,(子)树将在更新时重新创建,但 Erlang 实现可能会重用所有未更改的子树,导致每次更新的复杂度为 O(log n),而不是可变状态的 O(1) 。此外,不会复制树的节点,而是复制对节点的引用,这应该相对有效。您可以在Okasaki 的论文中阅读有关功能设置中的树更新的信息< /a> 或在他基于论文的书《Purely Function Data Structures》中。我会尝试使用不可变的数据结构来实现该算法,如果遇到性能问题,则切换到可变的数据结构。
另请参阅此处和此处。
There are a couple of solutions when mutable state in functional programming is required.
Use a different algorithm that solves the same problem. E.g. quicksort is generally regarded as mutable and may therefore be less useful in a functional setting, but mergesort is generally better suited for a functional setting. I can't tell if this option is possible or makes sense in your case.
Even functional programming languages usually provide some way to mutate state. (This blog post seems to show how to do it in Erlang.) For some algorithms and data structures this is indeed the only available option (there's active research on the topic, I think); for example hash tables in functional programming languages are generally implemented with mutable state.
In your case, I'm not so sure immutability really leads to a performance bottleneck. You are right, the (sub)tree will be recreated on update, but the Erlang implementation will probably reuse all the subtrees that haven't changed, leading to O(log n) complexity per update instead of O(1) with mutable state. Also, the nodes of the trees won't be copied but instead the references to the nodes, which should be relatively efficient. You can read about tree updates in a functional setting in e.g. the thesis from Okasaki or in his book "Purely Functional Data Structures" based on the thesis. I'd try implementing the algorithm with an immutable data structure and switch to a mutable one if you have a performance problem.
Also see some relevant SO questions here and here.
我想我已经弄清楚如何用 K 树解决您的特定问题(一般问题太难了:P)。我的解决方案以某种可怕的混合 Python 的伪代码形式呈现(我今天的 FP 速度非常慢)但是在创建一个节点后它不会更改节点(诀窍是构建树自下而上)
首先,我们需要找到哪些节点属于哪个级别:
因此,在
+/*abcd
中,例如,这应该为您提供[+, /*, abcd]< /代码>。现在你可以将其转换为自下而上的树:
我很确定我们现在可以非常轻松地将其转换为单一赋值 Erlang/Haskell。
I think I figured out how to solve your particular problem with the K trees, (the general problem is too hard :P). My solution is presented in some horrible sort of hybrid Python-like psudocode (I am very slow on my FP today) but it doesn't change a node after you create one (the trick is building the tree bottom-up)
First, we need to find which nodes belong to which level:
So in the
+/*abcd
, example, this should give you[+, /*, abcd]
. Now you can convert this into a tree bottom up:I am pretty sure we can convert this into single assignment Erlang/Haskell very easily now.