Clojure 中惰性卷积 fn 的问题
我正在编写一些信号处理软件,我首先编写一个离散卷积函数< /a>。
这对于前一万个左右的值列表来说效果很好,但随着它们变得更大(比如 100k),我当然开始收到 StackOverflow 错误。
不幸的是,我在将命令式卷积算法转换为递归和递归算法时遇到了很多麻烦。惰性版本实际上足够快,可以使用(至少有一点优雅也很好)。
我也不能 100% 确定我的这个功能完全正确,但如果我遗漏了什么/做错了什么,请告诉我。我认为这是正确的。
(defn convolve
"
Convolves xs with is.
This is a discrete convolution.
'xs :: list of numbers
'is :: list of numbers
"
[xs is]
(loop [xs xs finalacc () acc ()]
(if (empty? xs)
(concat finalacc acc)
(recur (rest xs)
(if (empty? acc)
()
(concat finalacc [(first acc)]))
(if (empty? acc)
(map #(* (first xs) %) is)
(vec-add
(map #(* (first xs) %) is)
(rest acc)))))))
我非常感谢任何类型的帮助:我仍然在 Clojure 中了解自己的方位,并且使这种优雅、懒惰和/或递归会很棒。
我有点惊讶的是,在 Lisp 中用命令式语言表达相当的算法是多么困难。但也许我做错了!
编辑: 只是为了展示用命令式语言表达是多么容易,并为人们提供运行良好且易于阅读的算法,这里是 Python 版本。除了更短、更简洁、更容易推理之外,它的执行速度比 Clojure 代码快几个数量级:甚至是我使用 Java 数组的命令式 Clojure 代码。
from itertools import repeat
def convolve(ns, ms):
y = [i for i in repeat(0, len(ns)+len(ms)-1)]
for n in range(len(ns)):
for m in range(len(ms)):
y[n+m] = y[n+m] + ns[n]*ms[m]
return y
另一方面,这里是命令式 Clojure 代码。它还会从卷积中删除最后一个非完全沉浸的值。因此,除了缓慢且丑陋之外,它并不是 100% 实用。也不实用。
(defn imp-convolve-1
[xs is]
(let [ys (into-array Double (repeat (dec (+ (count xs) (count is))) 0.0))
xs (vec xs)
is (vec is)]
(map #(first %)
(for [i (range (count xs))]
(for [j (range (count is))]
(aset ys (+ i j)
(+ (* (nth xs i) (nth is j))
(nth ys (+ i j)))))))))
这太令人沮丧了。请有人告诉我我刚刚错过了一些明显的事情。
编辑3:
这是我昨天想到的另一个版本,展示了我希望如何表达它(尽管其他解决方案非常优雅;我只是把另一个解决方案放在那里!)
(defn convolve-2
[xs is]
(reduce #(vec-add %1 (pad-l %2 (inc (count %1))))
(for [x xs]
(for [i is]
(* x i)))))
它使用这个实用函数vec-add
:
(defn vec-add
([xs] (vec-add xs []))
([xs ys]
(let [lxs (count xs)
lys (count ys)
xs (pad-r xs lys)
ys (pad-r ys lxs)]
(vec (map #(+ %1 %2) xs ys))))
([xs ys & more]
(vec (reduce vec-add (vec-add xs ys) more))))
(vec (reduce vec-add (vec-add xs ys) more))))
I am writing some signal processing software, and I am starting off by writing out a discrete convolution function.
This works fine for the first ten thousand or so list of values, but as they get larger (say, 100k), I begin to get StackOverflow errors, of course.
Unfortunately, I am having a lot of trouble converting the imperative convolution algorithm I have to a recursive & lazy version that is actually fast enough to use (having at least a modicum of elegance would be nice as well).
I am also not 100% sure I have this function completely right, yet – please let me know if I'm missing something/doing something wrong. I think it's correct.
(defn convolve
"
Convolves xs with is.
This is a discrete convolution.
'xs :: list of numbers
'is :: list of numbers
"
[xs is]
(loop [xs xs finalacc () acc ()]
(if (empty? xs)
(concat finalacc acc)
(recur (rest xs)
(if (empty? acc)
()
(concat finalacc [(first acc)]))
(if (empty? acc)
(map #(* (first xs) %) is)
(vec-add
(map #(* (first xs) %) is)
(rest acc)))))))
I'd be much obliged for any sort of help: I'm still getting my bearings in Clojure, and making this elegant and lazy and/or recursive would be wonderful.
I'm a little surprised how difficult it is to express an algorithm which is quite easy to express in an imperative language in a Lisp. But perhaps I'm doing it wrong!
EDIT:
Just to show how easy it is to express in an imperative language, and to give people the algorithm that works nicely and is easy to read, here is the Python version. Aside from being shorter, more concise and far easier to reason about, it executes orders of magnitude faster than the Clojure code: even my imperative Clojure code using Java arrays.
from itertools import repeat
def convolve(ns, ms):
y = [i for i in repeat(0, len(ns)+len(ms)-1)]
for n in range(len(ns)):
for m in range(len(ms)):
y[n+m] = y[n+m] + ns[n]*ms[m]
return y
Here, on the other hand, is the imperative Clojure code. It also drops the last, non fully-immersed, values from the convolution. So aside from being slow and ugly, it's not 100% functional. Nor functional.
(defn imp-convolve-1
[xs is]
(let [ys (into-array Double (repeat (dec (+ (count xs) (count is))) 0.0))
xs (vec xs)
is (vec is)]
(map #(first %)
(for [i (range (count xs))]
(for [j (range (count is))]
(aset ys (+ i j)
(+ (* (nth xs i) (nth is j))
(nth ys (+ i j)))))))))
This is so disheartening. Please, someone show me I've just missed something obvious.
EDIT 3:
Here's another version I thought up yesterday, showing how I'd like to be able express it (though other solutions are quite elegant; I'm just putting another one out there!)
(defn convolve-2
[xs is]
(reduce #(vec-add %1 (pad-l %2 (inc (count %1))))
(for [x xs]
(for [i is]
(* x i)))))
It uses this utility function vec-add
:
(defn vec-add
([xs] (vec-add xs []))
([xs ys]
(let [lxs (count xs)
lys (count ys)
xs (pad-r xs lys)
ys (pad-r ys lxs)]
(vec (map #(+ %1 %2) xs ys))))
([xs ys & more]
(vec (reduce vec-add (vec-add xs ys) more))))
(vec (reduce vec-add (vec-add xs ys) more))))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果我在 Clojure equiv 分支中执行此操作,则重复 jg-faustus 的版本。对我有用。在 i7 Mackbook Pro 上,1,000,000 个点约为 400 毫秒,100,000 个点约为 25 毫秒。
Riffing on j-g-faustus's version if I was doing this in the Clojure equiv branch. Works for me. ~400ms for 1,000,000 points, ~25ms for 100,000 on a i7 Mackbook Pro.
堆栈溢出错误的可能原因是惰性重击变得太深。 (
concat
和map
是惰性的)。尝试将这些调用包装在doall
中以强制评估其返回值。至于更实用的解决方案,请尝试这样的方法:
使用可以使用
reduce
轻松执行求和,并且由于map
产生惰性序列,因此该函数也是惰性的。The likely cause of the stack overflow errors is that the lazy thunks are getting too deep. (
concat
andmap
are lazy). Try wrapping those calls indoall
to force evaluation of their return values.As for a more functional solution, try something like this:
Use can use
reduce
to perform summation easily, and sincemap
produces a lazy sequence, this function is also lazy.无法帮助实现高性能函数式版本,但通过放弃惰性并添加类型提示,您可以使命令式版本获得 100 倍的加速:
使用
xs
大小 100k 且is< /code> size 2,当你的 imp-convolve-1 包裹在 doall 中时,你的 imp-convolve-1 在我的机器上需要大约 6,000 毫秒,而这个需要大约 35 毫秒。
更新
这是一个惰性函数版本:
在大小 100k 和 2 上,它的时钟时间约为 600 毫秒(450-750 毫秒不等),而 imp-convolve-1 约为 6,000 毫秒,imp-convolve 约为 35 毫秒-2。
所以它是实用的、惰性的并且具有可接受的性能。不过,它的代码量是命令式版本的两倍,而且我又花了 1-2 个小时才找到,所以我不确定我是否明白这一点。
当纯函数使代码更短或更简单,或者比命令式版本有其他一些好处时,我完全支持纯函数。当他们不这样做时,我不反对切换到命令式模式。
这是我认为 Clojure 很棒的原因之一,因为您可以根据需要使用任何一种方法。
更新2:
我会修改我的“功能上这样做的意义是什么”,说我喜欢此功能实现(第二个,页面下方),作者:David Cabana。
它简短、可读,时间约为 140 毫秒,输入大小与上面相同(100,000 和 2),使其成为迄今为止我尝试过的性能最佳的功能实现。
考虑到它是函数式的(但不是惰性的),不使用类型提示并且适用于所有数字类型(不仅仅是双精度数),这是相当令人印象深刻的。
Can't help with a high-performance functional version, but you can get a 100-fold speedup for the imperative version by foregoing laziness and adding type hints:
With
xs
size 100k andis
size 2, your imp-convolve-1 takes ~6,000ms on my machine when wrapped in a doall, while this one takes ~35ms.Update
Here is a lazy functional version:
On sizes 100k and 2, it clocks in at ~600ms (varying 450-750ms) vs ~6,000ms for imp-convolve-1 and ~35ms for imp-convolve-2.
So it's functional, lazy and has tolerable performance. Still, it's twice as much code as the imperative version and took me 1-2 additional hours to find, so I'm not sure that I see the point.
I'm all for pure functions when they make the code shorter or simpler, or have some other benefit over an imperative version. When they don't, I have no objection to switch to imperative mode.
Which is one of the reasons I think Clojure is great, since you can use either approach as you see fit.
Update 2:
I'll amend my "what's the point of doing this functionally" by saying that I like this functional implementation (the second one, further down the page) by David Cabana.
It's brief, readable and times to ~140ms with the same input sizes as above (100,000 and 2), making it by far the best-performing functional implementation of those I tried.
Considering that it is functional (but not lazy), uses no type hints and works for all numeric types (not just doubles), that's quite impressive.
可以用简洁的术语表达一个惰性的、函数式的解决方案。唉,> 的表现2k不切实际。我很想知道是否有方法可以在不牺牲可读性的情况下加快速度。
编辑:
阅读 drcabana 关于该主题的信息帖子(http: //erl.nfshost.com/2010/07/17/discrete-convolution-of-finite-vectors/),我已经更新了我的代码以接受不同大小的向量。他的实现表现更好:
对于 xs 尺寸 3,尺寸为 1000000,~2s 与 ~3s
编辑 2:
采用 drcabana 简单反转 xs 和 padding 的想法,我得出的结论是:
这可能是尽可能简洁的,但总体来说仍然较慢,可能是由于需要一段时间。感谢博客作者经过深思熟虑的方法。这里唯一的优点是上面的代码确实很懒,因为如果我询问 (nth res 10000),它只需要前 10k 次计算即可得出结果。
It is possible to express a lazy, functional solution in concise terms. Alas, the performance for > 2k is impractical. I'm interested to see if there are ways to speed it up without sacrificing readability.
Edit:
After reading drcabana's informative post on the topic (http://erl.nfshost.com/2010/07/17/discrete-convolution-of-finite-vectors/), I've updated my code to accept different sized vectors. His implementation is better performing:
for xs size 3, is size 1000000, ~2s vs ~3s
Edit 2:
Taking drcabana's ideas of simply reversing xs and padding is, I arrived at:
This is probably as concise as it's going to get, but it is still slower overall, likely due to take-while. Kudos to the blog author for a well considered approach. The only advantage here is that the above is truly lazy in that if I ask (nth res 10000), it will only need the first 10k calculations to arrive at a result.
并不是对您提出的众多问题中的任何一个的真正答案,但我对您没有提出的问题有一些评论。
您可能不应该在向量上使用
nth
。是的,它的时间复杂度为 O(1),但由于nth
在 O(n) 中适用于其他序列,因此 (a) 并没有明确表明您期望输入是向量,并且 ( b) 意味着如果你犯了一个错误,你的代码会神秘地变得非常慢,而不是立即失败。for
和map
都是惰性的,而aset
仅具有副作用。这种组合会导致灾难:对于类似for
的副作用,请使用doseq
。for
和doseq
允许多个绑定,因此您不需要像在 Python 中那样(显然)堆积它们的负载。会做你想做的事。
#(first %)
更简洁地写为first
;同样,#(+ %1 %2)
是+
。对一堆不需要向量的中间结果调用 vec 会降低速度。具体来说,在
vec-add
中,仅在创建最终返回值时调用vec
就足够了:在(vec (reduce foo bar))
中,有如果foo
从未将其中间结果用于随机访问,则它没有理由将其转换为向量。Not really an answer to any of the many questions you asked, but I have several comments on the ones you didn't ask.
You probably shouldn't use
nth
on vectors. Yes, it's O(1), but becausenth
works on other sequences in O(n), it (a) doesn't make it clear that you expect the input to be a vector, and (b) means if you make a mistake, your code will mysteriously get really slow instead of failing immediately.for
andmap
are both lazy, andaset
is side-effects-only. This combination is a recipe for disaster: for side-effectingfor
-like behavior, usedoseq
.for
anddoseq
allow multiple bindings, so you don't need to pile up loads of them like you (apparently) do in Python.will do what you want.
#(first %)
is more concisely written asfirst
; likewise#(+ %1 %2)
is+
.Calling
vec
on a bunch of intermediate results that don't need to be vectors will slow you down. Specifically invec-add
it's sufficient to only callvec
when you make a final return value: in(vec (reduce foo bar))
there's no reason forfoo
to turn its intermediate results into vectors if it never uses them for random access.