如何提高 clojure 中对两个数组进行操作的函数的性能
我有一组少量的函数。两个函数执行数学叠加运算(在 http://docs.gimp 上定义.org/en/gimp-concepts-layer-modes.html,但有点下降——只需以不同的方式搜索“overlay”即可找到数学)。现在,这个操作是 Gimp 非常快地完成的,不到一秒钟,但我似乎无法优化我的代码以获得类似远程相似时间的东西。
(我的应用程序是一个 GUI 应用程序,可帮助我查看和比较大量文件的各种叠加组合。Gimp 层界面实际上使得仅选择两个图像进行叠加,然后选择不同的两个图像等变得相当困难。
)代码是:
(set! *warn-on-reflection* true )
(defn to-8-bit [v]
(short (* (/ v 65536) 256)))
(defn overlay-sample [base-p over-p]
(to-8-bit
(* (/ base-p 65536)
(+ base-p
(* (/ (* 2 over-p) 65536)
(- 65536 base-p))))))
(defn overlay-map [^shorts base ^shorts over]
(let [ovl (time (doall (map overlay-sample ^shorts base ^shorts over)))]
(time (into-array Short/TYPE ovl))))
(defn overlay-array [base over]
(let [ovl (time (amap base
i
r
(int (overlay-sample (aget r i)
(aget over i)))))]
ovl))
overlay-map和overlay-array以不同的方式执行相同的操作。我也写过这个操作的其他版本。然而,overlay-map 是迄今为止我拥有的最快的。
在这两个函数中,base 和 over 都是 16 位整数数组。每个样本的实际大小为 1,276,800 个样本(800 x 532 图像,每个像素 3 个样本)。最终结果应该是相同的单个数组,但缩小到 8 位。
我的(时间)操作结果非常一致。 overlay-map 在大约 16 或 17 秒内运行实际的数学运算,然后再花 5 秒将结果序列复制回整数数组。
override-array 大约需要 111 秒。
我已经阅读了大量有关使用数组、类型提示等的内容,但我的 Java-Array-Only 操作速度慢得惊人! amap、aget 等都应该很快,但我读过代码,没有任何看起来像是速度优化的东西,而且我的结果是一致的。我什至尝试过其他计算机并看到了大致相同的差异。
现在,对于这个数据集来说,16-17 秒实际上相当痛苦,但我一直在缓存结果,以便我可以轻松地来回切换。如果我将数据集的大小增加到全尺寸图像 (4770x3177) 之类的大小,则相同的操作将花费非常长的时间。而且,我还想做其他的手术。
那么,关于如何加快速度有什么建议吗?我在这里缺少什么?
更新:我刚刚公开了与此代码相关的整个项目,因此您可以在 https://bitbucket.org/savannidgerinel/hdr-darkroom/src/62a42fcf6a4b/scripts/speed_test.clj。请随意下载它并在您自己的设备上尝试它,但显然在运行之前更改图像文件路径。
I have a set of a small number of functions. Two functions perform a mathematical overlay operation (defined on http://docs.gimp.org/en/gimp-concepts-layer-modes.html, but a little down -- just search for "overlay" to find the math) in different ways. Now, this operation is something that Gimp does very quickly, in under a second, but I can't seem to optimize my code to get anything like remotely similar time.
(My application is a GUI application to help me see and compare various overlay combinations of a large number of files. The Gimp layer interface actually makes it rather difficult to just pick two images to overlay, then pick a different two, etc.)
Here is the code:
(set! *warn-on-reflection* true )
(defn to-8-bit [v]
(short (* (/ v 65536) 256)))
(defn overlay-sample [base-p over-p]
(to-8-bit
(* (/ base-p 65536)
(+ base-p
(* (/ (* 2 over-p) 65536)
(- 65536 base-p))))))
(defn overlay-map [^shorts base ^shorts over]
(let [ovl (time (doall (map overlay-sample ^shorts base ^shorts over)))]
(time (into-array Short/TYPE ovl))))
(defn overlay-array [base over]
(let [ovl (time (amap base
i
r
(int (overlay-sample (aget r i)
(aget over i)))))]
ovl))
overlay-map and overlay-array do the same operation in different ways. I've written other versions of this operation, too. However, overlay-map is, by far, the fastest I have.
base and over, in both functions, are 16-bit integer arrays. The actual size of each is 1,276,800 samples (an 800 x 532 image with 3 samples per pixel). The end result should be a single array of the same, but scaled down to 8-bits.
My results from the (time) operation are pretty consistent. overlay-map runs the actual mathematical operation in about 16 or 17 seconds, then spends another 5 seconds copying the resulting sequence back into an integer array.
overlay-array takes about 111 seconds.
I've done a lot of reading about using arrays, type hints, etc, but my Java-Array-Only operation is amazingly slow! amap, aget, etc was all supposed to be fast, but I have read the code and there is nothing that looks like a speed optimization there, and my results are consistent. I've even tried other computers and seen roughly the same difference.
Now, 16-17 seconds is, actually rather painful at this data set, but I've been caching the results so that I can easily switch back and forth. The same operation would take an atrociously long time if I increased the size of the dataset to anything like a full-size image (4770x3177). And, there's other operations I want to be doing, too.
So, any suggestions on how to speed this up? What am I missing here?
UPDATE: I just made the entire project pertaining to this code public, so you can see the current version entire script I am using for speed tests at https://bitbucket.org/savannidgerinel/hdr-darkroom/src/62a42fcf6a4b/scripts/speed_test.clj . Feel free to download it and try it on your own gear, but obviously change the image file paths before running it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您的函数纯粹是数学函数,因此您可能需要查看 memoize
这里发生的情况是参数被缓存为键,返回值是值。如果已经计算了值,则返回该值而不是执行的函数。
Since your functions are purely mathematical, you might want to check out memoize
What's happening here is the arguments are being cached as the key and the return is the value. Where the value has already been computed, the value is returned rather than the function executed.