Clojure / Incanter 中的快速矢量数学

发布于 2024-09-24 22:38:41 字数 920 浏览 7 评论 0原文

我目前正在研究 Clojure 和 Incanter 作为 R 的替代品。（并不是说我不喜欢 R，而是尝试新语言很有趣。）我喜欢 Incanter 并且发现语法很有吸引力，但相比之下，矢量化操作相当慢例如 R 或 Python。

作为一个例子，我想获得向量的一阶差分使用 Incanter 向量运算、Clojure 映射和 R 。以下是所有代码和时间版本。正如您所看到的，R 显然更快。

Incanter 和 Clojure：

(use '(incanter core stats)) 
(def x (doall (sample-normal 1e7))) 
(time (def y (doall (minus (rest x) (butlast x))))) 
"Elapsed time: 16481.337 msecs" 
(time (def y (doall (map - (rest x) (butlast x))))) 
"Elapsed time: 16457.850 msecs"

R：

rdiff <- function(x){ 
   n = length(x) 
   x[2:n] - x[1:(n-1)]} 
x = rnorm(1e7) 
system.time(rdiff(x)) 
   user  system elapsed 
  1.504   0.900   2.561

所以我想知道是否有办法加速 Incanter/Clojure 中的矢量运算？此外，还欢迎涉及使用循环、Java 数组和/或 Clojure 库的解决方案。

我也已将这个问题发布到 Incanter Google 群组，但到目前为止尚未得到回复。

更新：我已将 Jouni 的答案标记为已接受，请参阅下面我自己的答案，我已经清理了他的代码并添加了一些基准。

原文

I'm currently looking into Clojure and Incanter as an alternative to R. (Not that I dislike R, but it just interesting to try out new languages.) I like Incanter and find the syntax appealing, but vectorized operations are quite slow as compared e.g. to R or Python.

As an example I wanted to get the first order difference of a vector
using Incanter vector operations, Clojure map and R . Below is the code and timing for all
versions. As you can see R is clearly faster.

Incanter and Clojure:

(use '(incanter core stats)) 
(def x (doall (sample-normal 1e7))) 
(time (def y (doall (minus (rest x) (butlast x))))) 
"Elapsed time: 16481.337 msecs" 
(time (def y (doall (map - (rest x) (butlast x))))) 
"Elapsed time: 16457.850 msecs"

rdiff <- function(x){ 
   n = length(x) 
   x[2:n] - x[1:(n-1)]} 
x = rnorm(1e7) 
system.time(rdiff(x)) 
   user  system elapsed 
  1.504   0.900   2.561

So I was wondering is there a way to speed up the vector operations in Incanter/Clojure? Also solutions involving the use of loops, Java arrays and/or libraries from Clojure are welcome.

I have also posted this question to Incanter Google group with no responses so far.

UPDATE: I have marked Jouni's answer as accepted, see below for my own answer where I have cleaned up his code a bit and added some benchmarks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那伤。 2024-10-01 22:38:41

我的最终解决方案

经过所有测试，我发现两种略有不同的方法可以以足够的速度进行计算。

首先，我使用了具有不同类型返回值的函数 diff ，下面是返回向量的代码，但我还计时了返回双数组的版本（将 (vec y) 替换为 y ）和 Incanter.matrix（用矩阵 y 替换 (vec y)）。该函数仅基于java数组。这是基于 Jouni 的代码，删除了一些额外的类型提示。

另一种方法是使用 Java 数组进行计算并将值存储在瞬态向量中。正如您从计时中看到的，如果您不希望函数返回并数组，则这比方法 1 稍快。这是在函数 difft 中实现的。

因此，选择实际上取决于您不想对数据做什么。我想一个好的选择是重载该函数，以便它返回与调用中使用的相同类型。实际上将 java 数组传递给 diff 而不是向量会使速度加快约 1 秒。

不同函数的时序：

返回向量的差异：

(time (def y (diff x)))
"Elapsed time: 4733.259 msecs"

返回 Incanter.matrix 的差异：

(time (def y (diff x)))
"Elapsed time: 2599.728 msecs"

返回双数组的差异：

(time (def y (diff x)))
"Elapsed time: 1638.548 msecs"

差异：

(time (def y (difft x)))
"Elapsed time: 3683.237 msecs"

功能

(use 'incanter.stats)
(def x (vec (sample-normal 1e7)))

(defn diff [x]
  (let [y (double-array (dec (count x)))
        x (double-array x)] 
   (dotimes [i (dec (count x))]
     (aset y i
       (- (aget x (inc i))
                   (aget x i))))
   (vec y)))


(defn difft [x]
  (let [y (vector (range n))
        y (transient y)
        x (double-array x)]
   (dotimes [i (dec (count x))]
     (assoc! y i
       (- (aget x (inc i))
                   (aget x i))))
   (persistent! y)))

My final solutions

After all the testing I found two slightly different ways to do the calculation with sufficient speed.

First I've used the function diff with different types of return values, below is the code returning a vector, but I have also timed a version returning a double-array (replace (vec y) with y) and Incanter.matrix (replace (vec y) with matrix y). This function is only based on java arrays. This is based on Jouni's code with some extra type hints removed.

Another approach is to do the calculations with Java arrays and store the values in a transient vector. As you see from the timings this is slightly faster than approach 1 if you wan't the function to return and array. This is implemented in function difft.

So the choice really depends on what you wan't to do with the data. I guess a good option would be to overload the function so that it returns the same type that was used in the call. Actually passing a java array to diff instead of a vector makes ~1s faster.

Timings for the different functions:

diff returning vector:

(time (def y (diff x)))
"Elapsed time: 4733.259 msecs"

diff returning Incanter.matrix:

(time (def y (diff x)))
"Elapsed time: 2599.728 msecs"

diff returning double-array:

(time (def y (diff x)))
"Elapsed time: 1638.548 msecs"

difft:

(time (def y (difft x)))
"Elapsed time: 3683.237 msecs"

The functions

(use 'incanter.stats)
(def x (vec (sample-normal 1e7)))

(defn diff [x]
  (let [y (double-array (dec (count x)))
        x (double-array x)] 
   (dotimes [i (dec (count x))]
     (aset y i
       (- (aget x (inc i))
                   (aget x i))))
   (vec y)))


(defn difft [x]
  (let [y (vector (range n))
        y (transient y)
        x (double-array x)]
   (dotimes [i (dec (count x))]
     (assoc! y i
       (- (aget x (inc i))
                   (aget x i))))
   (persistent! y)))

回复收藏 0 原文

故人如初 2024-10-01 22:38:41

这是一个 Java 数组实现，它在我的系统上比您的 R 代码 (YMMV) 更快。请注意，启用反射警告（这在优化性能时至关重要），以及 y 上的重复类型提示（def 上的提示似乎对 aset 没有帮助）并将所有内容转换为原始双值（dotimes 确保i 是一个原始 int）。

(set! *warn-on-reflection* true)
(use 'incanter.stats)
(def ^"[D" x (double-array (sample-normal 1e7)))

(time
 (do
   (def ^"[D" y (double-array (dec (count x))))
   (dotimes [i (dec (count x))]
     (aset ^"[D" y
       i
       (double (- (double (aget x (inc i)))
                  (double (aget x i))))))))

Here's a Java arrays implementation that is on my system faster than your R code (YMMV). Note enabling the reflection warnings, which is essential when optimizing for performance, and the repeated type hint on y (the one on the def didn't seem to help for the aset) and casting everything to primitive double values (the dotimes makes sure that i is a primitive int).

(set! *warn-on-reflection* true)
(use 'incanter.stats)
(def ^"[D" x (double-array (sample-normal 1e7)))

(time
 (do
   (def ^"[D" y (double-array (dec (count x))))
   (dotimes [i (dec (count x))]
     (aset ^"[D" y
       i
       (double (- (double (aget x (inc i)))
                  (double (aget x i))))))))

回复收藏 0 原文

×眷恋的温暖 2024-10-01 22:38:41

Bradford Cross 的博客一堆关于这个的帖子（他在他工作的初创公司中使用了这些东西链接文本。一般来说，在内部循环、类型提示（通过 *warn-on-reflection*）等都有助于提高速度。The Joy of Clojure 有一个关于性能调整的精彩部分，您应该阅读。

回复收藏 0 原文

红尘作伴 2024-10-01 22:38:41

这是一个带有瞬变的解决方案 - 很吸引人，但速度很慢。

(use 'incanter.stats)
(set! *warn-on-reflection* true)
(def x (doall (sample-normal 1e7)))

(time
 (def y
      (loop [xs x
             xs+ (rest x)
             result (transient [])]
        (if (empty? xs+)
          (persistent! result)
          (recur (rest xs) (rest xs+)
                 (conj! result (- (double (first xs+))
                                  (double (first xs)))))))))

Here's a solution with transients - appealing but slow.

(use 'incanter.stats)
(set! *warn-on-reflection* true)
(def x (doall (sample-normal 1e7)))

(time
 (def y
      (loop [xs x
             xs+ (rest x)
             result (transient [])]
        (if (empty? xs+)
          (persistent! result)
          (recur (rest xs) (rest xs+)
                 (conj! result (- (double (first xs+))
                                  (double (first xs)))))))))

回复收藏 0 原文

墟烟 2024-10-01 22:38:41

到目前为止，所有评论都是由似乎没有太多加速 Clojure 代码经验的人提出的。如果您希望 Clojure 代码执行与 Java 相同的功能 - 可以使用相应的工具来实现此目的。然而，对于矢量数学来说，采用 Colt 或 Parallel Colt 等成熟的 Java 库可能更有意义。使用 Java 数组来实现绝对最高性能迭代可能是有意义的。

@Shane 的链接充满了过时的信息，几乎不值得一看。另外，@Shane 的评论说代码比 10 倍慢，这根本不准确（并且不受支持 http://shootout.alioth.debian.org/u32q/compare.php?lang=clojure，这些基准测试并未考虑 1.2.0 或 1.3.0 中可能进行的优化类型-阿尔法1）。只需做一点工作，通常就可以轻松获得 4X-5X 的 Clojure 代码。除此之外，通常需要对 Clojure 的快速路径有更深入的了解 - 由于 Clojure 是一种相当年轻的语言，因此某些东西并未广泛传播。

Clojure 速度非常快。但是学习如何使其快速需要一些工作/研究，因为 Clojure 不鼓励可变操作和可变数据结构。

回复收藏 0 原文

~没有更多了~

关于作者

紫轩蝶泪

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

Clojure / Incanter 中的快速矢量数学

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

我的最终解决方案

不同函数的时序：

功能

My final solutions

Timings for the different functions:

The functions

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

Clojure / Incanter 中的快速矢量数学

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

我的最终解决方案

不同函数的时序：

功能

My final solutions

Timings for the different functions:

The functions

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。