为什么对于 Euler 50 的等效解，Clojure 比 Python 慢 10 倍？

发布于 2024-12-14 08:40:57 字数 3970 浏览 2 评论 0原文

我最近开始学习 Clojure，并决定练习 Euler 问题，以掌握可用的数据结构并练习递归和循环。

我尝试了各种方法来解决问题 50，但无论我做什么，都无法找到 1000000 的解决方案。在我查看了其他人的做法后，我猜想我正在做的事情也不应该永远持续下去，所以我在 Python 中输入了等效的算法，看看问题是否在于我对某些 Clojure 事物或 Java 设置缺乏理解。 Python 10 秒完成。对于 100000 以下的素数，Python 版本在 0.5 秒内完成，Clojure 在 5 秒内完成。

我发布的是专门为匹配 Python 代码而创建的 Clojure 版本。您能帮我理解为什么性能会有如此大的差异吗？我应该使用未经检查的添加、类型提示、原语（但在哪里？）还是什么？

这是 Clojure：

(defn prime? [n]
  (let [r (int (Math/sqrt n))]
    (loop [d 2]
      (cond
        (= n 1) false
        (> d r) true
        (zero? (rem n d)) false
        :other (recur (inc d))))))

(defn primes []
  (filter prime? (iterate inc 2)))


(defn cumulative-sum [s]
  (reduce 
    (fn [v, x] (conj v (+ (last v) x))) 
    [(first s)] 
    (rest s)))


(defn longest-seq-under [n]
  "Longest prime seq with sum under n"
  (let [ps (vec (take-while #(< % n) (primes))) ; prime numbers up to n
        prime-set (set ps)  ; set for testing of inclusion
        cs (cumulative-sum ps)
        cnt (count ps)
        max-len (count (take-while #(< % n) cs)) ; cannot have longer sequences
        sub-sum (fn [i j] ; sum of primes between the i-th and j-th      
                  (- (cs j) (get cs (dec i) 0)))
        seq-with-len (fn [m] ; try m length prime sequences and return the first where the sum is prime
                       (loop [i 0] ; try with the lowest sum
                         (if (> i (- cnt m)) ; there are no more elements for and m length sequence
                           nil ; could not find any
                           (let [j (+ i (dec m)) ; fix length
                                 s (sub-sum i j)]
                             (if (>= s n) ; overshoot
                               nil
                               (if (prime-set s) ; sum is prime
                                 [i (inc j)] ; we just looked for the first
                                 (recur (inc i))))))))] ; shift window
        (loop [m max-len] ; try with the longest sequence
          (if (not (zero? m))
            (let [[i j] (seq-with-len m) ]
              (if j 
                (subvec ps i j)
                (recur (dec m))))))))                    



(assert (= [2 3 5 7 11 13] (longest-seq-under 100)))

(let [s1000  (longest-seq-under 1000)]
  (assert (= 21 (count s1000)))
  (assert (= 953 (reduce + s1000))))

; (time (reduce + (longest-seq-under 100000))) ; "Elapsed time: 5707.784369 msecs"

Python 也是如此：

from math import sqrt
from itertools import takewhile

def is_prime(n) :
    for i in xrange(2, int(sqrt(n))+1) :
        if n % i == 0 :
            return False
    return True

def next_prime(n):
    while not is_prime(n) :
        n += 1
    return n

def primes() :
    i = 1
    while True :
        i = next_prime(i+1)
        yield i

def cumulative_sum(s):
    cs = []
    css = 0
    for si in s :
        css += si
        cs.append( css )
    return cs


def longest_seq_under(n) :
    ps = list(takewhile( lambda p : p < n, primes()))
    pss = set(ps)
    cs = cumulative_sum(ps)
    cnt = len(ps)
    max_len = len(list(takewhile(lambda s : s < n, cs)))

    def subsum(i, j):
        return cs[j] - (cs[i-1] if i > 0 else 0)

    def interval_with_length(m) :
        for i in xrange(0, cnt-m+1) :
            j = i + m - 1            
            sij = subsum(i,j)
            if sij >= n :
                return None, None
            if sij in pss : # prime
                return i, j+1
        return None, None

    for m in xrange(max_len, 0, -1) :
        f, t = interval_with_length(m)
        if t :
            return ps[f:t]


assert longest_seq_under(100) == [2, 3, 5, 7, 11, 13]
assert sum(longest_seq_under(1000)) == 953

# import timeit
# timeit.Timer("sum(longest_seq_under(100000))", "from __main__ import longest_seq_under").timeit(1) # 0.51235757617223499

谢谢

原文

I recently started to learn Clojure and decided to practice on Euler problems to get a hang of the available data structures and practice recursion and looping.

I tried various approaches to Problem 50, but no matter what I did, finding the solution for 1000000 never finished. After I looked up what others did, I guessed what I was doing should not take forever either, so I typed in the equivalent algorithm in Python to see if the problem lies in my lack of understanding of some Clojure thing, or Java setting. Python finished in 10 seconds. For primes under 100000, the Python version finished in 0.5 sec, Clojure in 5.

I'm posting the Clojure version which was created specifically to match the Python code. Can you help me understand why there is such a difference in performance? Should I use unchecked-add, type hints, primitives (but where?) or what?

So here's Clojure:

(defn prime? [n]
  (let [r (int (Math/sqrt n))]
    (loop [d 2]
      (cond
        (= n 1) false
        (> d r) true
        (zero? (rem n d)) false
        :other (recur (inc d))))))

(defn primes []
  (filter prime? (iterate inc 2)))


(defn cumulative-sum [s]
  (reduce 
    (fn [v, x] (conj v (+ (last v) x))) 
    [(first s)] 
    (rest s)))


(defn longest-seq-under [n]
  "Longest prime seq with sum under n"
  (let [ps (vec (take-while #(< % n) (primes))) ; prime numbers up to n
        prime-set (set ps)  ; set for testing of inclusion
        cs (cumulative-sum ps)
        cnt (count ps)
        max-len (count (take-while #(< % n) cs)) ; cannot have longer sequences
        sub-sum (fn [i j] ; sum of primes between the i-th and j-th      
                  (- (cs j) (get cs (dec i) 0)))
        seq-with-len (fn [m] ; try m length prime sequences and return the first where the sum is prime
                       (loop [i 0] ; try with the lowest sum
                         (if (> i (- cnt m)) ; there are no more elements for and m length sequence
                           nil ; could not find any
                           (let [j (+ i (dec m)) ; fix length
                                 s (sub-sum i j)]
                             (if (>= s n) ; overshoot
                               nil
                               (if (prime-set s) ; sum is prime
                                 [i (inc j)] ; we just looked for the first
                                 (recur (inc i))))))))] ; shift window
        (loop [m max-len] ; try with the longest sequence
          (if (not (zero? m))
            (let [[i j] (seq-with-len m) ]
              (if j 
                (subvec ps i j)
                (recur (dec m))))))))                    



(assert (= [2 3 5 7 11 13] (longest-seq-under 100)))

(let [s1000  (longest-seq-under 1000)]
  (assert (= 21 (count s1000)))
  (assert (= 953 (reduce + s1000))))

; (time (reduce + (longest-seq-under 100000))) ; "Elapsed time: 5707.784369 msecs"

And here's the same in Python:

from math import sqrt
from itertools import takewhile

def is_prime(n) :
    for i in xrange(2, int(sqrt(n))+1) :
        if n % i == 0 :
            return False
    return True

def next_prime(n):
    while not is_prime(n) :
        n += 1
    return n

def primes() :
    i = 1
    while True :
        i = next_prime(i+1)
        yield i

def cumulative_sum(s):
    cs = []
    css = 0
    for si in s :
        css += si
        cs.append( css )
    return cs


def longest_seq_under(n) :
    ps = list(takewhile( lambda p : p < n, primes()))
    pss = set(ps)
    cs = cumulative_sum(ps)
    cnt = len(ps)
    max_len = len(list(takewhile(lambda s : s < n, cs)))

    def subsum(i, j):
        return cs[j] - (cs[i-1] if i > 0 else 0)

    def interval_with_length(m) :
        for i in xrange(0, cnt-m+1) :
            j = i + m - 1            
            sij = subsum(i,j)
            if sij >= n :
                return None, None
            if sij in pss : # prime
                return i, j+1
        return None, None

    for m in xrange(max_len, 0, -1) :
        f, t = interval_with_length(m)
        if t :
            return ps[f:t]


assert longest_seq_under(100) == [2, 3, 5, 7, 11, 13]
assert sum(longest_seq_under(1000)) == 953

# import timeit
# timeit.Timer("sum(longest_seq_under(100000))", "from __main__ import longest_seq_under").timeit(1) # 0.51235757617223499

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

失与倦＂ 2024-12-21 08:40:57

我认为速度减慢来自于迭代 longest-seq-under 中的序列的次数；每一次迭代都会造成损失。这是一个快速版本，基于您的代码和此处发布的答案的组合。请注意，primes 是惰性的，因此我们可以使用 def 与 defn 绑定它：

(defn prime? [n]
  (let [r (int (Math/sqrt n))]
    (loop [d 2]
      (cond (= n 1) false
            (> d r) true
            (zero? (rem n d)) false
            :else (recur (inc d))))))

(def primes (filter prime? (iterate inc 2)))

(defn make-seq-accumulator
  [[x & xs]]
  (map first (iterate
              (fn [[sum [s & more]]]
                [(+ sum s) more])
              [x xs])))

(def prime-sums
  (conj (make-seq-accumulator primes) 0))

(defn euler-50 [goal]
  (loop [c 1]
    (let [bots (reverse (take c prime-sums))
          tops (take c (reverse (take-while #(> goal (- % (last bots)))
                                            (rest prime-sums))))]
      (or (some #(when (prime? %) %)
                (map - tops bots))
          (recur (inc c))))))

在我的机器上，这大约需要 6 毫秒：

user> (time (euler-50 1000000))
"Elapsed time: 6.29 msecs"
997651

I think the slowdown comes from the number of times you iterate through the sequences in longest-seq-under; each of those iterations takes its toll. Here's a smoking fast version, based on a combination of your code and the answer posted here. Note that primes is lazy, so we can bind it with def vs defn:

(defn prime? [n]
  (let [r (int (Math/sqrt n))]
    (loop [d 2]
      (cond (= n 1) false
            (> d r) true
            (zero? (rem n d)) false
            :else (recur (inc d))))))

(def primes (filter prime? (iterate inc 2)))

(defn make-seq-accumulator
  [[x & xs]]
  (map first (iterate
              (fn [[sum [s & more]]]
                [(+ sum s) more])
              [x xs])))

(def prime-sums
  (conj (make-seq-accumulator primes) 0))

(defn euler-50 [goal]
  (loop [c 1]
    (let [bots (reverse (take c prime-sums))
          tops (take c (reverse (take-while #(> goal (- % (last bots)))
                                            (rest prime-sums))))]
      (or (some #(when (prime? %) %)
                (map - tops bots))
          (recur (inc c))))))

This finished in about 6 ms on my machine:

user> (time (euler-50 1000000))
"Elapsed time: 6.29 msecs"
997651

回复收藏 0 原文

羁绊已千年 2024-12-21 08:40:57

我将接受我自己的评论作为为什么 Python 有效而 Clojure 无效的问题的答案：使用向量的 last 是一种线性运算，它会阻止累积总和按照我的预期方式计算。

更新函数以使用如下所示的瞬态向量：

(defn cumulative-sum-2 [s]
  (loop [[x & xs] s
         ss 0
         acc (transient [])]
    (if x      
      (let [ssx (+ ss x)]
        (recur xs ssx (conj! acc ssx)))
      (persistent! acc))))

导致 Clojure 版本的运行时间始终只有 Python 的两倍。我有点希望 Clojure 对于相同的操作能比 Python 更快，不知道我是否仍然错过了一些东西。顺便说一句，我正在使用1.2。

谢谢

I will accept my own comment as answer for the question why Python worked and Clojure did not: using last of a vector is a linear operation that prevented the cumulative sum from being computed the way I intended it to be.

Updating the function to use a transient vector like this:

(defn cumulative-sum-2 [s]
  (loop [[x & xs] s
         ss 0
         acc (transient [])]
    (if x      
      (let [ssx (+ ss x)]
        (recur xs ssx (conj! acc ssx)))
      (persistent! acc))))

results in the Clojure version to run only twice as long as Python, consistently. I kind of hoped Clojure would be faster then Python for the same operations, wonder if I still miss something. I'm using 1.2 by the way.

Thanks

回复收藏 0 原文

~没有更多了~