Clojure 的 pmap 函数为 URL 获取操作生成多少个线程?

发布于 2024-10-17 15:18:36 字数 84 浏览 2 评论 0原文

关于 pmap 函数的文档让我想知道它对于通过网络获取 XML 提要集合之类的事情会有多高效。我不知道 pmap 会产生多少并发获取操作以及最大值是多少。

The documentation on the pmap function leaves me wondering how efficient it would be for something like fetching a collection of XML feeds over the web. I have no idea how many concurrent fetch operations pmap would spawn and what the maximum would be.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

潇烟暮雨 2024-10-24 15:18:36

如果您检查源代码,您会看到:

> (use 'clojure.repl)
> (source pmap)
(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

(+ 2 (.. Runtime getRuntime availableProcessors)) 是一个重要线索。 pmap 将获取第一个(+ 2 个处理器) 工作,并通过 future 异步运行它们。因此,如果你有 2 个核心,它将一次启动 4 个工作,试图保持领先,但最大值应该是 2+n。

future 最终使用代理 I/O 线程池,它支持无限数量的线程。它会随着工作量的增加而增长,如果线程未使用,它会收缩。

If you check the source you see:

> (use 'clojure.repl)
> (source pmap)
(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

The (+ 2 (.. Runtime getRuntime availableProcessors)) is a big clue there. pmap will grab the first (+ 2 processors) pieces of work and run them asynchronously via future. So if you have 2 cores, it's going to launch 4 pieces of work at a time, trying to keep a bit ahead of you but the max should be 2+n.

future ultimately uses the agent I/O thread pool which supports an unbounded number of threads. It will grow as work is thrown at it and shrink if threads are unused.

路弥 2024-10-24 15:18:36

以 Alex 解释 pmap 如何工作的出色答案为基础,以下是我对您的情况的建议:

(doall
  (map
    #(future (my-web-fetch-function %))
    list-of-xml-feeds-to-fetch))

理由:

  • 您希望尽可能多地进行中工作,因为大多数工作都会阻塞网络 IO。
  • Future 将为每个请求触发一个异步工作,并在线程池中处理。您可以让 Clojure 智能地处理这个问题。
  • 地图上的 doall 将强制评估整个序列(即启动所有请求)。
  • 您的主线程可以立即开始取消引用 future,因此可以在单个结果返回时继续取得进展

Building on Alex's excellent answer that explains how pmap works, here's my suggestion for your situation:

(doall
  (map
    #(future (my-web-fetch-function %))
    list-of-xml-feeds-to-fetch))

Rationale:

  • You want as many pieces of work in-flight as you can, since most will block on network IO.
  • Future will fire off an asynchronous piece of work for each request, to be handled in a thread pool. You can let Clojure take care of that intelligently.
  • The doall on the map will force the evaluation of the full sequence (i.e. the launch of all the requests).
  • Your main thread can start dereferencing the futures right away, and can therefore continue making progress as the individual results come back
不美如何 2024-10-24 15:18:36

没有时间写很长的响应,但是有一个 clojure.contrib http-agent,它将每个 get/post 请求创建为自己的代理。因此,您可以发出一千个请求,它们都会并行运行并在结果出现时完成。

No time to write a long response, but there's a clojure.contrib http-agent which creates each get/post request as its own agent. So you can fire off a thousand requests and they'll all run in parallel and complete as the results come in.

风蛊 2024-10-24 15:18:36

看看pmap的操作,无论你有多少个处理器,它似乎一次都会运行32个线程,问题是map将领先计算32个,并且futures会自己启动。 (样本)
<代码>(defn 样本 [n]
(println“开始”n)
(线程/睡眠 10000)
n)
(def 结果 (pmap Samplef (范围 0 100)))

;您将等待 10 秒并看到 32 张照片,然后当您拍摄第 33 张照片时,再拍摄另外 32 张照片
;打印出您一次执行 32 个并发线程的分钟数
;对我来说这并不完美
;萨卢多斯·费利佩

Looking the operation of pmap, it seems to go 32 threads at a time no mater what number of processors you have, the issue is that map will go ahead of the computation by 32 and the futures are started in their own. (SAMPLE)
(defn samplef [n]
(println "starting " n)
(Thread/sleep 10000)
n)
(def result (pmap samplef (range 0 100)))

; you will wait for 10 seconds and see 32 prints then when you take the 33rd an other 32
; prints this mins that you are doing 32 concurrent threads at a time
; to me this is not perfect
; SALUDOS Felipe

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文