请帮助我理解这些 Clojure Hprof 痕迹

发布于 2024-12-02 08:08:46 字数 4084 浏览 4 评论 0原文

我有一些 Clojure 代码正在模拟然后处理数值数据。数据基本上是双值向量;处理主要涉及以各种方式对它们的值进行求和。我将在下面包含一些代码,但我的问题(我认为)更笼统 - 我只是不知道如何解释 hprof 结果。

无论如何,我的测试代码是:

(defn spin [n]
  (let [c 6000
        signals (spin-signals c)]
      (doseq [_ (range n)] (time (spin-voxels c signals)))))

(defn -main []
  (spin 4))

其中spin-voxels应该比spin-signals更昂贵(特别是重复多次时)。我可以给出较低级别的例程,但我认为这个问题更多的是关于我不理解跟踪的基础知识(如下)。

当我用 lein 编译它,然后进行一些简单的分析时:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.vec com.isti.compset.stack
"Elapsed time: 14118.772924 msecs"
"Elapsed time: 10082.015672 msecs"
"Elapsed time: 9212.522973 msecs"
"Elapsed time: 12968.23877 msecs"
Dumping CPU usage by sampling running threads ... done.

分析跟踪看起来像:

CPU SAMPLES BEGIN (total = 4300) Sun Aug 28 15:51:40 2011
rank   self  accum   count trace method
   1  5.33%  5.33%     229 300791 clojure.core$seq.invoke
   2  5.21% 10.53%     224 300786 clojure.core$seq.invoke
   3  5.05% 15.58%     217 300750 clojure.core$seq.invoke
   4  4.93% 20.51%     212 300787 clojure.lang.Numbers.add
   5  4.74% 25.26%     204 300799 clojure.core$seq.invoke
   6  2.60% 27.86%     112 300783 clojure.lang.RT.more
   7  2.51% 30.37%     108 300803 clojure.lang.Numbers.multiply
   8  2.42% 32.79%     104 300788 clojure.lang.RT.first
   9  2.37% 35.16%     102 300831 clojure.lang.RT.more
  10  2.37% 37.53%     102 300840 clojure.lang.Numbers.add

这非常酷。到这里为止,我很高兴。我可以看到我在数值的通用处理上浪费了时间。

因此,我查看了我的代码并决定,作为第一步,我将用 d-vec 替换 vec

(defn d-vec [collection]
  (apply conj (vector-of :double) collection))

我不确定这是否足够 - 我怀疑我还需要在不同的地方添加一些类型注释 - 但这似乎是一个好的开始。所以我再次编译和分析:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.d-vec com.isti.compset.stack
"Elapsed time: 15944.278043 msecs"
"Elapsed time: 15608.099677 msecs"
"Elapsed time: 16561.659408 msecs"
"Elapsed time: 15416.414548 msecs"
Dumping CPU usage by sampling running threads ... done.

Ewww。所以它的速度要慢得多。还有个人资料?

CPU SAMPLES BEGIN (total = 6425) Sun Aug 28 15:55:12 2011
rank   self  accum   count trace method
   1 26.16% 26.16%    1681 300615 clojure.core.Vec.count
   2 23.28% 49.45%    1496 300607 clojure.core.Vec.count
   3  7.74% 57.18%     497 300608 clojure.lang.RT.seqFrom
   4  5.59% 62.77%     359 300662 clojure.core.Vec.count
   5  3.72% 66.49%     239 300604 clojure.lang.RT.first
   6  3.25% 69.74%     209 300639 clojure.core.Vec.count
   7  1.91% 71.66%     123 300635 clojure.core.Vec.count
   8  1.03% 72.68%      66 300663 clojure.core.Vec.count
   9  1.00% 73.68%      64 300644 clojure.lang.RT.more
  10  0.79% 74.47%      51 300666 clojure.lang.RT.first
  11  0.75% 75.22%      48 300352 clojure.lang.Numbers.double_array
  12  0.75% 75.97%      48 300638 clojure.lang.RT.more
  13  0.64% 76.61%      41 300621 clojure.core.Vec.count
  14  0.62% 77.23%      40 300631 clojure.core.Vec.cons
  15  0.61% 77.84%      39 300025 java.lang.ClassLoader.defineClass1
  16  0.59% 78.43%      38 300670 clojure.core.Vec.cons
  17  0.58% 79.00%      37 300681 clojure.core.Vec.cons
  18  0.54% 79.55%      35 300633 clojure.lang.Numbers.multiply
  19  0.48% 80.03%      31 300671 clojure.lang.RT.seqFrom
  20  0.47% 80.50%      30 300609 clojure.lang.Numbers.add

我在这里添加了更多行,因为这是我不理解的部分。

为什么地球Vec.count出现得如此频繁?它是一个返回向量大小的方法。属性的单行查找。

我认为我速度较慢,因为我仍在 Double 和 double 之间来回跳跃,并且当我添加更多类型注释时,情况可能会再次改善。但我不明白我现在拥有什么,所以我不太确定笨拙的前锋有多大意义。

请问,有人可以一般性地解释一下上面的转储吗?我保证我不会重复调用 count - 相反,我有很多映射和归约以及一些显式循环。

我想知道我是否对 JIT 感到困惑?也许我因为函数被内联而丢失了一堆信息?哦,我正在使用 1.3.0-beta1,因为它似乎有更合理的数字处理。

[更新] 我在 http://www.acooke 总结了我的经验。 org/cute/Optimising1.html - 尽管从未理解这一点,但速度却提高了 5 倍(在清理更多内容并迁移到 1.3 后实际上是 10 倍)。

I have some Clojure code that is simulating and then processing numerical data. The data are basically vectors of double values; the processing mainly involves summing their values in various ways. I will include some code below, but my question is (I think) more general - I just don't have a clue how to interpret the hprof results.

Anyway, my test code is:

(defn spin [n]
  (let [c 6000
        signals (spin-signals c)]
      (doseq [_ (range n)] (time (spin-voxels c signals)))))

(defn -main []
  (spin 4))

where spin-voxels should be more expensive than spin-signals (especially when repeated multiple times). I can give the lower level routines, but I think this question is more about me not understanding basics of the traces (below).

When I compile this with lein and then do some simple profiling:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.vec com.isti.compset.stack
"Elapsed time: 14118.772924 msecs"
"Elapsed time: 10082.015672 msecs"
"Elapsed time: 9212.522973 msecs"
"Elapsed time: 12968.23877 msecs"
Dumping CPU usage by sampling running threads ... done.

and the profile trace looks like:

CPU SAMPLES BEGIN (total = 4300) Sun Aug 28 15:51:40 2011
rank   self  accum   count trace method
   1  5.33%  5.33%     229 300791 clojure.core$seq.invoke
   2  5.21% 10.53%     224 300786 clojure.core$seq.invoke
   3  5.05% 15.58%     217 300750 clojure.core$seq.invoke
   4  4.93% 20.51%     212 300787 clojure.lang.Numbers.add
   5  4.74% 25.26%     204 300799 clojure.core$seq.invoke
   6  2.60% 27.86%     112 300783 clojure.lang.RT.more
   7  2.51% 30.37%     108 300803 clojure.lang.Numbers.multiply
   8  2.42% 32.79%     104 300788 clojure.lang.RT.first
   9  2.37% 35.16%     102 300831 clojure.lang.RT.more
  10  2.37% 37.53%     102 300840 clojure.lang.Numbers.add

which is pretty cool. Up to here, I am happy. I can see that I am wasting time with generic handling of numerical values.

So I look at my code and decide that, as a first step, I will replace vec with d-vec:

(defn d-vec [collection]
  (apply conj (vector-of :double) collection))

I am not sure that will be sufficient - I suspect I will also need to add some type annotations in various places - but it seems like a good start. So I compile and profile again:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.d-vec com.isti.compset.stack
"Elapsed time: 15944.278043 msecs"
"Elapsed time: 15608.099677 msecs"
"Elapsed time: 16561.659408 msecs"
"Elapsed time: 15416.414548 msecs"
Dumping CPU usage by sampling running threads ... done.

Ewww. So it is significantly slower. And the profile?

CPU SAMPLES BEGIN (total = 6425) Sun Aug 28 15:55:12 2011
rank   self  accum   count trace method
   1 26.16% 26.16%    1681 300615 clojure.core.Vec.count
   2 23.28% 49.45%    1496 300607 clojure.core.Vec.count
   3  7.74% 57.18%     497 300608 clojure.lang.RT.seqFrom
   4  5.59% 62.77%     359 300662 clojure.core.Vec.count
   5  3.72% 66.49%     239 300604 clojure.lang.RT.first
   6  3.25% 69.74%     209 300639 clojure.core.Vec.count
   7  1.91% 71.66%     123 300635 clojure.core.Vec.count
   8  1.03% 72.68%      66 300663 clojure.core.Vec.count
   9  1.00% 73.68%      64 300644 clojure.lang.RT.more
  10  0.79% 74.47%      51 300666 clojure.lang.RT.first
  11  0.75% 75.22%      48 300352 clojure.lang.Numbers.double_array
  12  0.75% 75.97%      48 300638 clojure.lang.RT.more
  13  0.64% 76.61%      41 300621 clojure.core.Vec.count
  14  0.62% 77.23%      40 300631 clojure.core.Vec.cons
  15  0.61% 77.84%      39 300025 java.lang.ClassLoader.defineClass1
  16  0.59% 78.43%      38 300670 clojure.core.Vec.cons
  17  0.58% 79.00%      37 300681 clojure.core.Vec.cons
  18  0.54% 79.55%      35 300633 clojure.lang.Numbers.multiply
  19  0.48% 80.03%      31 300671 clojure.lang.RT.seqFrom
  20  0.47% 80.50%      30 300609 clojure.lang.Numbers.add

I have included more rows here because this is the part I don't understand.

Why on earth is Vec.count appearing so often? It's a method that returns the size of the vector. A single line lookup of an attribute.

I assume I am slower because I am still jumping back and forth between Double and double, and that things may improve again when I add more type annotations. But I don't understand what I have now, so I am not so sure blundering fowards makes much sense.

Please, can anyone explain the dump above in general terms? I promise I don't repeatedly call count - instead I have lots of maps and reduces and a few explicit loops.

I wondered if I am perhaps being confused by the JIT? Maybe I am missing a pile of information because functions are being inlined? Oh, and I am using 1.3.0-beta1 because it appears to have more sensible number handling.

[UPDATE] I summarised my experiences at http://www.acooke.org/cute/Optimising1.html - got a 5x speedup (actually was 10x after cleaning some more and moving to 1.3) despite never understanding this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小鸟爱天空丶 2024-12-09 08:08:46

对 Vec 对象(由 vector-of 创建的对象)调用 seq 会创建一个 VecSeq 对象。

在 Veca 上创建的 VecSeq 对象在其内部减少方法中调用 Vec.count,该方法由 clojure.core/reduce 使用。

所以看起来由 vector-of 创建的向量在减少时调用 Vec.count 。正如您提到的,代码做了很多减少这似乎是原因

仍然令人毛骨悚然的是 Vec.count 是 Vec.count 似乎非常简单:

clojure.lang.Counted
  (count [_] cnt)

一个不执行任何计数的简单 getter。

calling seq for a Vec object (object created by vector-of) creates a VecSeq Object.

VecSeq object created on Veca calls Vec.count in it's method internal-reduce, which is used by clojure.core/reduce.

so it seems that a vector created by vector-of calls Vec.count while reducing. And as you mentioned that the code did a lot of reducing this seems to be the cause

What remains spooky is that Vec.count is that Vec.count seems to be very simple:

clojure.lang.Counted
  (count [_] cnt)

a simple getter that doesn't do any counting.

瀞厅☆埖开 2024-12-09 08:08:46

只是大声地说出来,看起来您的代码正在与 Seq 进行大量来回转换。

查看 RT.seqFrom ,这调用 ArraySeq.createFromObject

if(array == null || Array.getLength(array) == 0)
   return null;

那么,使用 vec 会使用快速向量访问,而使用 d-vec 会强制使用数组并调用慢速 java.lang.Array.getLength方法(使用反射..)

Just talking out loud, it looks like your code is doing a lot of back and forth conversion to/from Seq.

Looking at RT.seqFrom, this calls ArraySeq.createFromObject

if(array == null || Array.getLength(array) == 0)
   return null;

So would that be that using vec uses fast vector access, and that using d-vec is forcing the use of arrays and the call to a slow java.lang.Array.getLength method (that uses reflection .. )

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文