请帮助我理解这些 Clojure Hprof 痕迹
我有一些 Clojure 代码正在模拟然后处理数值数据。数据基本上是双值向量;处理主要涉及以各种方式对它们的值进行求和。我将在下面包含一些代码,但我的问题(我认为)更笼统 - 我只是不知道如何解释 hprof 结果。
无论如何,我的测试代码是:
(defn spin [n]
(let [c 6000
signals (spin-signals c)]
(doseq [_ (range n)] (time (spin-voxels c signals)))))
(defn -main []
(spin 4))
其中spin-voxels
应该比spin-signals
更昂贵(特别是重复多次时)。我可以给出较低级别的例程,但我认为这个问题更多的是关于我不理解跟踪的基础知识(如下)。
当我用 lein 编译它,然后进行一些简单的分析时:
> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.vec com.isti.compset.stack
"Elapsed time: 14118.772924 msecs"
"Elapsed time: 10082.015672 msecs"
"Elapsed time: 9212.522973 msecs"
"Elapsed time: 12968.23877 msecs"
Dumping CPU usage by sampling running threads ... done.
分析跟踪看起来像:
CPU SAMPLES BEGIN (total = 4300) Sun Aug 28 15:51:40 2011
rank self accum count trace method
1 5.33% 5.33% 229 300791 clojure.core$seq.invoke
2 5.21% 10.53% 224 300786 clojure.core$seq.invoke
3 5.05% 15.58% 217 300750 clojure.core$seq.invoke
4 4.93% 20.51% 212 300787 clojure.lang.Numbers.add
5 4.74% 25.26% 204 300799 clojure.core$seq.invoke
6 2.60% 27.86% 112 300783 clojure.lang.RT.more
7 2.51% 30.37% 108 300803 clojure.lang.Numbers.multiply
8 2.42% 32.79% 104 300788 clojure.lang.RT.first
9 2.37% 35.16% 102 300831 clojure.lang.RT.more
10 2.37% 37.53% 102 300840 clojure.lang.Numbers.add
这非常酷。到这里为止,我很高兴。我可以看到我在数值的通用处理上浪费了时间。
因此,我查看了我的代码并决定,作为第一步,我将用 d-vec
替换 vec
:
(defn d-vec [collection]
(apply conj (vector-of :double) collection))
我不确定这是否足够 - 我怀疑我还需要在不同的地方添加一些类型注释 - 但这似乎是一个好的开始。所以我再次编译和分析:
> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.d-vec com.isti.compset.stack
"Elapsed time: 15944.278043 msecs"
"Elapsed time: 15608.099677 msecs"
"Elapsed time: 16561.659408 msecs"
"Elapsed time: 15416.414548 msecs"
Dumping CPU usage by sampling running threads ... done.
Ewww。所以它的速度要慢得多。还有个人资料?
CPU SAMPLES BEGIN (total = 6425) Sun Aug 28 15:55:12 2011
rank self accum count trace method
1 26.16% 26.16% 1681 300615 clojure.core.Vec.count
2 23.28% 49.45% 1496 300607 clojure.core.Vec.count
3 7.74% 57.18% 497 300608 clojure.lang.RT.seqFrom
4 5.59% 62.77% 359 300662 clojure.core.Vec.count
5 3.72% 66.49% 239 300604 clojure.lang.RT.first
6 3.25% 69.74% 209 300639 clojure.core.Vec.count
7 1.91% 71.66% 123 300635 clojure.core.Vec.count
8 1.03% 72.68% 66 300663 clojure.core.Vec.count
9 1.00% 73.68% 64 300644 clojure.lang.RT.more
10 0.79% 74.47% 51 300666 clojure.lang.RT.first
11 0.75% 75.22% 48 300352 clojure.lang.Numbers.double_array
12 0.75% 75.97% 48 300638 clojure.lang.RT.more
13 0.64% 76.61% 41 300621 clojure.core.Vec.count
14 0.62% 77.23% 40 300631 clojure.core.Vec.cons
15 0.61% 77.84% 39 300025 java.lang.ClassLoader.defineClass1
16 0.59% 78.43% 38 300670 clojure.core.Vec.cons
17 0.58% 79.00% 37 300681 clojure.core.Vec.cons
18 0.54% 79.55% 35 300633 clojure.lang.Numbers.multiply
19 0.48% 80.03% 31 300671 clojure.lang.RT.seqFrom
20 0.47% 80.50% 30 300609 clojure.lang.Numbers.add
我在这里添加了更多行,因为这是我不理解的部分。
为什么地球上Vec.count
出现得如此频繁?它是一个返回向量大小的方法。属性的单行查找。
我认为我速度较慢,因为我仍在 Double 和 double 之间来回跳跃,并且当我添加更多类型注释时,情况可能会再次改善。但我不明白我现在拥有什么,所以我不太确定笨拙的前锋有多大意义。
请问,有人可以一般性地解释一下上面的转储吗?我保证我不会重复调用 count
- 相反,我有很多映射和归约以及一些显式循环。
我想知道我是否对 JIT 感到困惑?也许我因为函数被内联而丢失了一堆信息?哦,我正在使用 1.3.0-beta1,因为它似乎有更合理的数字处理。
[更新] 我在 http://www.acooke 总结了我的经验。 org/cute/Optimising1.html - 尽管从未理解这一点,但速度却提高了 5 倍(在清理更多内容并迁移到 1.3 后实际上是 10 倍)。
I have some Clojure code that is simulating and then processing numerical data. The data are basically vectors of double values; the processing mainly involves summing their values in various ways. I will include some code below, but my question is (I think) more general - I just don't have a clue how to interpret the hprof results.
Anyway, my test code is:
(defn spin [n]
(let [c 6000
signals (spin-signals c)]
(doseq [_ (range n)] (time (spin-voxels c signals)))))
(defn -main []
(spin 4))
where spin-voxels
should be more expensive than spin-signals
(especially when repeated multiple times). I can give the lower level routines, but I think this question is more about me not understanding basics of the traces (below).
When I compile this with lein and then do some simple profiling:
> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.vec com.isti.compset.stack
"Elapsed time: 14118.772924 msecs"
"Elapsed time: 10082.015672 msecs"
"Elapsed time: 9212.522973 msecs"
"Elapsed time: 12968.23877 msecs"
Dumping CPU usage by sampling running threads ... done.
and the profile trace looks like:
CPU SAMPLES BEGIN (total = 4300) Sun Aug 28 15:51:40 2011
rank self accum count trace method
1 5.33% 5.33% 229 300791 clojure.core$seq.invoke
2 5.21% 10.53% 224 300786 clojure.core$seq.invoke
3 5.05% 15.58% 217 300750 clojure.core$seq.invoke
4 4.93% 20.51% 212 300787 clojure.lang.Numbers.add
5 4.74% 25.26% 204 300799 clojure.core$seq.invoke
6 2.60% 27.86% 112 300783 clojure.lang.RT.more
7 2.51% 30.37% 108 300803 clojure.lang.Numbers.multiply
8 2.42% 32.79% 104 300788 clojure.lang.RT.first
9 2.37% 35.16% 102 300831 clojure.lang.RT.more
10 2.37% 37.53% 102 300840 clojure.lang.Numbers.add
which is pretty cool. Up to here, I am happy. I can see that I am wasting time with generic handling of numerical values.
So I look at my code and decide that, as a first step, I will replace vec
with d-vec
:
(defn d-vec [collection]
(apply conj (vector-of :double) collection))
I am not sure that will be sufficient - I suspect I will also need to add some type annotations in various places - but it seems like a good start. So I compile and profile again:
> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.d-vec com.isti.compset.stack
"Elapsed time: 15944.278043 msecs"
"Elapsed time: 15608.099677 msecs"
"Elapsed time: 16561.659408 msecs"
"Elapsed time: 15416.414548 msecs"
Dumping CPU usage by sampling running threads ... done.
Ewww. So it is significantly slower. And the profile?
CPU SAMPLES BEGIN (total = 6425) Sun Aug 28 15:55:12 2011
rank self accum count trace method
1 26.16% 26.16% 1681 300615 clojure.core.Vec.count
2 23.28% 49.45% 1496 300607 clojure.core.Vec.count
3 7.74% 57.18% 497 300608 clojure.lang.RT.seqFrom
4 5.59% 62.77% 359 300662 clojure.core.Vec.count
5 3.72% 66.49% 239 300604 clojure.lang.RT.first
6 3.25% 69.74% 209 300639 clojure.core.Vec.count
7 1.91% 71.66% 123 300635 clojure.core.Vec.count
8 1.03% 72.68% 66 300663 clojure.core.Vec.count
9 1.00% 73.68% 64 300644 clojure.lang.RT.more
10 0.79% 74.47% 51 300666 clojure.lang.RT.first
11 0.75% 75.22% 48 300352 clojure.lang.Numbers.double_array
12 0.75% 75.97% 48 300638 clojure.lang.RT.more
13 0.64% 76.61% 41 300621 clojure.core.Vec.count
14 0.62% 77.23% 40 300631 clojure.core.Vec.cons
15 0.61% 77.84% 39 300025 java.lang.ClassLoader.defineClass1
16 0.59% 78.43% 38 300670 clojure.core.Vec.cons
17 0.58% 79.00% 37 300681 clojure.core.Vec.cons
18 0.54% 79.55% 35 300633 clojure.lang.Numbers.multiply
19 0.48% 80.03% 31 300671 clojure.lang.RT.seqFrom
20 0.47% 80.50% 30 300609 clojure.lang.Numbers.add
I have included more rows here because this is the part I don't understand.
Why on earth is Vec.count
appearing so often? It's a method that returns the size of the vector. A single line lookup of an attribute.
I assume I am slower because I am still jumping back and forth between Double and double, and that things may improve again when I add more type annotations. But I don't understand what I have now, so I am not so sure blundering fowards makes much sense.
Please, can anyone explain the dump above in general terms? I promise I don't repeatedly call count
- instead I have lots of maps and reduces and a few explicit loops.
I wondered if I am perhaps being confused by the JIT? Maybe I am missing a pile of information because functions are being inlined? Oh, and I am using 1.3.0-beta1 because it appears to have more sensible number handling.
[UPDATE] I summarised my experiences at http://www.acooke.org/cute/Optimising1.html - got a 5x speedup (actually was 10x after cleaning some more and moving to 1.3) despite never understanding this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对 Vec 对象(由 vector-of 创建的对象)调用 seq 会创建一个 VecSeq 对象。
在 Veca 上创建的 VecSeq 对象在其内部减少方法中调用 Vec.count,该方法由 clojure.core/reduce 使用。
所以看起来由 vector-of 创建的向量在减少时调用 Vec.count 。正如您提到的,代码做了很多减少这似乎是原因
仍然令人毛骨悚然的是 Vec.count 是 Vec.count 似乎非常简单:
一个不执行任何计数的简单 getter。
calling seq for a Vec object (object created by vector-of) creates a VecSeq Object.
VecSeq object created on Veca calls Vec.count in it's method internal-reduce, which is used by clojure.core/reduce.
so it seems that a vector created by vector-of calls Vec.count while reducing. And as you mentioned that the code did a lot of reducing this seems to be the cause
What remains spooky is that Vec.count is that Vec.count seems to be very simple:
a simple getter that doesn't do any counting.
只是大声地说出来,看起来您的代码正在与 Seq 进行大量来回转换。
查看 RT.seqFrom ,这调用 ArraySeq.createFromObject
那么,使用 vec 会使用快速向量访问,而使用 d-vec 会强制使用数组并调用慢速 java.lang.Array.getLength方法(使用反射..)
Just talking out loud, it looks like your code is doing a lot of back and forth conversion to/from Seq.
Looking at RT.seqFrom, this calls ArraySeq.createFromObject
So would that be that using vec uses fast vector access, and that using d-vec is forcing the use of arrays and the call to a slow java.lang.Array.getLength method (that uses reflection .. )