在哈希表上使用 get() 方法时的 scala 速度? (是否生成临时 Option() 对象?)
我正在将一些代码转换为 Scala。它的代码位于包含大量数据的内部循环中,因此需要速度快,并且涉及在哈希表中查找键并计算概率。它需要根据是否找到密钥来执行不同的操作。使用“标准”习惯用法,代码看起来像这样:
counts.get(word) match {
case None => {
WordDist.overall_word_probs.get(word) match {
case None => (unseen_mass*WordDist.globally_unseen_word_prob
/ WordDist.num_unseen_word_types)
case Some(owprob) => unseen_mass * owprob / overall_unseen_mass
}
}
case Some(wordcount) => wordcount.toDouble/total_tokens*(1.0 - unseen_mass)
}
但我担心这种代码会非常慢,因为所有这些临时 Some() 对象都被创建然后被垃圾收集。 Scala2e 书声称智能 JVM“可能”优化这些,以便代码高效地执行正确的操作,但是使用 Sun 的 JVM 真的会发生这种情况吗?有人知道吗?
I am converting some code to Scala. It's code that sits in an inner loop with very large amounts of data so it needs to be fast, and it involves looking up keys in a hash table and computing probabilities. It needs to do different things depending on whether a key is found or not. The code would look like this using the "standard" idiom:
counts.get(word) match {
case None => {
WordDist.overall_word_probs.get(word) match {
case None => (unseen_mass*WordDist.globally_unseen_word_prob
/ WordDist.num_unseen_word_types)
case Some(owprob) => unseen_mass * owprob / overall_unseen_mass
}
}
case Some(wordcount) => wordcount.toDouble/total_tokens*(1.0 - unseen_mass)
}
but I am concerned that code of this sort is going to be very slow because of all these temporary Some() objects being created and then garbage-collected. The Scala2e book claims that a smart JVM "might" optimize these away so that the code does the right thing efficiency-wise, but does this actually happen using Sun's JVM? Anyone know?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您在 jvm 中启用逃逸分析,则可能会发生这种情况与:
在 JRE 1.6 上。本质上,它应该检测正在创建的对象,这些对象不会逃逸方法激活帧,并将它们分配到堆栈上,或者在不再需要它们后立即对其进行 GC。
您可以做的一件事是使用
scala.testing.Benchmark
特征。只需使用单例对象扩展它并实现 run 方法,编译并运行它。它将多次运行run
方法,并测量执行时间。This may happen if you enable escape analysis in the jvm, enabled with:
on JRE 1.6. Essentially, it should detect objects being created which do not escape the method activation frame and either allocate them on the stack or GC them right after they're no longer needed.
One thing you could do is to micro benchmark your code using the
scala.testing.Benchmark
trait. Just extend it with a singleton object and implement therun
method, compile it and run it. It will run therun
method multiple times, and measure execution times.是的,
Some
对象将被创建(None
是单例)。当然,除非 JVM 忽略了这一点——这取决于许多因素,包括 JVM 是否认为代码被调用了那么多。无论如何,该代码并不是真正的标准习惯用法。甚至有一个关于它的模因:有一次,一位经验丰富的 Scala 开发人员编写了这样的代码,而另一位开发人员回答说“这是什么?业余时间?平面地图那该死!”
不管怎样,我是这样重写它的:
然后你可以重构它——两个 getOrElse 参数都可以用不同的方法分割,并使用漂亮的名称。由于它们只是返回一个值而不需要输入,因此它们应该非常快。
现在,我们在
Option
上仅调用两个方法:map
和getOrElse
。以下是其实现的开始:由于
getOrElse
的参数是按名称传递的,因此涉及匿名函数的创建。当然,map
的参数也是一个函数。除此之外,这些方法被内联的机会非常好。所以,这是重构的代码,尽管我对它了解不够,无法给出好名字。
Yes,
Some
objects will be created (None
is a singleton). Unless, of course, JVM elides that -- that depends on many factors, including whether or not JVM thinks the code is called all that much.Anyway, that code is not really the standard idiom. There's even a meme about it: once, one experienced Scala developer was written code like this, when the other one replied "What's this? Amateur hour? Flatmap that sh*t!"
Anyway, here's how I'd rewrite it:
You can then refactor this -- both
getOrElse
parameters could be split in different method with nice names. Since they just return a value without input, they should be pretty fast.Now, we call just two methods here on
Option
:map
andgetOrElse
. Here's the beginning of their implementation:As the parameter to
getOrElse
is passed by name, it involves an anonymous function creation. And, of course, the parameter tomap
is also a function. Other than that, the chance of these methods getting inlined is pretty good.So, here's the refactored code, though I don't know enough about it to give good names.