对HashTable的性能问题感到好奇

发布于 2024-09-05 23:28:58 字数 364 浏览 2 评论 0 原文

我读到 Haskell 中的哈希表存在性能问题(在 Haskell-Cafe< /a> 于 2006 年和 飞蛙咨询公司2009 年的博客),而且由于我喜欢 Haskell,所以它让我很担心。

那是一年前的事了,现在(2010年6月)情况如何? GHC 中的“哈希表问题”解决了吗?

I read that hash tables in Haskell had performance issues (on the Haskell-Cafe in 2006 and Flying Frog Consultancy's blog in 2009), and since I like Haskell it worried me.

That was a year ago, what is the status now (June 2010)? Has the "hash table problem" been fixed in GHC?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

活雷疯 2024-09-12 23:28:58

问题是垃圾收集器需要遍历可变的指针数组(“装箱数组”),寻找指向可能准备释放的数据的指针。装箱的可变数组是实现哈希表的主要机制,因此特定的结构会出现 GC 遍历问题。这对于许多语言来说都是常见的。症状是垃圾收集过多(高达 95% 的时间花在 GC 上)。

修复方法是在 GC 中实现“卡片标记”可变指针数组,这发生在 2009 年末。现在在 Haskell 中使用可变指针数组时,您不应该看到过多的 GC。在简单的基准测试中,大型哈希的哈希表插入改进了 10 倍。

请注意,GC 行走问题不会影响纯函数结构,也不会影响未装箱的数组(就像大多数数据一样) 并行数组,或类似矢量的数组,在 Haskell 中。它也不影响存储在 C 堆上的哈希表(如 judy)。这意味着它不会影响不使用命令式哈希表的日常 Haskellers。

如果您在 Haskell 中使用哈希表,那么您现在不应该观察到任何问题。例如,这里是一个简单的哈希表程序,它将 1000 万个整数插入到哈希中,因为原始引用

import Control.Monad
import qualified Data.HashTable as H
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  m <- H.new (==) H.hashInt
  forM_ [1..size] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

在修复之前 没有提供任何代码或基准。 10M 整数:

$ time ./A 10000000 +RTS -s
...
47s.

使用 GHC 6.13,修复后:

./A 10000000 +RTS -s 
...
8s

增加默认堆区域:

./A +RTS -s -A2G
...
2.3s

避免哈希表并使用 IntMap:

import Control.Monad
import Data.List
import qualified Data.IntMap as I
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  let k = foldl' (\m n -> I.insert n n m) I.empty [1..size]
  print $ I.lookup 100 k

我们得到:

$ time ./A 10000000 +RTS -s        
./A 10000000 +RTS -s
6s

或者,使用 judy 数组(这是通过外部函数调用 C 代码的 Haskell 包装器)接口):

import Control.Monad
import Data.List
import System.Environment
import qualified Data.Judy as J

main = do
  [size] <- fmap (fmap read) getArgs
  j <- J.new :: IO (J.JudyL Int)
  forM_ [1..size] $ \n -> J.insert (fromIntegral n) n j
  print =<< J.lookup 100 j

运行这个,

$ time ./A 10000000 +RTS -s
...
2.1s

所以,正如你所看到的,哈希表的 GC 问题已经修复了,并且总是有其他库和数据结构非常适合。总而言之,这不是问题。

注意:从 2013 年开始,您可能应该只使用 hashtables 包,它支持 一系列本地可变哈希表

The problem was that the garbage collector is required to traverse mutable arrays of pointers ("boxed arrays") looking for pointers to data that might be ready to deallocate. Boxed, mutable arrays are the main mechanism for implementing a hashtable, so that particular structure showed up the GC traversal issue. This is common to many languages. The symptom is excessive garbage collection (up to 95% of time spent in GC).

The fix was to implement "card marking" in the GC for mutable arrays of pointers, which occured in late 2009. You shouldn't see excessive GC when using mutable arrays of pointers in Haskell now. On the simple benchmarks, hashtable insertion for large hashes improved by 10x.

Note that the GC walking issue doesn't affect purely functional structures, nor unboxed arrays (like most data parallel arrays, or vector-like arrays, in Haskell. Nor does it affect hashtables stored on the C heap (like judy). Meaning that it didn't affect day-to-day Haskellers not using imperative hash tables.

If you are using hashtables in Haskell, you shouldn't observe any issue now. Here, for example, is a simple hashtable program that inserts 10 million ints into a hash. I'll do the benchmarking, since the original citation doesn't present any code or benchmarks.

import Control.Monad
import qualified Data.HashTable as H
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  m <- H.new (==) H.hashInt
  forM_ [1..size] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

With GHC 6.10.2, before the fix, inserting 10M ints:

$ time ./A 10000000 +RTS -s
...
47s.

With GHC 6.13, after the fix:

./A 10000000 +RTS -s 
...
8s

Increasing the default heap area:

./A +RTS -s -A2G
...
2.3s

Avoiding hashtables and using an IntMap:

import Control.Monad
import Data.List
import qualified Data.IntMap as I
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  let k = foldl' (\m n -> I.insert n n m) I.empty [1..size]
  print $ I.lookup 100 k

And we get:

$ time ./A 10000000 +RTS -s        
./A 10000000 +RTS -s
6s

Or, alternatively, using a judy array (which is a Haskell wrapper calling C code through the foreign-function interface):

import Control.Monad
import Data.List
import System.Environment
import qualified Data.Judy as J

main = do
  [size] <- fmap (fmap read) getArgs
  j <- J.new :: IO (J.JudyL Int)
  forM_ [1..size] $ \n -> J.insert (fromIntegral n) n j
  print =<< J.lookup 100 j

Running this,

$ time ./A 10000000 +RTS -s
...
2.1s

So, as you can see, the GC issue with hashtables is fixed, and there have always been other libraries and data structures which were perfectly suitable. In summary, this is a non-issue.

Note: as of 2013, you should probably just use the hashtables package, which supports a range of mutable hashtables natively.

一抹淡然 2024-09-12 23:28:58

这样的问题确实只能通过实验来解决。但如果你没有时间或金钱做实验,你就必须询问其他人的想法。当您这样做时,您可能需要考虑来源并考虑所提供的信息是否已经过任何方式的审查或审查。

Jon Harrop 提出了一些关于 Haskell 的有趣主张。我建议您在 Google 网上论坛和其他地方搜索 Harrop 在 Haskell、Lisp 和其他函数式语言方面的专业知识的证据。您还可以阅读 Chris Okasaki 和 Andy Gill 关于 Haskell 中的 Patricia 树的著作,了解如何看待他们的专业知识。您还可以找到谁的索赔(如果有)已经过第三方检查。然后你就可以自己决定如何认真对待不同人对不同函数式语言性能的说法。

哦,不要喂巨魔。


PS:您自己做实验是相当合理的,但也许没有必要,因为值得信赖的Don Stewart 在他的精彩答案中提出了一些很好的微基准。以下是 Don 答案的附录:


附录:在 AMD Phenom 9850 Black Edition 上使用 Don Stewart 的代码,时钟频率为 2.5GHz,配备 4GB RAM,32 位模式,使用 ghc -O

  • 使用默认堆,IntMap 比哈希表快 40%。
  • 使用 2G 堆时,哈希表比 IntMap 快 40%。
  • 如果我使用默认堆处理一千万个元素,则 IntMap 比哈希表(CPU 时间)快四倍,或者快两倍 > 按挂钟时间。

我对这个结果有点惊讶,但让我放心的是函数式数据结构的性能相当好。并证实了我的信念,在代码的实际使用条件下对代码进行基准测试确实值得。

A question like this can really be settled only by experiment. But if you don't have the time or money to do experiments, you have to ask other people what they think. When you do so, you might want to consider the source and consider whether the information given has been reviewed or vetted in any way.

Jon Harrop has advanced some interesting claims about Haskell. Let me suggest that you search on Google Groups and elsewhere for evidence of Harrop's expertise in Haskell, Lisp, and other functional languages. You could also read the work by Chris Okasaki and Andy Gill on Patricia trees in Haskell, see how their expertise is regarded. You can also find whose claims, if any, have been checked by a third party. Then you can make up your own mind how seriously to take different people's claims about the performance of different functional languages.

Oh, and don't feed the troll.


P.S. It would be quite reasonable for you to do your own experiments, but perhaps not necessary, since the trusty Don Stewart presents some nice microbenchmarks in his fine answer. Here's an addendum to Don's answer:


Addendum: Using Don Stewart's code on an AMD Phenom 9850 Black Edition clocked at 2.5GHz with 4GB RAM, in 32-bit mode, with ghc -O,

  • With the default heap, the IntMap is 40% faster than the hash table.
  • With the 2G heap, the hash table is 40% faster than the IntMap.
  • If I go to ten million elements with the default heap, the IntMap is four times faster than the hash table (CPU time) or twice as fast by wall-clock time.

I'm a little surprised by this result, but reassured that functional data structures perform pretty well. And confirmed in my belief that it really pays to benchmark your code under the actual conditions in which it's going to be used.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文