当前位置：文江博客话题详情

hashtable haskell ghc

对HashTable的性能问题感到好奇

发布于 2024-09-05 23:28:58 字数 364 浏览 7 评论 0 原文

我读到 Haskell 中的哈希表存在性能问题（在 Haskell-Cafe< /a> 于 2006 年和飞蛙咨询公司2009 年的博客），而且由于我喜欢 Haskell，所以它让我很担心。

那是一年前的事了，现在（2010年6月）情况如何？ GHC 中的“哈希表问题”解决了吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

活雷疯 2024-09-12 23:28:58

问题是垃圾收集器需要遍历可变的指针数组（“装箱数组”），寻找指向可能准备释放的数据的指针。装箱的可变数组是实现哈希表的主要机制，因此特定的结构会出现 GC 遍历问题。这对于许多语言来说都是常见的。症状是垃圾收集过多（高达 95% 的时间花在 GC 上）。

修复方法是在 GC 中实现“卡片标记”可变指针数组，这发生在 2009 年末。现在在 Haskell 中使用可变指针数组时，您不应该看到过多的 GC。在简单的基准测试中，大型哈希的哈希表插入改进了 10 倍。

请注意，GC 行走问题不会影响纯函数结构，也不会影响未装箱的数组（就像大多数数据一样）并行数组，或类似矢量的数组，在 Haskell 中。它也不影响存储在 C 堆上的哈希表（如 judy）。这意味着它不会影响不使用命令式哈希表的日常 Haskellers。

如果您在 Haskell 中使用哈希表，那么您现在不应该观察到任何问题。例如，这里是一个简单的哈希表程序，它将 1000 万个整数插入到哈希中，因为原始引用

import Control.Monad
import qualified Data.HashTable as H
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  m <- H.new (==) H.hashInt
  forM_ [1..size] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

在修复之前没有提供任何代码或基准。 10M 整数：

$ time ./A 10000000 +RTS -s
...
47s.

使用 GHC 6.13，修复后：

./A 10000000 +RTS -s 
...
8s

增加默认堆区域：

./A +RTS -s -A2G
...
2.3s

避免哈希表并使用 IntMap：

import Control.Monad
import Data.List
import qualified Data.IntMap as I
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  let k = foldl' (\m n -> I.insert n n m) I.empty [1..size]
  print $ I.lookup 100 k

我们得到：

$ time ./A 10000000 +RTS -s        
./A 10000000 +RTS -s
6s

或者，使用 judy 数组（这是通过外部函数调用 C 代码的 Haskell 包装器）接口）：

import Control.Monad
import Data.List
import System.Environment
import qualified Data.Judy as J

main = do
  [size] <- fmap (fmap read) getArgs
  j <- J.new :: IO (J.JudyL Int)
  forM_ [1..size] $ \n -> J.insert (fromIntegral n) n j
  print =<< J.lookup 100 j

运行这个，

$ time ./A 10000000 +RTS -s
...
2.1s

所以，正如你所看到的，哈希表的 GC 问题已经修复了，并且总是有其他库和数据结构非常适合。总而言之，这不是问题。

注意：从 2013 年开始，您可能应该只使用 hashtables 包，它支持一系列本地可变哈希表。

The problem was that the garbage collector is required to traverse mutable arrays of pointers ("boxed arrays") looking for pointers to data that might be ready to deallocate. Boxed, mutable arrays are the main mechanism for implementing a hashtable, so that particular structure showed up the GC traversal issue. This is common to many languages. The symptom is excessive garbage collection (up to 95% of time spent in GC).

The fix was to implement "card marking" in the GC for mutable arrays of pointers, which occured in late 2009. You shouldn't see excessive GC when using mutable arrays of pointers in Haskell now. On the simple benchmarks, hashtable insertion for large hashes improved by 10x.

Note that the GC walking issue doesn't affect purely functional structures, nor unboxed arrays (like most data parallel arrays, or vector-like arrays, in Haskell. Nor does it affect hashtables stored on the C heap (like judy). Meaning that it didn't affect day-to-day Haskellers not using imperative hash tables.

If you are using hashtables in Haskell, you shouldn't observe any issue now. Here, for example, is a simple hashtable program that inserts 10 million ints into a hash. I'll do the benchmarking, since the original citation doesn't present any code or benchmarks.

import Control.Monad
import qualified Data.HashTable as H
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  m <- H.new (==) H.hashInt
  forM_ [1..size] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

With GHC 6.10.2, before the fix, inserting 10M ints:

$ time ./A 10000000 +RTS -s
...
47s.

With GHC 6.13, after the fix:

./A 10000000 +RTS -s 
...
8s

Increasing the default heap area:

./A +RTS -s -A2G
...
2.3s

Avoiding hashtables and using an IntMap:

import Control.Monad
import Data.List
import qualified Data.IntMap as I
import System.Environment

main = do
  [size] <- fmap (fmap read) getArgs
  let k = foldl' (\m n -> I.insert n n m) I.empty [1..size]
  print $ I.lookup 100 k

And we get:

$ time ./A 10000000 +RTS -s        
./A 10000000 +RTS -s
6s

Or, alternatively, using a judy array (which is a Haskell wrapper calling C code through the foreign-function interface):

import Control.Monad
import Data.List
import System.Environment
import qualified Data.Judy as J

main = do
  [size] <- fmap (fmap read) getArgs
  j <- J.new :: IO (J.JudyL Int)
  forM_ [1..size] $ \n -> J.insert (fromIntegral n) n j
  print =<< J.lookup 100 j

Running this,

$ time ./A 10000000 +RTS -s
...
2.1s

So, as you can see, the GC issue with hashtables is fixed, and there have always been other libraries and data structures which were perfectly suitable. In summary, this is a non-issue.

Note: as of 2013, you should probably just use the hashtables package, which supports a range of mutable hashtables natively.

回复收藏 0 原文

一抹淡然 2024-09-12 23:28:58

这样的问题确实只能通过实验来解决。但如果你没有时间或金钱做实验，你就必须询问其他人的想法。当您这样做时，您可能需要考虑来源并考虑所提供的信息是否已经过任何方式的审查或审查。

Jon Harrop 提出了一些关于 Haskell 的有趣主张。我建议您在 Google 网上论坛和其他地方搜索 Harrop 在 Haskell、Lisp 和其他函数式语言方面的专业知识的证据。您还可以阅读 Chris Okasaki 和 Andy Gill 关于 Haskell 中的 Patricia 树的著作，了解如何看待他们的专业知识。您还可以找到谁的索赔（如果有）已经过第三方检查。然后你就可以自己决定如何认真对待不同人对不同函数式语言性能的说法。

哦，不要喂巨魔。

PS：您自己做实验是相当合理的，但也许没有必要，因为值得信赖的Don Stewart 在他的精彩答案中提出了一些很好的微基准。以下是 Don 答案的附录：

附录：在 AMD Phenom 9850 Black Edition 上使用 Don Stewart 的代码，时钟频率为 2.5GHz，配备 4GB RAM，32 位模式，使用 ghc -O，

使用默认堆，IntMap 比哈希表快 40%。
使用 2G 堆时，哈希表比 IntMap 快 40%。
如果我使用默认堆处理一千万个元素，则 IntMap 比哈希表（CPU 时间）快四倍，或者快两倍 > 按挂钟时间。

我对这个结果有点惊讶，但让我放心的是函数式数据结构的性能相当好。并证实了我的信念，在代码的实际使用条件下对代码进行基准测试确实值得。

回复收藏 0 原文

~没有更多了~

关于作者

别念他

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

对HashTable的性能问题感到好奇

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

对HashTable的性能问题感到好奇

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。