为什么东京暴君调整bnum后速度仍呈指数级下降?
有人成功使用 Tokyo Cabinet / Tokyo Tyrant 处理大型数据集吗? 我正在尝试上传维基百科数据源的子图。 在达到大约 3000 万条记录后,我的速度呈指数级下降。 HDB 和 BDB 数据库都会发生这种情况。 我将 bnum 调整为 HDB 情况下预期记录数的 2-4 倍,仅略微加快速度。 我还将 xmsiz 设置为 1GB 左右,但最终我还是碰壁了。
看起来 Tokyo Tyrant 基本上是一个内存数据库,当你超过 xmsiz 或 RAM 时,你会得到一个几乎不可用的数据库。 以前有其他人遇到过这个问题吗? 你能解决吗?
Has anyone successfully used Tokyo Cabinet / Tokyo Tyrant with large datasets? I am trying to upload a subgraph of the Wikipedia datasource. After hitting about 30 million records, I get exponential slow down. This occurs with both the HDB and BDB databases. I adjusted bnum to 2-4x the expected number of records for the HDB case with only a slight speed up. I also set xmsiz to 1GB or so but ultimately I still hit a wall.
It seems that Tokyo Tyrant is basically an in memory database and after you exceed the xmsiz or your RAM, you get a barely usable database. Has anyone else encountered this problem before? Were you able to solve it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我想我可能已经破解了这个,而且我在其他地方没有看到这个解决方案。 在 Linux 上,Tokyo 开始变慢通常有两个原因。 让我们来看看常见的罪魁祸首。 首先,如果您将 bnum 设置得太低,您希望它至少等于哈希中项目数量的一半。 (最好更多。)其次,您要尝试将 xmsiz 设置为接近存储桶数组的大小。 要获取存储桶数组的大小,只需创建一个具有正确 bnum 的空数据库,Tokyo 就会将该文件初始化为适当的大小。 (例如,对于空数据库,bnum=200000000 大约为 1.5GB。)
但是现在,您会注意到它仍然减慢,尽管速度有点远。 我们发现诀窍是关闭文件系统中的日志记录——出于某种原因,当哈希文件大小超过 2-3GB 时,日志记录(在 ext3 上)会激增。 (我们意识到这一点的方式是 I/O 峰值与磁盘上文件的更改不对应,同时守护进程 CPU 突发 kjournald)
对于 Linux,只需卸载 ext3 分区并将其重新挂载为 ext2。 构建您的数据库,然后重新挂载为 ext3。 当日志记录被禁用时,我们可以毫无问题地构建 180M 键大小的数据库。
I think I may have cracked this one, and I haven't seen this solution anywhere else. On Linux, there are generally two reasons that Tokyo starts to slow down. Lets go through the usual culprits. First, is if you set your bnum too low, you want it to be at least equal to half of the number of items in the hash. (Preferrably more.) Second, you want to try to set your xmsiz to be close to the size of the bucket array. To get the size of the bucket array, just create an empty db with the correct bnum and Tokyo will initialize the file to the appropriate size. (For example, bnum=200000000 is approx 1.5GB for an empty db.)
But now, you'll notice that it still slows down, albeit a bit farther along. We found that the trick was to turn off journalling in the filesystem -- for some reason the journalling (on ext3) spikes as your hash file size grows beyond 2-3GB. (The way we realized this was spikes in I/O not corresponding to the changes of the file on disk, alongside daemon CPU bursts of kjournald)
For Linux, just unmount and remount your ext3 partition as an ext2. Build your db, and remount as ext3. When journalling was disabled we could build 180M key sized db's without a problem.
东京规模惊人! 但你必须适当地设置你的 bnum 和 xmsiz 。 bnum 应比您计划存储的记录大 0.025 到 4 倍。 xmsiz 应与 BNUM 的大小匹配。 如果您计划存储超过 2GB,还请设置 opts=l。
请参阅上面 Greg Fodor 的帖子,了解如何获取 xmsiz 的大小值。 请注意,设置 xmsiz 时,该值以字节为单位。
最后,如果您使用基于磁盘的哈希,则关闭 tokyo 数据所在的文件系统上的日志记录非常非常重要。 对于 Linux、Mac OSX 和可能的 Windows 来说都是如此,尽管我还没有在那里进行测试。
如果打开日志功能,当行数接近 30+ 百万时,您会发现性能严重下降。 关闭日记并适当设置其他选项后,东京是一个很棒的工具。
Tokyo scales wonderfully!! But you have to set your bnum and xmsiz appropriately. bnum should be .025 to 4 times greater than the records you are planning to store. xmsiz should match the size of BNUM. Also set opts=l if you are planning to store more than 2GB.
See Greg Fodor's post above about getting the value for size of xmsiz. Be careful to note that when setting xmsiz the value is in bytes.
Finally, if you are using a disk based hash it is very, very, VERY important to turn off journaling on the filesystem that the tokyo data lives on. This is true for Linux, Mac OSX and probably Windows though I have not tested it there yet.
If journaling is turned on you will see severe drops in performance as you approach 30+ million rows. With journaling turned off and other options appropriately set Tokyo is a great tool.
我正在开始研究一种名为 Shardy 的解决方案,将分片添加到东京内阁。
http://github.com/cardmagic/shardy/tree/master
I am starting to work on a solution to add sharding to tokyo cabinet called Shardy.
http://github.com/cardmagic/shardy/tree/master
Tokyo Cabinet的键值存储确实不错。 我认为人们称其为“慢”是因为他们使用东京内阁的桌子状商店。
如果你想存储文档数据,请使用 mongodb 或其他一些 nosql 引擎。
Tokyo Cabinet's key-value store is really good. I think people call it slow because they use Tokyo Cabinet's table-like store.
If you want to store document data use mongodb or some other nosql engine.