ruby 1.9 是否能够使用多核在 solr 中索引数据?
我有一个 ruby 1.9 Rails 3.0.7 应用程序,它使用 lucid/solr 来索引大量文本数据(3GB 左右)。数据存储在 MongoDB 数据库中,主要由电子邮件组成。
我遇到的一个问题是,当我建立应用程序时,我最初尝试对整个数据建立索引,以便可以搜索它。这是一个实际上会经常重复的过程,因此我必须弄清楚如何快速有效地将整个 MongoDB 数据库索引到 solr 中。根据 solr 文档,加快索引过程的主要方法之一是使用多个核心。我在单核虚拟机上运行索引,大约花了 1 小时来索引我拥有的数据。当我将其移至 4 核虚拟机并运行时,也花费了大约 1 小时。我没有注意到两者之间有任何明显的区别。
这让我怀疑 ruby 1.9 是否无法正确使用多核?我使用的是 Linux Ubuntu 10.10 虚拟机。
我读过一些文章,提到 ruby 1.9 是与 1.8 不同的多核功能,但我承认这不是我非常了解的领域。
有谁知道ruby 1.9是否确实能够利用多核在solr中索引大量数据?
I have a ruby 1.9 rails 3.0.7 application that is using lucid/solr to index large amounts of text data (3GB or so). The data is stored in a MongoDB database and consists mainly of emails.
One issue I'm having is that I'm trying to index the entire data initially when I establish the application so I can search it. This is a process that will actually be repeated quite often, so I have to figure out how to index the entire MongoDB database quickly and efficiently into solr. According to the solr docs, one of the main ways to expedite the indexing process is to use multiple cores. I ran the index on a single core VM and it took about 1 hour to index the data I have. When I moved it to a 4 core VM and ran it it took about 1 hour as well. I didn't notice any discernible difference between the 2.
This leads me to suspect that maybe ruby 1.9 is NOT capable of using multiple cores properly? I'm using a Linux Ubuntu 10.10 VM.
I've read some posts that mention ruby 1.9 is a different multi-core functionality than 1.8 but I admit this is not an area I'm very knowledgeable about.
Does anyone know if ruby 1.9 is indeed capable of taking advantage of multiple cores for indexing large amounts of data in solr?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据 这个问题和这个,它可以在所有核心,只要线程释放称为巨型 VM 锁的东西。
由于这可能取决于您正在使用的 gem(以及 C 扩展),因此我建议您进行一些测试以检查它是否实际上使用了所有核心,如果它没有这样做,则可能会转到JRuby,它应该使用所有 OOB 核心。
我知道这不是一个明确的答案,但我希望它能帮助您找到解决方案。
According to this question and this, it can run on all the cores, as long as the thread frees something called Giant VM Lock.
Since this probably depends on the gems (and thus C-extensions) you're using, I would suggest you to do some testing to check that it's actually using all the cores, and in the case that it's not doing it, maybe move to JRuby, which should use all the cores OOB.
I know that this is not a definitive answer, but I hope it helps you to find out a solution.