今天最快的全文搜索?
剧透:
这只是 Lucene 与 Sphinx 对比的另一场对比,
我看到所有其他线程都快两年了,所以决定重新开始。.
这是要求:
数据大小:最大 10 GB。
行:近十亿
索引应该很快
搜索时间应该低于 0 毫秒 [好吧,笑话...笑...但是保持尽可能低]
在当今世界,我该怎么做?
编辑 : 我在lucene上做了一些计时,索引1.8gb的数据,花了5分钟。
搜索速度非常快,除非我执行a*。 a* 需要 400 ~ 500 ms。
我最担心的是索引,这需要花费大量的时间和大量的资源!
spoiler :
This is just another Lucene vs Sphinx vs whatever,
I saw that all other threads were almost two years old, so decided to start again..
Here is the requirement :
data size : max 10 GB.
rows : nearly billions
indexing should be fast
searching should be under 0 ms [ ok, joke... laugh... but keep this as low as possible ]
In today's world, which/what/how do I go about it ?
edit :
I did some timing on lucene, and for indexing 1.8gb data, it took 5 minutes.
searching is pretty fast, unless I do a a*. a* takes 400 ~ 500 ms.
My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
除了 Lucene 之外,我没有其他经验 - 它几乎是默认的索引解决方案,因此不要认为您会出错。
10GB 并不是很多数据。您将能够非常快速地重新索引它 - 或者将其保留在 SSD 上以获得额外的速度。当然,将整个索引保存在 RAM(Lucene 支持)中以实现超快速查找。
I have no experience other than with Lucene - it's pretty much the default indexing solution so don't think you can go too wrong.
10GB is not a lot of data. You'll be able to re-index it pretty rapidly - or keep it on SSDs for extra speed. And of course keep your whole index in RAM (which Lucene supports) for super-fast lookups.
请查看 Lucene wiki,了解有关提高 Lucene 索引速度的提示。这是相当简洁的。一般来说,Lucene 的速度相当快(它用于实时搜索)。这些提示可以方便地找出您是否遗漏了一些“明显”的内容。
Please check Lucene wiki for tips on improving Lucene indexing speed. This is quite succinct. In general, Lucene is quite fast (it is used for real-time search.) The tips will be handy to figure out if you are missing out on something "obvious."
看看 Lusql< /strong>,我们使用过一次,在一台像样的机器上,来自 mysql 的 FWIW 100 GB 数据在文件系统(NTFS)上花费了一个多小时来建立索引
现在,如果您添加 SSD 或任何超快磁盘技术,您可以大大降低它
Take a look at Lusql, we used it once, FWIW 100 GBdata from mysql on a decent machine took little more than an hour to index, on filesystem(NTFS)
Now if u add SSD or whatever ultra fast disk tecnnology, you can bring it down considerably