基于分布式键/值存储的搜索引擎架构?
有人知道描述基于分布式键/值存储的大规模全文搜索引擎的任何链接、论文、演示文稿或博客文章吗?
我对索引的组织特别感兴趣。数据结构到底是什么?字典和帖子存储在哪里以及如何存储?查询处理的工作流程是什么?如何以无需通过网络传输大量数据的方式处理查询?
我猜 Blekko 就是这样构建的。我想知道他们或他们的竞争对手实际上做了什么。
Is anyone aware of any links, papers, presentations, or blog posts that describe a large-scale full-text search engine built upon a distributed key/value store?
I'm particularly interested in the organization of the index. What, exactly, is the data structure? Where and how are dictionaries and postings stored? What is the workflow for query processing? How are queries handled in such a way that it's not necessary to haul massive amounts of data across the network?
I gather that Blekko is built this way. I'd like to know what they, or their competitors, actually did.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道有哪一篇博客文章或文章可以准确回答您的问题。然而,这里有一些我认为与您的问题相关的资源,我希望它们可以帮助您提取答案。
首先是 Jeff Dean 关于 Google 架构演变的主题演讲,
接下来,在名为 Lucandra 的 KV 存储之上有一个开源搜索引擎 - 如顾名思义,Lucene 位于 Cassandra 之上,两者都是 Apache 项目。
为了了解 Lucandra 的工作原理,请查看有关 Lucene 如何索引 Cassandra 数据的实现和演示。
同样,你也可以看到Lucene和HBase是如何共存的。这里有一个 Apache 提交/补丁的链接,它使用一个另一个集成了一个搜索层,
另一篇关于 Redis 的类似文章
接下来,查看可扩展搜索系统的操作要求
CIS 实验室有一些关于该主题的优秀研究论文,您应该查看,
对于上面可能做出的一般搜索引擎假设,这里是书籍链接这会有所帮助,
I'm not aware of a blog post or article that answers your question Exactly. However, here are some resources I think are of relevance to your question and I hope they can help you distill an answer.
Firstly, Jeff Dean's keynotes on the evolution of Google's architecture,
Next, there's an open source search engine on top of a K-V store called Lucandra - as the name suggests, Lucene on top of Cassandra, both being Apache projects.
In order to understand how Lucandra works, check out the implementation and presentations that were made that talk about how Lucene indexes Cassandra data.
Similarly, you can also see how Lucene and HBase coexist. Here's a link to the Apache commit/patch which integrates a search layer using one on the other,
Another similar article for Redis
Next, check out Operational Requirements for Scalable Search Systems
The CIS lab has some excellent research papers on the subject that you should check out,
For general search engine assumptions that may be made above, here are links to books that will help,
Google MapReduce 可能会让您非常感兴趣。
Google MapReduce will probably interest you a great deal.