HBase、Hyptertable、Lucene
我正在使用 lucene 中的搜索系统。默认情况下它不是分布式的,所以我正在考虑转向 HBase 或 Hadoop 之类的东西。
HBase 或 Hypertable 等解决方案是否具有内置搜索功能,或者我需要在它们之上实现 Lucene?
I am using an search system in lucene. By default it is not distributed, so I am thinking of moving to something like HBase or Hadoop.
Do solutions like HBase or Hypertable have a built-in search capability or will I need to implement Lucene on top of them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Lucene 与 HBase 或 Hypertable 等 BigTable 克隆有很大不同。如果您只是在寻找分布式 Lucene,那么您应该考虑 Elastic Search 或 Katta 等项目。
Solr/Lucene 也具有在集群上运行的能力,但分区不是自动的。您必须手动创建分片和副本以匹配您正在查找的数据的分布。如果您的基础数据存储在 HBase 之类的东西中,那么设置、修改和更新就会容易得多。
从根本上来说,HBase 和 Lucene 解决的是不同的问题。 Lucene 是一个索引,可以让关键字和其他类型的搜索快速返回。 HBase 是一个可以实时服务各个行的数据存储库;但是HBase不具备在线查询能力。为了获得最佳结果,您必须将它们结合起来。这一领域的一个例子是 Lily (http://outerthought.org/site/products/lily.html )
Lucene is very different from BigTable clones like HBase or Hypertable. If you are simply looking for a distributed Lucene, then you should look at projects such as Elastic Search or Katta.
Solr/Lucene also has the ability to operate over a cluster, but the partitioning is not automatic. You have to create shards and replicas manually to match the distribution of that data you are looking for. If your underlying data is stored in something like HBase this is much easier to set up, modify, and update.
Fundamentally HBase and Lucene solve different problems. Lucene is an index that allows keyword and other types of searches to return quickly. HBase is a data repository that can serve individual rows in real time; however, HBase does not have a online query capability. For best results, you have to combine them. One example in this area is Lily (http://outerthought.org/site/products/lily.html)
您可能还想看看 Lucandra,带有 Cassandra 后端的 Lucene:
https://github.com/tjake/Lucandra
You may also want to look at Lucandra, the Lucene with a Cassandra backend:
https://github.com/tjake/Lucandra
另一种值得关注的技术是 Katta 或分布式 Lucene,它可以在 HDFS 上运行
Another technology to look at is Katta or Distributed Lucene which can operate over HDFS
Lucene 提供了两个主要功能:结构化搜索和全文搜索。 Hbase 没有提供任何这些,结构化搜索可以用 hbase 以相对简单的方式完成,这就是我认为 Lilly 所做的。但重建全文搜索会更加困难。为了扩展 Lucene,您仍然可以尝试通过查找可以将数据分割到单独区域的属性来对索引进行分区(您将无法进行跨区域搜索)。然后每个区域就可以有一个集群。
Lucene provides two main features: structured search and full-text search. Hbase doesn't provide any of those, structured search can be done with hbase in a relatively easy way, it's what Lilly does I think. But rebuilding a full text search would be more difficult. To scale you Lucene you can still try to partitioned you index by looking to an attribute that can split your data in separate area (you won't be able to do cross area search). Then you can have one cluster per area.