Hive 与 Lucene
是否可以使用 Hive 查询分布在 Hadoop 上的 Lucene 索引???
Is it possible to use Hive for querying Lucene index which is distributed over Hadoop???
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Hadapt 是一家初创公司,其软件将 Hadoop 与 SQL 前端(如 Hive)和混合存储引擎连接起来。他们提供的档案文本搜索功能可能会满足您的需求。
免责声明:我为 Hadapt 工作。
Hadapt is a startup whose software bridges Hadoop with a SQL front-end (like Hive) and hybrid storage engines. They offer a archival text search capability that may meet your needs.
Disclaimer: I work for Hadapt.
据我所知,您基本上可以在 Hive 中编写自定义“行提取”代码,所以我猜您可以。我从来没有使用过Lucene,也几乎没有使用过Hive,所以我不能确定。如果您找到问题的更明确答案,请发布!
As far as I know you can essentially write custom "row-extraction" code in Hive so I would guess that you could. I've never used Lucene and barely used Hive, so I can't be sure. If you find a more conclusive answer to your question, please post it!
我知道这是一篇相当旧的帖子,但我认为我可以提供更好的选择。
就您而言,与其经历将 HDFS Lucene 索引映射到 hive 模式的麻烦,不如将它们推送到 Pig 中,因为 Pig 可以读取平面文件。除非您想要以关系方式存储数据,否则您可以通过 Pig 处理它们并使用 Hbase 作为数据库。
I know this is a fairly old post, but thought I could offer a better alternative.
In your case, instead of going through the hassle of mapping your HDFS Lucene index to hive schema, it's better to push them into pig, because pig can read flat files. Unless you want a Relational way of storing your data, you could probably process them through Pig and use, Hbase as your DB.
您可以为 Hive 编写自定义输入格式来访问 Hadoop 中的 lucene 索引。
You could write a custom input format for Hive to access lucene index in Hadoop.