临时报告 Hadoop
我想让人们输入简单的文本搜索词,运行一个 Pig 作业(如果这是最好的?这是我最了解的)并输出结果(tsv 文件结果?),以便我可以在 Web 界面中显示它们。
有什么办法可以解决这个问题吗?
有什么已知的方法可以将我想要的流程中一些不连贯的部分链接在一起吗?
谢谢
I want to allow people to put in simple text search terms, run a pig job (if that's best? it's what I know best) and output the results (the tsv file results?) so I can show them in a web interface.
Is there anything that approaches this problem?
Anything known to link a few disjointed pieces of the flow I am going for, together?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么不将文档索引到 Lucene 或 Solr 中?然后您就可以进行实时文本搜索。 Hadoop 是为面向批处理的流程而设计的,在本例中这似乎不是您想要的。
Why don't you index the docs into Lucene or Solr? Then you can do text search in real-time. Hadoop is designed for batch oriented processes, which doesn't seem like what you want in this case.
嗯,这取决于您的项目的要求。是否需要低延迟,即席搜索有多复杂。嗯,我认为 hbase+pig 可能是一个组合的解决方案。 hbase可以用于search实时搜索目的(虽然它的搜索功能没有RDBMS那么强大),pig可以用于大数据量的batch_processing。
Well, it depends on your project's requirements. Does it need low-latency, and how complex is the ad hoc search. Well I think hbase+pig might be a comprised solution. hbase can be used for search real-time search purpose (although its search function is not so powerful than RDBMS) and pig for batch_processing of large amount for data.