顺序运行多个 Pig 脚本的方法有哪些?
我需要在 Hadoop 中顺序运行一些 Pig 脚本。它们必须单独运行。有什么建议吗?
更新
只是一个快速更新,我们正在努力从一个 Java 类运行 Pig 脚本。 Oozie 是评论中提到的一种可能性(尽管对于我们的需求来说太重了)。我还听说可以将 Pig 脚本编排为 Cascading 中更大作业流程的一部分 ( http://www.cascading.org/)并稍微看了一下。
I need to run some Pig scripts sequentially in Hadoop. They must be run separately. Any suggestions?
update
Just a quick update that we're working toward running the Pig scripts from one Java class. Oozie is a possibility that was mentioned in a comment (though much too heavy for our needs). I've also heard that it's possible to orchestrate Pig scripts as a part of a larger job flow in Cascading (http://www.cascading.org/) and looked at that a little.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于简单的任务序列,我想 Orangeoctopus 建议的内容可能就足够了。如果您想将 Pig 和/或普通 MapReduce 的更复杂的工作流程组合在一起,您可能应该看看 Oozie
更新:
如果您使用pig 0.9,您还可以看看将pig 嵌入到像python 这样的语言中。这是链接
For a simple sequence of tasks I guess what orangeoctopus suggested would probably suffice. If you would like to club together a more complex workflow of pig and/or plain vanilla MapReduce, you should probably take a look at Oozie
Update :
If you are using pig 0.9, you could also possibly take a look at embedding pig in a language like python. Heres the link
在实践中,我将大部分 Pig 脚本包装在 bash 脚本中。您可以控制 shell 脚本内部的顺序执行:
pig myscript1.pig &&猪 myscript2.pig &&猪 myscript3.pig
In practice, I wrap the majority of my Pig scripts in bash scripts. You could control the sequential execution inside of the shell script:
pig myscript1.pig && pig myscript2.pig && pig myscript3.pig