Apache PIG 问题
我对运行 Pig 脚本/ Map-Reduce 作业有几个问题。
我知道pig在真正开始执行map/reduce作业之前会创建逻辑计划、物理计划以及执行计划;我可以使用命令 explain
查看逻辑/物理计划;但是我如何查看执行计划(我想它列出了计划的不同映射/归约任务)?在 Pig 执行过程中,我看到创建了许多作业(map/reduce 对)。想要了解每个作业解决什么问题。 是否有任何明确的指南可以用来理解所创建的计划,因为口水是很难理解的。
我可以通过更改输入文件块的数量来更改映射作业的数量。我也可以控制减少作业的数量吗?如何设置减速器的数量?
mapper/reducer 节点中的默认堆内存大小是多少?哪些工作参数反映了这些?我可以通过 -Xmx 1024m 选项更改堆内存吗?当我以这种方式设置堆内存时,我的作业常常会失败 - 可能对可以提供的值有一些限制?
非常感谢!
I have few questions on running the pig script/ map-reduce jobs.
I know that pig creates logical, physical and then execution plans before it really starts executing the map/reduce job; I am able to look at the logical/physical plans using the command explain <alias_name>; But how do I view the execution plan (which I suppose list the different map/reduce tasks planned)? In the course of pig execution, I see that many jobs (map/reduce pair) are created. Want to understand what each of these jobs solve.
Is there any definitive guide which I can use to understand the plans created because what is spat is difficult to understand.
I am able to change the number of map jobs by changing the number of input file blocks. Do I have control over the number of reduce jobs as well? How do I set the number of reducers?
What is the default heap memory size in mapper/reducer nodes? Which job parameters reflect these? Will I be able to change the heap memory by -Xmx 1024m option? My jobs used to fail when I set the heap memory in this way - May be there are some restrictions on what values can be supplied?
Thanks much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
解释生成了不同类型的计划。提供一个目录路径而不是一个文件来从“explain”获取所有 3 个计划。
不知道。
set default_parallel 10
会将reduce 作业数量设置为10。它必须在您的 hadoop 设置中。
There are different kinds of plans generated by explain. Give a directory path instead of a file to get all 3 plans from 'explain'.
No idea.
set default_parallel 10
would set number of reduce jobs to 10.It must be in your hadoop settings.
“Explain(pig command )” ALIAS-NAME 解释物理计划(就 Map reduce 作业而言)
别名将在 MR 作业中分组在一起。在计划阶段本身,人们可以看到,所有别名都已分组到给定的 MR 中。
为了控制减速器的数量,可以在编写 join 、 groupby 时使用“USING PARALLELdesired_no”,或者在开始时使用“setdefault_paralleldesiredno”猪脚本。
这取决于一只奔跑的猪在哪里。
如果是 MRv1 :设置 mapred.java.opts -Xmx
在mrv2中:设置mapred.map.size,设置mapred.map.java.opts -Xmx
"Explain(pig command )" ALIAS-NAME explains the physical plan (in terms of Map reduce jobs)
Alias will be grouped together in a MR job . During the plan phase itself one can see , whicl all alias have been grouped into a given MR
To control number of reducer one can use "USING PARALLEL desired_no " when writing joins , groupby or "set default_parallel desired no" at starting of pig script .
This depends on where is one running pig .
if its MRv1 : set mapred.java.opts -Xmx
in mrv2 : set mapred.map.size , set mapred.map.java.opts -Xmx