减速器堆内存不足

发布于 2024-12-24 18:18:21 字数 2108 浏览 1 评论 0原文

因此,我有一些 Pig 脚本在作业的减少阶段不断死亡,并出现 Java 堆空间不足的错误。到目前为止,我唯一的解决方案是增加Reducer 数量,但这似乎并没有给我带来任何可靠的帮助。现在,部分原因可能是我们获得的数据大幅增长,但不能确定。

我考虑过更改溢出阈值设置,但不记得该设置,但不确定它们是否会有所帮助或只是减慢速度。我还可以考虑做哪些其他事情来解决这个问题?

顺便说一句,当这种情况偶尔发生时,我也会收到有关 bash 无法获取我认为溢出操作的内存的错误。这会是 Hadoop 节点内存不足吗?如果是这样,只需减小这些盒子上的堆大小就可以解决问题吗?

编辑 1
1) 猪0.8.1
2) 唯一的 UDF 是 eval udf,它只查看没有包或映射的单行。
3)我没有注意到有任何密钥分配不良的热点。我也一直在使用素数比例来减少这个问题。

编辑2
这是有问题的错误:
<代码> 2012-01-04 09:58:11,179 致命 org.apache.hadoop.mapred.TaskRunner:attempt_201112070707_75699_r_000054_1:映射输出复制失败:java.lang.OutOfMemoryError:Java 堆空间 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

这是我不断收到的 bash 错误:
<代码> java.io.IOException:任务:attempt_201112070707_75699_r_000054_0 - 减少复制程序失败 在 org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) 在 org.apache.hadoop.mapred.Child.main(Child.java:170) 引起原因:java.io.IOException:无法运行程序“bash”:java.io.IOException:错误= 12,无法分配内存 在 java.lang.ProcessBuilder.start(ProcessBuilder.java:460) 在 org.apache.hadoop.util.Shell.runCommand(Shell.java:149) 在 org.apache.hadoop.util.Shell.run(Shell.java:134) 在 org.apache.hadoop.fs.DF.getAvailable(DF.java:73) 在 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) 在 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) 在 org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160) 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2537) 在 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)

So I have a few Pig scripts that keep dying in there reduce phase of the job with the errors that the Java heap keeps running out of space. To this date my only solution has been to increase Reducer counts, but that doesn't seem to be getting me anywhere reliable. Now part of this may be just the massive growth in data we are getting, but can't be sure.

I've thought about changing the spill threshold setting, can't recall the setting, but not sure if they would help any or just slow it down. What other things can I look at doing to solve this issue?

On a side note when this starts happening on occasion I also get errors about bash failing to get memory for what I assume is the spill operation. Would this be the Hadoop node running out of memory? If so would just turning down the heap size on these boxes be the solution?

Edit 1
1) Pig 0.8.1
2) The only UDF is an eval udf that just looks at single rows with no bags or maps.
3) I haven't noticed there being any hotspots with bad key distrobution. I have been using the prime number scale to reduce this issue as well.

Edit 2
Here is the error in question:

2012-01-04 09:58:11,179 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201112070707_75699_r_000054_1 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

Here is the bash error I keep getting:

java.io.IOException: Task: attempt_201112070707_75699_r_000054_0 - The reduce copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2537)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

放我走吧 2024-12-31 18:18:21

显然你的某个地方内存不足了。增加reducer的数量其实是相当合理的。查看 JobTracker Web GUI 上的统计信息,看看有多少字节从映射器中流出。将其除以reduce 任务的数量,这就是每个reducer 所获得的结果的相当粗略的估计。不幸的是,从长远来看,只有当你的密钥分布均匀时,这才有效。

在某些情况下,JOIN(尤其是复制类型)会导致此类问题。当您有特定键的“热点”时,就会发生这种情况。例如,假设您正在进行某种连接,其中一个键出现的概率为 50%。无论哪个减速器幸运地处理了该密钥,都将被破坏。您可能需要调查哪些键导致热点并进行相应的处理。根据我的数据,通常这些热点无论如何都是无用的。要找出热门内容,只需执行 GROUP BYCOUNT 并找出哪些内容出现得最多。然后,如果它没有用,就将其过滤掉。

此问题的另一个根源是 Java UDF 聚合了太多数据。例如,如果您有一个 UDF,它会遍历数据包并将记录收集到某种列表数据结构中,那么热点值可能会耗尽您的记忆。

我发现 Pig 的新版本(特别是 .8 和 .9)的内存问题要少得多。我在 0.7 中遇到过很多堆耗尽的情况。这些版本具有更好的溢出到磁盘检测功能,因此如果堆即将崩溃,它会足够智能地溢出到磁盘。


为了让我更有帮助,您可以发布您的 Pig 脚本,并提及您正在使用的 Pig 版本。

Obviously you are running out of memory somewhere. Increasing the number of reducers is actually quite reasonable. Take a look at the stats on the JobTracker Web GUI and see how many bytes are going out of the mapper. Divide that by the number of reduce tasks, and that is a pretty rough estimate of what each reducer is getting. Unfortunately, this only works in the long run if your keys are evenly distributed.

In some cases, JOIN (especially the replicated kind) will cause this type of issue. This happens when you have a "hot spot" of a particular key. For example, say you are doing some sort of join and one of those keys shows up 50% of the time. Whatever reducer gets lucky to handle that key is going to get clobbered. You may want to investigate which keys are causing hot spots and handle them accordingly. In my data, usually these hot spots are useless anyways. To find out what's hot, just do a GROUP BY and COUNT and figure out what's showing up a lot. Then, if it's not useful, just FILTER it out.

Another source of this problem is a Java UDF that is aggregating way too much data. For example, if you have a UDF that goes through a data bag and collects the records into some sort of list data structure, you may be blowing your memory with a hot spot value.

I found that the newer versions of Pig (.8 and .9 particularly) have far fewer memory issues. I had quite a few instances of running out of heap in .7. These versions have much better spill to disk detection so that if its about to blow the heap, it is smart enough to spill to disk.


In order for me to be more helpful, you could post your Pig script and also mention what version of Pig you are using.

秋心╮凉 2024-12-31 18:18:21

我不是一个有经验的用户或任何人,但我在虚拟机上运行 Pig 作业时确实遇到了类似的问题。

我的特殊问题是,虚拟机没有配置交换空间,它最终会耗尽内存。我猜你正在正确的Linux配置中尝试这个,但是执行以下命令并不会造成什么伤害: free -m 并查看你得到的结果,也许问题是由于你太有配置的交换内存很少。

只是一个想法,如果有帮助请告诉我。祝你的问题好运!

I'm not an experienced user or anything, but I did run into a similar problem when runing pig jobs on a VM.

My particular problem, was that the VM had no swap space configured, it would eventually run out of memory. I guess you're trying this in a proper linux configuration, but it would't hurt to do a: free -m and see what you get in result, maybe the problem is due to you having too little swap memory configured.

Just a thought, let me know if it helps. Good luck with your problem!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文