MapReduce作业运行时,任务可能会失败,报out of memory错误。这个时候可以采用以下几个过程调优
简单粗暴: 加大内存
哪个阶段报错就增加那个阶段的内存。以reduce阶段为例,map阶段的类似
代码语言:javascript复制mapreduce.reduce.memory.mb=5120 //设置reduce container的内存大小
mapreduce.reduce.java.opts=-Xms2000m -Xmx4600m; //设置reduce任务的JVM参数
案例一:copy阶段占用内存过大
有时候将内存设置大不管用,案例如下:
代码语言:javascript复制Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:378)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:304)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:294)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
原因 这是reduce从map取数据阶段报的错,reduce从map取数阶段使用的buffer可以占到reduce任务最大堆的70%的内存。报错之前copy还在运行,而reduce阶段其他过程占用了超过30%的内存,这个时候copy阶段继续取数,扩展buffer的时候,申请不到内存就报错了
解决方案 设置copy阶段buffer占用的内存大小,将mapreduce.reduce.shuffle.input.buffer.percent设置成0.5甚至更小 参数在wiki上的解释 The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
参考
MapReduce任务Shuffle Error错误