OOM:GC overhead limit exceeded 错处理方法

最新推荐文章于 2025-07-14 03:12:46 发布

原创最新推荐文章于 2025-07-14 03:12:46 发布 · 1.9k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#OOM:GC overhead limit exceeded #spark.driver.memory #spark.sql.shuffle.partitions

spark OOM 专栏收录该内容

1 篇文章

订阅专栏

本文介绍了在使用Spark处理大数据时遇到的“GC overhead limit exceeded”错误，并详细记录了通过调整spark.driver.memory参数来解决问题的过程。同时，还讨论了spark.sql.shuffle.partitions参数对输出文件数量的影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

*OOM:GC overhead limit exceeded spark.driver.memory spark.sql.shuffle.partitions*
错误现象：
调整不同资源进行执行时报错：
18/07/26 17:02:03 INFO spark.ContextCleaner: Cleaned accumulator 18
Exception in thread “broadcast-hash-join-1” 18/07/26 17:10:59 WARN nio.NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded

18/07/26 15:11:04 INFO spark.ContextCleaner: Cleaned accumulator 18
Exception in thread “broadcast-hash-join-1” java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.copy(UnsafeRow.java:537)
at org.apache.spark.sql.execution.joins.UnsafeHashedRelation $.apply(HashedRelation.scala:403) at org.apache.spark.sql.execution.joins.HashedRelation$ .apply(HashedRelation.scala:128)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin

a n o n f u n $ b r o a d c a s t F u t u r e $ 1

$anonfun$broadcastFuture$1$ anonfun

applyapply $apply$ 1.apply(BroadcastHashOuterJoin.scala:92)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin

a n o n f u n $ b r o a d c a s t F u t u r e $ 1

$anonfun$broadcastFuture$1$ anonfun

applyapply $apply$ 1.apply(BroadcastHashOuterJoin.scala:82)
at org.apache.spark.sql.execution.SQLExecution

.withExecutionId(SQLExecution.scala:90)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin.withExecutionId(SQLExecution.scala:90)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin $.withExecutionId(SQLExecution.scala:90) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$

anonfunanonfun $anonfun$ broadcastFuture

1.apply(BroadcastHashOuterJoin.scala:82)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin1.apply(BroadcastHashOuterJoin.scala:82)atorg.apache.spark.sql.execution.joins.BroadcastHashOuterJoin $1.apply(BroadcastHashOuterJoin.scala:82) at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$

anonfunanonfun $anonfun$ broadcastFuture

1.apply(BroadcastHashOuterJoin.scala:82)atscala.concurrent.impl.Future1.apply(BroadcastHashOuterJoin.scala:82)atscala.concurrent.impl.Future $1.apply(BroadcastHashOuterJoin.scala:82) at scala.concurrent.impl.Future$ PromiseCompletingRunnable.liftedTree1

1(Future.scala:24)atscala.concurrent.impl.Future1(Future.scala:24)atscala.concurrent.impl.Future $1(Future.scala:24) at scala.concurrent.impl.Future$ PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

经过度娘查看有两种说法，看样子是在yarn-standalone方式下调 executor.memory,或driver.memory(多建议由512m或1g调至2g)而当前问题发生集群上有，本身指字的driver.memory 已经是3G了，依然如此（集群限driver.memory最大3G），多次调整语句与其它未能达到解决此问题的效果；便提议集群放大一下driver.memory 的大小。
随后调整集群参数（ SPARK_HOME/conf/spark-defaults.conf 中设置的spark.driver.memory ）值到5g;再在集群中调用spark-submit 时指定driver.memory为 5g 启动任务就可以调用过去了。

另外在执行任务时，发现集群写出的文件数量比较大，调用时设置的： spark.sql.shuffle.partitions 与spark.default.parallelism 是
spark.executor.cores 与 spark.executor.instances 的积的1-3倍；后续经多次测试，发现文件大小与：spark.sql.shuffle.partitions 有直接关系一般是一次写出到hive会写此值+1/2 个文件，因此调整此参数据的数值，文件数量减少，当前未明了其原因，请路过的大拿给讲解讲解。