JAVA 性能调优 -- 带你解读 CMS GC 日志

1. 引言

连续两篇文章，我们介绍了 java 最为常用的两大垃圾收集器 -- CMS 与 G1：

CMS 执行的七个阶段

驾驭一切的垃圾收集器 -- G1

在 G1 收集器的介绍中，我们提到，CMS 有着非常多的参数来控制整个流程，因此，要想进行 GC 相关的 java 性能调优，首要的基础就是要去深入了解垃圾回收相关的全部参数，在此基础上，看懂和分析 GC 日志，就成为了 java 程序员必备的一项技能。

本文我们就深入浅出，逐步带你走进 CMS GC 性能优化的世界。

2. 使用 CMS 需要配置的常用参数

2.1 基础参数

-server # 开启服务端模式，启动慢但性能更优 -Xms<size> # 初始堆内存大小，默认为物理内存的 1/64 -Xmx<size> # 最大堆内存大小，默认为物理内存的 1/4 -Xmn<size> # 新生代大小 -Xss<size> # 每个线程的堆栈大小，jdk1.5 以上默认为 1M

2.2 gc log 相关配置参数

-Xloggc:<path> # gc log 的输出路径 -XX: PrintGC # 输出 gc log -XX: PrintGCDetails # 输出 gc 详细信息 -XX: PrintGCDateStamps # 在 gc log 中记录 gc 启动时的系统时间 -XX: PrintGCTimeStamps # 在 gc log 中记录 gc 启动时相对于 jvm 启动的相对时间

2.3 堆 dump 相关参数

-XX: HeapDumpOnOutOfMemoryError # 当 jvm OOM 时，自动生成 dump 文件 -XX: AlwaysPreTouch # jvm 启动时预分配物理内存 -XX:HeapDumpPath=<path> # jvm OOM 时，dump 文件输出路径 -XX:ErrorFile=<path> # jvm OOM 时，输出 hs_err_pid*.log 的位置

2.4 分代大小参数

-XX:MetaspaceSize=<size> # 元空间大小 -XX:MaxMetaspaceSize=<size> # 最大元空间大小 -XX:NewRatio=<int> # 老年代与新生代的比例，默认为 2 -XX:SurvivorRatio=<int> # eden 区与 survive 区比例，默认为 8

2.5 CMS 相关参数

-XX: UseParNewGC # 使用 ParNew 作为新生代垃圾收集器 -XX: UseConcMarkSweepGC # 使用 CMS 作为 minor GC 收集器 -XX: CMSClassUnloadingEnabled # CMS 可以回收永久代 -XX: UseCMSInitiatingOccupancyOnly # 只有当堆内存超过阈值，才运行 CMS 回收，不建议开启 -XX:CMSInitiatingOccupancyFraction=<int> # 堆满阈值，默认为 75 -XX: UseCMSCompactAtFullCollection # full gc 后是否进行碎片整理 -XX:CMSFullGCsBeforeCompaction=<int> # 多少次 full gc 后进行一次碎片整理，默认为 5，需要与上一项配置搭配使用 -XX: CMSConcurrentMTEnabled # 是否多线程并发回收，默认开启 -XX:ConcGCThreads=<int> # CMS 并发收集线程数，默认为 CPU 核心数 -XX: CMSIncrementalMode # 开启 CMS 增量模式，每次回收一部分，多次执行后完成回收，不建议开启 -XX: ExplicitGCInvokesConcurrent # 调用 System.gc() 将执行 CMS 而不是 full GC -XX: ExplicitGCInvokesConcurrentAndUnloadsClasses # 在上一项配置基础上，System.gc() 执行后回收范围包含永久代 -XX: DisableExplicitGC # 禁用 System.gc()

3. 实战 GC

3.1 demo 程序代码

下面我们通过一个 demo 程序来看看具体的 GC 日志：

代码语言：javascript复制

 package cn.techlog.testjava.main;
 
 public class TestCMSGC{
     public static void main(String[] args) {
         int size = 1024 * 1024;
 
         for (int i = 0; i < 10;   i) {
             byte[] alloc1 = new byte[size * 4];
             System.out.println("alloc1");
 
             byte[] alloc2 = new byte[size * 2];
             System.out.println("alloc2");
 
             byte[] alloc3 = new byte[size * 4];
             System.out.println("alloc3");
         }
     }
 }

代码看起来很简单，我们创建了个 byte[]，分别占据 4M、2M、4M 空间，并将这一过程循环 10 次。

3.2 jvm 参数

接下里我们配置以下参数：

-server -Xms40M -Xmx40M # 堆内存 40MB -Xmn20M # 新生代内存 20MB -XX:SurvivorRatio=8 # eden 区与 survive 区比例为 8:2 -verbose:gc # 在控制台输出 gc 日志 -XX: PrintGCDetails # 输出 gc 日志详细信息 -XX: PrintGCDateStamps # 打印日志时间 -XX: UseConcMarkSweepGC # 启用 CMS

3.3 执行日志

启动程序，输出了以下日志：

alloc1alloc2alloc3alloc12021-02-06T12:42:47.088-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.088-0800: [ParNew: 16056K->419K(18432K), 0.0084200 secs] 16056K->5852K(38912K), 0.0084561 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] alloc2alloc3alloc1alloc22021-02-06T12:42:47.097-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.097-0800: [ParNew: 13028K->209K(18432K), 0.0099102 secs] 18461K->11787K(38912K), 0.0099420 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2021-02-06T12:42:47.107-0800: [GC (CMS Initial Mark) [1 CMS-initial-mark: 11577K(20480K)] 15883K(38912K), 0.0002270 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.107-0800: [CMS-concurrent-mark-start]alloc3alloc1alloc2alloc32021-02-06T12:42:47.108-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.108-0800: [ParNew: 14828K->14828K(18432K), 0.0000186 secs]2021-02-06T12:42:47.108-0800: [CMS2021-02-06T12:42:47.109-0800: [CMS-concurrent-mark: 0.001/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] (concurrent mode failure): 11577K->1320K(20480K), 0.0070740 secs] 26406K->1320K(38912K), [Metaspace: 7602K->7602K(1056768K)], 0.0071292 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] alloc1alloc2alloc3alloc12021-02-06T12:42:47.116-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.116-0800: [ParNew: 15287K->17K(18432K), 0.0012768 secs] 16608K->5436K(38912K), 0.0013161 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] alloc2alloc3alloc1alloc22021-02-06T12:42:47.118-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.118-0800: [ParNew: 12612K->4K(18432K), 0.0014321 secs] 18031K->11569K(38912K), 0.0014586 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2021-02-06T12:42:47.120-0800: [GC (CMS Initial Mark) [1 CMS-initial-mark: 11564K(20480K)] 15665K(38912K), 0.0001875 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.120-0800: [CMS-concurrent-mark-start]alloc3alloc1alloc2alloc32021-02-06T12:42:47.121-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.121-0800: [ParNew: 14633K->4K(18432K), 0.0008106 secs] 26198K->11568K(38912K), 0.0008370 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] alloc1alloc2alloc3alloc12021-02-06T12:42:47.122-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.122-0800: [ParNew: 14643K->4K(18432K), 0.0022952 secs] 26207K->15664K(38912K), 0.0023281 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] alloc2alloc32021-02-06T12:42:47.125-0800: [CMS-concurrent-mark: 0.002/0.005 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.125-0800: [CMS-concurrent-preclean-start]2021-02-06T12:42:47.126-0800: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.126-0800: [CMS-concurrent-abortable-preclean-start]2021-02-06T12:42:47.126-0800: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.126-0800: [GC (CMS Final Remark) [YG occupancy: 7115 K (18432 K)]2021-02-06T12:42:47.126-0800: [Rescan (parallel) , 0.0007304 secs]2021-02-06T12:42:47.126-0800: [weak refs processing, 0.0000118 secs]2021-02-06T12:42:47.126-0800: [class unloading, 0.0007235 secs]2021-02-06T12:42:47.127-0800: [scrub symbol table, 0.0005133 secs]2021-02-06T12:42:47.128-0800: [scrub string table, 0.0001555 secs][1 CMS-remark: 15660K(20480K)] 22776K(38912K), 0.0021777 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.128-0800: [CMS-concurrent-sweep-start]2021-02-06T12:42:47.128-0800: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.128-0800: [CMS-concurrent-reset-start]2021-02-06T12:42:47.128-0800: [CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Heap par new generation total 18432K, used 7115K [0x00000007bd800000, 0x00000007bec00000, 0x00000007bec00000) eden space 16384K, 43% used [0x00000007bd800000, 0x00000007bdef1db0, 0x00000007be800000) from space 2048K, 0% used [0x00000007bea00000, 0x00000007bea01020, 0x00000007bec00000) to space 2048K, 0% used [0x00000007be800000, 0x00000007be800000, 0x00000007bea00000) concurrent mark-sweep generation total 20480K, used 11562K [0x00000007bec00000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 7608K, capacity 7782K, committed 7936K, reserved 1056768K class space used 877K, capacity 908K, committed 1024K, reserved 1048576K

3.4 日志分析

接下来，我们就来逐条看看上述 GC log 究竟向我们讲述了什么。

3.4.1 young gc

开始的时候，程序顺次打印出了：

alloc1alloc2alloc3alloc1

分别申请了 4M、2M、4M、4M 的空间，总计 14 MB，而我们的 eden 区是 20M(新生代) * 0.8(eden 区比例) = 16M，所以恰好是可以容纳上述 14MB 的对象的。

接着，我们要再申请 2MB 的空间，此时 eden 区就会被占满，于是触发了第一次 young gc：

2021-02-06T12:42:47.116-0800: [GC (Allocation Failure) 2021-02-06T12:42:47.116-0800: [ParNew: 15287K->17K(18432K), 0.0012768 secs] 16608K->5436K(38912K), 0.0013161 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

这行日志中，有以下关键信息：

Allocation Failure -- 表示本次 GC 是由于分配空间失败引起的
ParNew -- 表示使用的垃圾收集器为 ParNew
15287K->17K(18432K) -- 表示年轻代垃圾回收前占用从 15287KB 到回收后占用 17KB，年轻代总大小为 18432KB
0.0012768 secs -- 本次 GC 耗时 0.0012768 秒
16608K->5436K(38912K) -- 堆区回收前占用 16608KB，回收后占用 5436KB，堆区总占用空间为 38912KB
[Times: user=0.01 sys=0.00, real=0.00 secs] -- 三个耗时分别是用户态耗时、内核态耗时和总耗时

3.4.2 minor gc

紧接着，我们的程序重复着上述过程，分配、young gc。

最后，程序进行了一次完整的 CMS minor GC：

2021-02-06T12:42:47.120-0800: [GC (CMS Initial Mark) [1 CMS-initial-mark: 11564K(20480K)] 15665K(38912K), 0.0001875 secs] [Times: user=0.00 sys=0.00, real=0.00 2021-02-06T12:42:47.120-0800: [CMS-concurrent-mark-start]2021-02-06T12:42:47.125-0800: [CMS-concurrent-mark: 0.002/0.005 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.125-0800: [CMS-concurrent-preclean-start]2021-02-06T12:42:47.126-0800: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.126-0800: [CMS-concurrent-abortable-preclean-start]2021-02-06T12:42:47.126-0800: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.126-0800: [GC (CMS Final Remark) [YG occupancy: 7115 K (18432 K)]2021-02-06T12:42:47.126-0800: [Rescan (parallel) , 0.0007304 secs]2021-02-06T12:42:47.126-0800: [weak refs processing, 0.0000118 secs]2021-02-06T12:42:47.126-0800: [class unloading, 0.0007235 secs]2021-02-06T12:42:47.127-0800: [scrub symbol table, 0.0005133 secs]2021-02-06T12:42:47.128-0800: [scrub string table, 0.0001555 secs][1 CMS-remark: 15660K(20480K)] 22776K(38912K), 0.0021777 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.128-0800: [CMS-concurrent-sweep-start]2021-02-06T12:42:47.128-0800: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2021-02-06T12:42:47.128-0800: [CMS-concurrent-reset-start]2021-02-06T12:42:47.128-0800: [CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

此前的文章中介绍过 CMS 执行的七个步骤，而上述日志正好对应着七个步骤每一步的开始、结束以及运行信息：

初始标记 -- CMS Initial Mark
并发标记 -- CMS-concurrent-mark
并发预清理 -- CMS-concurrent-preclean
并发可中断预清理 -- CMS-concurrent-abortable
最终标记 -- CMS Final Remark
并发清除 -- CMS-concurrent-sweep
并发重置 -- CMS-concurrent-reset

上述大部分日志信息都是关于对应流程的耗时情况，这里我们只对最终标记流程的日志进行详细解读：

2021-02-06T12:42:47.126-0800: [GC (CMS Final Remark) [YG occupancy: 7115 K (18432 K)]2021-02-06T12:42:47.126-0800: [Rescan (parallel) , 0.0007304 secs]2021-02-06T12:42:47.126-0800: [weak refs processing, 0.0000118 secs]2021-02-06T12:42:47.126-0800: [class unloading, 0.0007235 secs]2021-02-06T12:42:47.127-0800: [scrub symbol table, 0.0005133 secs]2021-02-06T12:42:47.128-0800: [scrub string table, 0.0001555 secs][1 CMS-remark: 15660K(20480K)] 22776K(38912K), 0.0021777 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

YG occupancy: 7115 K (18432 K) -- 表示年轻代占用空间为 7115 K，年轻代总空间为 18432 K
Rescan (parallel) , 0.0007304 secs -- 老年代重新扫描耗时 0.0007304 秒
weak refs processing, 0.0000118 secs -- 弱引用处理耗时
class unloading, 0.0007235 secs -- 类加载处理耗时
scrub symbol table, 0.0005133 secs -- 符号表处理耗时
scrub string table, 0.0001555 secs -- 字符表处理耗时
1 CMS-remark: 15660K(20480K)] 22776K(38912K), 0.0021777 secs -- CMS 重新标记后，老年代占用 15660K，老年代总空间 20480K，堆占用空间 22776K，堆总空间 38912K，以及该阶段耗时

3.4.3 gc 结束后堆的空间占用情况

整个 CMS 垃圾收集器处理完成之后，接下来我们可以看到堆的占用情况：

Heap par new generation total 18432K, used 7115K [0x00000007bd800000, 0x00000007bec00000, 0x00000007bec00000) eden space 16384K, 43% used [0x00000007bd800000, 0x00000007bdef1db0, 0x00000007be800000) from space 2048K, 0% used [0x00000007bea00000, 0x00000007bea01020, 0x00000007bec00000) to space 2048K, 0% used [0x00000007be800000, 0x00000007be800000, 0x00000007bea00000) concurrent mark-sweep generation total 20480K, used 11562K [0x00000007bec00000, 0x00000007c0000000, 0x00000007c0000000) Metaspace used 7608K, capacity 7782K, committed 7936K, reserved 1056768K class space used 877K, capacity 908K, committed 1024K, reserved 1048576K

它包含了新生代、老年代、元空间的占用情况，信息非常详尽。