起因
最近排查一个历史老问题,历史问题就是大家都知道,但是没人愿意处理。
线上节点 16 和 32 G 内存频繁的出现GC,并且由于 GC 还导至两个问题,1.CPU使用率在GC时变高,2.业务出现异常。
这个问题变得无法忍耐了,跟踪程序线程使用,CMS GC 线程CPU 使用在初始化时都 11-99% 之间,严重影响业务处理。
通过 jstat
观察,FullGC 不多,但是 YoungGC 非常频繁,YoungGC 多不影响服务性能吗?也影响。
计算一个服务GC的好坏,不是看 FullGC 多少,而是看整体服务的吞吐量。
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 16840.1 0.0 545344.0 115481.1 6658496.0 4835834.6 63280.0 60712.9 7084.0 6615.1 1930261 53981.101 4804 1248.371 55229.471
68096.0 68096.0 0.0 17347.7 545344.0 12072.1 6658496.0 4836565.4 63280.0 60712.9 7084.0 6615.1 1930262 53981.125 4804 1248.371 55229.496
68096.0 68096.0 0.0 12854.1 545344.0 252246.5 6658496.0 4838818.8 63280.0 60712.9 7084.0 6615.1 1930264 53981.168 4804 1248.371 55229.539
68096.0 68096.0 19501.0 0.0 545344.0 187933.5 6658496.0 4838869.4 63280.0 60712.9 7084.0 6615.1 1930265 53981.189 4804 1248.371 55229.559
68096.0 68096.0 19501.0 0.0 545344.0 445775.5 6658496.0 4838869.4 63280.0 60712.9 7084.0 6615.1 1930265 53981.189 4804 1248.371 55229.559
68096.0 68096.0 11858.3 0.0 545344.0 515043.9 6658496.0 4845614.9 63280.0 60712.9 7084.0 6615.1 1930267 53981.239 4804 1248.371 55229.610
68096.0 68096.0 0.0 16469.0 545344.0 361227.7 6658496.0 4846356.5 63280.0 60712.9 7084.0 6615.1 1930268 53981.263 4804 1248.371 55229.634
68096.0 68096.0 0.0 20075.6 545344.0 226522.4 6658496.0 4847100.6 63280.0 60712.9 7084.0 6615.1 1930270 53981.316 4804 1248.371 55229.687
68096.0 68096.0 19591.2 0.0 545344.0 325872.6 6658496.0 4852312.8 63280.0 60712.9 7084.0 6615.1 1930271 53981.344 4804 1248.371 55229.715
68096.0 68096.0 0.0 19282.2 545344.0 206195.9 6658496.0 4853497.6 63280.0 60712.9 7084.0 6615.1 1930272 53981.368 4804 1248.371 55229.739
68096.0 68096.0 0.0 16319.6 545344.0 520305.2 6658496.0 4858184.1 63280.0 60712.9 7084.0 6615.1 1930274 53981.414 4804 1248.371 55229.785
68096.0 68096.0 19240.3 0.0 545344.0 349365.5 6658496.0 4859016.3 63280.0 60712.9 7084.0 6615.1 1930275 53981.445 4804 1248.371 55229.815
计算吞吐量
排查
先排除环境原因,从开发环境到线上环境发现情况是相同的,可以初步认定是一个普遍问题。
定位 16G 内存服务情况
先从 16G 内存开始,收集一些基本信息,再观察运行情况。
代码语言:javascript复制java -Xms9g -Xmx9g
-XX: UseConcMarkSweepGC
-XX: PrintGCDetails -Xloggc:./gc.log
-XX: PrintGCDateStamps
-XX: CMSParallelRemarkEnabled
-XX:ReservedCodeCacheSize=256m
-XX: CMSScavengeBeforeRemark
-XX: HeapDumpOnOutOfMemoryError
-XX:MaxDirectMemorySize=1g
-jar Node.jar
-c config.conf
从这上面能看到关于 CMS 关相的几个参数。
代码语言:javascript复制-XX: UseConcMarkSweepGC
-XX: CMSParallelRemarkEnabled
-XX: CMSScavengeBeforeRemark
内存使用情况
代码语言:javascript复制Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 9663676416 (9216.0MB)
NewSize = 697892864 (665.5625MB)
MaxNewSize = 697892864 (665.5625MB)
OldSize = 8965783552 (8550.4375MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden 1 Survivor Space):
capacity = 628162560 (599.0625MB)
used = 511774816 (488.0664978027344MB)
free = 116387744 (110.99600219726562MB)
81.47171585648148% used
Eden Space:
capacity = 558432256 (532.5625MB)
used = 487164440 (464.5962142944336MB)
free = 71267816 (67.9662857055664MB)
87.23787617311275% used
From Space:
capacity = 69730304 (66.5MB)
used = 24610376 (23.47028350830078MB)
free = 45119928 (43.02971649169922MB)
35.29365941097862% used
To Space:
capacity = 69730304 (66.5MB)
used = 0 (0.0MB)
free = 69730304 (66.5MB)
0.0% used
concurrent mark-sweep generation:
capacity = 8965783552 (8550.4375MB)
used = 4737287088 (4517.829025268555MB)
free = 4228496464 (4032.6084747314453MB)
52.837401890471156% used
20983 interned Strings occupying 2039024 bytes.
这里看出一个问题
代码语言:javascript复制Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 9663676416 (9216.0MB)
NewSize = 697892864 (665.5625MB)
MaxNewSize = 697892864 (665.5625MB)
OldSize = 8965783552 (8550.4375MB)
NewRatio = 2
NewRatio=2 但是实际新生代获得的内存空间只有 665.5625MB,这就有点奇怪了,不是说好的老年代:新生代 2 : 1 吗。 这是一个问题,另外从当前的 GC 回收时间推算出,当前吞吐量为: 96%,还没有达到最优的情况。
修复新生代比例大小
添加启动参数 -XX:NewRatio=2,完整启动命令:
代码语言:javascript复制java -Xms9g -Xmx9g
-XX:NewRatio=2
-XX: UseConcMarkSweepGC
-XX: PrintGCDetails -Xloggc:./gc.log
-XX: PrintGCDateStamps
-XX: CMSParallelRemarkEnabled
-XX:ReservedCodeCacheSize=256m
-XX: CMSScavengeBeforeRemark
-XX: HeapDumpOnOutOfMemoryError
-XX:MaxDirectMemorySize=1g
-jar Node.jar
-c config.conf
通过 jstat -gc pid 1000
观察一下 GC 情况
添加 -XX:NewRatio=2
启动参数的机器情况
jmap -heap 查看一下调整后的新生代占比,现在 NewSize=3072.0MB
OldSize=6144.0MB
这时比例生效了。
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 9663676416 (9216.0MB)
NewSize = 3221225472 (3072.0MB)
MaxNewSize = 3221225472 (3072.0MB)
OldSize = 6442450944 (6144.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden 1 Survivor Space):
capacity = 2899116032 (2764.8125MB)
used = 2516795744 (2400.2034606933594MB)
free = 382320288 (364.6090393066406MB)
86.81252203154317% used
Eden Space:
capacity = 2577006592 (2457.625MB)
used = 2194686304 (2093.0159606933594MB)
free = 382320288 (364.6090393066406MB)
85.16417112836008% used
From Space:
capacity = 322109440 (307.1875MB)
used = 322109440 (307.1875MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 322109440 (307.1875MB)
used = 0 (0.0MB)
free = 322109440 (307.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 6442450944 (6144.0MB)
used = 525052336 (500.7289276123047MB)
free = 5917398608 (5643.271072387695MB)
8.149884889523188% used
13727 interned Strings occupying 1224528 bytes.
代码语言:javascript复制S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
314560.0 314560.0 0.0 85539.9 2516608.0 1975197.4 6291456.0 4613067.7 63012.0 59752.3 7024.0 6520.3 9317 494.820 222 22.566 517.386
314560.0 314560.0 0.0 85539.9 2516608.0 2228635.2 6291456.0 4613067.7 63012.0 59752.3 7024.0 6520.3 9317 494.820 222 22.566 517.386
314560.0 314560.0 87689.1 0.0 2516608.0 681984.2 6291456.0 4621812.0 63012.0 59752.3 7024.0 6520.3 9318 494.875 222 22.566 517.441
314560.0 314560.0 87689.1 0.0 2516608.0 1965292.1 6291456.0 4621812.0 63012.0 59752.3 7024.0 6520.3 9318 494.875 222 22.566 517.441
314560.0 314560.0 0.0 95789.9 2516608.0 145469.7 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494
314560.0 314560.0 0.0 95789.9 2516608.0 1741481.8 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494
314560.0 314560.0 0.0 95789.9 2516608.0 1978876.4 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494
314560.0 314560.0 74735.8 0.0 2516608.0 306828.0 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549
314560.0 314560.0 74735.8 0.0 2516608.0 1513196.3 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549
314560.0 314560.0 74735.8 0.0 2516608.0 1941220.4 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549
314560.0 314560.0 74735.8 0.0 2516608.0 2415940.5 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549
对比存在问题的情况
代码语言:javascript复制S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 0.0 16853.2 545344.0 371165.3 8755648.0 5890791.6 62312.0 59746.5 7076.0 6608.5 43879 1312.752 60 5.206 1317.958
68096.0 68096.0 19377.9 0.0 545344.0 420827.3 8755648.0 5891528.9 62312.0 59746.5 7076.0 6608.5 43880 1312.785 60 5.206 1317.991
68096.0 68096.0 0.0 22147.2 545344.0 295435.0 8755648.0 5891577.4 62312.0 59746.5 7076.0 6608.5 43881 1312.816 60 5.206 1318.022
68096.0 68096.0 21257.5 0.0 545344.0 461020.2 8755648.0 5898299.9 62312.0 59746.5 7076.0 6608.5 43882 1312.850 60 5.206 1318.056
68096.0 68096.0 0.0 13895.9 545344.0 419534.8 8755648.0 5901746.3 62312.0 59746.5 7076.0 6608.5 43885 1312.961 60 5.206 1318.167
68096.0 68096.0 0.0 17542.7 545344.0 63902.2 8755648.0 5906856.1 62312.0 59746.5 7076.0 6608.5 43887 1313.028 60 5.206 1318.234
68096.0 68096.0 0.0 17542.7 545344.0 444878.0 8755648.0 5906856.1 62312.0 59746.5 7076.0 6608.5 43887 1313.028 60 5.206 1318.234
68096.0 68096.0 15241.8 0.0 545344.0 403475.0 8755648.0 5909706.0 62312.0 59746.5 7076.0 6608.5 43890 1313.108 60 5.206 1318.314
68096.0 68096.0 24031.2 0.0 545344.0 52019.0 8755648.0 5916571.4 62312.0 59746.5 7076.0 6608.5 43892 1313.174 60 5.206 1318.380
68096.0 68096.0 17381.4 0.0 545344.0 349457.1 8755648.0 5919082.7 62312.0 59746.5 7076.0 6608.5 43894 1313.232 60 5.206 1318.438
68096.0 68096.0 0.0 20887.7 545344.0 442091.1 8755648.0 5919151.3 62312.0 59746.5 7076.0 6608.5 43895 1313.261 60 5.206 1318.467
这两组数据能看出什么?
YGC : 新生代垃圾回收次数 YGCT : 新生代垃圾回收时间 FGC : 老年代垃圾回收次数 FGCT : 老年代垃圾回收时间 GCT : 垃圾回收总消耗时间
对比两组GC回收时间数据
- YGC
9320 : 43895
,次数减少 4.7倍 - YGCT
494.982 : 1313.261
, 减少 2.6倍 - GCT
517.549: 1318.467
,减少 2.5 倍 - FGC
222 : 60
却增加了 3.7 倍的次数
这有点出乎意料,如果新生代空间够大之后,对象有足够的新生代空间可以分配对象,保证尽可能多的对象在新生代中被回收,熬不到老年代。