日常运维｜OGG 的参数模版使用ANTLR4解析（二）

1 背景

上一篇定义了正在运行程序暴露出来的错误，这一篇具体来说一下解决思路以及具体的解决方案。

回顾下上一篇中出现的问题，在使用ANTLR4来解析OGG的参数文件时，还有一个问题就是OGG的任务没有解析出来。这一篇也来说一下这个问题。传送门日常运维｜OGG 查询 Lag Chkpt／Time Chkpt（一）

2 OGG任务采集模版文件解析错误排查

2.1-Q：解析特殊字符错误

代码语言：javascript复制

line 38524:33 token recognition error at: '#'
line 38526:26 token recognition error at: '#'
line 38534:35 token recognition error at: '#'
line 38548:31 token recognition error at: '#'
line 38551:38 token recognition error at: '#'
line 38557:29 token recognition error at: '#'
line 38560:35 token recognition error at: '#'
line 38564:25 token recognition error at: '#'
line 38577:33 token recognition error at: '#'
line 38589:25 token recognition error at: '#'
line 38599:39 token recognition error at: '#'
line 38602:36 token recognition error at: '#'
line 38603:29 token recognition error at: '#'
line 38634:30 token recognition error at: '#'
line 38660:34 token recognition error at: '#'
line 38673:36 token recognition error at: '#'
line 38675:31 token recognition error at: '#'
line 38689:40 token recognition error at: '#'
line 38704:32 token recognition error at: '#'
line 38715:39 token recognition error at: '#'
line 38721:33 token recognition error at: '#'
line 38743:40 token recognition error at: '#'
line 38751:29 token recognition error at: '#'
line 38754:38 token recognition error at: '#'

解决方案：

由原来的antlr-4.7.2-runtime.jar升级到antlr4-4.9.1.jar，并在语言解析器模版增加#标识，由于原来的解析模版并没有增加这个字符的解析。重新生成需要的可以执行的代码片段。

2.2-Q：堆栈溢出错误

代码语言：javascript复制

Exception in thread "main" java.lang.StackOverflowError

3 解决方案

增加程序运行时的内存池内存。后面看了一下这个需要解析的文件的大小1.9M，存储数据量最大的table的条数是3.9W。经典的配置方案：-Xmn2g -Xms3550m -Xmx3550m -Xss16m。

具体的启动内存，依据机器情况，酌情增加。

需要解析文件的行记录数量如下：

当然我们在程序的单元测试中可以这样子来增加我们的VM参数

在IDE的默认参数设置上，可以查看下自己IDEA的VM参数设定

4 JVM相关

JVM默认情况下，年轻代初始分配建议保持在整个堆大小的一半到四分之一之间；初始（和最小）分配内存为物理内存的1/64；最大分配的内存（内存池）为物理内存的1/4；线程堆栈大小取决于平台架构，例如32位320KB，64位1M。初始分配内存和最大分配的物理内存可以设置相同，避免每次垃圾回收完成后JVM重新分配内存。

为了查看程序在执行过程中内存的具体执行情况，我想到了打印日志。在GC时打印详细日志，可以加入命令参数：-XX: PrintGCDetails，但是在Java8的文档中却看到如下解释（虽然过期了，但是还可以用）：

代码语言：javascript复制

-XX: PrintGCDetails
        Enables printing of detailed messages at every GC. By default, this option is disabled.

取而代之的是另外一个命令-Xloggc:filename

代码语言：javascript复制

-Xloggc:filename
    Sets the file to which verbose GC events information should be redirected for logging. The information written to this file is similar to the output of -verbose:gc with the time elapsed since the first GC event preceding each logged event. The -Xloggc option overrides -verbose:gc if both are given with the same java command.

    Example:

    -Xloggc:garbage-collection.log

经典的配置方案为-XX: PrintGCDetails -Xmn2g -Xms3550m -Xmx3550m -Xss16m。

最后执行的日志信息如下：

代码语言：javascript复制

[GC (Allocation Failure) [PSYoungGen: 1572864K->16169K(1835008K)] 1572864K->16177K(3373056K), 0.0335576 secs] [Times: user=0.28 sys=0.02, real=0.03 secs] 
……
[54.000s][info   ][gc,heap,exit ] Heap
[54.000s][info   ][gc,heap,exit ]  garbage-first heap   total 3635200K, used 1553408K [0x0000000722200000, 0x0000000800000000)
[54.000s][info   ][gc,heap,exit ]   region size 1024K, 1504 young (1540096K), 0 survivors (0K)
[54.000s][info   ][gc,heap,exit ]  Metaspace       used 7719K, capacity 7814K, committed 7936K, reserved 1056768K
[54.000s][info   ][gc,heap,exit ]   class space    used 683K, capacity 726K, committed 768K, reserved 1048576K

5 总结

有些问题不一定是技术问题，有可能是配置问题，所以要针对问题进行分析，冷静处理。

首先需要明确问题的定义和性质，了解问题的背景和相关因素，以便更好地理解问题的本质。

其次收集与问题相关的信息，包括但不限于相关人员、事件、时间、地点、原因等，以及任何可能有助于解决问题的证据和数据。

再者，对问题进行深入分析，识别问题的根源和关键因素，确定问题的核心所在。

最后在问题解决后，需要跟踪问题的解决效果，评估解决方案的实际效果，以便总结经验教训，并为类似问题的解决提供参考。

6 环境以及参考资料

6.1 当前环境

Idea 2020.3.2
JDK 1.8.0.202（也专门测试了下JDK 11.0.2）
CPU 16G
处理器 2.6 GHz 六核Intel Core i7

6.2 参考文献

https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html

我正在参与2023腾讯技术创作特训营第三期有奖征文，组队打卡瓜分大奖！

2023腾讯·技术创作特训营第三期OGG ANTLR4 日常运维后端开发者

0 人点赞