PostgreSQL 如何通过工具来分析PG 内存泄露

2024-06-04 18:02:46 浏览数 (1)

PostgreSQL在运行中,是否可以在运行中去分析内存的分配,通过内存的分配来分析PG的一些原理和工作状态,答案是可以的,我们可以通过工具valgrind 工具来进行相关的深层次的PostgreSQL 的内存分配的分析。

在操作这个部分之前,我们需要注意几点

1 通过valgrind 软件来监控PostgreSQL的数据库中的部分必须是PostgreSQL 的重新带有特殊编译参数的POSTGRESQL ,rpm安装的也可,但这篇里面不说明对于那样安装的POSTGRESQL 怎么监控。

2 在生产环境中,不能这样安装和部署POSTGRESQL ,这样的方式仅仅限于分析问题和学习研究使用,严禁在生产中使用。

编译参数

在对PostgreSQL 进行编译的情况下,要加上这两个参数,否则命令无法获得相关的内存的信息。

./configure --enable-debug --enable-cassert

在编译后,直接初始化数据库

同时需要编译 valgrind 软件,在数据库所在的系统上

通过下面的命令来启动,数据库产品,启动后,后续数据库的一些的操作所涉及的内存方面的分配和工作都会在日志里面进行展示。

valgrind --leak-check=full --track-origins=yes --log-file=/home/postgres/valgrind.log --trace-children=yes /usr/local/postgres/bin/pg_ctl -D /pgdata/data start

比如下面这段就是

代码语言:javascript复制
==37794== 584 bytes in 1 blocks are possibly lost in loss record 16 of 82
==37794==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37794==    by 0x56027C: RegisterBackgroundWorker (bgworker.c:929)
==37794==    by 0x580E8B: ApplyLauncherRegister (launcher.c:923)
==37794==    by 0x56811F: PostmasterMain (postmaster.c:1003)
==37794==    by 0x26F2D0: main (main.c:198)
==37794== 
==37794== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 82
==37794==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37794==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==37794==    by 0x26F101: main (main.c:91)
==37794== 
==37794== LEAK SUMMARY:
==37794==    definitely lost: 245 bytes in 2 blocks
==37794==    indirectly lost: 2,434 bytes in 28 blocks
==37794==      possibly lost: 624 bytes in 2 blocks
==37794==    still reachable: 1,229,816 bytes in 204 blocks
==37794==         suppressed: 0 bytes in 0 blocks
==37794== Reachable blocks (those to which a pointer was found) are not shown.
==37794== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37794== 
==37794== For lists of detected and suppressed errors, rerun with: -s
==37794== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
==37795== 
==37795== HEAP SUMMARY:
==37795==     in use at exit: 1,294,559 bytes in 240 blocks
==37795==   total heap usage: 7,193 allocs, 6,953 frees, 3,359,897 bytes allocated
==37795== 
==37795== 13 bytes in 1 blocks are definitely lost in loss record 6 of 84
==37795==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37795==    by 0x570258E: strdup (strdup.c:42)
==37795==    by 0x567D71: PostmasterMain (postmaster.c:737)
==37795==    by 0x26F2D0: main (main.c:198)
==37795== 
==37795== 40 bytes in 1 blocks are possibly lost in loss record 10 of 84
==37795==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37795==    by 0x565463: assign_backendlist_entry (postmaster.c:5864)
==37795==    by 0x565463: do_start_bgworker (postmaster.c:5715)
==37795==    by 0x565463: maybe_start_bgworkers (postmaster.c:5989)
==37795==    by 0x56626C: process_pm_child_exit (postmaster.c:3106)
==37795==    by 0x56626C: ServerLoop (postmaster.c:1771)
==37795==    by 0x5687D5: PostmasterMain (postmaster.c:1466)
==37795==    by 0x26F2D0: main (main.c:198)
==37795== 
==37795== 584 bytes in 1 blocks are possibly lost in loss record 16 of 84
==37795==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37795==    by 0x56027C: RegisterBackgroundWorker (bgworker.c:929)
==37795==    by 0x580E8B: ApplyLauncherRegister (launcher.c:923)
==37795==    by 0x56811F: PostmasterMain (postmaster.c:1003)
==37795==    by 0x26F2D0: main (main.c:198)
==37795== 
==37795== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 34 of 84
==37795==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37795==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==37795==    by 0x26F101: main (main.c:91)
==37795== 
==37795== LEAK SUMMARY:
==37795==    definitely lost: 245 bytes in 2 blocks
==37795==    indirectly lost: 2,434 bytes in 28 blocks
==37795==      possibly lost: 624 bytes in 2 blocks
==37795==    still reachable: 1,291,256 bytes in 208 blocks
==37795==         suppressed: 0 bytes in 0 blocks
==37795== Reachable blocks (those to which a pointer was found) are not shown.
==37795== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37795== 
==37795== For lists of detected and suppressed errors, rerun with: -s
==37795== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
tail: valgrind.log: file truncated
==37796== Memcheck, a memory error detector
==37796== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==37796== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==37796== Command: /bin/sh -c test ! -f /pgdata/archive/000000010000000000000001 && cp pg_wal/000000010000000000000001 /pgdata/archive/000000010000000000000001
==37796== Parent PID: 37524
==37796== 
==37796== 
==37796== HEAP SUMMARY:
==37796==     in use at exit: 1,114 bytes in 36 blocks
==37796==   total heap usage: 40 allocs, 4 frees, 5,874 bytes allocated
==37796== 
==37796== LEAK SUMMARY:
==37796==    definitely lost: 0 bytes in 0 blocks
==37796==    indirectly lost: 0 bytes in 0 blocks
==37796==      possibly lost: 0 bytes in 0 blocks
==37796==    still reachable: 1,114 bytes in 36 blocks
==37796==         suppressed: 0 bytes in 0 blocks
==37796== Reachable blocks (those to which a pointer was found) are not shown.
==37796== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37796== 
==37796== For lists of detected and suppressed errors, rerun with: -s
==37796== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
tail: valgrind.log: file truncated
==37797== Memcheck, a memory error detector
==37797== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==37797== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==37797== Command: /bin/sh -c test ! -f /pgdata/archive/000000010000000000000001 && cp pg_wal/000000010000000000000001 /pgdata/archive/000000010000000000000001
==37797== Parent PID: 37524
==37797== 
==37797== 
==37797== HEAP SUMMARY:
==37797==     in use at exit: 1,114 bytes in 36 blocks
==37797==   total heap usage: 40 allocs, 4 frees, 5,874 bytes allocated
==37797== 
==37797== LEAK SUMMARY:
==37797==    definitely lost: 0 bytes in 0 blocks
==37797==    indirectly lost: 0 bytes in 0 blocks
==37797==      possibly lost: 0 bytes in 0 blocks
==37797==    still reachable: 1,114 bytes in 36 blocks
==37797==         suppressed: 0 bytes in 0 blocks
==37797== Reachable blocks (those to which a pointer was found) are not shown.
==37797== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37797== 
==37797== For lists of detected and suppressed errors, rerun with: -s
==37797== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
tail: valgrind.log: file truncated
==37798== Memcheck, a memory error detector
==37798== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==37798== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==37798== Command: /bin/sh -c test ! -f /pgdata/archive/000000010000000000000001 && cp pg_wal/000000010000000000000001 /pgdata/archive/000000010000000000000001
==37798== Parent PID: 37524
==37798== 
==37798== 
==37798== HEAP SUMMARY:
==37798==     in use at exit: 1,114 bytes in 36 blocks
==37798==   total heap usage: 40 allocs, 4 frees, 5,874 bytes allocated
==37798== 
==37798== LEAK SUMMARY:
==37798==    definitely lost: 0 bytes in 0 blocks
==37798==    indirectly lost: 0 bytes in 0 blocks
==37798==      possibly lost: 0 bytes in 0 blocks
==37798==    still reachable: 1,114 bytes in 36 blocks
==37798==         suppressed: 0 bytes in 0 blocks
==37798== Reachable blocks (those to which a pointer was found) are not shown.
==37798== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37798== 
==37798== For lists of detected and suppressed errors, rerun with: -s
==37798== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==37799== 
==37799== HEAP SUMMARY:
==37799==     in use at exit: 1,233,119 bytes in 236 blocks
==37799==   total heap usage: 7,191 allocs, 6,955 frees, 3,325,897 bytes allocated
==37799== 
==37799== 13 bytes in 1 blocks are definitely lost in loss record 6 of 82
==37799==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37799==    by 0x570258E: strdup (strdup.c:42)
==37799==    by 0x567D71: PostmasterMain (postmaster.c:737)
==37799==    by 0x26F2D0: main (main.c:198)
==37799== 
==37799== 40 bytes in 1 blocks are possibly lost in loss record 10 of 82
==37799==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37799==    by 0x565463: assign_backendlist_entry (postmaster.c:5864)
==37799==    by 0x565463: do_start_bgworker (postmaster.c:5715)
==37799==    by 0x565463: maybe_start_bgworkers (postmaster.c:5989)
==37799==    by 0x56626C: process_pm_child_exit (postmaster.c:3106)
==37799==    by 0x56626C: ServerLoop (postmaster.c:1771)
==37799==    by 0x5687D5: PostmasterMain (postmaster.c:1466)
==37799==    by 0x26F2D0: main (main.c:198)
==37799== 
==37799== 584 bytes in 1 blocks are possibly lost in loss record 16 of 82
==37799==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37799==    by 0x56027C: RegisterBackgroundWorker (bgworker.c:929)
==37799==    by 0x580E8B: ApplyLauncherRegister (launcher.c:923)
==37799==    by 0x56811F: PostmasterMain (postmaster.c:1003)
==37799==    by 0x26F2D0: main (main.c:198)
==37799== 
==37799== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 82
==37799==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37799==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==37799==    by 0x26F101: main (main.c:91)
==37799== 
==37799== LEAK SUMMARY:
==37799==    definitely lost: 245 bytes in 2 blocks
==37799==    indirectly lost: 2,434 bytes in 28 blocks
==37799==      possibly lost: 624 bytes in 2 blocks
==37799==    still reachable: 1,229,816 bytes in 204 blocks
==37799==         suppressed: 0 bytes in 0 blocks
==37799== Reachable blocks (those to which a pointer was found) are not shown.
==37799== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37799== 

下面我们打开一个客户进程,并且在客户得进程中创建一个数据库

代码语言:javascript复制
==37945== HEAP SUMMARY:
==37945==     in use at exit: 1,114 bytes in 36 blocks
==37945==   total heap usage: 40 allocs, 4 frees, 5,874 bytes allocated
==37945== 
==37945== LEAK SUMMARY:
==37945==    definitely lost: 0 bytes in 0 blocks
==37945==    indirectly lost: 0 bytes in 0 blocks
==37945==      possibly lost: 0 bytes in 0 blocks
==37945==    still reachable: 1,114 bytes in 36 blocks
==37945==         suppressed: 0 bytes in 0 blocks
==37945== Reachable blocks (those to which a pointer was found) are not shown.
==37945== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37945== 
==37945== For lists of detected and suppressed errors, rerun with: -s
==37945== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
tail: valgrind.log: file truncated
==37946== Memcheck, a memory error detector
==37946== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==37946== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==37946== Command: /bin/sh -c test ! -f /pgdata/archive/000000010000000000000001 && cp pg_wal/000000010000000000000001 /pgdata/archive/000000010000000000000001
==37946== Parent PID: 37844
==37946== 
==37946== 
==37946== HEAP SUMMARY:
==37946==     in use at exit: 1,114 bytes in 36 blocks
==37946==   total heap usage: 40 allocs, 4 frees, 5,874 bytes allocated
==37946== 
==37946== LEAK SUMMARY:
==37946==    definitely lost: 0 bytes in 0 blocks
==37946==    indirectly lost: 0 bytes in 0 blocks
==37946==      possibly lost: 0 bytes in 0 blocks
==37946==    still reachable: 1,114 bytes in 36 blocks
==37946==         suppressed: 0 bytes in 0 blocks
==37946== Reachable blocks (those to which a pointer was found) are not shown.
==37946== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37946== 
==37946== For lists of detected and suppressed errors, rerun with: -s
==37946== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==37944== 13 bytes in 1 blocks are definitely lost in loss record 6 of 82
==37944==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37944==    by 0x570258E: strdup (strdup.c:42)
==37944==    by 0x567D71: PostmasterMain (postmaster.c:737)
==37944==    by 0x26F2D0: main (main.c:198)
==37944== 
==37944== 40 bytes in 1 blocks are possibly lost in loss record 10 of 82
==37944==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37944==    by 0x565463: assign_backendlist_entry (postmaster.c:5864)
==37944==    by 0x565463: do_start_bgworker (postmaster.c:5715)
==37944==    by 0x565463: maybe_start_bgworkers (postmaster.c:5989)
==37944==    by 0x56626C: process_pm_child_exit (postmaster.c:3106)
==37944==    by 0x56626C: ServerLoop (postmaster.c:1771)
==37944==    by 0x5687D5: PostmasterMain (postmaster.c:1466)
==37944==    by 0x26F2D0: main (main.c:198)
==37944== 
==37944== 584 bytes in 1 blocks are possibly lost in loss record 16 of 82
==37944==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37944==    by 0x56027C: RegisterBackgroundWorker (bgworker.c:929)
==37944==    by 0x580E8B: ApplyLauncherRegister (launcher.c:923)
==37944==    by 0x56811F: PostmasterMain (postmaster.c:1003)
==37944==    by 0x26F2D0: main (main.c:198)
==37944== 
==37944== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 82
==37944==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37944==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==37944==    by 0x26F101: main (main.c:91)
==37944== 
==37944== LEAK SUMMARY:
==37944==    definitely lost: 245 bytes in 2 blocks
==37944==    indirectly lost: 2,434 bytes in 28 blocks
==37944==      possibly lost: 624 bytes in 2 blocks
==37944==    still reachable: 1,229,816 bytes in 204 blocks
==37944==         suppressed: 0 bytes in 0 blocks
==37944== Reachable blocks (those to which a pointer was found) are not shown.
==37944== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37944== 
==37944== For lists of detected and suppressed errors, rerun with: -s
==37944== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
==37947== 
==37947== HEAP SUMMARY:
==37947==     in use at exit: 1,294,559 bytes in 240 blocks
==37947==   total heap usage: 7,137 allocs, 6,897 frees, 3,358,113 bytes allocated
==37947== 
==37947== 13 bytes in 1 blocks are definitely lost in loss record 6 of 84
==37947==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37947==    by 0x570258E: strdup (strdup.c:42)
==37947==    by 0x567D71: PostmasterMain (postmaster.c:737)
==37947==    by 0x26F2D0: main (main.c:198)
==37947== 
==37947== 40 bytes in 1 blocks are possibly lost in loss record 10 of 84
==37947==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37947==    by 0x565463: assign_backendlist_entry (postmaster.c:5864)
==37947==    by 0x565463: do_start_bgworker (postmaster.c:5715)
==37947==    by 0x565463: maybe_start_bgworkers (postmaster.c:5989)
==37947==    by 0x56626C: process_pm_child_exit (postmaster.c:3106)
==37947==    by 0x56626C: ServerLoop (postmaster.c:1771)
==37947==    by 0x5687D5: PostmasterMain (postmaster.c:1466)
==37947==    by 0x26F2D0: main (main.c:198)
==37947== 
==37947== 584 bytes in 1 blocks are possibly lost in loss record 16 of 84
==37947==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37947==    by 0x56027C: RegisterBackgroundWorker (bgworker.c:929)
==37947==    by 0x580E8B: ApplyLauncherRegister (launcher.c:923)
==37947==    by 0x56811F: PostmasterMain (postmaster.c:1003)
==37947==    by 0x26F2D0: main (main.c:198)
==37947== 
==37947== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 34 of 84
==37947==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==37947==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==37947==    by 0x26F101: main (main.c:91)
==37947== 
==37947== LEAK SUMMARY:
==37947==    definitely lost: 245 bytes in 2 blocks
==37947==    indirectly lost: 2,434 bytes in 28 blocks
==37947==      possibly lost: 624 bytes in 2 blocks
==37947==    still reachable: 1,291,256 bytes in 208 blocks
==37947==         suppressed: 0 bytes in 0 blocks
==37947== Reachable blocks (those to which a pointer was found) are not shown.
==37947== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==37947== 
==37947== For lists of detected and suppressed errors, rerun with: -s
==37947== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)


产生以上相关的数据库内存方面的活动,但这样是针对整体的数据库的内存方面进行监控,如果需要对一个客户的链接进行具体的使用内存的监控,需要通过命令单独启动一个客户访问进程来进行数据库的访问

valgrind --leak-check=full --track-origins=yes --log-file=/home/postgres/valgrind_client.log /usr/local/postgres/bin/psql

代码语言:javascript复制
postgres=# exit 
postgres@pg16:~$ cd /home/postgres/
postgres@pg16:~$ ls
data.backup  valgrind_client.log  valgrind.log
postgres@pg16:~$ cat valgrind_client.log 
==38001== Memcheck, a memory error detector
==38001== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==38001== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==38001== Command: /usr/local/postgres/bin/psql
==38001== Parent PID: 37352
==38001== 
postgres@pg16:~$ tail -f  valgrind_client.log 
==38001== Memcheck, a memory error detector
==38001== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==38001== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==38001== Command: /usr/local/postgres/bin/psql
==38001== Parent PID: 37352
==38001== 


但在观察中,发现即使单独开了相关的监控客户端的进程,实际的内存分配还是在主进程中进行的。

代码语言:javascript复制
==38568== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 37 of 78
==38568==    at 0x484880F: malloc (vg_replace_malloc.c:446)
==38568==    by 0x761C5A: save_ps_display_args (ps_status.c:165)
==38568==    by 0x26F101: main (main.c:91)
==38568== 
==38568== LEAK SUMMARY:
==38568==    definitely lost: 245 bytes in 2 blocks
==38568==    indirectly lost: 2,434 bytes in 28 blocks
==38568==      possibly lost: 704 bytes in 4 blocks
==38568==    still reachable: 1,165,693 bytes in 197 blocks
==38568==         suppressed: 0 bytes in 0 blocks
==38568== Reachable blocks (those to which a pointer was found) are not shown.
==38568== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==38568== 
==38568== For lists of detected and suppressed errors, rerun with: -s
==38568== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)
==38571== 
==38571== HEAP SUMMARY:
==38571==     in use at exit: 1,294,559 bytes in 240 blocks
==38571==   total heap usage: 7,202 allocs, 6,962 frees, 3,362,817 bytes allocated
==38571== 

咱们以这段操作中获得代码来说,这段就是你开启一个psql后产生的内存日志的内容,我们可以分析出几个问题

==38568== 2,666 (232 direct, 2,434 indirect) bytes in 1 blocks are definitely lost in loss record 37 of 78

比如如definiely lost 这个部分的意思是内存存在泄露,也就是没有任何的指针指向这些内存块,他们是无法被访问的。

==38568== at 0x484880F: malloc (vg_replace_malloc.c:446)

==38568== by 0x761C5A: save_ps_display_args (ps_status.c:165==38568== by 0x26F101: main (main.c:91)

上面的部分,告知了在ps_status.c 的部分中的函数pg_display_args 发生了内存的泄露。

==38568== LEAK SUMMARY: ==38568== definitely lost: 245 bytes in 2 blocks

这里给出一些valgrind 的使用技巧,在分析大型应用的时候可以通过添加参数 --smc-check=all 来减少内存的开销,同时如果要经常使用valgrind 则可以通过 export VALGRIND_OPTS="--leak-check=full --track-origins=yes" 将常用的选择项封装后,直接在下次使用valgrind 的时候不用在写选择项等.

0 人点赞