作者:程柳润,爱可生 DBA 团队成员,OceanBase 数据库技术爱好者。
爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
本文约 1000 字,预计阅读需要 3 分钟。
1问题描述
业务反馈一个存储过程执行慢,需要排查原因。由于存储过程过长且涉及多张表,所以主要分享排查思路。
2分析过程
1. 查找该存储过程的 trace_id
每次执行存储过程后会有两条记录,两条记录 SQL 的请求类型不同,先产生的记录是预编译的记录(执行耗时很小),另一个是实际执行的记录。
代码语言:javascript复制obclient [oceanbase]> select usec_to_time(request_time),svr_ip,trace_id,query_sql,elapsed_time from gv$sql_audit where query_sql like 'call TEST_CHENG%' order by request_time;
---------------------------- ---------------- ----------------------------------- ------------------- -------------- --------------
| usec_to_time(request_time) | svr_ip | trace_id | query_sql | elapsed_time | request_type |
---------------------------- ---------------- ----------------------------------- ------------------- -------------- --------------
| 2023-12-16 18:20:31.741933 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149FDC35FD-0-0 | call TEST_CHENG() | 73 | 5|
| 2023-12-16 18:20:31.742838 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149CFDF57C-0-0 | call TEST_CHENG() | 2362075 | 6|
---------------------------- ---------------- ----------------------------------- ------------------- -------------- --------------
2 rows in set (0.614 sec)
2. 分析存储中较慢的 SQL
根据上一步中执行较慢的 trace_id
查看该存储过程中执行较慢的 SQL。发现存储过程中的几个 UPDATE
耗时较长。另外,IS_HIT_PLAN
都为 0,表示没有命中 plan cache。然而其 GET_PLAN_TIME
(获取执行计划时间)也为 0,是不正常的,这一点会在后面解释。
obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by elapsed_time desc limit 6G
*************************** 1. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:31.742838
query_sql: call TEST_CHENG()
elapsed_time: 2362075
execute_time: 2362015
......
GET_PLAN_TIME: 21
QUEUE_TIME: 15
IS_HIT_PLAN: 0
......
*************************** 2. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.359304
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select
elapsed_time: 630735
execute_time: 630735
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 0
......
*************************** 3. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.999169
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."COUNTY" = (select
elapsed_time: 400111
execute_time: 400111
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 0
......
*************************** 4. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:33.731726
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select
elapsed_time: 335918
execute_time: 335918
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 0
......
*************************** 5. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:33.402608
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select
elapsed_time: 319150
execute_time: 319150
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 0
......
*************************** 6. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.086302
query_sql: truncate Table TEST1
elapsed_time: 142128
execute_time: 142128
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 0
......
6 rows in set (5.342 sec)
3. 查看存储过程执行情况
发现调用该存储过程之后,对 __all_tenant_dependency
等系统视图有增删改查的操作(篇幅有限,这里只截取了一部分)。这是由于编译 PL 对象会对系统视图做一些操作,gv$sql_audit
会记录下来,可以用 tenant_id=业务租户 ID
来过滤掉这些记录。
另外,在 UPDATE
前有 TRUNCATE TEST1
的操作。该操作会导致两个问题:
- 整个存储过程的 plan cache 无法命中,每次都要重新编译。
- 存储过程中这张表相关的 SQL 的 plan cache 也会失效,每次执行也需要重新获取执行计划。
obclient [oceanbase]> select substr(query_sql,1,60),usec_to_time(request_time),elapsed_time,svr_ip from gv$sql_audit
where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by request_time;
--------------------------------------------------- ---------------------------- -------------- ----------------
| substr(query_sql,1,60) | usec_to_time(request_time) | elapsed_time | svr_ip |
--------------------------------------------------- ---------------------------- -------------- ----------------
| call TEST_CHENG(); | 2023-12-16 18:50:30.086159 | 2514812 | xx.xxx.xxx.xxx |
| START TRANSACTION | 2023-10-07 18:50:30.472741 | 194 | xx.xxx.xxx.xxx |
| delete FROM __all_tenant_dependency WHERE dep_obj_| 2023-10-07 18:50:30.473589 | 1570 | xx.xxx.xxx.xxx |
| INSERT INTO __all_tenant_dependency (tenant_id, de| 2023-10-07 18:50:30.476450 | 243 | xx.xxx.xxx.xxx |
|...... |
|...... |
| insert into "CC"."TEST5"( "LOG_DATE", "LOG_LEVEL",| 2023-12-16 18:50:30.504996 | 393 | xx.xxx.xxx.xxx |
| truncate Table TEST1 | 2023-12-16 18:50:30.505998 | 569828 | xx.xxx.xxx.xxx |
| select max("T"."CREATE_TIME") from "CC"."TEST6 | 2023-12-16 18:50:31.176795 | 295 | xx.xxx.xxx.xxx |
| select max("T"."CREATE_TIME") from "CC"."TEST6 | 2023-12-16 18:50:31.177108 | 180 | xx.xxx.xxx.xxx |
| insert into "CC"."TEST1"( "ID", "NUMBER | 2023-12-16 18:50:31.178344 | 8039 | xx.xxx.xxx.xxx |
| SELECT * FROM __all_tenant_meta_table WHERE tenant| 2023-12-16 18:50:31.184392 | 328 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."ID" = | 2023-12-16 18:50:31.187246 | 4696 | xx.xxx.xxx.xxx |
| SELECT partition_id, object_type, row_cnt as row_c| 2023-12-16 18:50:31.189366 | 181 | xx.xxx.xxx.xxx |
| SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190145 | 224 | xx.xxx.xxx.xxx |
| SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190774 | 131 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:31.192182 | 4577 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:31.197606 | 3896 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:31.202269 | 347793 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:31.559553 | 338883 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:31.901602 | 311209 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:32.218374 | 6085 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" = | 2023-12-16 18:50:32.225213 | 338044 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."ORGAN_CODE" = | 2023-12-16 18:50:32.566794 | 15760 | xx.xxx.xxx.xxx |
| insert into "CC"."TEST8"( "MANAGER_ID", "APART_ID | 2023-12-16 18:50:32.593934 | 5915 | xx.xxx.xxx.xxx |
--------------------------------------------------- ---------------------------- -------------- ----------------
154 rows in set (4.125 sec)
4. 手工执行慢的几个 Update
手工执行慢的几个 update,发现命中 plan cache 时只需要几毫秒,而没有命中 plan cache 时需要 300 多毫秒。
手工执行时会命中执行计划
代码语言:javascript复制obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_timeG
usec_to_time(request_time): 2023-12-16 19:12:30.063271
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select
elapsed_time: 8810
execute_time: 8810
......
GET_PLAN_TIME: 0
QUEUE_TIME: 0
IS_HIT_PLAN: 1
......
手工执行 update 前先做 truncate 操作
不会命中执行计划,此时 GET_PLAN_TIME
为 372ms。
obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_timeG
*************************** 1. row ***************************
usec_to_time(request_time): 2023-12-16 19:21:53.069019
query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select
elapsed_time: 378148
execute_time: 6036
......
GET_PLAN_TIME: 372055
QUEUE_TIME: 28
IS_HIT_PLAN: 0
......
注意:存储过程里的 SQL,即使没有命中 plan cache,在 gv$sql_audit
中还是会显示 GET_PLAN_TIME: 0
,都体现在 execute_time
。正常来说(比如手工执行的 SQL),如果没有命中 plan cache,GET_PLAN_TIME
值会比较大,并且不会计算到 execute_time
里。
3结论
TRUNCATE
导致 UPDATE
无法命中 plan cache,所以存储过程慢。
4解决方案
将存储过程里的TRUNCATE
改成了 DELETE
。
本文关键字:#OceanBase# #存储过程# #SQL优化# #plan cache#