OB 运维 | 存储过程慢?可能是 plan cache 失效了

2024-09-19 19:17:01 浏览数 (3)

作者:程柳润,爱可生 DBA 团队成员,OceanBase 数据库技术爱好者。

爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

本文约 1000 字,预计阅读需要 3 分钟。


1问题描述

业务反馈一个存储过程执行慢,需要排查原因。由于存储过程过长且涉及多张表,所以主要分享排查思路。

2分析过程

1. 查找该存储过程的 trace_id

每次执行存储过程后会有两条记录,两条记录 SQL 的请求类型不同,先产生的记录是预编译的记录(执行耗时很小),另一个是实际执行的记录。

代码语言:javascript复制
obclient [oceanbase]> select usec_to_time(request_time),svr_ip,trace_id,query_sql,elapsed_time from gv$sql_audit where query_sql like 'call TEST_CHENG%' order by request_time;
 ---------------------------- ---------------- ----------------------------------- ------------------- -------------- -------------- 
| usec_to_time(request_time) | svr_ip         | trace_id                          | query_sql         | elapsed_time | request_type |
 ---------------------------- ---------------- ----------------------------------- ------------------- -------------- -------------- 
| 2023-12-16 18:20:31.741933 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149FDC35FD-0-0 | call TEST_CHENG() |           73 |             5|
| 2023-12-16 18:20:31.742838 | xx.xxx.xxx.xxx | YB420ABA3D1F-00060A149CFDF57C-0-0 | call TEST_CHENG() |      2362075 |             6|
 ---------------------------- ---------------- ----------------------------------- ------------------- -------------- -------------- 
2 rows in set (0.614 sec)

2. 分析存储中较慢的 SQL

根据上一步中执行较慢的 trace_id 查看该存储过程中执行较慢的 SQL。发现存储过程中的几个 UPDATE 耗时较长。另外,IS_HIT_PLAN 都为 0,表示没有命中 plan cache。然而其 GET_PLAN_TIME(获取执行计划时间)也为 0,是不正常的,这一点会在后面解释。

代码语言:javascript复制
obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by elapsed_time desc limit 6G
*************************** 1. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:31.742838
                 query_sql: call TEST_CHENG()
              elapsed_time: 2362075
              execute_time: 2362015
......
             GET_PLAN_TIME: 21
                QUEUE_TIME: 15
               IS_HIT_PLAN: 0
......
*************************** 2. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.359304
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select 
              elapsed_time: 630735
              execute_time: 630735
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 0
......
*************************** 3. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.999169
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."COUNTY" = (select 
              elapsed_time: 400111
              execute_time: 400111
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 0
......
*************************** 4. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:33.731726
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select 
              elapsed_time: 335918
              execute_time: 335918
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 0
......
*************************** 5. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:33.402608
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '0',"T"."ROAD" = (select 
              elapsed_time: 319150
              execute_time: 319150
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 0
......
*************************** 6. row ***************************
usec_to_time(request_time): 2023-12-16 18:20:32.086302
                 query_sql: truncate Table  TEST1
              elapsed_time: 142128
              execute_time: 142128
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 0
......
6 rows in set (5.342 sec)

3. 查看存储过程执行情况

发现调用该存储过程之后,对 __all_tenant_dependency 等系统视图有增删改查的操作(篇幅有限,这里只截取了一部分)。这是由于编译 PL 对象会对系统视图做一些操作,gv$sql_audit 会记录下来,可以用 tenant_id=业务租户 ID 来过滤掉这些记录。

另外,在 UPDATE 前有 TRUNCATE TEST1 的操作。该操作会导致两个问题:

  • 整个存储过程的 plan cache 无法命中,每次都要重新编译。
  • 存储过程中这张表相关的 SQL 的 plan cache 也会失效,每次执行也需要重新获取执行计划。
代码语言:javascript复制
obclient [oceanbase]> select substr(query_sql,1,60),usec_to_time(request_time),elapsed_time,svr_ip from gv$sql_audit 
where trace_id='YB420ABA3D1F-00060A149CFDF57C-0-0' order by request_time;
 --------------------------------------------------- ---------------------------- -------------- ---------------- 
| substr(query_sql,1,60)                            | usec_to_time(request_time) | elapsed_time | svr_ip         |
 --------------------------------------------------- ---------------------------- -------------- ---------------- 
| call TEST_CHENG();                                | 2023-12-16 18:50:30.086159 |      2514812 | xx.xxx.xxx.xxx |
| START TRANSACTION                                 | 2023-10-07 18:50:30.472741 |          194 | xx.xxx.xxx.xxx |
| delete FROM __all_tenant_dependency WHERE dep_obj_| 2023-10-07 18:50:30.473589 |         1570 | xx.xxx.xxx.xxx |
| INSERT INTO __all_tenant_dependency (tenant_id, de| 2023-10-07 18:50:30.476450 |          243 | xx.xxx.xxx.xxx |
|......                                                                                                                                                                                                                         |
|......                                                                                                                                                                                                                         |
| insert into "CC"."TEST5"( "LOG_DATE", "LOG_LEVEL",| 2023-12-16 18:50:30.504996 |          393 | xx.xxx.xxx.xxx |
| truncate Table  TEST1                             | 2023-12-16 18:50:30.505998 |       569828 | xx.xxx.xxx.xxx |
| select max("T"."CREATE_TIME") from "CC"."TEST6    | 2023-12-16 18:50:31.176795 |          295 | xx.xxx.xxx.xxx |
| select max("T"."CREATE_TIME") from "CC"."TEST6    | 2023-12-16 18:50:31.177108 |          180 | xx.xxx.xxx.xxx |
| insert into "CC"."TEST1"( "ID", "NUMBER           | 2023-12-16 18:50:31.178344 |         8039 | xx.xxx.xxx.xxx |
| SELECT * FROM __all_tenant_meta_table WHERE tenant| 2023-12-16 18:50:31.184392 |          328 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."ID" =            | 2023-12-16 18:50:31.187246 |         4696 | xx.xxx.xxx.xxx |
| SELECT partition_id, object_type, row_cnt as row_c| 2023-12-16 18:50:31.189366 |          181 | xx.xxx.xxx.xxx |
| SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190145 |          224 | xx.xxx.xxx.xxx |
| SELECT tenant_id, table_id, partition_id, column_i| 2023-12-16 18:50:31.190774 |          131 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.192182 |         4577 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.197606 |         3896 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.202269 |       347793 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.559553 |       338883 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:31.901602 |       311209 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:32.218374 |         6085 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."CITY" =          | 2023-12-16 18:50:32.225213 |       338044 | xx.xxx.xxx.xxx |
| update "CC"."TEST1" "T" set "T"."ORGAN_CODE" =    | 2023-12-16 18:50:32.566794 |        15760 | xx.xxx.xxx.xxx |
| insert into "CC"."TEST8"( "MANAGER_ID", "APART_ID | 2023-12-16 18:50:32.593934 |         5915 | xx.xxx.xxx.xxx |
 --------------------------------------------------- ---------------------------- -------------- ---------------- 
154 rows in set (4.125 sec)

4. 手工执行慢的几个 Update

手工执行慢的几个 update,发现命中 plan cache 时只需要几毫秒,而没有命中 plan cache 时需要 300 多毫秒。

手工执行时会命中执行计划
代码语言:javascript复制
obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_timeG
usec_to_time(request_time): 2023-12-16 19:12:30.063271
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select 
              elapsed_time: 8810
              execute_time: 8810
......
             GET_PLAN_TIME: 0
                QUEUE_TIME: 0
               IS_HIT_PLAN: 1
......
手工执行 update 前先做 truncate 操作

不会命中执行计划,此时 GET_PLAN_TIME 为 372ms。

代码语言:javascript复制
obclient [oceanbase]> select usec_to_time(request_time),query_sql,elapsed_time,execute_time,AFFECTED_ROWS,EVENT,TOTAL_WAIT_TIME_MICRO,WAIT_TIME_MICRO,TOTAL_WAITS,GET_PLAN_TIME,QUEUE_TIME,IS_HIT_PLAN,plan_id,svr_ip from gv$sql_audit where query_sql like 'update "CC"."TEST1" "T" set "T"."CITY"%' and GET_PLAN_TIME>0 order by request_timeG
*************************** 1. row ***************************
usec_to_time(request_time): 2023-12-16 19:21:53.069019
                 query_sql: update "CC"."TEST1" "T" set "T"."CITY" = '1',"T"."COUNTY" = (select 
              elapsed_time: 378148
              execute_time: 6036
......
             GET_PLAN_TIME: 372055
                QUEUE_TIME: 28
               IS_HIT_PLAN: 0
......

注意:存储过程里的 SQL,即使没有命中 plan cache,在 gv$sql_audit 中还是会显示 GET_PLAN_TIME: 0,都体现在 execute_time。正常来说(比如手工执行的 SQL),如果没有命中 plan cacheGET_PLAN_TIME 值会比较大,并且不会计算到 execute_time 里。

3结论

TRUNCATE 导致 UPDATE 无法命中 plan cache,所以存储过程慢。

4解决方案

将存储过程里的TRUNCATE 改成了 DELETE

本文关键字:#OceanBase# #存储过程# #SQL优化# #plan cache#

0 人点赞