PawSQL 索引推荐引擎是PawSQL自动化SQL优化平台的核心功能,它支持ANSI标准的DML语法以及其他SQL方言的解析,通过对SQL语句的语法分析,结合表结构定义信息及统计信息,对所有可能的语法组合能推荐出合适的索引,以提升数据库查询性能。
索引的作用
从性能收益的角度,索引的作用有如下三个方面(更多和索引相关的介绍,请参考本公众号的两篇文章数据库索引的类型和如何创建高效的索引):
1、快速定位
- 等值常量条件
- 等值关联条件
- 范围条件
2、避免排序(B Tree)
- order by
- group by
- distinct
- join planning
3、避免回表(index only)
案例解释
对于下面的SQL查询语句,基于快速定位、避免排序和避免回表的策略,分別推荐下面的三個SQL索引。
代码语言:javascript复制SELECT classid, avg(score) tscore
FROM score
WHERE subject = '语文'
GROUP BY classid
代码语言:javascript复制Index on score(subject ) - 快速定位
Index on score(subject,classid) - 快速定位 避免排序
Index on score(subject,classid,score) - 快速定位 避免排序 回表
基于SQL语法的索引推荐策略
- 等值条件
-- 单列条件
select * from lineitem where l_shipdate = date '1998-12-01';
-- 推荐索引
CREATE INDEX PAW_IDX1533504424 ON LINEITEM(L_SHIPDATE);
-- is null
select * from lineitem where l_shipmode is null;
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_SHIPMODE);
-- in 单值
select * from lineitem where l_shipmode in ('0');
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_SHIPMODE);
-- 多列条件,区分度大的列放在前面
select * from lineitem where l_shipdate = date '1998-12-01' and l_shipmode = '0';
CREATE INDEX PAW_IDX1942674217 ON LINEITEM(L_SHIPDATE,L_SHIPMODE);
-- 标量子查询
select * from lineitem where l_shipdate = (select max(l_shipdate) from lineitem);
CREATE INDEX PAW_IDX1533504424 ON LINEITEM(L_SHIPDATE);
- 范围条件
-- 范围条件>、<、>=、<=
select * from lineitem where l_shipdate >= date '1998-12-01';
CREATE INDEX PAW_IDX0156881833 ON LINEITEM(L_SHIPDATE);
-- in 多值
select * from lineitem where l_shipmode in ('0','1');
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_SHIPMODE);
-- between ... and ...
select * from lineitem where l_shipdate between date '2010-12-01' and date '2020-12-01';
CREATE INDEX PAW_IDX1241878058 ON LINEITEM(L_SHIPDATE);
-- Like 一个左前缀
select * from customer where c_phone like "139%";
CREATE INDEX PAW_IDX0326568991 ON CUSTOMER(C_PHONE);
- 分组 - 避免排序
-- grouping
select l_shipdate, count(*) as sum_qty from lineitem group by l_shipdate;
-- 推荐的索引为
CREATE INDEX PAW_IDX1614428511 ON LINEITEM(L_SHIPDATE);
- 排序 - 避免排序
-- order by
select * from lineitem order by l_shipdate limit 10;
-- 推荐的索引为
CREATE INDEX PAW_IDX1424903467 ON LINEITEM(L_SHIPDATE);
- 覆盖索引 - 避免回表
-- index only
select l_shipdate, sum(l_quantity) as sum_qty from lineitem group by l_shipdate;
-- 推荐的索引为
CREATE INDEX PAW_IDX1614428511 ON LINEITEM(L_SHIPDATE,L_QUANTITY);
- 语法组合:等值 范围,等值条件放在前面
-- equal range
select * from lineitem where l_shipdate = date '1998-12-01' and l_quantity >100;
-- 推荐的索引为
CREATE INDEX PAW_IDX2048143506 ON LINEITEM(L_SHIPDATE,L_QUANTITY);
- 语法组合:等值 分组 覆盖,等值条件放在前面,覆盖列放到最后
-- where group
select l_shipdate, sum(l_quantity) as sum_qty from lineitem where l_receiptdate = '2020-01-01' group by l_shipdate;
-- 推荐的索引为
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_RECEIPTDATE,L_SHIPDATE,l_quantity);
- 多表关联
-- 内连接,两个表都可以作为驱动表
SELECT * FROM ORDERS, LINEITEM WHERE O_ORDERKEY = L_ORDERKEY;
-- 推荐的索引为
CREATE INDEX PAW_IDX2127618499 ON ORDERS(O_ORDERKEY);
CREATE INDEX PAW_IDX0339323878 ON LINEITEM(L_ORDERKEY);
-- 内连接,两个表都可以作为驱动表
SELECT * FROM ORDERS JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY;
CREATE INDEX PAW_IDX1531629550 ON ORDERS(O_ORDERKEY);
CREATE INDEX PAW_IDX1365836084 ON LINEITEM(L_ORDERKEY);
-- 外连接,外表作为驱动表,join条件作为升级为等值条件,参与索引推荐
SELECT * FROM ORDERS LEFT JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY;
CREATE INDEX PAW_IDX1336974557 ON LINEITEM(L_ORDERKEY);
SELECT * FROM ORDERS RIGHT JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY;
CREATE INDEX PAW_IDX1002609246 ON ORDERS(O_ORDERKEY);
SELECT * FROM ORDERS LEFT JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY WHERE l_shipdate = date '1998-12-01';
CREATE INDEX PAW_IDX0711368375 ON LINEITEM(L_ORDERKEY,L_SHIPDATE);
- DT子查询
-- Derived table,不同的查询块分别进行索引推荐,然后进行合并去重
select *from supplier,(select l_suppkey, sum(l_extendedprice) as total_revenue from lineitem group by l_suppkey) revenuewhere s_suppkey = revenue.l_suppkey;
-- 推荐的索引为
CREATE INDEX PAW_IDX0151075817 ON SUPPLIER(S_SUPPKEY);
CREATE INDEX PAW_IDX1968327707 ON LINEITEM(L_SUPPKEY,L_EXTENDEDPRICE);
- 条件子查询
-- 条件子查询
select *from supplierwhere s_suppkey = (select l_suppkey from lineitem order by l_suppkey desc limit 1);
-- 推荐的索引为
CREATE INDEX PAW_IDX0664775210 ON SUPPLIER(S_SUPPKEY);
CREATE INDEX PAW_IDX1518532301 ON LINEITEM(L_SUPPKEY);
- CTE子查询
-- CTE
with revenue as (select l_suppkey, sum(l_extendedprice) as total_revenue from lineitem group by l_suppkey)select *from supplier, revenuewhere s_suppkey = l_suppkey;
-- 推荐的索引为
CREATE INDEX PAW_IDX0343576594 ON SUPPLIER(S_SUPPKEY);
CREATE INDEX PAW_IDX1518532301 ON LINEITEM(L_SUPPKEY,L_EXTENDEDPRICE);
- 传递闭包重写 - 重写后进行索引推荐
-- 传递闭包,重写后进行索引推荐
select o_custkey as cust_no, l_extendedprice * (1 - l_discount)from orders, lineitemwhere l_orderkey = o_orderkey and l_orderkey = 'ORD1234';
-- 重写为,orders表上新增一个等值条件o_orderkey = 'ORD1234'
select o_custkey as cust_no, l_extendedprice * (1 - l_discount)from orders, lineitemwhere l_orderkey = o_orderkey and l_orderkey = 'ORD1234'and o_orderkey = 'ORD1234';
-- 推荐的索引为
CREATE INDEX PAW_IDX0837835805 ON ORDERS(O_ORDERKEY,O_CUSTKEY);
CREATE INDEX PAW_IDX1989932894 ON LINEITEM(L_ORDERKEY,L_DISCOUNT,L_EXTENDEDPRICE);
- Ordinal重写 - 重写后进行索引推荐
-- orderby ordinal 重写
select L_SHIPDATE, count(*) as cnt, sum(bal) as bal from lineitem order by 1;
-- 重写后的sql为
select L_SHIPDATE, count(*) as cnt, sum(bal) as bal from lineitem order by L_SHIPDATE;
-- 推荐的索引为
CREATE INDEX PAW_IDX1424903467 ON LINEITEM(L_SHIPDATE);
-- groupby ordinal 重写
select l_shipdate, sum(l_quantity) as sum_qty from lineitem group by 1;
-- 重写后的SQL为
select l_shipdate, sum(l_quantity) as sum_qty from lineitem group by l_shipdate;
-- 推荐的索引为
CREATE INDEX PAW_IDX1614428511 ON LINEITEM(L_SHIPDATE,L_QUANTITY);
- DELETE语句
delete from lineitem where l_shipdate = date '1998-12-01';
CREATE INDEX PAW_IDX1533504424 ON LINEITEM(L_SHIPDATE);
- UPDATE语句
update lineitem set l_shipmode='' where l_shipmode is null;
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_SHIPMODE);
- INSERT语句
insert into lineitem select * from lineitem where l_shipmode in ('0');
CREATE INDEX PAW_IDX2000792795 ON LINEITEM(L_SHIPMODE);
- Merge语句
replace into lineitem select * where l_shipdate = date '1998-12-01' and l_shipmode = '0';
CREATE INDEX PAW_IDX1942674217 ON LINEITEM(L_SHIPDATE,L_SHIPMODE);
查询重写与索引推荐
PawSQL的索引推荐是基于查询块(Query Block)的,某些重写优化算法能够将索引推荐相关的部分推导或是下推至查询块中,从而让索引推荐引擎推荐出合适的索引,下面以过滤谓词下推重写为例,介绍重写优化如何影响索引的推荐。
- 譬如对于下面的SQL,
select *
from (select c_nationkey nationkey
from customer
where c_acctbal>0
group by nationkey
) as p
where p.nationkey= 100
单纯的分析查询块p
,PawSQL索引引擎推荐出的索引是(c_acctbal,c_nationkey),其只能被用来做索引范围扫描
而当开启了过滤条件下推重写优化后,推荐的索引为(c_nationkey, c_acctbal)。可以看到,此索引筛选率更高,且能够避免排序,效率更好。
类似的,能够帮助PawSQL索引引擎推荐更有效的索引的重写优化算法还包括:LIMIT子句下推重写优化,外连接转化为内连接优化,投影下推优化,SATTC优化等等。
关于PawSQL
PawSQL专注数据库性能优化的自动化和智能化,支持MySQL,PostgreSQL,Opengauss,Oracle等数据库,提供的SQL优化产品包括
- PawSQL Cloud,在线自动化SQL优化工具,支持SQL审查,智能查询重写、基于代价的索引推荐,适用于数据库管理员及数据应用开发人员,
- PawSQL Advisor,IntelliJ 插件, 适用于数据应用开发人员,可以IDEA/DataGrip应用市场通过名称搜索“PawSQL Advisor”安装。
- PawSQL Engine, 是PawSQL系列产品的后端优化引擎,可以以docker镜像的方式独立安装部署,并通过http/json的接口提供SQL优化服务。