PG基于dexter的自动化索引推荐

2024-01-07 18:39:09 浏览数 (1)

github地址: https://github.com/ankane/dexter

安装高版本ruby

代码语言:shell复制
安装scl源:
yum install  scl-utils centos-release-scl scl-utils-build


查看从 SCL 中安装的包的列表:
scl --list

列出scl源有哪些包可以用:
yum list all --enablerepo='centos-sclo-rh' | grep ruby

	
查看从 SCL 中安装的包的列表:
scl --list

安装ruby3.0版本
yum install rh-ruby30 rh-ruby30-ruby-devel

切到ruby3.0环境下
scl enable rh-ruby30 bash

# ruby --version
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]

安装pgdexter组件

代码语言:shell复制
$ scl enable rh-ruby30 bash
# export PATH=/usr/pgsql-15/bin/:/usr/pgsql-15/lib/:$PATH
# export https_proxy=http://192.168.31.1:7890 && export http_proxy=http://192.168.31.1:7890
# gem install pgdexter
Successfully installed google-protobuf-3.25.1-x86_64-linux
Building native extensions. This could take a while...
Successfully installed pg-1.5.4
Fetching pgdexter-0.5.1.gem
Successfully installed pgdexter-0.5.1
Parsing documentation for google-protobuf-3.25.1-x86_64-linux
Installing ri documentation for google-protobuf-3.25.1-x86_64-linux
Parsing documentation for pg-1.5.4
Installing ri documentation for pg-1.5.4
Parsing documentation for pgdexter-0.5.1
Installing ri documentation for pgdexter-0.5.1
Done installing documentation for google-protobuf, pg, pgdexter after 13 seconds
3 gems installed

对pg启用2个组件

插件不是本文重点,安装步骤忽略

代码语言:sql复制
CREATE EXTENSION hypopg;
CREATE EXTENSION pg_stat_statements

造点测试数据,并进行压测

代码语言:shell复制
$ pgbench  postgres  --initialize   --fillfactor=10 --scale=10
dropping old tables...
creating tables...
generating data (client-side)...
1000000 of 1000000 tuples (100%) done (elapsed 14.24 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 23.42 s (drop tables 0.01 s, create tables 0.01 s, client-side generate 21.56 s, vacuum 0.44 s, primary keys 1.41 s).


postgres=# dt 
                                                List of relations
 Schema |          Name           |   Type   |  Owner   | Persistence | Access method |    Size    | Description
-------- ------------------------- ---------- ---------- ------------- --------------- ------------ -------------
 public | pgbench_accounts        | table    | postgres | permanent   | heap          | 1302 MB    |
 public | pgbench_branches        | table    | postgres | permanent   | heap          | 40 kB      |
 public | pgbench_history         | table    | postgres | permanent   | heap          | 0 bytes    |
 public | pgbench_tellers         | table    | postgres | permanent   | heap          | 80 kB      |


然后我们把原表的主键索引删掉,制造出全表扫描的场景
postgres=# alter table pgbench_accounts drop constraint pgbench_accounts_pkey ;
ALTER TABLE
Time: 9.403 ms


另外开一个窗口,执行压测
$ pgbench -c 10 -j 10
pgbench (16.0 - Percona Distribution, server 15.4 - Percona Distribution)
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 100/100
number of failed transactions: 0 (0.000%)
latency average = 2140.750 ms
initial connection time = 74.566 ms
tps = 4.671260 (without initial connection time)

将dexer连接到pg并读取pg-stat-statements信息进行分析【推荐】

代码语言:shell复制
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg
返回结果:
Processing 269 new query fingerprints
Index found: public.pgbench_accounts (aid)

如果没有发现需要优化的sql,则返回类似如下:
Processing 192 new query fingerprints
No new indexes found


另外,如果需要直接创建索引,可以加参数 --create 【生产上不推荐这种】
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements  --enable-hypopg --create
Processing 269 new query fingerprints
Index found: public.pgbench_accounts (aid)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."pgbench_accounts" ("aid")
Index created: 1967 ms

另外,也可以使用docker方式

代码语言:shell复制
docker pull ankane/dexter

docker run --rm ankane/dexter dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg
Processing 44 new query fingerprints
Index found: public.pgbench_accounts (aid)

dexter的帮助

代码语言:shell复制
# dexter --help
Usage:
    dexter [options]

Input options:
    --input-format          input format
    --pg-stat-activity      use pg_stat_activity
    --stdin                 use stdin
    -s, --statement         process a single statement

Connection options:
    -d, --dbname            database name
    -h, --host              database host
    -p, --port              database port
    -U, --username          database user

Processing options:
    --interval              time to wait between processing queries, in seconds
    --min-calls             only process queries that have been called a certain number of times
    --min-time              only process queries that have consumed a certain amount of DB time, in minutes

Indexing options:
    --analyze               analyze tables that haven't been analyzed in the past hour
    --create                create indexes
    --enable-hypopg         enable the HypoPG extension
    --exclude               prevent specific tables from being indexed
    --include               only include specific tables
    --tablespace            tablespace to create indexes

Logging options:
    --log-level             log level
    --log-sql               log sql

Other options:
    -v, --version           print the version
    --help                  prints help

其它

直接从pg当前会话进行分析【推荐】

代码语言:shell复制
# dexter -d postgres -Udts -h192.168.31.181  --pg-stat-activity   --enable-hypopg
Started
Processing 2 new query fingerprints
Index found: public.pgbench_accounts (aid)

从pg日志文件进行分析【不推荐】

代码语言:shell复制
先调低pg慢日志时间
log_min_duration_statement = 10 # ms

然后使用
# dexter -d postgres -Udts -h192.168.31.181 postgresql-Wed.log
Processing 276 new query fingerprints
Index found: public.pgbench_accounts (aid)


支持stderr、csvlog、 和jsonlog格式。
对于实时索引,请通过管道传输日志文件:
tail -F -n  1 postgresql.csv | dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg --stdin

从sql文件进行分析【不推荐】

代码语言:shell复制
dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg queries.sql
dexter -d postgres -Udts -h192.168.31.181 --enable-hypopg -s "SELECT * FROM xxx where xx=x"

收集选项

代码语言:shell复制
可以忽略某些极少量的一次性的查询被索引,只分析运行了至少100次的sql
dexter --min-calls 100

只分析运行的总时长超过10分钟的sql
dexter --min-time 10 # minutes

流式传输日志时,指定处理查询之间等待的时间
dexter --interval 60 # seconds

分析

为了获得最佳结果,请确保您的表最近已被分析过,以便统计数据是最新的。您可以要求 Dexter 分析它遇到的在过去一小时内尚未分析过的表:

代码语言:shell复制
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements  --analyze
Processing 276 new query fingerprints
Index found: public.pgbench_accounts (aid)

表格的过滤

代码语言:shell复制
您可以使用以下方法从索引中排除大型表或写入量大的表:
dexter --exclude table1,table2

或者,您可以指定要索引的表:
dexter --include table3,table4

例:
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --exclude=sm1.t1 --include=public.pgbench_accounts
Processing 44 new query fingerprints
Index found: public.pgbench_accounts (aid)

注意: --exclude=sm1.t1 和 --include 其中的表名不支持通配符的写法

调试

查看 Dexter 如何处理查询:

代码语言:shell复制
dexter --log-sql --log-level debug2
dexter --log-sql --log-level debug3

发现的bug

update: 2024-01-07 ,作者已修复该问题并合并到master分支,详见 https://github.com/ankane/dexter/issues/48

代码语言:shell复制
1、如果pg启用了auto_explain插件,且设置 auto_explain.log_min_duration = '0'



则执行dexter --pg-stat-statements 的时候会报错,如下是加了参数 --log-level debug2 打出的日志情况

# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 5 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16 08:00
/opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:310:in `block in determine_indexes': undefined method `<' for nil:NilClass (NoMethodError)
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:306:in `each'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:306:in `determine_indexes'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:102:in `process_queries'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:32:in `process_stat_statements'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:27:in `perform'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:9:in `start'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/exe/dexter:7:in `<top (required)>'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `load'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `<main>'






根据报错日志,加些log输出
vim /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb  310  , 加一些 puts 输出变量到控制台

      savings_ratio = (1 - @min_cost_savings_pct / 100.0)
      puts "savings_ratio ---> ",savings_ratio

      queries.each do |query|
        if query.explainable? && query.high_cost?
          puts "query.costs -->", query.costs
          puts "query.costs.length -->", query.costs.length
          new_cost, new_cost2 = query.costs[1..2]
          puts "new_cost -->", new_cost
          puts "new_cost2 -->", new_cost2
          puts  "query.initial_cost -->", query.initial_cost
          puts  "savings_ratio -->",savings_ratio


再次执行上面的dexter命令,可以看到输出变量情况如下
# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 25 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16 08:00
savings_ratio --->
0.5
query.costs -->
172875.43
query.costs.length -->   注意这里数组长度为1,造成 query.costs[1..2] 解包失败,导致 new_cost和new_cost2都为nil
1
new_cost -->

new_cost2 -->

query.initial_cost -->
172875.43
savings_ratio -->
0.5
/opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:318:in `block in determine_indexes': undefined method `<' for nil:NilClass (NoMethodError)
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:308:in `each'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:308:in `determine_indexes'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:102:in `process_queries'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/indexer.rb:32:in `process_stat_statements'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:27:in `perform'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/lib/dexter/client.rb:9:in `start'
        from /opt/rh/rh-ruby30/root/usr/local/share/gems/gems/pgdexter-0.5.1/exe/dexter:7:in `<top (required)>'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `load'
        from /opt/rh/rh-ruby30/root/usr/local/bin/dexter:23:in `<main>'




2、如果pg启用了auto_explain插件,但是设置 auto_explain.log_min_duration不为0,例如设置为 100 ,则不会有报错

postgres=# select pg_stat_statements_reset() ;
 pg_stat_statements_reset
--------------------------

(1 row)

Time: 2.176 ms
postgres=# select * from pgbench_accounts where aid=1000;
 aid  | bid | abalance |                                        filler
------ ----- ---------- --------------------------------------------------------------------------------------
 1000 |   1 |        0 |
(1 row)


# dexter -d postgres -Udts -h192.168.31.181 --pg-stat-statements --enable-hypopg --log-level debug2
Processing 5 new query fingerprints
Last analyze: public.pgbench_accounts : 2024-01-03T10:49:16 08:00
savings_ratio --->
0.5
query.costs -->
172875.43
8.07
8.07
query.costs.length -->
3
new_cost -->
8.07
new_cost2 -->
8.07
query.initial_cost -->
172875.43
savings_ratio -->
0.5
Index found: public.pgbench_accounts (aid)
--------------------------------------------------------------------------------
Query 1edaea087b983cab
Start: 172875.43
Pass1: 8.07 : public.pgbench_accounts (aid)
Pass2: 8.07 : public.pgbench_accounts (aid)
Final: 8.07 : public.pgbench_accounts (aid)

select * from pgbench_accounts where aid=$1

--------------------------------------------------------------------------------
Query 300ac36fac258016
No tables

select pg_stat_statements_reset()

--------------------------------------------------------------------------------
Query 8d6f8ae553bbb2a4
No candidate tables for indexes

SELECT installed_version FROM pg_available_extensions WHERE name = $1 /*dexter*/

--------------------------------------------------------------------------------
Query c43bbb6237d6cbc6
No tables

SET lock_timeout = '5s' /*dexter*/

--------------------------------------------------------------------------------
Query ce4b1350f91bbc5f
No tables

SHOW server_version_num /*dexter*/



大致看了下indexer.rb的代码,正常情况下 calculate_plan 会执行3次,计算3次的costs存到数组里面,流程就不会抛错。 具体是哪个逻辑里面出问题的,ruby代码不想去深入了。

简单看了下dexter的逻辑(不到800行的代码),发现和我们目前生产在用的MySQL的sql自动化索引推荐服务有点类似,但PG有hypopg加持,MySQL我只能抽样数据做成沙箱环境。

0 人点赞