PostgreSQL pg_dump 原理探究与备份数据一致性问题和备份注意事项 与 PG MYSQL 到底谁是NO.1

2024-01-04 09:42:47 浏览数 (5)

我其实是想郑重其事的回答以下在上个礼拜分享会分享后,一个人最后对我的一个问题,问题是, PG 和 MYSQL 到底那个是 NO.1 ,当时我的语气不是很好,因为我觉得很无聊,在此表示道歉,但观点我是不变的。

作为一个技术人员,不是对自己的技术有要求,而是成天的讨论 PG 第一还是 MYSQL 第一,我不觉得是光彩的事情,这只能说明一件事情,你无能,提出这个问题的人,很可能在技术上是 physically challenged individual.

你在问这个问题的时候,其实你在告诉别人,这两个数据库你必然有一个你不会,不管是PG 或是 MYSQL 。为什么不能提高自己,两个都会呢?我想那时的你,不会在提出这样问题,因为你有自信,告诉任何人,"Who cares about which one is NO.1? I can operate both of them." That's all.

——————————————————————————————

技术正文

最近有人问过关于pg_dump的问题,说实在的,我这点在写这篇文章之前属于并未十分注意,并且概念也大多停留在操作的层面。但问的人多了尤其到了备份数据的一致性的问题上,这点我讲不清。基于这个问题,

1 回答问问题的人

2 回答我自己对于一些问题的模糊不清

要搞清楚这个问题,分为两个步骤

1 跟踪操作

2 源代码深入的理解

这里我们采用postgresql 16 版本,将相关记录操作的功能的选项在PG数据库中的配置中都进行打开,并进行一次简单的备份操作,我们获取到了下面的信息。

代码语言:javascript复制
  [] 2023-12-29 05:48:51.887 UTC [256413]  postmaster 00000 0 LOG:  starting PostgreSQL 16.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, 64-bit
  [] 2023-12-29 05:48:51.887 UTC [256413]  postmaster 00000 0 LOG:  listening on IPv4 address "0.0.0.0", port 5432
  [] 2023-12-29 05:48:51.887 UTC [256413]  postmaster 00000 0 LOG:  listening on IPv6 address "::", port 5432
  [] 2023-12-29 05:48:51.889 UTC [256413]  postmaster 00000 0 LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
  [] 2023-12-29 05:48:51.893 UTC [256417]  startup 00000 0 LOG:  database system was shut down at 2023-12-29 05:48:51 UTC
  [] 2023-12-29 05:48:51.897 UTC [256413]  postmaster 00000 0 LOG:  database system is ready to accept connections
[local] [unknown] [[unknown]] 2023-12-29 05:49:04.686 UTC [256427] [unknown] not initialized 00000 0 LOG:  connection received: host=[local]
[local] postgres [[unknown]] 2023-12-29 05:49:04.687 UTC [256427] test client backend 00000 0 LOG:  connection authorized: user=postgres database=test application_name=pg_dump
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.417 ms  statement: SELECT pg_catalog.set_config('search_path', '', false);
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.121 ms  statement: SELECT pg_catalog.pg_is_in_recovery()
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.048 ms  statement: SELECT pg_catalog.set_config('search_path', '', false);
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.100 ms  statement: SET DATESTYLE = ISO
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.014 ms  statement: SET INTERVALSTYLE = POSTGRES
[local] postgres [pg_dump] 2023-12-29 05:49:04.692 UTC [256427] test client backend 00000 0 LOG:  duration: 0.048 ms  statement: SET extra_float_digits TO 3
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.013 ms  statement: SET synchronize_seqscans TO off
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.054 ms  statement: SET statement_timeout = 0
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.034 ms  statement: SET lock_timeout = 0
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.011 ms  statement: SET idle_in_transaction_session_timeout = 0
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.009 ms  statement: SET row_security = off
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.008 ms  statement: BEGIN
[local] postgres [pg_dump] 2023-12-29 05:49:04.693 UTC [256427] test client backend 00000 0 LOG:  duration: 0.012 ms  statement: SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY
[local] postgres [pg_dump] 2023-12-29 05:49:04.694 UTC [256427] test client backend 00000 0 LOG:  duration: 1.357 ms  statement: SELECT oid, rolname FROM pg_catalog.pg_roles ORDER BY 1
[local] postgres [pg_dump] 2023-12-29 05:49:04.695 UTC [256427] test client backend 00000 0 LOG:  duration: 0.576 ms  statement: SELECT x.tableoid, x.oid, x.extname, n.nspname, x.extrelocatable, x.extversion, x.extconfig, x.extcondition FROM pg_extension x JOIN pg_namespace n ON n.oid = x.extnamespace
[local] postgres [pg_dump] 2023-12-29 05:49:04.696 UTC [256427] test client backend 00000 0 LOG:  duration: 0.788 ms  statement: SELECT classid, objid, refobjid FROM pg_depend WHERE refclassid = 'pg_extension'::regclass AND deptype = 'e' ORDER BY 3
[local] postgres [pg_dump] 2023-12-29 05:49:04.696 UTC [256427] test client backend 00000 0 LOG:  duration: 0.194 ms  statement: SELECT n.tableoid, n.oid, n.nspname, n.nspowner, n.nspacl, acldefault('n', n.nspowner) AS acldefault FROM pg_namespace n
[local] postgres [pg_dump] 2023-12-29 05:49:04.700 UTC [256427] test client backend 00000 0 LOG:  duration: 3.419 ms  statement: SELECT c.tableoid, c.oid, c.relname, c.relnamespace, c.relkind, c.reltype, c.relowner, c.relchecks, c.relhasindex, c.relhasrules, c.relpages, c.relhastriggers, c.relpersistence, c.reloftype, c.relacl, acldefault(CASE WHEN c.relkind = 'S' THEN 's'::"char" ELSE 'r'::"char" END, c.relowner) AS acldefault, CASE WHEN c.relkind = 'f' THEN (SELECT ftserver FROM pg_catalog.pg_foreign_table WHERE ftrelid = c.oid) ELSE 0 END AS foreignserver, c.relfrozenxid, tc.relfrozenxid AS tfrozenxid, tc.oid AS toid, tc.relpages AS toastpages, tc.reloptions AS toast_reloptions, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, tsp.spcname AS reltablespace, false AS relhasoids, c.relispopulated, c.relreplident, c.relrowsecurity, c.relforcerowsecurity, c.relminmxid, tc.relminmxid AS tminmxid, array_remove(array_remove(c.reloptions,'check_option=local'),'check_option=cascaded') AS reloptions, CASE WHEN 'check_option=local' = ANY (c.reloptions) THEN 'LOCAL'::text WHEN 'check_option=cascaded' = ANY (c.reloptions) THEN 'CASCADED'::text ELSE NULL END AS checkoption, am.amname, (d.deptype = 'i') IS TRUE AS is_identity_sequence, c.relispartition AS ispartition 
 FROM pg_class c
 LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = 'pg_class'::regclass AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = 'pg_class'::regclass AND d.deptype IN ('a', 'i'))
 LEFT JOIN pg_tablespace tsp ON (tsp.oid = c.reltablespace)
 LEFT JOIN pg_am am ON (c.relam = am.oid)
 LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid AND tc.relkind = 't' AND c.relkind <> 'p')
 WHERE c.relkind IN ('r', 'S', 'v', 'c', 'm', 'f', 'p')
 ORDER BY c.oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.701 UTC [256427] test client backend 00000 0 LOG:  duration: 0.099 ms  statement: LOCK TABLE public.test IN ACCESS SHARE MODE
[local] postgres [pg_dump] 2023-12-29 05:49:04.707 UTC [256427] test client backend 00000 0 LOG:  duration: 5.525 ms  statement: SELECT p.tableoid, p.oid, p.proname, p.prolang, p.pronargs, p.proargtypes, p.prorettype, p.proacl, acldefault('f', p.proowner) AS acldefault, p.pronamespace, p.proowner FROM pg_proc p LEFT JOIN pg_init_privs pip ON (p.oid = pip.objoid AND pip.classoid = 'pg_proc'::regclass AND pip.objsubid = 0) WHERE p.prokind <> 'a'
   AND NOT EXISTS (SELECT 1 FROM pg_depend WHERE classid = 'pg_proc'::regclass AND objid = p.oid AND deptype = 'i')
   AND (
   pronamespace != (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog')
   OR EXISTS (SELECT 1 FROM pg_cast
   WHERE pg_cast.oid > 16383 
   AND p.oid = pg_cast.castfunc)
   OR EXISTS (SELECT 1 FROM pg_transform
   WHERE pg_transform.oid > 16383 AND 
   (p.oid = pg_transform.trffromsql
   OR p.oid = pg_transform.trftosql))
   OR p.proacl IS DISTINCT FROM pip.initprivs)
[local] postgres [pg_dump] 2023-12-29 05:49:04.709 UTC [256427] test client backend 00000 0 LOG:  duration: 2.385 ms  statement: SELECT tableoid, oid, typname, typnamespace, typacl, acldefault('T', typowner) AS acldefault, typowner, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END AS typrelkind, typtype, typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type
[local] postgres [pg_dump] 2023-12-29 05:49:04.711 UTC [256427] test client backend 00000 0 LOG:  duration: 0.591 ms  statement: SELECT tableoid, oid, lanname, lanpltrusted, lanplcallfoid, laninline, lanvalidator, lanacl, acldefault('l', lanowner) AS acldefault, lanowner FROM pg_language WHERE lanispl ORDER BY oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.712 UTC [256427] test client backend 00000 0 LOG:  duration: 1.651 ms  statement: SELECT p.tableoid, p.oid, 5:49:04.714 UTC [256427] test client backend 00000 0 LOG:  duration: 0.929 ms  statement: SELECT tableoid, oid, oprname, oprnamespace, oprowner, oprkind, oprcode::oid AS oprcode FROM pg_operator
[local] postgres [pg_dump] 2023-12-29 05:49:04.714 UTC [256427] test client backend 00000 0 LOG:  duration: 0.226 ms  statement: SELECT tableoid, oid, amname, amtype, amhandler::pg_catalog.regproc AS amhandler FROM pg_am
[local] postgres [pg_dump] 2023-12-29 05:49:04.715 UTC [256427] test client backend 00000 0 LOG:  duration: 0.251 ms  statement: SELECT tableoid, oid, opcname, opcnamespace, opcowner FROM pg_opclass
[local] postgres [pg_dump] 2023-12-29 05:49:04.715 UTC [256427] test client backend 00000 0 LOG:  duration: 0.272 ms  statement: SELECT tableoid, oid, opfname, opfnamespace, opfowner FROM pg_opfamily
[local] postgres [pg_dump] 2023-12-29 05:49:04.715 UTC [256427] test client backend 00000 0 LOG:  duration: 0.228 ms  statement: SELECT tableoid, oid, prsname, prsnamespace, prsstart::oid, prstoken::oid, prsend::oid, prsheadline::oid, prslextype::oid FROM pg_ts_parser
[local] postgres [pg_dump] 2023-12-29 05:49:04.715 UTC [256427] test client backend 00000 0 LOG:  duration: 0.141 ms  statement: SELECT tableoid, oid, tmplname, tmplnamespace, tmplinit::oid, tmpllexize::oid FROM pg_ts_template
[local] postgres [pg_dump] 2023-12-29 05:49:04.716 UTC [256427] test client backend 00000 0 LOG:  duration: 0.205 ms  statement: SELECT tableoid, oid, dictname, dictnamespace, dictowner, dicttemplate, dictinitoption FROM pg_ts_dict
[local] postgres [pg_dump] 2023-12-29 05:49:04.716 UTC [256427] test client backend 00000 0 LOG:  duration: 0.151 ms  statement: SELECT tableoid, oid, cfgname, cfgnamespace, cfgowner, cfgparser FROM pg_ts_config
[local] postgres [pg_dump] 2023-12-29 05:49:04.717 UTC [256427] test client backend 00000 0 LOG:  duration: 0.505 ms  statement: SELECT tableoid, oid, fdwname, fdwowner, fdwhandler::pg_catalog.regproc, fdwvalidator::pg_catalog.regproc, fdwacl, acldefault('F', fdwowner) AS acldefault, array_to_string(ARRAY(SELECT quote_ident(option_name) || ' ' || quote_literal(option_value) FROM pg_options_to_table(fdwoptions) ORDER BY option_name), E',
     ') AS fdwoptions FROM pg_foreign_data_wrapper
[local] postgres [pg_dump] 2023-12-29 05:49:04.717 UTC [256427] test client backend 00000 0 LOG:  duration: 0.284 ms  statement: SELECT tableoid, oid, srvname, srvowner, srvfdw, srvtype, srvversion, srvacl, acldefault('S', srvowner) AS acldefault, array_to_string(ARRAY(SELECT quote_ident(option_name) || ' ' || quote_literal(option_value) FROM pg_options_to_table(srvoptions) ORDER BY option_name), E',
     ') AS srvoptions FROM pg_foreign_server
[local] postgres [pg_dump] 2023-12-29 05:49:04.717 UTC [256427] test client backend 00000 0 LOG:  duration: 0.227 ms  statement: SELECT oid, tableoid, defaclrole, defaclnamespace, defaclobjtype, defaclacl, CASE WHEN defaclnamespace = 0 THEN acldefault(CASE WHEN defaclobjtype = 'S' THEN 's'::"char" ELSE defaclobjtype END, defaclrole) ELSE '{}' END AS acldefault FROM pg_default_acl
[local] postgres [pg_dump] 2023-12-29 05:49:04.718 UTC [256427] test client backend 00000 0 LOG:  duration: 0.737 ms  statement: SELECT tableoid, oid, collname, collnamespace, collowner FROM pg_collation
[local] postgres [pg_dump] 2023-12-29 05:49:04.719 UTC [256427] test client backend 00000 0 LOG:  duration: 0.279 ms  statement: SELECT tableoid, oid, conname, connamespace, conowner FROM pg_conversion
[local] postgres [pg_dump] 2023-12-29 05:49:04.720 UTC [256427] test client backend 00000 0 LOG:  duration: 0.605 ms  statement: SELECT tableoid, oid, castsource, casttarget, castfunc, castcontext, castmethod FROM pg_cast c WHERE NOT EXISTS ( SELECT 1 FROM pg_range r WHERE c.castsource = r.rngtypid AND c.casttarget = r.rngmultitypid ) ORDER BY 3,4
[local] postgres [pg_dump] 2023-12-29 05:49:04.720 UTC [256427] test client backend 00000 0 LOG:  duration: 0.125 ms  statement: SELECT tableoid, oid, trftype, trflang, trffromsql::oid, trftosql::oid FROM pg_transform ORDER BY 3,4
[local] postgres [pg_dump] 2023-12-29 05:49:04.720 UTC [256427] test client backend 00000 0 LOG:  duration: 0.212 ms  statement: SELECT inhrelid, inhparent FROM pg_inherits
[local] postgres [pg_dump] 2023-12-29 05:49:04.721 UTC [256427] test client backend 00000 0 LOG:  duration: 0.276 ms  statement: SELECT e.tableoid, e.oid, evtname, evtenabled, evtevent, evtowner, array_to_string(array(select quote_literal(x)  from unnest(evttags) as t(x)), ', ') as evttags, e.evtfoid::regproc as evtfname FROM pg_event_trigger e ORDER BY e.oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.721 UTC [256427] test client backend 00000 0 LOG:  duration: 0.476 ms  statement: SELECT conrelid, confrelid FROM pg_constraint JOIN pg_depend ON (objid = confrelid) WHERE contype = 'f' AND refclassid = 'pg_extension'::regclass AND classid = 'pg_class'::regclass;
[local] postgres [pg_dump] 2023-12-29 05:49:04.722 UTC [256427] test client backend 00000 0 LOG:  duration: 0.927 ms  statement: SELECT
 a.attrelid,
 a.attnum,
 a.attname,
 a.attstattarget,
 a.attstorage,
 t.typstorage,
 a.attnotnull,
 a.atthasdef,
 a.attisdropped,
 a.attlen,
 a.attalign,
 a.attislocal,
 pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,
 array_to_string(a.attoptions, ', ') AS attoptions,
 CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
 pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
     ') AS attfdwoptions,
 a.attcompression AS attcompression,
 a.attidentity,
 CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,
 a.attgenerated
 FROM unnest('{16394}'::pg_catalog.oid[]) AS src(tbloid)
 JOIN pg_catalog.pg_attribute a ON (src.tbloid = a.attrelid) LEFT JOIN pg_catalog.pg_type t ON (a.atttypid = t.oid)
 WHERE a.attnum > 0::pg_catalog.int2
 ORDER BY a.attrelid, a.attnum
[local] postgres [pg_dump] 2023-12-29 05:49:04.723 UTC [256427] test client backend 00000 0 LOG:  duration: 0.433 ms  statement: SELECT partrelid FROM pg_partitioned_table WHERE
 (SELECT c.oid FROM pg_opclass c JOIN pg_am a ON c.opcmethod = a.oid
 WHERE opcname = 'enum_ops' AND opcnamespace = 'pg_catalog'::regnamespace AND amname = 'hash') = ANY(partclass)
[local] postgres [pg_dump] 2023-12-29 05:49:04.726 UTC [256427] test client backend 00000 0 LOG:  duration: 2.790 ms  statement: SELECT t.tableoid, t.oid, i.indrelid, t.relname AS indexname, pg_catalog.pg_get_indexdef(i.indexrelid) AS indexdef, i.indkey, i.indisclustered, c.contype, c.conname, c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, t.reloptions AS indreloptions, i.indisreplident, inh.inhparent AS parentidx, i.indnkeyatts AS indnkeyatts, i.indnatts AS indnatts, (SELECT pg_catalog.array_agg(attnum ORDER BY attnum)   FROM pg_catalog.pg_attribute   WHERE attrelid = i.indexrelid AND     attstattarget >= 0) AS indstatcols, (SELECT pg_catalog.array_agg(attstattarget ORDER BY attnum)   FROM pg_catalog.pg_attribute   WHERE attrelid = i.indexrelid AND     attstattarget >= 0) AS indstatvals, i.indnullsnotdistinct FROM unnest('{}'::pg_catalog.oid[]) AS src(tbloid)
 JOIN pg_catalog.pg_index i ON (src.tbloid = i.indrelid) JOIN pg_catalog.pg_class t ON (t.oid = i.indexrelid) JOIN pg_catalog.pg_class t2 ON (t2.oid = i.indrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) LEFT JOIN pg_catalog.pg_inherits inh ON (inh.inhrelid = indexrelid) WHERE (i.indisvalid OR t2.relkind = 'p') AND i.indisready ORDER BY i.indrelid, indexname
[local] postgres [pg_dump] 2023-12-29 05:49:04.726 UTC [256427] test client backend 00000 0 LOG:  duration: 0.241 ms  statement: SELECT tableoid, oid, stxname, stxnamespace, stxowner, stxstattarget FROM pg_catalog.pg_statistic_ext
[local] postgres [pg_dump] 2023-12-29 05:49:04.726 UTC [256427] test client backend 00000 0 LOG:  duration: 0.257 ms  statement: SELECT c.tableoid, c.oid, conrelid, conname, confrelid, conindid, pg_catalog.pg_get_constraintdef(c.oid) AS condef
 FROM unnest('{}'::pg_catalog.oid[]) AS src(tbloid)
 JOIN pg_catalog.pg_constraint c ON (src.tbloid = c.conrelid)
 WHERE contype = 'f' AND conparentid = 0 ORDER BY conrelid, conname
[local] postgres [pg_dump] 2023-12-29 05:49:04.727 UTC [256427] test client backend 00000 0 LOG:  duration: 0.446 ms  statement: SELECT t.tgrelid, t.tgname, t.tgfoid::pg_catalog.regproc AS tgfname, pg_catalog.pg_get_triggerdef(t.oid, false) AS tgdef, t.tgenabled, t.tableoid, t.oid, t.tgparentid <> 0 AS tgispartition
 FROM unnest('{}'::pg_catalog.oid[]) AS src(tbloid)
 JOIN pg_catalog.pg_trigger t ON (src.tbloid = t.tgrelid) LEFT JOIN pg_catalog.pg_trigger u ON (u.oid = t.tgparentid) WHERE ((NOT t.tgisinternal AND t.tgparentid = 0) OR t.tgenabled != u.tgenabled) ORDER BY t.tgrelid, t.tgname
[local] postgres [pg_dump] 2023-12-29 05:49:04.728 UTC [256427] test client backend 00000 0 LOG:  duration: 0.705 ms  statement: SELECT tableoid, oid, rulename, ev_class AS ruletable, ev_type, is_instead, ev_enabled FROM pg_rewrite ORDER BY oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.729 UTC [256427] test client backend 00000 0 LOG:  duration: 0.883 ms  statement: SELECT pol.oid, pol.tableoid, pol.polrelid, pol.polname, pol.polcmd, pol.polpermissive, CASE WHEN pol.polroles = '{0}' THEN NULL ELSE    pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(rolname) from pg_catalog.pg_roles WHERE oid = ANY(pol.polroles)), ', ') END AS polroles, pg_catalog.pg_get_expr(pol.polqual, pol.polrelid) AS polqual, pg_catalog.pg_get_expr(pol.polwithcheck, pol.polrelid) AS polwithcheck FROM unnest('{16394}'::pg_catalog.oid[]) AS src(tbloid)
 JOIN pg_catalog.pg_policy pol ON (src.tbloid = pol.polrelid)
[local] postgres [pg_dump] 2023-12-29 05:49:04.729 UTC [256427] test client backend 00000 0 LOG:  duration: 0.188 ms  statement: SELECT p.tableoid, p.oid, p.pubname, p.pubowner, p.puballtables, p.pubinsert, p.pubupdate, p.pubdelete, p.pubtruncate, p.pubviaroot FROM pg_publication p
[local] postgres [pg_dump] 2023-12-29 05:49:04.730 UTC [256427] test client backend 00000 0 LOG:  duration: 0.601 ms  statement: SELECT tableoid, oid, prpubid, prrelid, pg_catalog.pg_get_expr(prqual, prrelid) AS prrelqual, (CASE
   WHEN pr.prattrs IS NOT NULL THEN
     (SELECT array_agg(attname)
        FROM
          pg_catalog.generate_series(0, pg_catalog.array_upper(pr.prattrs::pg_catalog.int2[], 1)) s,
          pg_catalog.pg_attribute
       WHERE attrelid = pr.prrelid AND attnum = prattrs[s])
   ELSE NULL END) prattrs FROM pg_catalog.pg_publication_rel pr
[local] postgres [pg_dump] 2023-12-29 05:49:04.730 UTC [256427] test client backend 00000 0 LOG:  duration: 0.136 ms  statement: SELECT tableoid, oid, pnpubid, pnnspid FROM pg_catalog.pg_publication_namespace
[local] postgres [pg_dump] 2023-12-29 05:49:04.730 UTC [256427] test client backend 00000 0 LOG:  duration: 0.355 ms  statement: SELECT s.tableoid, s.oid, s.subname,
  s.subowner,
  s.subconninfo, s.subslotname, s.subsynccommit,
  s.subpublications,
  s.subbinary,
  s.substream,
  s.subtwophasestate,
  s.subdisableonerr,
  s.suborigin,
  s.subpasswordrequired
 FROM pg_subscription s
 WHERE s.subdbid = (SELECT oid FROM pg_database
                    WHERE datname = current_database())
[local] postgres [pg_dump] 2023-12-29 05:49:04.732 UTC [256427] test client backend 00000 0 LOG:  duration: 1.315 ms  statement: WITH RECURSIVE w AS ( SELECT d1.objid, d2.refobjid, c2.relkind AS refrelkind FROM pg_depend d1 JOIN pg_class c1 ON c1.oid = d1.objid AND c1.relkind = 'm' JOIN pg_rewrite r1 ON r1.ev_class = d1.objid JOIN pg_depend d2 ON d2.classid = 'pg_rewrite'::regclass AND d2.objid = r1.oid AND d2.refobjid <> d1.objid JOIN pg_class c2 ON c2.oid = d2.refobjid AND c2.relkind IN ('m','v') WHERE d1.classid = 'pg_class'::regclass UNION SELECT w.objid, d3.refobjid, c3.relkind FROM w JOIN pg_rewrite r3 ON r3.ev_class = w.refobjid JOIN pg_depend d3 ON d3.classid = 'pg_rewrite'::regclass AND d3.objid = r3.oid AND d3.refobjid <> w.refobjid JOIN pg_class c3 ON c3.oid = d3.refobjid AND c3.relkind IN ('m','v') ) SELECT 'pg_class'::regclass::oid AS classid, objid, refobjid FROM w WHERE refrelkind = 'm'
[local] postgres [pg_dump] 2023-12-29 05:49:04.732 UTC [256427] test client backend 00000 0 LOG:  duration: 0.161 ms  statement: SELECT oid, lomowner, lomacl, acldefault('L', lomowner) AS acldefault FROM pg_largeobject_metadata
[local] postgres [pg_dump] 2023-12-29 05:49:04.734 UTC [256427] test client backend 00000 0 LOG:  duration: 2.329 ms  statement: SELECT classid, objid, refclassid, refobjid, deptype FROM pg_depend WHERE deptype != 'p' AND deptype != 'e'
 UNION ALL
 SELECT 'pg_opfamily'::regclass AS classid, amopfamily AS objid, refclassid, refobjid, deptype FROM pg_depend d, pg_amop o WHERE deptype NOT IN ('p', 'e', 'i') AND classid = 'pg_amop'::regclass AND objid = o.oid AND NOT (refclassid = 'pg_opfamily'::regclass AND amopfamily = refobjid)
 UNION ALL
 SELECT 'pg_opfamily'::regclass AS classid, amprocfamily AS objid, refclassid, refobjid, deptype FROM pg_depend d, pg_amproc p WHERE deptype NOT IN ('p', 'e', 'i') AND classid = 'pg_amproc'::regclass AND objid = p.oid AND NOT (refclassid = 'pg_opfamily'::regclass AND amprocfamily = refobjid)
 ORDER BY 1,2
[local] postgres [pg_dump] 2023-12-29 05:49:04.736 UTC [256427] test client backend 00000 0 LOG:  duration: 0.763 ms  statement: SELECT DISTINCT attrelid FROM pg_attribute WHERE attacl IS NOT NULL
[local] postgres [pg_dump] 2023-12-29 05:49:04.736 UTC [256427] test client backend 00000 0 LOG:  duration: 0.286 ms  statement: SELECT objoid, classoid, objsubid, privtype, initprivs FROM pg_init_privs
[local] postgres [pg_dump] 2023-12-29 05:49:04.744 UTC [256427] test client backend 00000 0 LOG:  duration: 7.591 ms  statement: SELECT description, classoid, objoid, objsubid FROM pg_catalog.pg_description ORDER BY classoid, objoid, objsubid
[local] postgres [pg_dump] 2023-12-29 05:49:04.745 UTC [256427] test client backend 00000 0 LOG:  duration: 0.436 ms  statement: SELECT label, provider, classoid, objoid, objsubid FROM pg_catalog.pg_seclabel ORDER BY classoid, objoid, objsubid
[local] postgres [pg_dump] 2023-12-29 05:49:04.747 UTC [256427] test client backend 00000 0 LOG:  duration: 0.100 ms  statement: SELECT pg_catalog.current_schemas(false)
[local] postgres [pg_dump] 2023-12-29 05:49:04.747 UTC [256427] test client backend 00000 0 LOG:  duration: 0.596 ms  statement: SELECT tableoid, oid, datname, datdba, pg_encoding_to_char(encoding) AS encoding, datcollate, datctype, datfrozenxid, datacl, acldefault('d', datdba) AS acldefault, datistemplate, datconnlimit, datminmxid, datlocprovider, daticulocale, datcollversion, daticurules, (SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, shobj_description(oid, 'pg_database') AS description FROM pg_database WHERE datname = current_database()
[local] postgres [pg_dump] 2023-12-29 05:49:04.748 UTC [256427] test client backend 00000 0 LOG:  duration: 0.130 ms  statement: SELECT provider, label FROM pg_catalog.pg_shseclabel WHERE classoid = 'pg_catalog.pg_database'::pg_catalog.regclass AND objoid = '16393'
[local] postgres [pg_dump] 2023-12-29 05:49:04.748 UTC [256427] test client backend 00000 0 LOG:  duration: 0.127 ms  statement: SELECT unnest(setconfig) FROM pg_db_role_setting WHERE setrole = 0 AND setdatabase = '16393'::oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.748 UTC [256427] test client backend 00000 0 LOG:  duration: 0.183 ms  statement: SELECT rolname, unnest(setconfig) FROM pg_db_role_setting s, pg_roles r WHERE setrole = r.oid AND setdatabase = '16393'::oid
[local] postgres [pg_dump] 2023-12-29 05:49:04.749 UTC [256427] test client backend 00000 0 LOG:  duration: 0.188 ms  statement: PREPARE getColumnACLs(pg_catalog.oid) AS
 SELECT at.attname, at.attacl, '{}' AS acldefault, pip.privtype, pip.initprivs FROM pg_catalog.pg_attribute at LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = $1 AND NOT at.attisdropped AND (at.attacl IS NOT NULL OR pip.initprivs IS NOT NULL) ORDER BY at.attnum
[local] postgres [pg_dump] 2023-12-29 05:49:04.749 UTC [256427] test client backend 00000 0 LOG:  duration: 0.440 ms  statement: EXECUTE getColumnACLs('6100')
[local] postgres [pg_dump] 2023-12-29 05:49:04.749 UTC [256427] test client backend 00000 0 DETAIL:  prepare: PREPARE getColumnACLs(pg_catalog.oid) AS
 SELECT at.attname, at.attacl, '{}' AS acldefault, pip.privtype, pip.initprivs FROM pg_catalog.pg_attribute at LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = $1 AND NOT at.attisdropped AND (at.attacl IS NOT NULL OR pip.initprivs IS NOT NULL) ORDER BY at.attnum
[local] postgres [pg_dump] 2023-12-29 05:49:04.749 UTC [256427] test client backend 00000 0 LOG:  duration: 0.149 ms  statement: COPY public.test (id, text) TO stdout;
[local] postgres [pg_dump] 2023-12-29 05:49:04.751 UTC [256427] test client backend 00000 0 LOG:  disconnection: session time: 0:00:00.065 user=postgres database=test host=[local]
  [] 2023-12-29 05:53:51.992 UTC [256415]  checkpointer 00000 0 LOG:  checkpoint starting: time
  [] 2023-12-29 05:53:52.004 UTC [256415]  checkpointer 00000 0 LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.003 s, sync=0.001 s, total=0.013 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/5000110, redo lsn=0/50000D8

这里我们对于信息进行分块化的处理

代码语言:javascript复制
statement: SELECT pg_catalog.set_config('search_path', '', false);
statement: SELECT pg_catalog.pg_is_in_recovery()
statement: SELECT pg_catalog.set_config('search_path', '', false);
statement: SET DATESTYLE = ISO
statement: SET INTERVALSTYLE = POSTGRES
statement: SET extra_float_digits TO 3
statement: SET synchronize_seqscans TO off
statement: SET statement_timeout = 0
statement: SET lock_timeout = 0
statement: SET idle_in_transaction_session_timeout = 0
statement: SET row_security = off

上面这段主要是是pg_dump开始时要做的一些准备工作,search_path为针对当前要备份的搜索路径进行设置,这里使用的是默认的路径,进行数据库中需要备份的OBJECTS 的设定。

后面针对数据库是否是主库或是从库进行了判断,pg_is_in_recovery 便是对于数据库是主库还是备库的判断函数。这里有几个重要的设置,如设置了锁的超时时间为,不超时,设置了链接idel超时的时间为不超时,设置了语句的超时时间为不超时,也就是将备份中被系统的超时设置影响而导致的备份失败的情况都排除了。

代码语言:javascript复制
statement: BEGIN
statement: SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY

中间在进行数据备份开始前,将备份的过程作为一个事务来进行,同时将当前的数据库隔离方式设置为 repeatable read,在这样的情况下,不会产生诸如,脏读,幻读,不可重复读等数据在备份期间产生数据不一致的情况。后面会对当前数据库的一些状态信息进行获取,这里就不一一列出了。

代码语言:javascript复制
test client backend 00000 0 LOG 
statement: LOCK TABLE public.test IN ACCESS SHARE MODE
test client backend 00000 0 LOG 

这里需要注意在对表备份中,是存在对于表进行锁定的情况的,所以这里有部分同学反馈,有些情况下pg_dump 有失败的情况,是正常的,因为在你的数据库中表的数量比较多,同时在对表进行锁定的情况下,你的表正在进行大事务的操作,比如批量的DML 操作,那么此时你的备份操作很可能有失败的情况,因为有些同学反馈,pg_dump 一直没有响应。

这里建议你,可以将日志打开,跟踪一下到底备份到哪里出现问题,大概率是 lock table 无法lock。

最后的操作是访问控制访问列表,通过下面的语句直接将需要备份表的每个列的名称和想的访问的访问控制列表获得。最后执行标准的表的copy 命令进行数据标准输出。

代码语言:javascript复制
EXECUTE getColumnACLs('6100')
prepare: PREPARE getColumnACLs(pg_catalog.oid) AS
SELECT at.attname, at.attacl, '{}' AS acldefault, pip.privtype, pip.initprivs FROM pg_catalog.pg_attribute at LEFT JOIN pg_catalog.pg_init_privs pip ON (at.attrelid = pip.objoid AND pip.classoid = 'pg_catalog.pg_class'::pg_catalog.regclass AND at.attnum = pip.objsubid) WHERE at.attrelid = $1 AND NOT at.attisdropped AND (at.attacl IS NOT NULL OR pip.initprivs IS NOT NULL) ORDER BY at.attnum
COPY public.test (id, text) TO stdout;
代码语言:javascript复制
attrelid |       attname       |    attacl     | acldefault | privtype |   initprivs   
---------- --------------------- --------------- ------------ ---------- ---------------
   6100 | oid                 | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subdbid             | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subskiplsn          | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subname             | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subowner            | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subenabled          | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subbinary           | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | substream           | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subtwophasestate    | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subdisableonerr     | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subpasswordrequired | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subrunasowner       | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subslotname         | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subsynccommit       | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | subpublications     | {=r/postgres} | {}         | i        | {=r/postgres}
   6100 | suborigin           | {=r/postgres} | {}         | i        | {=r/postgres}
(16 rows)

整体的备份过程如上。

这里有几个问题

1 备份中为什么将备份的过程,转为 RR 模式

2 备份中为什么要给表加 access share 锁 第一个问题

我们截取程序的一段源代码,这段源代码是来自于 /src/bin/pg_dump/pg_dump.c 下的代码。

代码语言:javascript复制
  * Initialize prepared-query state to "nothing prepared".  We do this here
  * so that a parallel dump worker will have its own state.
  */
 AH->is_prepared = (bool *) pg_malloc0(NUM_PREP_QUERIES * sizeof(bool));

 /*
  * Start transaction-snapshot mode transaction to dump consistent data.
  */
 ExecuteSqlStatement(AH, "BEGIN");

 /*
  * To support the combination of serializable_deferrable with the jobs
  * option we use REPEATABLE READ for the worker connections that are
  * passed a snapshot.  As long as the snapshot is acquired in a
  * SERIALIZABLE, READ ONLY, DEFERRABLE transaction, its use within a
  * REPEATABLE READ transaction provides the appropriate integrity
  * guarantees.  This is a kluge, but safe for back-patching.
  */
 if (dopt->serializable_deferrable && AH->sync_snapshot_id == NULL)
  ExecuteSqlStatement(AH,
       "SET TRANSACTION ISOLATION LEVEL "
       "SERIALIZABLE, READ ONLY, DEFERRABLE");
 else
  ExecuteSqlStatement(AH,
       "SET TRANSACTION ISOLATION LEVEL "
       "REPEATABLE READ, READ ONLY");

 /*
  * If user specified a snapshot to use, select that.  In a parallel dump
  * worker, we'll be passed dumpsnapshot == NULL, but AH->sync_snapshot_id
  * is already set (if the server can handle it) and we should use that.
  */
 if (dumpsnapshot)
  AH->sync_snapshot_id = pg_strdup(dumpsnapshot);

 if (AH->sync_snapshot_id)
 {
  PQExpBuffer query = createPQExpBuffer();

  appendPQExpBufferStr(query, "SET TRANSACTION SNAPSHOT ");
  appendStringLiteralConn(query, AH->sync_snapshot_id, conn);
  ExecuteSqlStatement(AH, query->data);
  destroyPQExpBuffer(query);
 }
 else if (AH->numWorkers > 1)
 {
  if (AH->isStandby && AH->remoteVersion < 100000)
   pg_fatal("parallel dumps from standby servers are not supported by this server version");
  AH->sync_snapshot_id = get_synchronized_snapshot(AH);
 }
}

在这段代码中,我将一些重要的部分调整成蓝色,大字体,这里的第一个蓝色的行,表达的是,要开始一个事务,通过创造一个事务的方式,让在备份期间的数据库达到一个一致性的状态。 第二个部分的蓝色的代码,是经过判断的,如果数据库当前的状态无法进行snapshot 则就必须将数据库设置为序列化的状态,但正常的情况是不会无法创建snapshot的,所以一般都是要到蓝色的部分,去产生一个repeatable read 的隔离级别来完成备份中数据一致性的要求。 这里使用repeatable read的原因是,在这个隔离级别下,PG 在事务周期中看到的数据是一致的,及时此事其他的事务修改了表中的行,但在这个事务里面 是不可见的,所以保证了此时这个事务中的所有要备份的表的数据是一致的要求中,可以采集到的所有系统表的数据的数据都是一致的。 这也是我们在最上面的整体备份过程中,一大片在获取数据库中各种信息原因并且保证这些表的数据都是一致的机理。 第三个蓝色的地方是保证在执行数据输出的时候,保证输出的每个表的数据都应用当前事务中的数据状态是一致的,包括此时事务是在并发进行工作的。 第二个问题,在对每个表进行 copy的时候要先对表进行access share mode 下方蓝色的部分,对于表的要进行access share 锁的设定,主要的原因在是备份的过长中,虽然我们有表的在发起备份时刻的snapshot 但是如果此时表要被进行DDL 是不会被允许的,因为这将改变获取表数据的准确性,所以在获取表的schema 信息是需要加这个锁防止表被改变和修改。 所以在备份期间,不要做 1 对于表结构进行修改的事情2 2 超大事务,在备份期间对于要备份的表进行长时间的霸占导致获取锁失败(不包含获取 SM 锁的行为) 3 删除表,truncate 表的行为 4 vacuum full 的操作

代码语言:javascript复制
if (query->len == 0)
    appendPQExpBuffer(query, "LOCK TABLE %s",
                      fmtQualifiedDumpable(&tblinfo[i]));
else
{
    appendPQExpBuffer(query, ", %s",
                      fmtQualifiedDumpable(&tblinfo[i]));

    /* Arbitrarily end a batch when query length reaches 100K. */
    if (query->len >= 100000)
    {
        /* Lock another batch of tables. */
        appendPQExpBufferStr(query, " IN ACCESS SHARE MODE");
        ExecuteSqlStatement(fout, query->data);
        resetPQExpBuffer(query);
    }
}

相关MySQLDUMP 的分析也会在周五发送,并且比对二者在整体操作过程中的特点和优点,想了解MYSQLDUMP 在数据备份一致性形成的原理和方法的同学可以关注周五,在有一个点上MYSQL相对PG减少了备份中冲突的发生的可能性。

0 人点赞