【什么情况需要重建抽取进程】
1、对于RAC增加与删除节点后出现RAC线程与OGG线程不一致.
2、对于抽取进程所在的文件系统损坏或者被删除的情况,这个重建需要依赖应用进程或者传输进程,需要具体问题具体分析(如果文件损坏或误删除,大概率传输也会遭殃(具体情况具体分析),所以依赖应用进程比较靠谱)
【重建抽取进程--增加与删除节点大概步骤】
1、对于删除节点后,此线程对应日志必须extract完成,才能停止extract否则会丢失数据.
对于新加节点后,RAC线程与OGG线程不一致,可以停止extract进程.对于OGG停止之前,先判断数据库是否有长事务运行,如果有等待长事务结束.
对于单纯重建extract的,以下步骤也是适用的.与上面新加与删除节点原理都是一样的.对于非静态系统的执行2-9步骤,如果是静态系统的话,等待extract无延迟后,直接执行6,9的2个步骤,不存在更新checkpoint信息,但是exttrail或者rmttrail信息需要关注下(参考步骤7中)
2、停止EXTRACT--stop extract extractname,status extract extractname(前者是停止,后者是检查是否状态)
3、备份EXTRACT的dirchk目录下extractname.cpe文件,避免误操作可以恢复(extractname默认都是大写)
4、获取EXTRACT的checkpoint信息,用于后续重建--info extractname showch,info extractname detail
5、删除EXTRACT(可以先dblogin数据库,这样数据库checkpoint table中信息以及register进程被正常删除)--delete extract extractname
6、重新增加EXTRACT--add extract extractname,begin now,threads n(这个n表示总的线程数,跟thread n有区别,后者表示第几个线程)
7、更新EXTRACT的checkpoint信息,利用info extractname showch输出信息的,更新包括recovery checkpoint,current checkpoint以及增加exttrail
增加exttrail或者rmttrail信息(根据实际情况,这些是否存在)--add exttrail exttrailname,megabytes xxm,extract extractname(exttrailname可以是相对路径
或者绝对路径,各种好处,相对迁移之类简单,绝对路径管理容易些)
8、再次检查extract checkpoint信息--info extractname showch,info extractname detail
9、启动extract --start extract extractname
【重建抽取进程--增加与删除节点:以oracle为例子】
1、先停止extract进程
GGSCI> stop EXIAOXU
Sending STOP request to EXTRACT EXIAOXU ...
Request processed.
2、检查extract进程是否状态
GGSCI> info EXIAOXU
EXTRACT EXIAOXU Last Started 2018-02-05 23:48 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:03:28 ago)
3、获取extract EXIAOXU的checkpoint信息
备注:checkpoint信息包括:Recovery Checkpoint、Current Checkpoint以及Write Checkpoint三个信息,用于后续重建时使用
GGSCI> info EXIAOXU showch
EXTRACT EXIAOXU Last Started 2018-02-05 23:48 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:03:28 ago)
Log Read Checkpoint Oracle Redo Logs
2018-03-21 23:29:30 Thread 1, Seqno 19290, RBA 3958848
SCN 312.1419249828 (131449046180)
Log Read Checkpoint Oracle Redo Logs
2018-03-21 23:29:28 Thread 2, Seqno 21339, RBA 58046720
SCN 312.1419248878 (131449045230)
Current Checkpoint Detail:
Read Checkpoint #1
Oracle Threaded Redo Log
Startup Checkpoint (starting position in the data source):
Thread #: 1
Sequence #: 196024
RBA: 76904464
Timestamp: 2018-02-05 18:24:29.000000
SCN: 307.4007933433 (132562893305)
Redo File: FRA/xiaoxu/onlinelog/group5
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 19290
RBA: 3958352
Timestamp: 2018-03-21 23:29:30.000000
SCN: 312.1419249828 (131449046180)
Redo File: FRA/xiaoxu/onlinelog/group6
Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 19290
RBA: 3958848
Timestamp: 2018-03-21 23:29:30.000000
SCN: 312.1419249828 (131449046180)
Redo File: FRA/xiaoxu/onlinelog/group6
BR Previous Recovery Checkpoint:
Thread #: 1
Sequence #: 0
RBA: 0
Timestamp: 2018-02-05 23:48:51.600282
SCN: Not available
Redo File:
BR Begin Recovery Checkpoint:
Thread #: 1
Sequence #: 19287
RBA: 18735824
Timestamp: 2018-03-21 20:13:00.000000
SCN: 312.1372881091 (131402677443)
Redo File:
BR End Recovery Checkpoint:
Thread #: 1
Sequence #: 19287
RBA: 18823360
Timestamp: 2018-03-21 20:13:01.000000
SCN: 312.1372881363 (131402677715)
Redo File:
Read Checkpoint #2
Oracle Threaded Redo Log
Startup Checkpoint (starting position in the data source):
Thread #: 2
Sequence #: 21542
RBA: 6912912
Timestamp: 2018-02-05 18:24:30.000000
SCN: 307.4007933465 (132562893337)
Redo File: FRA/xiaoxu/onlinelog/group8
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 2
Sequence #: 21339
RBA: 58008848
Timestamp: 2018-03-21 23:29:28.000000
SCN: 312.1419248878 (131449045230)
Redo File: FRA/xiaoxu/onlinelog/group9
Current Checkpoint (position of last record read in the data source):
Thread #: 2
Sequence #: 21339
RBA: 58046720
Timestamp: 2018-03-21 23:29:28.000000
SCN: 312.1419248878 (131449045230)
Redo File: FRA/xiaoxu/onlinelog/group9
BR Previous Recovery Checkpoint:
Thread #: 2
Sequence #: 0
RBA: 0
Timestamp: 2018-02-05 23:48:51.600282
SCN: Not available
Redo File:
BR Begin Recovery Checkpoint:
Thread #: 2
Sequence #: 21336
RBA: 23109392
Timestamp: 2018-03-21 20:13:01.000000
SCN: 312.1372881231 (131402677583)
Redo File:
BR End Recovery Checkpoint:
Thread #: 2
Sequence #: 21336
RBA: 23337728
Timestamp: 2018-03-21 20:13:02.000000
SCN: 312.1372881525 (131402677877)
Redo File:
Write Checkpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 6715
RBA: 6717
Timestamp: 2018-03-21 23:29:30.937451
Extract Trail: /u01goldengate/112105/dirdat/tt
4、创建extract EXIAOXU
备注:创建抽取进程EXIAOXU,线程总数是2,从当前时间开始读取日志
add extract EXIAOXU,tranlog,begin now,threads 2
alter extract EXIAOXU,extseqno 19290,extrba 3958848 thread 1--update current checkpoint
alter extract EXIAOXU,ioextseqno 19290,ioextrba 3958352 thread 1--update recovery checkpoint,这一步不是必须的,如果停止没长有事务运行
alter extract EXIAOXU,extseqno 2139,extrba 5846720 thread 2--update current checkpoint
alter extract EXIAOXU,ioextseqno 2139,ioextrba 5808848 thread 2--update recovery checkpoint,这一步不是必须的,如果停止没长有事务运行
--以下是具体执行步骤
GGSCI > add extract EXIAOXU,tranlog,begin now,threads 2
EXTRACT added.
GGSCI > alter extract EXIAOXU,extseqno 19290,extrba 3958848 thread 1
EXTRACT altered.
GGSCI > alter extract EXIAOXU,ioextseqno 19290,ioextrba 3958352 thread 1
2018-03-21 23:40:19 INFO OGG-00988 WARNING: Unsupported operation. This might cause transactional inconsistency. Modifying input checkpoint #1, Oracle thread #1 of EXTRACT: ioseq = 19290 iorba = 3958352.
Are you sure you want to continue? y
EXTRACT altered.
GGSCI> alter extract EXIAOXU,extseqno 21339,extrba 58046720 thread 2
EXTRACT altered.
GGSCI> alter extract EXIAOXU,ioextseqno 21339,ioextrba 58008848 thread 2
2018-03-21 23:40:37 INFO OGG-00988 WARNING: Unsupported operation. This might cause transactional inconsistency. Modifying input checkpoint #2, Oracle thread #2 of EXTRACT: ioseq = 21339 iorba = 58008848.
Are you sure you want to continue? y
EXTRACT altered.
5、增加extract EXIAOXU的exttrail信息
备注--有bug,不能区分rmttrail和exttrail全部被删除了.
当没有pump时,手动指定exttrail的seqno是下一个时,replicat无法察觉到extract已经切换,此时replicat的seqno是6715,rba是1727,那么extract指定为相同即可,因为extract重新启动后触发切换到一个.这个也需要注意的.类似存在pump也是一样,需要由extract或者pump去触发下游去自动轮询到下一个trailfile,否则会出现饿死的情况.
add exttrail /u01/goldengate/112105/dirdat/tt,seqno 6715,rba 1727 extract EXIAOXU
GGSCI > add exttrail /u01/goldengate/112105/dirdat/tt,seqno 6715,rba 1727 extract EXIAOXU
EXTTRAIL added.
6、启动extract EXIAOXU
GGSCI > start EXIAOXU
7、检查extract EXIAOXU状态
GGSCI> info EXIAOXU
EXTRACT EXIAOXU Last Started 2018-03-05 11:48 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:03:28 ago)
【重建抽取进程--对于抽取进程所在的文件系统损坏或者被删除的情况】
--备注:对于OGG 11.2.1.0.5开始,replicat可以跳过已经应用的事务,所以源端extract将时间点往前推N个小时,不过还是会存在丢失事务的情况,例如N 1之前启动的事务还没有提交,第N-2小时又应用需要同步的表,则此时会出现丢失数据的情况.根据自己的系统情况来判断最长事务避免丢失事务的情况.
1、如果源端是相对静态,可以直接重新创建即可
2、如果源端归档日志存在(包括恢复),可以直接重建即可
3、如果源端无法恢复或者已删除,则需要重建抽取进程并重新初始化.
4、如果不考虑数据可以丢失的情况,如果可以丢失的话,重建抽取进程使用begin now.
【通过获取replicat端信息来重建抽取进程:以oracle为例】
1、正常停止ogg replicat进程,获取replicat的checkpoint信息
GGSCI> info RXIAOXU showch
REPLICAT RXIAOXU Last Started 2019-03-24 12:44 Status STOPPED
Checkpoint Lag 00:00:00 (updated 45:04:11 ago)
Log Read Checkpoint File /u01/goldengate/112106/dirdat/tt000013
2018-03-25 17:11:11.000000 RBA 3824279
Current Checkpoint Detail:
Current Checkpoint (position of last record read in the data source):
Sequence #: 13
RBA: 3824279
Timestamp: 2018-03-25 17:11:11.000000
Extract Trail: /u01/goldengate/112106/dirdat/tt
2、通过logdump获取详细信息
--获取事务提交时间,事务scn,记录rowid、DML的类型以及记录属于哪个归档日志等信息
Logdump 6567 >open /u01/goldengate/112106/dirdat/tt000013
Current LogTrail is /u01/goldengate/112106/dirdat/tt000013
Logdump 6568 >ghdr on
Logdump 6569 >detail data on
Logdump 6570 >ggstoken detail
Logdump 6571 >pos 45953765
Reading forward from RBA 45953765
Logdump 6572 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 279 (x0117) IO Time : 2018/03/20 10:33:18.000.000 --这个事务提交时间
IOType : 5 (x05) OrigNode : 255 (xff)
TransInd : . (x03) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 140545 AuditPos : 3628567568
Continued : N (x00) RecCount : 1 (x01)
--TransInd来表事务的开始,中间位置,结尾还是单条记录.
TransInd : . (x00) 表示记录开始位
TransInd : . (x01) 表示记录中间位置
TransInd : . (x02) 表示记录结尾位置
TransInd : . (x03) 表示这个事务只有1条记录
2018/03/20 10:33:18.000.000 Insert Len 279 RBA 45953765
Name: XIAOXU.XIAOXU
--对XIAOXU用户下的XIAOXU表进行插入数据且当前事务只有一条记录.
Column 0 (x0000), Len 40 (x0028)
0000 0024 3764 6630 3232 3365 2d33 3932 632d 3438 | ...$7df0223e-392c-48
3838 2d38 3735 362d 6331 3439 3935 3539 6264 6265 | 88-8756-c1499559bdbe
Column 1 (x0001), Len 40 (x0028)
0000 0024 6665 3138 6437 6437 2d36 3535 372d 3462 | ...$fe18d7d7-6557-4b
3565 2d38 3964 612d 6639 6266 3165 6533 6266 3031 | 5e-89da-f9bf1ee3bf01
Column 2 (x0002), Len 3 (x0003)
0000 59 | ..Y
GGS tokens:
TokenID x52 'R' ORAROWID Info x00 Length 20
4141 4f4b 3651 4143 3741 4149 495a 7141 4142 0001 | AAOK6QAC7AAIIZqCCB..--这个是记录的rowid,之前遇到遇到有的软件复制时候通过rowid来更新数据,这个最快,但是如果源或者目标表的记录rowid改变就异常了.
TokenID x4c 'L' LOGCSN Info x00 Length 13
3133 3430 3739 3237 3834 3236 39 | 1340792784269 --这个是源端数据库事务的scn
TokenID x36 '6' TRANID Info x00 Length 14
3334 3332 2e32 2e31 3330 3632 3939 | 3432.2.1306299
3、通过上面信息来重建extract
--本次以IO Time:2018/03/20 10:33:18.000.000来重建源端的extract,往前推2小时,目标replicat通过aftercsn来跳过重复事务.
--如果goldengate软件丢失,需要预先安装满足数据库版本的软件
本次跳过安装.
--重建extract
add extract exiaoxu,tranlog,begin 2018/03/20 08:33:18.000.000,threads 2
add exttrail /u01/goldengate/112106/dirdat/tt,extract exiaoxu
--重建pump
add extract pxiaoxu,exttrailsource /u01/goldengate/112106/dirdat/tt
add rmttrail /u01/goldengate/112106/dirdat/tt,seqno 14,rba 0
--replicat
alter replicat rxiaoxu,etrollover
alter replicat rxiaoxu,aftercsn 1340792784269
至此完成2种下重建extract,欢迎大家多交流和评论,如果觉得不错,可以转发,顺手关注udapp.谢谢。