总结
总结
- 进入备机流程和主库启动流程类似,区别是主库做完xlog归档模式恢复之后就把startup退了,正常启动,而备机不会退startup,持续等待、读取日志(等的是哪把锁?看下一篇)。
- PG辅助进程都是从AuxiliaryProcessMain调进去的,区别是给AuxiliaryProcessMain不同的-x参数,例如-x2就是startup。
- PG的备库、redo函数入口都是StartupXLOG函数
- StartupXLOG函数主要分成几步:
- 解析control文件
- 解析recovery.conf
- 找到上一个chk位点
- 把chk日志读出来
- 用chk信息更新共享内存事务系统状态
- 用chk的redo位点作为redo起点(这就是chk启动时的位点)
- 每次拿一条日志,数据读出来缓存在XLogReaderState的maindata部分,record主要起分类的作用
- 每次拿一条日志走RmgrTable进行具体redo
- 如果是流复制状态,最后会停留在ReadRecord中等待最新日志过来,注意这个函数是拿到8k并确认是一条完整的xlog就可以进入循环redo,所以要比mysql binlog逻辑复制更及时。
下面分析中涉及的集中lsn表示形式:
Postgresql中lsn的三种记录形式与相关代码
备库启动与redo流程
1、下面【7】以后进入redo循环
2、checkPoint.redo表示redo起点,就是chk启动时的位点
3、checkPointLoc表示chk的终点,就是chk日志插入的位点
4、共享内存中 XLogCtl的两个指针记录刚做完的xlog,不停的往前推:
代码语言:javascript复制EndRecPtr表示要做的这条。 做之前更新:XLogCtl->replayEndRecPtr = EndRecPtr; 开始做:redo 做之后更新:XLogCtl->lastReplayedEndRecPtr = EndRecPtr;
【分析案例位点状态pg_controldata】
Latest checkpoint location: 2F/F849D720
Prior checkpoint location: 2F/F849D720
Latest checkpoint REDO location: 2F/F849D6E8
Latest checkpoint REDO WAL file: 000000010000002F000000F8
Latest checkpoint TimeLineID: 1
【分析案例启动流程】
PostmasterMain
StartupDataBase:宏专门用于拉起startup
StartChildProcess(StartupProcess)
fork_process
InitPostmasterChild
...
InitializeLatchSupport
InitLatch
AuxiliaryProcessMain(argc=2, argv='postgres -x2'):所有辅助进程都用这个函数拉起,但入参不同,2表示startup进程
...
BaseInit
InitAuxiliaryProcess
ProcSignalInit
InitBufferPoolBackend
switch (MyAuxProcType) :从postgres -x2拿到启动身份
case StartupProcess
StartupProcessMain
******************StartupXLOG :【开始redo流程】
ReadControlFile :【1】解析控制文件
readRecoveryCommandFile :【2】解析recovery.conf参数文件
ArchiveRecoveryRequested=true
StandbyModeRequested=true :true if standby_mode == 'on'
if (ArchiveRecoveryRequested && StandbyModeRequested) :进入备机模式
OwnLatch(&XLogCtl->recoveryWakeupLatch) :给latch配上当前PID
MemSet(&private, 0, sizeof(XLogPageReadPrivate)) :
xlogreader = XLogReaderAllocate(&XLogPageRead, &private) :【3】初始化整个XLogReaderState结构,并记录callback函数XLogPageRead
read_backup_label :如果backup_label存在,只用backup_label记录的位点
stat(TABLESPACE_MAP, &st) :如果tablespace_map存在,使用映射
checkPointLoc = ControlFile->checkPoint :206029051680转16进制0x2FF849D720:Latest checkpoint location 2F/F849D720
RedoStartLSN = ControlFile->checkPointCopy.redo :206029051624转16进制0x2FF849D6E8:Latest checkpoint REDO location 2F/F849D6E8
record = ReadCheckpointRecord(xlogreader, checkPointLoc) :【4】把chk这条读出来,见下面章节分析
memcpy(&checkPoint, XLogRecGetData(xlogreader), ...) :从record还原checkPoint:XLogRecGetData:xlogreader->main_data
这是chk结束位点 LastRec = RecPtr = checkPointLoc : p/x checkPoint->redo 0x2ff849d6e8
: p/x checkPointLoc 0x2ff849d720
更新ShmemVariableCache、MultiXactSetNextMXact
更新AdvanceOldestClogXid、SetTransactionIdLimit
更新SetMultiXactIdLimit、SetCommitTsLimit
redo是chk启动点 RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo :【5】都是起点 0x2ff849d6e8
minRecoveryPoint = ControlFile->minRecoveryPointTLI :0x2ff849d7c8
minRecoveryPointTLI = ControlFile->minRecoveryPointTLI :1
XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr = 0x2ff849d6e8 : lastReplayedEndRecPtr,做一次更新一个
XLogCtl->replayEndRecPtr = checkPoint.redo = 0x2ff849d6e8 : replayEndRecPtr = lastReplayedEndRecPtr
record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false) :【6】找到第一条开始要做的日志
【第一轮做redo后第一条长度0x18】
/* main redo apply loop */
RmgrTable[record->xl_rmid].rm_redo(xlogreader) :【7】开搞!从checkPoint.redo开始
standby_redo(XLogReaderState *record) :数据在xlogreader->main_data里面
XLOG_RUNNING_XACTS
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->lastReplayedEndRecPtr = EndRecPtr; :【8】更新lastReplayedEndRecPtr,日志长度0x18
SpinLockRelease(&XLogCtl->info_lck); : 0x2ff849d6e8 0x18 = EndRecPtr 0x2ff849d720
LastRec = ReadRecPtr :刚才做的是第一条0x2ff849d6e8,记录0x2ff849d6e8
record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false) :再拿一条继续做
/* end of main redo apply loop */
【第二轮做redo后第二条长度0x50】
/* main redo apply loop */
RmgrTable[record->xl_rmid].rm_redo(xlogreader) :
standby_redo(XLogReaderState *record) :数据在xlogreader->main_data里面
XLOG_CHECKPOINT_ONLINE
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->lastReplayedEndRecPtr = EndRecPtr; :【8】更新lastReplayedEndRecPtr,日志长度0x18
SpinLockRelease(&XLogCtl->info_lck); : 0x2ff849d720 0x50 = EndRecPtr 0x2ff849d790
LastRec = ReadRecPtr :刚才做的是第一条0x2ff849d720,记录0x2ff849d720
record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false) :再拿一条继续做
/* end of main redo apply loop */
...
【第n轮】startup进程会持续等待xlog
/* main redo apply loop */
...
record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false)
/* end of main redo apply loop */
堆栈:
__epoll_wait_nocancel
WaitEventSetWaitBlock
WaitEventSetWait
WaitLatchOrSocket
WaitLatch
WaitForWALToBecomeAvailable
XLogPageRead
ReadPageInternal
XLogReadRecord
ReadRecord
StartupXLOG
读一条日志ReadCheckpointRecord
代码语言:javascript复制XLogReaderAllocate
ReadCheckpointRecord
record = ReadRecord(xlogreader, RecPtr: 0x2FF849D720) : 读取位置 0x2FF849D720
XLogReadRecord(xlogreader, RecPtr: 0x2FF849D720)
state->currRecPtr = 0x2FF849D720
targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ) : 计算页面ID 0x2FF849C000
targetRecOff = RecPtr % XLOG_BLCKSZ : 计算页内偏移 0x1720
:
读一个8k上来 ReadPageInternal(state, targetPagePtr:0x2FF849C000 , Min(targetRecOff SizeOfXLogRecord, XLOG_BLCKSZ): 0x1720) :最多读8k出来
XLByteToSeg(pageptr, targetSegNo) : 计算SEG号targetSegNo=0x2ff8
targetPageOff = (pageptr % XLogSegSize) : 计算SEG号内的偏移targetPageOff=0x49c000
targetSegmentPtr = pageptr - targetPageOff : 目标SEG的起始位置
(第一次读:按SEG头的位置读)
state->read_page : 调用钩子
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
targetPagePtr = 0x2ff8000000
reqLen = 8192
targetRecPtr = 0x2ff849d720
XLByteToSeg(targetPagePtr, targetSegNo) : targetPagePtr = 0x2ff8
targetPageOff = targetPagePtr % XLogSegSize : targetPageOff = 0
(第一个文件或前一个文件已经读完了,开始读下一个文件)
XLByteToSeg(targetPagePtr, readSegNo) : readSegNo = 0x2ff8
WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr)
RecPtr = 0x2ff8002000
randAccess = 1
fetching_ckpt = 1
tliRecPtr = 0x2ff849d720
XLogFileReadAnyTLI(readSegNo: 0x2ff8) : 把指定的XLOG文件的FD读出来
lseek(readFile, (off_t) readOff, SEEK_SET) : 开始读文件
read(readFile, readBuf, XLOG_BLCKSZ)
XLogReaderValidatePageHeader : 检测页面正常
XLogReaderValidatePageHeader : 检测页面正常
(第二次读:按SEG内的位置读)
state->read_page
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
targetPagePtr = 0x2ff849c000
reqLen = 5944
targetRecPtr = 0x2ff849d720
XLogReaderValidatePageHeader : 检测页面正常
************(ReadPageInternal读完了数据在state->readBuf里面)
record = (XLogRecord *) (state->readBuf RecPtr % XLOG_BLCKSZ) : 读了一个完整的页面,用读取位置0x2ff849d720对8k取余能得到record的位置state->readBuf 0x1720
total_len = record->xl_tot_len;
ReadPageInternal
ValidXLogRecord
(CRC校验)