Postgresql源码(35)备库startup启动和redo流程分析

2022-05-12 09:00:43 浏览数 (1)

总结

总结

  1. 进入备机流程和主库启动流程类似,区别是主库做完xlog归档模式恢复之后就把startup退了,正常启动,而备机不会退startup,持续等待、读取日志(等的是哪把锁?看下一篇)。
  2. PG辅助进程都是从AuxiliaryProcessMain调进去的,区别是给AuxiliaryProcessMain不同的-x参数,例如-x2就是startup。
  3. PG的备库、redo函数入口都是StartupXLOG函数
  4. StartupXLOG函数主要分成几步:
    1. 解析control文件
    2. 解析recovery.conf
    3. 找到上一个chk位点
    4. 把chk日志读出来
    5. 用chk信息更新共享内存事务系统状态
    6. 用chk的redo位点作为redo起点(这就是chk启动时的位点)
    7. 每次拿一条日志,数据读出来缓存在XLogReaderState的maindata部分,record主要起分类的作用
    8. 每次拿一条日志走RmgrTable进行具体redo
    9. 如果是流复制状态,最后会停留在ReadRecord中等待最新日志过来,注意这个函数是拿到8k并确认是一条完整的xlog就可以进入循环redo,所以要比mysql binlog逻辑复制更及时。

下面分析中涉及的集中lsn表示形式:

Postgresql中lsn的三种记录形式与相关代码

备库启动与redo流程

1、下面【7】以后进入redo循环

2、checkPoint.redo表示redo起点,就是chk启动时的位点

3、checkPointLoc表示chk的终点,就是chk日志插入的位点

4、共享内存中 XLogCtl的两个指针记录刚做完的xlog,不停的往前推:

EndRecPtr表示要做的这条。 做之前更新:XLogCtl->replayEndRecPtr = EndRecPtr; 开始做:redo 做之后更新:XLogCtl->lastReplayedEndRecPtr = EndRecPtr;

代码语言:javascript复制
【分析案例位点状态pg_controldata】
Latest checkpoint location:           2F/F849D720
Prior checkpoint location:            2F/F849D720
Latest checkpoint REDO location:      2F/F849D6E8
Latest checkpoint REDO WAL file:      000000010000002F000000F8
Latest checkpoint TimeLineID:         1


【分析案例启动流程】
PostmasterMain
  StartupDataBase:宏专门用于拉起startup
    StartChildProcess(StartupProcess)
      fork_process
        InitPostmasterChild
          ...
          InitializeLatchSupport
          InitLatch
          AuxiliaryProcessMain(argc=2, argv='postgres -x2'):所有辅助进程都用这个函数拉起,但入参不同,2表示startup进程
            ...
            BaseInit
            InitAuxiliaryProcess
            ProcSignalInit
            InitBufferPoolBackend
            switch (MyAuxProcType)                                               :从postgres -x2拿到启动身份
              case StartupProcess
                StartupProcessMain
******************StartupXLOG                                                    :【开始redo流程】
                    ReadControlFile                                              :【1】解析控制文件
                    readRecoveryCommandFile                                      :【2】解析recovery.conf参数文件
                      ArchiveRecoveryRequested=true
                      StandbyModeRequested=true                                  :true if standby_mode == 'on'
                    if (ArchiveRecoveryRequested && StandbyModeRequested)        :进入备机模式
                    OwnLatch(&XLogCtl->recoveryWakeupLatch)                      :给latch配上当前PID
                    MemSet(&private, 0, sizeof(XLogPageReadPrivate))             :
                    xlogreader = XLogReaderAllocate(&XLogPageRead, &private)     :【3】初始化整个XLogReaderState结构,并记录callback函数XLogPageRead
                    read_backup_label                                            :如果backup_label存在,只用backup_label记录的位点
                    stat(TABLESPACE_MAP, &st)                                    :如果tablespace_map存在,使用映射
                    
                    checkPointLoc = ControlFile->checkPoint                      :206029051680转16进制0x2FF849D720:Latest checkpoint location 2F/F849D720
                    RedoStartLSN = ControlFile->checkPointCopy.redo              :206029051624转16进制0x2FF849D6E8:Latest checkpoint REDO location 2F/F849D6E8
                    record = ReadCheckpointRecord(xlogreader, checkPointLoc)     :【4】把chk这条读出来,见下面章节分析

                    memcpy(&checkPoint, XLogRecGetData(xlogreader), ...)         :从record还原checkPoint:XLogRecGetData:xlogreader->main_data


这是chk结束位点      LastRec = RecPtr = checkPointLoc                             : p/x checkPoint->redo 0x2ff849d6e8
                                                                                 : p/x checkPointLoc    0x2ff849d720
                    更新ShmemVariableCache、MultiXactSetNextMXact
                    更新AdvanceOldestClogXid、SetTransactionIdLimit
                    更新SetMultiXactIdLimit、SetCommitTsLimit

redo是chk启动点      RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo  :【5】都是起点 0x2ff849d6e8 
                    minRecoveryPoint = ControlFile->minRecoveryPointTLI                              :0x2ff849d7c8
                    minRecoveryPointTLI = ControlFile->minRecoveryPointTLI                           :1

                    XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr = 0x2ff849d6e8         : lastReplayedEndRecPtr,做一次更新一个
                    XLogCtl->replayEndRecPtr = checkPoint.redo = 0x2ff849d6e8                        : replayEndRecPtr = lastReplayedEndRecPtr


                    record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false)                   :【6】找到第一条开始要做的日志

【第一轮做redo后第一条长度0x18】
                    /* main redo apply loop */
                    RmgrTable[record->xl_rmid].rm_redo(xlogreader)                                   :【7】开搞!从checkPoint.redo开始
                        standby_redo(XLogReaderState *record)                                        :数据在xlogreader->main_data里面
                            XLOG_RUNNING_XACTS

                    SpinLockAcquire(&XLogCtl->info_lck);
                    XLogCtl->lastReplayedEndRecPtr = EndRecPtr;                                      :【8】更新lastReplayedEndRecPtr,日志长度0x18
                    SpinLockRelease(&XLogCtl->info_lck);                                             : 0x2ff849d6e8   0x18 = EndRecPtr 0x2ff849d720


                    LastRec = ReadRecPtr                                                             :刚才做的是第一条0x2ff849d6e8,记录0x2ff849d6e8 
                    record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false)                   :再拿一条继续做
                    /* end of main redo apply loop */

【第二轮做redo后第二条长度0x50】
                    /* main redo apply loop */
                    RmgrTable[record->xl_rmid].rm_redo(xlogreader)                                   :
                        standby_redo(XLogReaderState *record)                                        :数据在xlogreader->main_data里面
                            XLOG_CHECKPOINT_ONLINE

                    SpinLockAcquire(&XLogCtl->info_lck);
                    XLogCtl->lastReplayedEndRecPtr = EndRecPtr;                                      :【8】更新lastReplayedEndRecPtr,日志长度0x18
                    SpinLockRelease(&XLogCtl->info_lck);                                             : 0x2ff849d720   0x50 = EndRecPtr 0x2ff849d790


                    LastRec = ReadRecPtr                                                             :刚才做的是第一条0x2ff849d720,记录0x2ff849d720
                    record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false)                   :再拿一条继续做
                    /* end of main redo apply loop */  

...

【第n轮】startup进程会持续等待xlog
                    /* main redo apply loop */
                    ...
                    record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false)
                    /* end of main redo apply loop */  


堆栈:
__epoll_wait_nocancel
WaitEventSetWaitBlock
WaitEventSetWait
WaitLatchOrSocket
WaitLatch
WaitForWALToBecomeAvailable
XLogPageRead
ReadPageInternal
XLogReadRecord
ReadRecord
StartupXLOG

读一条日志ReadCheckpointRecord

代码语言:javascript复制
XLogReaderAllocate
ReadCheckpointRecord
    record = ReadRecord(xlogreader, RecPtr: 0x2FF849D720)            : 读取位置 0x2FF849D720
        XLogReadRecord(xlogreader, RecPtr: 0x2FF849D720)
            state->currRecPtr = 0x2FF849D720
      
            targetPagePtr = RecPtr - (RecPtr % XLOG_BLCKSZ)          : 计算页面ID 0x2FF849C000
            targetRecOff = RecPtr % XLOG_BLCKSZ                      : 计算页内偏移 0x1720
                                                                     : 
读一个8k上来 ReadPageInternal(state, targetPagePtr:0x2FF849C000 , Min(targetRecOff   SizeOfXLogRecord, XLOG_BLCKSZ): 0x1720) :最多读8k出来
                XLByteToSeg(pageptr, targetSegNo)                    : 计算SEG号targetSegNo=0x2ff8
                targetPageOff = (pageptr % XLogSegSize)              : 计算SEG号内的偏移targetPageOff=0x49c000
                targetSegmentPtr = pageptr - targetPageOff           : 目标SEG的起始位置
                
                (第一次读:按SEG头的位置读)
                state->read_page                                     : 调用钩子
                XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
                    targetPagePtr = 0x2ff8000000
                    reqLen = 8192
                    targetRecPtr = 0x2ff849d720
        
                    XLByteToSeg(targetPagePtr, targetSegNo)           : targetPagePtr = 0x2ff8
                    targetPageOff = targetPagePtr % XLogSegSize       : targetPageOff = 0
        
                  (第一个文件或前一个文件已经读完了,开始读下一个文件)
                    XLByteToSeg(targetPagePtr, readSegNo)             : readSegNo = 0x2ff8
                    WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr)
                        RecPtr = 0x2ff8002000
                        randAccess = 1
                        fetching_ckpt = 1
                        tliRecPtr = 0x2ff849d720
                      
                        XLogFileReadAnyTLI(readSegNo: 0x2ff8)         : 把指定的XLOG文件的FD读出来
                  
                    lseek(readFile, (off_t) readOff, SEEK_SET)        : 开始读文件
                    read(readFile, readBuf, XLOG_BLCKSZ)
        
                    XLogReaderValidatePageHeader                      : 检测页面正常
        
                XLogReaderValidatePageHeader                          : 检测页面正常
        
                (第二次读:按SEG内的位置读)
                state->read_page
                XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *readTLI)
                    targetPagePtr = 0x2ff849c000
                    reqLen = 5944
                    targetRecPtr = 0x2ff849d720
        
                XLogReaderValidatePageHeader                          : 检测页面正常

************(ReadPageInternal读完了数据在state->readBuf里面)
            record = (XLogRecord *) (state->readBuf   RecPtr % XLOG_BLCKSZ)   : 读了一个完整的页面,用读取位置0x2ff849d720对8k取余能得到record的位置state->readBuf 0x1720
            total_len = record->xl_tot_len;
            ReadPageInternal
            ValidXLogRecord
                (CRC校验)

0 人点赞