总结
从原理上来看,MVCC需要给定事务ID后,能查询到事务的状态。
在PG中事务状态可以从几个路径获取:
- 在快照中查询(活跃事务)
- 在元组头的状态为查询(不活跃事务)
- 在CLOG中查询(不活跃事务)
如果不看实现只看概念,不活跃事务提交状态也可以在XLOG中查询,CLOG可以视作一种XLOG commit/rollback日志的缓存、映射,一种事务提交状态的快速查询方式。
所以在write-WAL-before-data中,CLOG也会按照data来处理,只有XLOG属于WAL。
Postgresql中clog写盘实现SlruPhysicalWritePage
postgresql中clog使用SLRU机制读写,在Slru写盘前,会有保证xlog先写的机制:
- group_lsn表示32个事务一组中最大的日志序列号(LSN)。
- group_lsn主要用于事务提交非同步落盘的场景。
static bool
SlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata)
{
...
if (shared->group_lsn != NULL)
{
/*
* We must determine the largest async-commit LSN for the page. This
* is a bit tedious, but since this entire function is a slow path
* anyway, it seems better to do this here than to maintain a per-page
* LSN variable (which'd need an extra comparison in the
* transaction-commit path).
*/
XLogRecPtr max_lsn;
int lsnindex,
lsnoff;
lsnindex = slotno * shared->lsn_groups_per_page;
max_lsn = shared->group_lsn[lsnindex ];
for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff )
{
XLogRecPtr this_lsn = shared->group_lsn[lsnindex ];
if (max_lsn < this_lsn)
max_lsn = this_lsn; <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN
}
if (!XLogRecPtrIsInvalid(max_lsn))
{
/*
* As noted above, elog(ERROR) is not acceptable here, so if
* XLogFlush were to fail, we must PANIC. This isn't much of a
* restriction because XLogFlush is just about all critical
* section anyway, but let's make sure.
*/
START_CRIT_SECTION();
XLogFlush(max_lsn); <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点!
END_CRIT_SECTION();
}
}
...
if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)
{
...
}
}
Postgresql中用户数据写盘实现FlushBuffer
数据页面同理,也是先找到页面lsn,刷xlog,在写数据。
代码语言:javascript复制static void
FlushBuffer(BufferDesc *buf, SMgrRelation reln)
{
...
buf_state = LockBufHdr(buf);
/*
* Run PageGetLSN while holding header lock, since we don't have the
* buffer locked exclusively in all cases.
*/
recptr = BufferGetLSN(buf); <<<<<<<<<<<<<<<<<<<<<<<<< 找到页面的LSN
/* To check if block content changes while flushing. - vadim 01/17/97 */
buf_state &= ~BM_JUST_DIRTIED;
UnlockBufHdr(buf, buf_state);
/*
* Force XLOG flush up to buffer's LSN. This implements the basic WAL
* rule that log updates must hit disk before any of the data-file changes
* they describe do.
*
* However, this rule does not apply to unlogged relations, which will be
* lost after a crash anyway. Most unlogged relation pages do not bear
* LSNs since we never emit WAL records for them, and therefore flushing
* up through the buffer LSN would be useless, but harmless. However,
* GiST indexes use LSNs internally to track page-splits, and therefore
* unlogged GiST pages bear "fake" LSNs generated by
* GetFakeLSNForUnloggedRel. It is unlikely but possible that the fake
* LSN counter could advance past the WAL insertion point; and if it did
* happen, attempting to flush WAL through that location would fail, with
* disastrous system-wide consequences. To make sure that can't happen,
* skip the flush if the buffer isn't permanent.
*/
if (buf_state & BM_PERMANENT)
XLogFlush(recptr); <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点!
...
smgrwrite(reln,
BufTagGetForkNum(&buf->tag),
buf->tag.blockNum,
bufToWrite,
false);
...
}