一、刷盘机制
1 刷盘时机
RocketMQ消息存储有了顺序写和内存映射的加持,写入性能得到了极大保证。
内存映射是在内核中维护用户空间虚拟地址与文件偏移的映射关系,可以让用户态向操作数组一样读写文件,当对应页数据未读入内存时就会触发缺页中断,再由CPU响应中断根据映射关系读取文件中指定位置的数据并添加用户页表项。
同步刷盘又称为组提交,RocketMQ的GroupCommitService服务每次收集10ms内的写请求,刷盘一次CommitLog文件。优点是能够保证消息不丢失,但是效率偏低。
异步刷盘将消息写入到直接内存后就响应客户端,不会立刻刷盘,而是由异步线程每隔500ms执行FileChannel.forch()刷盘。
2 读写分离
RocketMQ读写请求都会到达页缓存,容易出现Broker busy异常。为了降低页缓存压力,引入了transientStorePoolEnable机制,即内存级别的读写分离机制。
RocketMQ先将消息写入到堆外并立即返回响应生产端,然后异步将堆外的消息提交到页缓存,再异步刷盘。该机制最大优势是实现了批量化消息写入,缺点是消息会丢失。消息写入时写入堆外内存,消息读取时从页缓存读取,读写分离减轻页缓存压力。
二、同步刷盘
同步刷盘采用组提交机制GroupCommitService,每次发送线程将消息写入到mmapedFile后,创建一个刷盘请求GroupCommitRequest,添加到requestsWrite。
代码语言:java复制/**
* 写队列,发送线程会将消息写入写队列,如果同步刷盘会阻塞发送线程,在刷盘后被组提交线程唤醒
*/
private volatile LinkedList<GroupCommitRequest> requestsWrite = new LinkedList<GroupCommitRequest>();
/**
* 读队列,组提交线程每隔10ms交换读写队列并将读队列中消息刷盘
*/
private volatile LinkedList<GroupCommitRequest> requestsRead = new LinkedList<GroupCommitRequest>();
private final PutMessageSpinLock lock = new PutMessageSpinLock();
GroupCommitService每次阻塞10ms然后swapRequests,将读写队列交换一次,刷盘时不影响发送线程写入提交请求。
代码语言:java复制public void run() {
CommitLog.log.info(this.getServiceName() " service started");
while (!this.isStopped()) {
try {
this.waitForRunning(10);
this.doCommit();
} catch (Exception e) {
CommitLog.log.warn(this.getServiceName() " service has exception. ", e);
}
}
//结束代码,最后一次提交
}
protected void waitForRunning(long interval) {
if (hasNotified.compareAndSet(true, false)) {
this.onWaitEnd();
return;
}
//entry to wait
waitPoint.reset();
try {
waitPoint.await(interval, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
log.error("Interrupted", e);
} finally {
hasNotified.set(false);
this.onWaitEnd();
}
}
@Override
protected void onWaitEnd() {
this.swapRequests();
}
private void swapRequests() {
lock.lock();
try {
LinkedList<GroupCommitRequest> tmp = this.requestsWrite;
this.requestsWrite = this.requestsRead;
this.requestsRead = tmp;
} finally {
lock.unlock();
}
}
然后doCommit处理读队列中GroupCommitRequest进行刷盘,最后唤醒发送线程响应生产者。
代码语言:java复制private void doCommit() {
if (!this.requestsRead.isEmpty()) {
for (GroupCommitRequest req : this.requestsRead) {
// There may be a message in the next file, so a maximum of
// two times the flush
boolean flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
for (int i = 0; i < 2 && !flushOK; i ) {
//0表示只要有消息就刷盘,异步刷盘默认至少16KB才会刷盘
CommitLog.this.mappedFileQueue.flush(0);
flushOK = CommitLog.this.mappedFileQueue.getFlushedWhere() >= req.getNextOffset();
}
//刷盘之后唤醒发送线程响应客户端
req.wakeupCustomer(flushOK ? PutMessageStatus.PUT_OK : PutMessageStatus.FLUSH_DISK_TIMEOUT);
}
long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
if (storeTimestamp > 0) {
CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
}
this.requestsRead = new LinkedList<>();
} else {
// Because of individual messages is set to not sync flush, it
// will come to this process
CommitLog.this.mappedFileQueue.flush(0);
}
}
三、异步刷盘
如果非同步刷盘且开启了transientStorePoolEnable的话,writeBuffer就是transientStorePool的buffer,向mappedFile写入消息时并非向commitLog文件写入,而是向transientStorePool的buffer写入。
代码语言:java复制public CompletableFuture<PutMessageStatus> submitFlushRequest(AppendMessageResult result, MessageExt messageExt) {
// Synchronization flush
if (FlushDiskType.SYNC_FLUSH == this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) {
final GroupCommitService service = (GroupCommitService) this.flushCommitLogService;
if (messageExt.isWaitStoreMsgOK()) {
GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() result.getWroteBytes(),
this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
flushDiskWatcher.add(request);
service.putRequest(request);
return request.future();
} else {
service.wakeup();
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
// Asynchronous flush
else {
if (!this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
flushCommitLogService.wakeup();
} else {
commitLogService.wakeup();
}
return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK);
}
}
public void init(final String fileName, final int fileSize,
final TransientStorePool transientStorePool) throws IOException {
init(fileName, fileSize);
this.writeBuffer = transientStorePool.borrowBuffer();//直接内存
this.transientStorePool = transientStorePool;
}
每次将消息先写入直接内存writeBuffer,再由CommitRealTimeService每隔200ms将writeBuffer中数据写入到fileChannel中。
代码语言:java复制public void run() {
CommitLog.log.info(this.getServiceName() " service started");
while (!this.isStopped()) {
int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitIntervalCommitLog();
//默认是4页才会刷盘,如果达到时间间隔200ms那么只要有数据就写入页缓存
int commitDataLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogLeastPages();
int commitDataThoroughInterval =
CommitLog.this.defaultMessageStore.getMessageStoreConfig().getCommitCommitLogThoroughInterval();
long begin = System.currentTimeMillis();
if (begin >= (this.lastCommitTimestamp commitDataThoroughInterval)) {
this.lastCommitTimestamp = begin;
commitDataLeastPages = 0;
}
try {
//将writeBuffer中消息刷入fileChannel
boolean result = CommitLog.this.mappedFileQueue.commit(commitDataLeastPages);
long end = System.currentTimeMillis();
if (!result) {
this.lastCommitTimestamp = end; // result = false means some data committed.
//now wake up flush thread.
flushCommitLogService.wakeup();
}
if (end - begin > 500) {
log.info("Commit data to file costs {} ms", end - begin);
}
this.waitForRunning(interval);
} catch (Throwable e) {
CommitLog.log.error(this.getServiceName() " service has exception. ", e);
}
}
//...
}
protected void commit0() {
int writePos = this.wrotePosition.get();
int lastCommittedPosition = this.committedPosition.get();
if (writePos - lastCommittedPosition > 0) {
try {
ByteBuffer byteBuffer = writeBuffer.slice();
byteBuffer.position(lastCommittedPosition);
byteBuffer.limit(writePos);
this.fileChannel.position(lastCommittedPosition);
//将writeBuffer写入到fileChannel中
this.fileChannel.write(byteBuffer);
this.committedPosition.set(writePos);
} catch (Throwable e) {
log.error("Error occurred when commit data to FileChannel.", e);
}
}
}
最后由FlushRealTimeService每隔500ms进行刷盘。
代码语言:java复制public void run() {
CommitLog.log.info(this.getServiceName() " service started");
while (!this.isStopped()) {
boolean flushCommitLogTimed = CommitLog.this.defaultMessageStore.getMessageStoreConfig().isFlushCommitLogTimed();
int interval = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushIntervalCommitLog();
//默认4页刷盘
int flushPhysicQueueLeastPages = CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogLeastPages();
int flushPhysicQueueThoroughInterval =
CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushCommitLogThoroughInterval();
boolean printFlushProgress = false;
// Print flush progress
long currentTimeMillis = System.currentTimeMillis();
//时间达到500ms时只要有消息也会刷盘
if (currentTimeMillis >= (this.lastFlushTimestamp flushPhysicQueueThoroughInterval)) {
this.lastFlushTimestamp = currentTimeMillis;
flushPhysicQueueLeastPages = 0;
printFlushProgress = (printTimes % 10) == 0;
}
try {
//阻塞500ms
if (flushCommitLogTimed) {
Thread.sleep(interval);
} else {
this.waitForRunning(interval);
}
if (printFlushProgress) {
this.printFlushProgress();
}
long begin = System.currentTimeMillis();
CommitLog.this.mappedFileQueue.flush(flushPhysicQueueLeastPages);
long storeTimestamp = CommitLog.this.mappedFileQueue.getStoreTimestamp();
if (storeTimestamp > 0) {
CommitLog.this.defaultMessageStore.getStoreCheckpoint().setPhysicMsgTimestamp(storeTimestamp);
}
long past = System.currentTimeMillis() - begin;
if (past > 500) {
log.info("Flush data to disk costs {} ms", past);
}
} catch (Throwable e) {
CommitLog.log.warn(this.getServiceName() " service has exception. ", e);
this.printFlushProgress();
}
}
// Normal shutdown, to ensure that all the flush before exit
boolean result = false;
for (int i = 0; i < RETRY_TIMES_OVER && !result; i ) {
result = CommitLog.this.mappedFileQueue.flush(0);
CommitLog.log.info(this.getServiceName() " service shutdown, retry " (i 1) " times " (result ? "OK" : "Not OK"));
}
this.printFlushProgress();
CommitLog.log.info(this.getServiceName() " service end");
}
public int flush(final int flushLeastPages) {
if (this.isAbleToFlush(flushLeastPages)) {
if (this.hold()) {
int value = getReadPosition();
try {
//We only append data to fileChannel or mappedByteBuffer, never both.
if (writeBuffer != null || this.fileChannel.position() != 0) {
//刷盘
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}
this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
return this.getFlushedPosition();
}
四、小结
同步刷盘可确保消息能够持久化,但是每隔10ms刷盘一次,性能不如异步刷盘。异步刷盘消息会先写入直接内存,再由异步线程每隔500ms将消息从直接内存写入到磁盘,性能好,而且页缓存压力小,但是丢失500ms的数据,不可靠。两种机制各有优缺点,需要根据业务场景来设置参数。