Redis使用及源码剖析-11.Redis持久化-2021-1-27

文章目录

前言
一、RDB持久化
- 1.RDB持久化简介
- 2.RDB文件的创建
- 3.RDB文件的载入
- 4.RDB文件结构分析
- 5.SAVE函数
二、AOF持久化
- 1.AOF持久化简介
- 2.AOF缓存区
- 3.AOF文件写入及追加
- 4.AOF重写
- 5.AOF文件写入同步函数
总结

前言

众所周知Redis是基于内存的key-value型数据库，这样每次执行数据库操作时只涉及到内存，不涉及磁盘读取写入，效率很高。但是随之而来的问题便是当服务端进程正常或者异常退出时，如何保证数据的安全性和可靠性，这就涉及到持久化的问题。Redis持久化分为RDB持久化和AOF持久化两种，本节将分别进行介绍。

一、RDB持久化

1.RDB持久化简介

Redis在服务端维护了多个数据库，每个数据库都包括很多个键值对。RDB持久化是将服务端所有数据库的状态保存到RDB文件中，这样服务端进程启动时就可以重新加载RDB文件来恢复数据。

2.RDB文件的创建

用户可以执行以下两个命令来手动创建RDB文件：其中save会阻塞服务器进程直到RDB文件创建完成，BGSAVE会生成一个子进程，由子进程创建RDB文件。

代码语言：javascript复制

SAVE
BGSAVE

服务端也可以自动生成RDB文件，服务端可以配置save选项，默认配置如下：

代码语言：javascript复制

save 900 1 //服务端在900秒内对数据库进行了至少1次修改
save 300 10 //服务端在300秒内对数据库进行了至少10次修改
save 60 10000 //服务端在60秒内对数据库进行了至少10000次修改

这样服务端在启动时会将配置信息存放在服务端结构体redisServer的saveparams成员中，如下所示：

代码语言：javascript复制

// 服务器的保存条件（BGSAVE 自动执行的条件）
struct saveparam {

    // 多少秒之内
    time_t seconds;

    // 发生多少次修改
    int changes;

};
struct redisServer {
	struct saveparam *saveparams;   /* Save points array for RDB */
    // 自从上次 SAVE 执行以来，数据库被修改的次数
    long long dirty;                /* Changes to DB from the last save */
    // 最后一次完成 SAVE 的时间
    time_t lastsave;                /* Unix time of last successful save */
	};

此外服务端还保存了上一次执行save或者bgsave命令的时间lastsave和上一次执行save或者bgsave之后执行的数据库修改操作（新增删除更新等）的次数dirty。这样服务端在执行100ms一次的周期性函数serverCron函数时，就可以根据当前时间和lastsave的差值以及dirty数目来判断saveparams中的每一项是否满足，只要有一项满足，服务端就会执行bgsave命令。

3.RDB文件的载入

Redis服务端在启动时会自动查找是否存在RDB文件，若存在则加载RDB文件进行持久化操作。需要注意的时，若是同时存在AOF持久化产生的AOF文件，则直接加载AOF文件，不会加载RDB文件。

4.RDB文件结构分析

完整RDB文件如下所示：其中REDIS是常量，用来在载入时检查是否是RDB文件，db_version是版本号，databases是保存的数据库状态，EOF是结束标志，check_num为校验和。

databases保存的是非空数据库的状态，如下所示：

其中每一个database表示一个数据库，具体内容如下：SELECTDB是常量，表示进入了一个数据库，db_number是数据库编号，key_value_pairs则为保存的键值对。

key_value_pairs部分的每一个键值对根据是否设置过期时间，保存的形式不同。分别如下所示：

可以看出设置了过期时间的键首先是EXPIREME_MS常量，接着是过期时间。然后才是所有键值对都有的type,它表示value的类型，最后是键值对key-value。其中value部分根据type类型的不同有着不同的结构，这里就不进一步详解了。

5.SAVE函数

创建RDB文件的save函数如下所示：

代码语言：javascript复制

/* Save the DB on disk. Return REDIS_ERR on error, REDIS_OK on success 
 *
 * 将数据库保存到磁盘上。
 *
 * 保存成功返回 REDIS_OK ，出错/失败返回 REDIS_ERR 。
 */
int rdbSave(char *filename) {
    dictIterator *di = NULL;
    dictEntry *de;
    char tmpfile[256];
    char magic[10];
    int j;
    long long now = mstime();
    FILE *fp;
    rio rdb;
    uint64_t cksum;

    // 创建临时文件
    snprintf(tmpfile,256,"temp-%d.rdb", (int) getpid());
    fp = fopen(tmpfile,"w");
    if (!fp) {
        redisLog(REDIS_WARNING, "Failed opening .rdb for saving: %s",
            strerror(errno));
        return REDIS_ERR;
    }

    // 初始化 I/O
    rioInitWithFile(&rdb,fp);

    // 设置校验和函数
    if (server.rdb_checksum)
        rdb.update_cksum = rioGenericUpdateChecksum;

    // 写入 RDB 版本号
    snprintf(magic,sizeof(magic),"REDISd",REDIS_RDB_VERSION);
    if (rdbWriteRaw(&rdb,magic,9) == -1) goto werr;

    // 遍历所有数据库
    for (j = 0; j < server.dbnum; j  ) {

        // 指向数据库
        redisDb *db = server.db j;

        // 指向数据库键空间
        dict *d = db->dict;

        // 跳过空数据库
        if (dictSize(d) == 0) continue;

        // 创建键空间迭代器
        di = dictGetSafeIterator(d);
        if (!di) {
            fclose(fp);
            return REDIS_ERR;
        }

        /* Write the SELECT DB opcode 
         *
         * 写入 DB 选择器
         */
        if (rdbSaveType(&rdb,REDIS_RDB_OPCODE_SELECTDB) == -1) goto werr;
        if (rdbSaveLen(&rdb,j) == -1) goto werr;

        /* Iterate this DB writing every entry 
         *
         * 遍历数据库，并写入每个键值对的数据
         */
        while((de = dictNext(di)) != NULL) {
            sds keystr = dictGetKey(de);
            robj key, *o = dictGetVal(de);
            long long expire;
            
            // 根据 keystr ，在栈中创建一个 key 对象
            initStaticStringObject(key,keystr);

            // 获取键的过期时间
            expire = getExpire(db,&key);

            // 保存键值对数据
            if (rdbSaveKeyValuePair(&rdb,&key,o,expire,now) == -1) goto werr;
        }
        dictReleaseIterator(di);
    }
    di = NULL; /* So that we don't release it again on error. */

    /* EOF opcode 
     *
     * 写入 EOF 代码
     */
    if (rdbSaveType(&rdb,REDIS_RDB_OPCODE_EOF) == -1) goto werr;

    /* CRC64 checksum. It will be zero if checksum computation is disabled, the
     * loading code skips the check in this case. 
     *
     * CRC64 校验和。
     *
     * 如果校验和功能已关闭，那么 rdb.cksum 将为 0 ，
     * 在这种情况下， RDB 载入时会跳过校验和检查。
     */
    cksum = rdb.cksum;
    memrev64ifbe(&cksum);
    rioWrite(&rdb,&cksum,8);

    /* Make sure data will not remain on the OS's output buffers */
    // 冲洗缓存，确保数据已写入磁盘
    if (fflush(fp) == EOF) goto werr;
    if (fsync(fileno(fp)) == -1) goto werr;
    if (fclose(fp) == EOF) goto werr;

    /* Use RENAME to make sure the DB file is changed atomically only
     * if the generate DB file is ok. 
     *
     * 使用 RENAME ，原子性地对临时文件进行改名，覆盖原来的 RDB 文件。
     */
    if (rename(tmpfile,filename) == -1) {
        redisLog(REDIS_WARNING,"Error moving temp DB file on the final destination: %s", strerror(errno));
        unlink(tmpfile);
        return REDIS_ERR;
    }

    // 写入完成，打印日志
    redisLog(REDIS_NOTICE,"DB saved on disk");

    // 清零数据库脏状态
    server.dirty = 0;

    // 记录最后一次完成 SAVE 的时间
    server.lastsave = time(NULL);

    // 记录最后一次执行 SAVE 的状态
    server.lastbgsave_status = REDIS_OK;

    return REDIS_OK;

werr:
    // 关闭文件
    fclose(fp);
    // 删除文件
    unlink(tmpfile);

    redisLog(REDIS_WARNING,"Write error saving DB on disk: %s", strerror(errno));

    if (di) dictReleaseIterator(di);

    return REDIS_ERR;
}
void saveCommand(redisClient *c) {

    // BGSAVE 已经在执行中，不能再执行 SAVE
    // 否则将产生竞争条件
    if (server.rdb_child_pid != -1) {
        addReplyError(c,"Background save already in progress");
        return;
    }

    // 执行 
    if (rdbSave(server.rdb_filename) == REDIS_OK) {
        addReply(c,shared.ok);
    } else {
        addReply(c,shared.err);
    }
}

二、AOF持久化

1.AOF持久化简介

aof持久化是通过记录服务端执行的所有写操作来记录数据库状态的，它将服务端执行的所有写操作保存在AOF文件中，这样服务端重启时只需加载AOF文件，依次执行写操作即可恢复数据库状态。

2.AOF缓存区

当开启了aof持久化时，服务端每执行一次写操作时，就会将该命令保存到服务端结构体redisServer的aof_buf缓冲区中：

代码语言：javascript复制

struct redisServer {
    // AOF 缓冲区
    sds aof_buf;      /* AOF buffer, written before entering the event loop */
    // 所使用的 fsync 策略（每个写入/每秒/从不）
    int aof_fsync;                  /* Kind of fsync() policy */
	};

如执行以下命令后：

代码语言：javascript复制

redis> SET KEY VALUE
OK

会将以下内容添加到缓冲区末尾：

代码语言：javascript复制

*3rn$3rnSETrn$3rnKEYrn$5rnVALUErn

3.AOF文件写入及追加

Redis服务端一直在一个事件循环中运行，在这个循环中会处理客户端的一批命令，这是aof缓冲区中就新增了一系列命令的内容。在每一次循环结束前，服务端都会根据结构体redisServer 中aof_fsync的值来决定AOF文件的写入同步策略。 a.aof_fsync=always时直接将aof文件内容写入aof文件，并调用fsync函数将文件内容同步到磁盘。 b.aof_fsync=everysec时直接将aof文件内容写入aof文件，并在距离上次同步时间超过一秒时调用fsync函数将文件内容同步到磁盘。 c.aof_fsync=no时直接将aof文件内容写入aof文件，同步到磁盘操作交由操作系统执行。默认的策略是aof_fsync=everysec。

4.AOF重写

由于aof文件里面记录了服务端的所有写命令，所以随着运行时间增加，aof文件的体积就会逐渐增加。为了对aof文件进行缩容，需要进行aof文件重写。aof重写时并不从现有aof文件读取内容，而是读取当前数据库的状态，并通过模拟添加键的命令记录当前状态，最后生成一个新的aof文件来代替原有文件。如客户端执行了以下命令：

代码语言：javascript复制

rpush name wyl
rpush name sjx
rpush name wyq
lpop name
lpop name

则此时aof文件中新增了五条命令，数据库的列表name中只有一个成员wyq。若采用aof重写，则只需要一条命令：

代码语言：javascript复制

rpush name wyq

可以看出，大大减少了aof文件体积。在进行aof重写时，因为需要进行大量写入操作，为了避免影响服务端性能，会生成一个子进程进行aof重写，这样在重写时服务端进程还可以提供服务。同时在aof重写期间执行的写操作会单独保存到一个aof重写缓冲区中，这样在aof重写结束时服务端主进程将aof重写缓冲区的内容写入新生成的aof文件，这样就可以不遗漏任何写操作了。最后用新的aof文件代替旧的aof文件，完成缩容。

5.AOF文件写入同步函数

aof写入同步函数如下所示：

代码语言：javascript复制

#define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */
void flushAppendOnlyFile(int force) {
    ssize_t nwritten;
    int sync_in_progress = 0;

    // 缓冲区中没有任何内容，直接返回
    if (sdslen(server.aof_buf) == 0) return;

    // 策略为每秒 FSYNC 
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
        // 是否有 SYNC 正在后台进行？
        sync_in_progress = bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC) != 0;

    // 每秒 fsync ，并且强制写入为假
    if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {

        /* With this append fsync policy we do background fsyncing.
         *
         * 当 fsync 策略为每秒钟一次时， fsync 在后台执行。
         *
         * If the fsync is still in progress we can try to delay
         * the write for a couple of seconds. 
         *
         * 如果后台仍在执行 FSYNC ，那么我们可以延迟写操作一两秒
         * （如果强制执行 write 的话，服务器主线程将阻塞在 write 上面）
         */
        if (sync_in_progress) {

            // 有 fsync 正在后台进行 。。。

            if (server.aof_flush_postponed_start == 0) {
                /* No previous write postponinig, remember that we are
                 * postponing the flush and return. 
                 *
                 * 前面没有推迟过 write 操作，这里将推迟写操作的时间记录下来
                 * 然后就返回，不执行 write 或者 fsync
                 */
                server.aof_flush_postponed_start = server.unixtime;
                return;

            } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                /* We were already waiting for fsync to finish, but for less
                 * than two seconds this is still ok. Postpone again. 
                 *
                 * 如果之前已经因为 fsync 而推迟了 write 操作
                 * 但是推迟的时间不超过 2 秒，那么直接返回
                 * 不执行 write 或者 fsync
                 */
                return;

            }

            /* Otherwise fall trough, and go write since we can't wait
             * over two seconds. 
             *
             * 如果后台还有 fsync 在执行，并且 write 已经推迟 >= 2 秒
             * 那么执行写操作（write 将被阻塞）
             */
            server.aof_delayed_fsync  ;
            redisLog(REDIS_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
        }
    }

    /* If you are following this code path, then we are going to write so
     * set reset the postponed flush sentinel to zero. 
     *
     * 执行到这里，程序会对 AOF 文件进行写入。
     *
     * 清零延迟 write 的时间记录
     */
    server.aof_flush_postponed_start = 0;

    /* We want to perform a single write. This should be guaranteed atomic
     * at least if the filesystem we are writing is a real physical one.
     *
     * 执行单个 write 操作，如果写入设备是物理的话，那么这个操作应该是原子的
     *
     * While this will save us against the server being killed I don't think
     * there is much to do about the whole server stopping for power problems
     * or alike 
     *
     * 当然，如果出现像电源中断这样的不可抗现象，那么 AOF 文件也是可能会出现问题的
     * 这时就要用 redis-check-aof 程序来进行修复。
     */
    nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
    if (nwritten != (signed)sdslen(server.aof_buf)) {

        static time_t last_write_error_log = 0;
        int can_log = 0;

        /* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */
        // 将日志的记录频率限制在每行 AOF_WRITE_LOG_ERROR_RATE 秒
        if ((server.unixtime - last_write_error_log) > AOF_WRITE_LOG_ERROR_RATE) {
            can_log = 1;
            last_write_error_log = server.unixtime;
        }

        /* Lof the AOF write error and record the error code. */
        // 如果写入出错，那么尝试将该情况写入到日志里面
        if (nwritten == -1) {
            if (can_log) {
                redisLog(REDIS_WARNING,"Error writing to the AOF file: %s",
                    strerror(errno));
                server.aof_last_write_errno = errno;
            }
        } else {
            if (can_log) {
                redisLog(REDIS_WARNING,"Short write while writing to "
                                       "the AOF file: (nwritten=%lld, "
                                       "expected=%lld)",
                                       (long long)nwritten,
                                       (long long)sdslen(server.aof_buf));
            }

            // 尝试移除新追加的不完整内容
            if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {
                if (can_log) {
                    redisLog(REDIS_WARNING, "Could not remove short write "
                             "from the append-only file.  Redis may refuse "
                             "to load the AOF the next time it starts.  "
                             "ftruncate: %s", strerror(errno));
                }
            } else {
                /* If the ftrunacate() succeeded we can set nwritten to
                 * -1 since there is no longer partial data into the AOF. */
                nwritten = -1;
            }
            server.aof_last_write_errno = ENOSPC;
        }

        /* Handle the AOF write error. */
        // 处理写入 AOF 文件时出现的错误
        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
            /* We can't recover when the fsync policy is ALWAYS since the
             * reply for the client is already in the output buffers, and we
             * have the contract with the user that on acknowledged write data
             * is synched on disk. */
            redisLog(REDIS_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
            exit(1);
        } else {
            /* Recover from failed write leaving data into the buffer. However
             * set an error to stop accepting writes as long as the error
             * condition is not cleared. */
            server.aof_last_write_status = REDIS_ERR;

            /* Trim the sds buffer if there was a partial write, and there
             * was no way to undo it with ftruncate(2). */
            if (nwritten > 0) {
                server.aof_current_size  = nwritten;
                sdsrange(server.aof_buf,nwritten,-1);
            }
            return; /* We'll try again on the next call... */
        }
    } else {
        /* Successful write(2). If AOF was in error state, restore the
         * OK state and log the event. */
        // 写入成功，更新最后写入状态
        if (server.aof_last_write_status == REDIS_ERR) {
            redisLog(REDIS_WARNING,
                "AOF write error looks solved, Redis can write again.");
            server.aof_last_write_status = REDIS_OK;
        }
    }

    // 更新写入后的 AOF 文件大小
    server.aof_current_size  = nwritten;

    /* Re-use AOF buffer when it is small enough. The maximum comes from the
     * arena size of 4k minus some overhead (but is otherwise arbitrary). 
     *
     * 如果 AOF 缓存的大小足够小的话，那么重用这个缓存，
     * 否则的话，释放 AOF 缓存。
     */
    if ((sdslen(server.aof_buf) sdsavail(server.aof_buf)) < 4000) {
        // 清空缓存中的内容，等待重用
        sdsclear(server.aof_buf);
    } else {
        // 释放缓存
        sdsfree(server.aof_buf);
        server.aof_buf = sdsempty();
    }

    /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
     * children doing I/O in the background. 
     *
     * 如果 no-appendfsync-on-rewrite 选项为开启状态，
     * 并且有 BGSAVE 或者 BGREWRITEAOF 正在进行的话，
     * 那么不执行 fsync 
     */
    if (server.aof_no_fsync_on_rewrite &&
        (server.aof_child_pid != -1 || server.rdb_child_pid != -1))
            return;

    /* Perform the fsync if needed. */

    // 总是执行 fsnyc
    if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
        /* aof_fsync is defined as fdatasync() for Linux in order to avoid
         * flushing metadata. */
        aof_fsync(server.aof_fd); /* Let's try to get this data on the disk */

        // 更新最后一次执行 fsnyc 的时间
        server.aof_last_fsync = server.unixtime;

    // 策略为每秒 fsnyc ，并且距离上次 fsync 已经超过 1 秒
    } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
                server.unixtime > server.aof_last_fsync)) {
        // 放到后台执行
        if (!sync_in_progress) aof_background_fsync(server.aof_fd);
        // 更新最后一次执行 fsync 的时间
        server.aof_last_fsync = server.unixtime;
    }

    // 其实上面无论执行 if 部分还是 else 部分都要更新 fsync 的时间
    // 可以将代码挪到下面来
    // server.aof_last_fsync = server.unixtime;
}

总结

本文对Redis常见两种持久化方式进行了简要介绍，如有不足，请多多指正。

node.js 云数据库Redis® c语言数据库 sql

0 人点赞