PG使用双缓冲写数据,shared_buffer OS page cache
OS刷脏
sysctl -a|grep dirty 【后台异步】 vm.dirty_background_bytes = 409600000 # 类似postgresql的bgwriter, 由后台进程而不是用户进程刷 vm.dirty_background_ratio = 0 【前台阻塞刷脏】 vm.dirty_bytes = 0 # 类似postgresql 的 server process刷脏, 用户进程参与, 所以会导致用户进程的RT升高 vm.dirty_ratio = 95 vm.dirty_writeback_centisecs = 100 # 后台进程的调度间隔 vm.dirty_expire_centisecs = 3000 # 在page cache中存活时间超过这个值的脏页才会被刷盘
PG刷脏
方式一:bgwriter刷脏:后台刷不影响用户使用,但从全局上看可能会有单页多次重复刷 方式二:checkpoint刷脏:阻塞性刷脏,严重影响QPS,但从全局上看可以等单页写多次,减少刷的次数
方式一:后台刷脏参数
bgwriter_delay
Specifies the delay between activity rounds for the background writer. In each round the writer issues writes for some number of dirty buffers (controllable by the following parameters). It then sleeps for the length of bgwriter_delay, and repeats. When there are no dirty buffers in the buffer pool, though, it goes into a longer sleep regardless of bgwriter_delay. If this value is specified without units, it is taken as milliseconds. The default value is 200 milliseconds (200ms). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; setting bgwriter_delay to a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in the postgresql.conf file or on the server command line. . bgwriter每次工作之后休息的间隔,休息避免阻塞用户线程。 生效值一定是向上取10ms的倍数
bgwriter_flush_after
Whenever more than this amount of data has been written by the background writer, attempt to force the OS to issue these writes to the underlying storage. Doing so will limit the amount of dirty data in the kernel’s page cache, reducing the likelihood of stalls when an fsync is issued at the end of a checkpoint, or when the OS writes data back in larger batches in the background. Often that will result in greatly reduced transaction latency, but there also are some cases, especially with workloads that are bigger than shared_buffers, but smaller than the OS’s page cache, where performance might degrade. This setting may have no effect on some platforms. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8kB. The valid range is between 0, which disables forced writeback, and 2MB. The default is 512kB on Linux, 0 elsewhere. (If BLCKSZ is not 8kB, the default and maximum values scale proportionally to it.) This parameter can only be set in the postgresql.conf file or on the server command line. . 刷完这个参数指定大小的数据,强制OS做一次FLUSH 因为OS层可能要累积到一个较大值才会去写盘 此时可能导致较大的写盘IO动作, 从而影响|争抢用户的IO 好处:bgwriter时不时的触发OS写dirty page, 目的是让OS层别累积太多dirty page才去刷脏. 避免大的IO写盘动作影响|争抢用户的IO。但是实际上OS也有自己的IO调度策略, 前面SYSCTL也介绍了, 可以设置dirty_background_bytes为一个较小值, 避免大IO。 另外bgwriter频繁让OS写盘, 也会带来一些弊端, 例如:一个shared buffer中的page在os层刷脏的一个窗口期内如果有多次变脏并且被bgwriter多次写出,并且被bgwriter触发fsync时, 会导致同样物理存储区间的重复磁盘IO。 相邻的page在一个窗口期内被写出时, 由于bgwrite频繁触发fsync, 也会导致无法合并落盘, 实际上也是增加了磁盘IO。
bgwriter_lru_maxpages
In each round, no more than this many buffers will be written by the background writer. Setting this to zero disables background writing. (Note that checkpoints, which are managed by a separate, dedicated auxiliary process, are unaffected.) The default value is 100 buffers. This parameter can only be set in the postgresql.conf file or on the server command line. . 一个bgwriter工作周期内, 最多刷出多少个dirty page。0会禁用bgwriter,
bgwriter_lru_multiplier
The number of dirty buffers written in each round is based on the number of new buffers that have been needed by server processes during recent rounds. The average recent need is multiplied by bgwriter_lru_multiplier to arrive at an estimate of the number of buffers that will be needed during the next round. Dirty buffers are written until there are that many clean, reusable buffers available. (However, no more than bgwriter_lru_maxpages buffers will be written per round.) Thus, a setting of 1.0 represents a “just in time” policy of writing exactly the number of buffers predicted to be needed. Larger values provide some cushion against spikes in demand, while smaller values intentionally leave writes to be done by server processes. The default is 2.0. This parameter can only be set in the postgresql.conf file or on the server command line. . 一个bgwriter工作周期内, 应该刷出多少个dirty page?
让shared buffer中保持有至少多少个clean reusable buffer pages?
上一个(或多个周期的平均数)周期, 用户申请了多少个new buffer page * bgwriter_lru_multiplier
例如上一个(或多个周期的平均数)周期, 用户申请了100个new buffer page。bgwriter_lru_multiplier=2.0 那么要求shared buffer中保持有至少100*2.0也就是200个clean, reusable buffer pages。
如果shared buffer中保持已经有大于或等于200个clean, reusable buffer pages, 那么bgwriter这个周期内就不需要刷脏。如果要满足shared buffer中保持有至少100*2.0 也就是200个clean, reusable buffer pages. bgwriter需要刷出90个dirty page, 但是bgwriter_lru_maxpages设置的值是50, 那么这个周期bgwriter也只能刷50个dirty page。
建议值
pg
代码语言:javascript复制bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 512kB
os
代码语言:javascript复制vm.dirty_background_bytes = 409600000
vm.dirty_background_ratio = 0
vm.dirty_bytes = 0
vm.dirty_ratio = 95
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 100
vm.dirtytime_expire_seconds = 43200