在企业中 如果缓存数据不是很多的时候5g左右可以使用 1master 多个slave来提高读了吞吐量 哨兵来保证高可用 如果缓存数据很多的时候 一般使用redis cluster来搭建集群。。
注意redis机器内存缓存不要放太多(超过10G)。。不然可能出现 fork子进程的时候redis暂停提供服务(内存数据越多fork子进程的时间越久)
redis cluster只要有部分的slot不能用,整个集群就gg了、、、、,所以一个master肯定要挂从节点的,不然随便一个master挂了,,整个集群就挂了
安装
mkdir -p /etc/redis-cluster //用来放集群的配置文件,集群自己维护 mkdir -p /var/log/redis //用来放redis的日志文件 mkdir -p /var/redis/7001 //redis的持久化文件
修改每个节点的配置文件 port 7001 cluster-enabled yes cluster-config-file /etc/redis-cluster/node-7001.conf cluster-node-timeout 15000 daemonize yes pidfile /var/run/redis_7001.pid dir /var/redis/7001 logfile /var/log/redis/7001.log bind 192.168.31.187
至少要用3个master节点启动,每个master加一个slave节点,先选择6个节点,启动6个实例
按之前生产环境的方式配置启动脚本 并分别启动6个redis实例
为了使用redis-trib.rb来管理redis 要安装ruby
代码语言:javascript复制yum install -y ruby
yum install -y rubygems
gem install redis
gem install redis 报错 说要求ruby要2.2.2以上
所以要安装rvm来升级ruby
//rvm在cengos下载不下来
rvm win下载下来上传
进入解压目录 .
运行 ./install --auto-dotfiles
source /etc/profile.d/rvm.sh
rvm list known//查询已知的ruby版本
rvm install 2.4.6 安装ruby
查询版本 ruby -v
cp /usr/local/redis-3.2.8/src/redis-trib.rb /usr/local/bin
redis-trib.rb create --replicas 1 192.168.31.187:7001 192.168.31.187:7002 192.168.31.19:7003 192.168.31.19:7004 192.168.31.227:7005 192.168.31.227:7006
--replicas: 每个master有几个slave
//检查集群信息
[root@localhost init.d]# redis-trib.rb check 192.168.144.4:7001
>>> Performing Cluster Check (using node 192.168.144.4:7001)
M: d94b2e648f54ca44c642b397e6a9ad2368b8396e 192.168.144.4:7001
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 084bed8f2d78eda2484149a9bdead5fced5c94a1 192.168.144.8:7006
slots: (0 slots) slave
replicates e546c0a6d00dc26715a992796e3a351181fe9792
M: e546c0a6d00dc26715a992796e3a351181fe9792 192.168.144.4:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
M: 686f3af7e5a3814cb6c54ce440325b08bace702d 192.168.144.8:7004
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 8ed652e24364576544cb464a840d4900b0cf625e 192.168.144.8:7005
slots: (0 slots) slave
replicates d94b2e648f54ca44c642b397e6a9ad2368b8396e
S: 52ada47dc639775322c26502f12d75ace964e22f 192.168.144.4:7003
slots: (0 slots) slave
replicates 686f3af7e5a3814cb6c54ce440325b08bace702d
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@localhost init.d]#
//测试高可用 将master 7002干掉 7006要变成主节点 (可能要等一会儿,让其他节点确定7002已经宕机了)
[root@localhost init.d]# ps -ef |grep redis
root 62293 1 0 14:18 ? 00:00:02 /usr/local/bin/redis-server 192.168.144.4:7002 [cluster]
root 62298 1 0 14:18 ? 00:00:03 /usr/local/bin/redis-server 192.168.144.4:7003 [cluster]
root 62347 1 0 14:34 ? 00:00:01 /usr/local/bin/redis-server 192.168.144.4:7001 [cluster]
root 62430 6373 0 14:50 pts/0 00:00:00 grep --color=auto redis
[root@localhost init.d]# kill 62293
[root@localhost init.d]# redis-trib.rb check 192.168.144.4:7001
>>> Performing Cluster Check (using node 192.168.144.4:7001)
M: d94b2e648f54ca44c642b397e6a9ad2368b8396e 192.168.144.4:7001
slots:0-5460 (5461 slots) master
1 additional replica(s)
M: 084bed8f2d78eda2484149a9bdead5fced5c94a1 192.168.144.8:7006
slots:10923-16383 (5461 slots) master
0 additional replica(s)
M: 686f3af7e5a3814cb6c54ce440325b08bace702d 192.168.144.8:7004
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 8ed652e24364576544cb464a840d4900b0cf625e 192.168.144.8:7005
slots: (0 slots) slave
replicates d94b2e648f54ca44c642b397e6a9ad2368b8396e
S: 52ada47dc639775322c26502f12d75ace964e22f 192.168.144.4:7003
slots: (0 slots) slave
replicates 686f3af7e5a3814cb6c54ce440325b08bace702d
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
//干掉所有的从节点 集群还是可用
[root@localhost init.d]# redis-trib.rb check 192.168.144.4:7001
>>> Performing Cluster Check (using node 192.168.144.4:7001)
M: d94b2e648f54ca44c642b397e6a9ad2368b8396e 192.168.144.4:7001
slots:0-5460 (5461 slots) master
0 additional replica(s)
M: 084bed8f2d78eda2484149a9bdead5fced5c94a1 192.168.144.8:7006
slots:10923-16383 (5461 slots) master
0 additional replica(s)
M: 686f3af7e5a3814cb6c54ce440325b08bace702d 192.168.144.8:7004
slots:5461-10922 (5462 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@localhost init.d]#
//在已经没有从节点的情况下 干掉7004发现集群不可用
[root@localhost init.d]# redis-trib.rb check 192.168.144.4:7001
>>> Performing Cluster Check (using node 192.168.144.4:7001)
M: d94b2e648f54ca44c642b397e6a9ad2368b8396e 192.168.144.4:7001
slots:0-5460 (5461 slots) master
0 additional replica(s)
M: 084bed8f2d78eda2484149a9bdead5fced5c94a1 192.168.144.8:7006
slots:10923-16383 (5461 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
之后重启 7004 发现集群恢复,,变成3个master
假设 现如今有 master 7001 slave 7004 master 7002 slave 7005 master 7003 slave 7006
此时杀掉从节点7004在杀掉 7001 重启7004 发现7004启动起来还是从节点
新增master节点
1.按之前配置集群的方式配置好redis实例,并启动 2.执行命令 redis-trib.rb add-node new_host:new_port existing_host:existing_port existing_host:existing_port为集群存在的任何一个master的节点和端口 3.reshard一些数据过去 redis-trib.rb reshard old-master-ip:old-master-port 会提示需要移动多少slot?(可以计算一下每个节点要放多少,多了移动新节点上) 移动到那个node上面?
新增slave节点
add-node new_host:new_port existing_host:existing_port —slave —master-id
redis-trib.rb add-node —slave —master-id 28927912ea0d59f6b790a50cf606602a5ee48108 192.168.31.227:7008 192.168.31.187:7001
redis cluster 节点的通信
redis采用gossip来进行节点的通信,每个节点维护着一份整个集群的元数据,当节点元数据发生变更时。会陆陆续续通知其他节点进行更新;
redis节点的通信端口是 10000,,比如redis实例对外是6379端口,这样集群的通信端口就是16379。
gossip协议包含多种消息,包括ping,pong,meet,fail,等等 meet: 某个节点发送meet给新加入的节点,让新节点加入集群中,然后新节点就会开始与其他节点进行通信
ping: 每个节点都会频繁给其他节点发送ping,其中包含自己的状态还有自己维护的集群元数据,互相通过ping交换元数据
pong: 返回ping和meet,包含自己的状态和其他信息,也可以用于信息广播和更新
fail: 某个节点判断另一个节点fail之后,就发送fail给其他节点,通知其他节点,指定的节点宕机了
主从切换
如果一个节点认为另外一个节点宕机,那么就是pfail,主观宕机
如果多个节点都认为另外一个节点宕机了,那么就是fail,客观宕机,
在cluster-node-timeout内,某个节点一直没有返回pong,那么就被认为pfail
如果一个节点认为某个节点pfail了,那么会在gossip ping消息中,ping给其他节点,如果超过半数的节点都认为pfail了,那么就会变成fail
节点过滤
对宕机的master node,从其所有的slave node中,选择一个切换成master node
检查每个slave node与master node断开连接的时间,如果超过了cluster-node-timeout * cluster-slave-validity-factor,那么就没有资格切换成master
哨兵:对所有从节点进行排序,slave priority,offset,run id
每个从节点,都根据自己对master复制数据的offset,来设置一个选举时间,offset越大(复制数据越多)的从节点,选举时间越靠前,优先进行选举
所有的master node开始slave选举投票,给要进行选举的slave进行投票,如果大部分master node(N/2 1)都投票给了某个从节点,那么选举通过,那个从节点可以切换成master
JedisCluster的工作原理
在JedisCluster初始化的时候,就会随机选择一个node,初始化hashslot -> node映射表,同时为每个节点创建一个JedisPool连接池
每次基于JedisCluster执行操作,首先JedisCluster都会在本地计算key的hashslot,然后在本地映射表找到对应的节点
如果那个node正好还是持有那个hashslot,那么就ok; 如果说进行了reshard这样的操作,可能hashslot已经不在那个node上了,就会返回moved
如果JedisCluter API发现对应的节点返回moved,那么利用该节点的元数据,更新本地的hashslot -> node映射表缓存
重复上面几个步骤,直到找到对应的节点,如果重试超过5次,那么就报错,JedisClusterMaxRedirectionException
set mykey1:{100} v1和set mykey2:{100} v2 能保证mykey1和mykey2放在同一个节点同一个slot
redis cluster的从节点,,不能直接进行读操作 要加readonly才行。
redis-cli -c 自动重定向查询操作
问题
代码语言:javascript复制gem install redis 的时候报错::ruby要求2.2.以上
74 tar -xvzf rvm-1.29.8.tar.gz
75 ll
76 cd rvm-1.29.8
77 ll
78 ./install --auto-dotfiles
79 source /etc/profile.d/rvm.sh
80 rvm list known
81 rvm install 2.4.6
82 gem install redis
Can I set the above configuration? (type 'yes' to accept): yes
/usr/local/rvm/gems/ruby-2.4.6/gems/redis-4.1.1/lib/redis/client.rb:126:in `call': ERR Slot 3662 is already busy (Redis::CommandError)
from /usr/local/rvm/gems/ruby-2.4.6/gems/redis-4.1.1/lib/redis.rb:3272:in `block in cluster'
from /usr/local/rvm/gems/ruby-2.4.6/gems/redis-4.1.1/lib/redis.rb:52:in `block in synchronize'
from /usr/local/rvm/rubies/ruby-2.4.6/lib/ruby/2.4.0/monitor.rb:214:in `mon_synchronize'
from /usr/local/rvm/gems/ruby-2.4.6/gems/redis-4.1.1/lib/redis.rb:52:in `synchronize'
from /usr/local/rvm/gems/ruby-2.4.6/gems/redis-4.1.1/lib/redis.rb:3271:in `cluster'
from /usr/local/bin/redis-trib.rb:212:in `flush_node_config'
from /usr/local/bin/redis-trib.rb:776:in `block in flush_nodes_config'
from /usr/local/bin/redis-trib.rb:775:in `each'
from /usr/local/bin/redis-trib.rb:775:in `flush_nodes_config'
from /usr/local/bin/redis-trib.rb:1296:in `create_cluster_cmd'
from /usr/local/bin/redis-trib.rb:1701:in `<main>'
[root@localhost init.d]# redis-cli -h 192.168.144.4 -p 7001
连上每个节点执行下面
[root@localhost init.d]# redis-cli -h 192.168.144.4 -p 7002
192.168.144.4:7002> flushall
OK
192.168.144.4:7002> cluster reset
OK
192.168.144.4:7002>