问题描述
原有的 rabbitmq 集群出现问题,无法启动,尝试删除 /var/lib/rabbitmq/.erlang.cookie 重新组集群,依旧无法启动
复制
代码语言:javascript复制# systemctl start rabbitmq-server.service
Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
解决分析
查看错误日志
复制
代码语言:javascript复制# journalctl -xe
-- Subject: Unit rabbitmq-server.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rabbitmq-server.service has begun starting up.
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: BOOT FAILED
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: ===========
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: Error description:
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: {error,{inconsistent_cluster,"Node rabbit@controller03 thinks it's clustered with node rabbit@controller02, but rabbit@controller02 disagrees"}}
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: Log files (may contain more information):
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: /var/log/rabbitmq/rabbit@controller03.log
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: /var/log/rabbitmq/rabbit@controller03-sasl.log
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: Stack trace:
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: [{rabbit_mnesia,check_cluster_consistency,0,
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: [{file,"src/rabbit_mnesia.erl"},{line,598}]},
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: {rabbit,'-boot/0-fun-0-',0,[{file,"src/rabbit.erl"},{line,275}]},
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: {rabbit,start_it,1,[{file,"src/rabbit.erl"},{line,296}]},
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: {init,start_it,1,[]},
Nov 24 14:26:20 controller03 rabbitmq-server[13522]: {init,start_em,1,[]}]
Nov 24 14:26:21 controller03 rabbitmq-server[13522]: {"init terminating in do_boot",{error,{inconsistent_cluster,"Node rabbit@controller03 thinks it's clustered with node rabbit@controller02, but rabbit@controller02 disagrees"}}}
Nov 24 14:26:21 controller03 rabbitmq-server[13522]: init terminating in do_boot ()
Nov 24 14:26:22 controller03 rabbitmq-server[13522]: Crash dump is being written to: erl_crash.dump...done
Nov 24 14:26:22 controller03 systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
可以看到错误描述
复制
代码语言:javascript复制{error,{inconsistent_cluster,"Node rabbit@controller03 thinks it's clustered with node rabbit@controller02, but rabbit@controller02 disagrees"}}
controller03 认为 controller02 是其 cluster node,但是controller02并不是
推测是之前集群残留的cluster信息,导致认证失败。官网查询到因为mnesia的信息残留,故会认证失败。
解决办法
1. 删除已有 mnesia 信息
复制
代码语言:javascript复制# rm /var/lib/rabbitmq/mnesia
2. 重启服务,状态恢复正常
复制
代码语言:javascript复制# systemctl restart rabbitmq-server.service
rabbitmqctl cluster_status
Cluster status of node rabbit@controller03 ...
[{nodes,[{disc,[rabbit@controller03]}]},
{running_nodes,[rabbit@controller03]},
{cluster_name,<<"rabbit@controller03">>},
{partitions,[]},
{alarms,[{rabbit@controller03,[]}]}]
3. 加入集群查看状态
复制
代码语言:javascript复制# rabbitmqctl stop_app
Stopping node rabbit@controller03 ...
[root@controller03 ~]# rabbitmqctl join_cluster --ram rabbit@controller01
Clustering node rabbit@controller03 with rabbit@controller01 ...
# rabbitmqctl start_app
Starting node rabbit@controller03 ...
# rabbitmqctl cluster_status
Cluster status of node rabbit@controller03 ...
[{nodes,[{disc,[rabbit@controller01]},
{ram,[rabbit@controller03,rabbit@controller02]}]},
{running_nodes,[rabbit@controller01,rabbit@controller03]},
{cluster_name,<<"rabbit@controller01">>},
{partitions,[]},
{alarms,[{rabbit@controller01,[]},{rabbit@controller03,[]}]}]