背景
某天,在一个月黑风高的夜晚,企业微信传来阵阵信息,bad news,服务器断电关机了。
当服务器重新启动后,很多服务自启动失败,通过日志发现,服务器的时间不准确了。
服务器安装ntp服务,会随着服务器启动时自启动,但为何最终又没有启动呢?为此,我开始深挖缘由。
问题分析
首先,查看ntp服务器启动的情况,发现 ntpd 启动失败。
查看ntp服务,发现ntp服务异常退出
代码语言:txt复制sudo systemctl status ntp
查看日志发现,ntp 进程报错
代码语言:txt复制tail -f /var/log/ntp.log
frequency error -1732 PPM exceeds tolerance 500 PPM frequency error -1732 PPM exceeds tolerance 500 PPM frequency error -1732 PPM exceeds tolerance 500 PPM
原因分析
Most operating systems and hardware of today incorporate a time-of-year (TOY) chip to maintain the time during periods when the power is off. When the machine is booted, the chip is used to initialize the operating system time. After the machine has synchronized to a NTP server, the operating system corrects the chip from time to time. In case there is no TOY chip or for some reason its time is more than 1000s from the server time, ntpd assumes something must be terribly wrong and the only reliable action is for the operator to intervene and set the clock by hand. This causes ntpd to exit with a panic message to the system log. The -g option overrides this check and the clock will be set to the server time regardless of the chip time. However, and to protect against broken hardware, such as when the CMOS battery fails or the clock counter becomes defective, once the clock has been set, an error greater than 1000s will cause ntpd to exit anyway.
ntp服务在时间偏差比较大的时间,如1000ms,ntp服务会自动退出,防止引发时间跳变(时间跳变间隔太大对服务有不可预估的影响,例如数据的一致性)。
服务器长时间使用一段时间后,COMS电池有可能没电或损坏,导致hwclock时间不正确。
服务器开机时读取硬件时钟的时间,即hwclock,所以开机时候出现date时间与ntp网络时间相差较大,最终导致 ntp 开机启动失败。
解决方案
手工的修复方案
强制执行ntp同步,再启动ntp服务,多个ntp进程不能同时启动,所以ntp进程启动前要保证ntpd没有在后台运行。
代码语言:txt复制sudo systemctl stop ntp
sudo /usr/sbin/ntpd -qg
sudo systemctl start ntp
长期的修复方案
在服务器启动时强行执行一次ntp同步。
由于服务器刚启动,业务服务一般都还没有启动,这时间强制进行一次ntp 时间同步是一个合理的选择,同时防止 ntp 服务因时间差异较大而退出。
代码语言:txt复制[Unit]
Description=ntp force synchronization once
Before=ntp.service
Wants=network-online.target
After=network.target network-online.target
[Service]
Type=oneshot
ExecStart=/usr/sbin/ntpd -qg
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
参考链接
- https://serverfault.com/questions/187446/ntp-service-on-linux-not-running-after-reboot
- https://access.redhat.com/solutions/35640
- http://doc.ntp.org/4.1.0/ntpd.htm