滴滴开源运维监控系统-夜莺Nightingale
夜莺是新一代国产智能监控系统。对云原生场景、传统物理机虚拟机场景,都有很好的支持,10分钟完成搭建,1小时熟悉使用,经受了滴滴生产环境海量数据的验证,希望打造国产监控的标杆之作
新版Nightingale在2020.3.20发布v1版本,目前是v5.0版本,从这个版本开始,与Prometheus、VictoriaMetrics、Grafana、Telegraf等生态做了协同集成,力争打造国内最好用的开源运维监控系统。
(图片可点击放大查看)
(图片可点击放大查看)
本文参考如下链接完成
(图片可点击放大查看)
代码语言:javascript复制https://n9e.gitee.io/quickstart/standalone/
https://n9e.gitee.io/quickstart/telegraf/
https://blog.csdn.net/smallbird108/article/details/122497200
相关组件安装包准备
代码语言:javascript复制1、https://downloads.mysql.com/archives/community/
2、https://github.com/prometheus/prometheus/releases/download/v2.33.1/prometheus-2.33.1.linux-amd64.tar.gz
3、https://dl.influxdata.com/telegraf/releases/telegraf-1.21.3-1.x86_64.rpm
4、https://github.com/n9e/fe-v5/releases
n9e-5.3.3.tar.gz
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
一、安装MySQL
(图片可点击放大查看)
代码语言:javascript复制 rpm -ivh mysql-community-common-5.7.36-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.36-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.36-1.el7.x86_64.rpm
rpm -ivh mysql-community-server-5.7.36-1.el7.x86_64.rpm
(图片可点击放大查看)
(图片可点击放大查看)
代码语言:javascript复制systemctl start mysqld
netstat -anp | grep 3306
systemctl enable mysqld
查看初始密码
grep 'temporary password' /var/log/mysqld.log
修改密码
set password for root@localhost=password('MySQL_2022');
grant all privileges on *.* to root@'%' identified by 'MySQL_2022';
flush privileges;
(图片可点击放大查看)
二、安装prometheus
代码语言:javascript复制mkdir -p /opt/prometheus
tar xf prometheus-2.33.1.linux-amd64.tar.gz
cp -far prometheus-2.33.1.linux-amd64/* /opt/prometheus/
cd /opt/prometheus
chown -R root:root *
(图片可点击放大查看)
代码语言:javascript复制# service
cat <<EOF >/etc/systemd/system/prometheus.service
[Unit]
Description="prometheus"
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.enable-lifecycle --enable-feature=remote-write-receiver --query.lookback-delta=2m
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=prometheus
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable prometheus
systemctl restart prometheus
systemctl status prometheus
(图片可点击放大查看)
其中prometheus在启动的时候要注意开启 --enable-feature=remote-write-receiver
(图片可点击放大查看)
三、安装Redis
建议给Redis添加密码
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
代码语言:javascript复制curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum install -y redis
systemctl enable redis
vim /etc/redis.conf
代码语言:javascript复制systemctl restart redis
四、n9e部署
代码语言:javascript复制mkdir /usr/local/n9e
tar -zxvf n9e-5.3.3.tar.gz -C /usr/local/n9e/
vim /usr/local/n9e/etc/server.conf
配置文件中MySQL Redis连接密码修改以及对接IP地址修改
vim /usr/local/n9e/etc/webapi.conf
mysql -uroot -p'MySQL_2022' < /usr/local/n9e/docker/initsql/a-n9e.sql
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
代码语言:javascript复制mkdir /opt/n9e
cat <<EOF >/etc/systemd/system/n9e-server.service
[Unit]
Description="n9e-server"
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/n9e/n9e server
WorkingDirectory=/usr/local/n9e
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=n9e-server
[Install]
WantedBy=multi-user.target
EOF
cat <<EOF >/etc/systemd/system/n9e-webapi.service
[Unit]
Description="n9e-webapi"
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/n9e/n9e webapi
WorkingDirectory=/usr/local/n9e
Restart=on-failure
SuccessExitStatus=0
LimitNOFILE=65536
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=n9e-server
[Install]
WantedBy=multi-user.target
EOF
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
代码语言:javascript复制systemctl enable n9e-server.service
systemctl enable n9e-server.service
systemctl enable n9e-webapi.service
systemctl restart n9e-server.service n9e-webapi.service
systemctl status n9e-server.service
systemctl status n9e-webapi.service
firewall-cmd --permanent --zone=public --add-port=18000/tcp
firewall-cmd --permanent --zone=public --add-port=19000/tcp
firewall-cmd --reload
五、监控主机上安装采集器telegraf
例如找一台监控主机作为监控主机客户端进行测试
代码语言:javascript复制rpm -ivh telegraf-1.21.3-1.x86_64.rpm
(图片可点击放大查看)
(图片可点击放大查看)
代码语言:javascript复制cat <<EOF > /etc/telegraf/telegraf.conf
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.opentsdb]]
host = "http://192.168.31.127"
port = 19000
http_batch_size = 50
http_path = "/opentsdb/put"
debug = false
separator = "_"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = true
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]
fielddrop = ["uptime_format"]
[[inputs.net]]
ignore_protocol_stats = true
EOF
systemctl restart telegraf.service
六、登录n9e web服务端参看监控指标项
默认用户名密码为:root/root.2000
(图片可点击放大查看)
(图片可点击放大查看)
(图片可点击放大查看)
这里使用telegraf作为采集器,本文只简单介绍入门部署,更多功能待研究与实践