【玩转Lighthouse】基于 CentOS 8 搭建 Zabbix 6.0 LTS 监控系统实战

2022-05-05 11:55:23 浏览数 (1)

0x00.前言

  • 时间过得可真快啊,不知不觉今年的征文活动也开始了,最开始是直接在「专栏」看到的比赛文章:https://cloud.tencent.com/developer/article/1967215,后来在「轻量控制台」banner 和「云 社区」菜单也都能看到了
轻量控制台轻量控制台
云   社区云 社区
  • 简单瞄了一眼已投稿的文章,还没有看到主题为 Zabbix 的文章,毕竟自己第一次也是在工作中才接触到的,下面就来介绍一下其搭建方法以及如何接入监控设备,除了默认的基于 Linux 的 Zabbix server 之外再介绍两种类型,一种是 Windows 云主机,另一种是 MikroTik 的路由器

0x01.Zabbix 介绍

TL;DR

【如果有了解过 Zabbix,此章节可跳过但建议阅读】

官网:https://www.zabbix.com/cn

TAKE YOUR BUSINESS SERVICE MONITORING TO THE NEXT LEVEL

官网官网

没错,他们是精通监控的厂商,提供了企业级监控解决方案,还有着自家的认证培训

并且 Zabbix 完全开源免费,不购买「订阅支持」服务的话是无需任何付费的,是不是很良心?

最后来说一下监控的意义,特性一文中给了详细的描述:https://www.zabbix.com/cn/features,懒得看原文的话推荐看下面贴出的截图

1,首先是采集数据

2,然后设置告警以发现问题,甚至会通过机器学习的方式

3,并且在告警的同时还可以直接采取措施,尝试自动解决问题(重启大法

4,必不可少的数据可视化,比如常见的拓扑图

5,灵活的仪表盘,可自定义构建

6,还可以计算 SLA

7,就是之前说过的,各种基础设施几乎都是支持的,开箱即用

8,安全性也是不可忽视的重要一环

9,官方宣称 5min 即可部署完成

10,最后一点:「无限可拓」

对于产品发布有着明确的路线图:https://www.zabbix.com/cn/roadmap

以及「Zabbix产品周期&发布日志」:https://www.zabbix.com/cn/life_cycle_and_release_policy

这里就不再赘述了,针对 Zabbix 的了解到这里应该足够了

0x02. Zabbix 安装

本来计划用 docker-compose 管理 docker 容器,后来还是确定为从本机安装:https://www.zabbix.com/cn/download?zabbix=6.0&os_distribution=centos&os_version=8&db=postgresql&ws=nginx

  • Zabbix 6.0 LTS
  • CentOS 8
  • PostgreSQL
  • Nginx

服务器选择 4C4G 的轻量机(是目前手中三台云主机中配置最高的一台)进行安装,因为暂时需要接入监控的设备并不多,所以跑起来性能绰绰有余

	 cn-tx-bj7-c8 cn-tx-bj7-c8

系统版本

代码语言:shell复制
[root@cn-tx-bj7-c8 ~]# cat /etc/redhat-release
CentOS Linux release 8.5.2111

安装并运行,中途修改了 nginx 的端口为 81 避免冲突

代码语言:shell复制
[root@cn-tx-bj7-c8 ~]# git clone https://github.com/zabbix/zabbix-docker
[root@cn-tx-bj7-c8 ~]# rpm -Uvh https://repo.zabbix.com/zabbix/6.0/rhel/8/x86_64/zabbix-release-6.0-1.el8.noarch.rpm
[root@cn-tx-bj7-c8 ~]# dnf clean all
[root@cn-tx-bj7-c8 ~]# dnf install zabbix-server-pgsql zabbix-web-pgsql zabbix-nginx-conf zabbix-sql-scripts zabbix-selinux-policy zabbix-agent -y
[root@cn-tx-bj7-c8 ~]# sudo -u postgres createuser --pwprompt zabbix
[root@cn-tx-bj7-c8 ~]# sudo -u postgres createdb -O zabbix zabbix
[root@cn-tx-bj7-c8 ~]# zcat /usr/share/doc/zabbix-sql-scripts/postgresql/server.sql.gz | sudo -u zabbix psql zabbix
[root@cn-tx-bj7-c8 ~]# vim /etc/zabbix/zabbix_server.conf
DBPassword=
[root@cn-tx-bj7-c8 ~]# vim /etc/nginx/conf.d/zabbix.conf
        listen          81;
        server_name     mastodon.yuangezhizao.cn;
[root@cn-tx-bj7-c8 ~]# systemctl restart zabbix-server zabbix-agent nginx php-fpm
[root@cn-tx-bj7-c8 ~]# systemctl enable zabbix-server zabbix-agent nginx php-fpm
Created symlink /etc/systemd/system/multi-user.target.wants/zabbix-server.service → /usr/lib/systemd/system/zabbix-server.service.
Created symlink /etc/systemd/system/multi-user.target.wants/zabbix-agent.service → /usr/lib/systemd/system/zabbix-agent.service.
Created symlink /etc/systemd/system/multi-user.target.wants/php-fpm.service → /usr/lib/systemd/system/php-fpm.service.

然后看到 Zabbix 服务已经在运行中了

代码语言:shell复制
[root@cn-tx-bj7-c8 ~]# systemctl status zabbix-server
● zabbix-server.service - Zabbix Server
   Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2022-04-23 22:19:14 CST; 1 weeks 3 days ago
 Main PID: 111949 (zabbix_server)
    Tasks: 48 (limit: 23718)
   Memory: 97.7M
   CGroup: /system.slice/zabbix-server.service
           ├─111949 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
           ├─111958 /usr/sbin/zabbix_server: ha manager
           ├─111960 /usr/sbin/zabbix_server: service manager #1 [processed 0 events, updated 0 event tags, deleted 0 problems, synced 0 service updates, idle 5.005211 sec during 5.005282 sec]
           ├─111961 /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.132053 sec, idle 60 sec]
           ├─111966 /usr/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.003233 sec during 5.003329 sec]
           ├─111967 /usr/sbin/zabbix_server: alerter #1 started
           ├─111968 /usr/sbin/zabbix_server: alerter #2 started
           ├─111969 /usr/sbin/zabbix_server: alerter #3 started
           ├─111970 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 40 values, idle 5.032522 sec during 5.033149 sec]
           ├─111972 /usr/sbin/zabbix_server: preprocessing worker #1 started
           ├─111973 /usr/sbin/zabbix_server: preprocessing worker #2 started
           ├─111974 /usr/sbin/zabbix_server: preprocessing worker #3 started
           ├─111975 /usr/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.004748sec during 5.004787 sec]
           ├─111976 /usr/sbin/zabbix_server: lld worker #1 [processed 1 LLD rules, idle 119.140748 sec during 119.156129 sec]
           ├─111977 /usr/sbin/zabbix_server: lld worker #2 [processed 1 LLD rules, idle 120.085312 sec during 120.098347 sec]
           ├─111978 /usr/sbin/zabbix_server: housekeeper [deleted 13209 hist/trends, 0 items/triggers, 0 events, 0 sessions, 0 alarms, 0 audit items, 0 records in 0.589759 sec, idle for 1 hour(s)]
           ├─111979 /usr/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.000510 sec, idle 59 sec]
           ├─111980 /usr/sbin/zabbix_server: http poller #1 [got 0 values in 0.000522 sec, idle 5 sec]
           ├─111981 /usr/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.000521 sec, idle 60 sec]
           ├─111982 /usr/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000034 sec, idle 1 sec]
           ├─111983 /usr/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000015 sec, idle 1 sec]
           ├─111984 /usr/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000018 sec, idle 1 sec]
           ├─111985 /usr/sbin/zabbix_server: history syncer #4 [processed 3 values, 4 triggers in 0.002674 sec, idle 1 sec]
           ├─111987 /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.000822 sec, idle 3 sec]
           ├─111988 /usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000017 sec, idle 5 sec]
           ├─111989 /usr/sbin/zabbix_server: self-monitoring [processed data in 0.000019 sec, idle 1 sec]
           ├─111990 /usr/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000505 sec, idle 5 sec]
           ├─111991 /usr/sbin/zabbix_server: poller #1 [got 3 values in 0.003551 sec, idle 1 sec]
           ├─111992 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000028 sec, idle 1 sec]
           ├─111994 /usr/sbin/zabbix_server: poller #3 [got 0 values in 0.000025 sec, idle 1 sec]
           ├─111996 /usr/sbin/zabbix_server: poller #4 [got 0 values in 0.000026 sec, idle 1 sec]
           ├─111997 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.000026 sec, idle 1 sec]
           ├─111998 /usr/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000031 sec, idle 5 sec]
           ├─111999 /usr/sbin/zabbix_server: trapper #1 [processed data in 0.001137 sec, waiting for connection]
           ├─112000 /usr/sbin/zabbix_server: trapper #2 [processed data in 0.001014 sec, waiting for connection]
           ├─112001 /usr/sbin/zabbix_server: trapper #3 [processed data in 0.001059 sec, waiting for connection]
           ├─112002 /usr/sbin/zabbix_server: trapper #4 [processed data in 0.001034 sec, waiting for connection]
           ├─112003 /usr/sbin/zabbix_server: trapper #5 [processed data in 0.001393 sec, waiting for connection]
           ├─112004 /usr/sbin/zabbix_server: icmp pinger #1 [got 3 values in 2.232111 sec, idle 5 sec]
           ├─112005 /usr/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.000741 sec, idle 1 sec]
           ├─112007 /usr/sbin/zabbix_server: history poller #1 [got 0 values in 0.000025 sec, idle 5 sec]
           ├─112008 /usr/sbin/zabbix_server: history poller #2 [got 0 values in 0.000011 sec, idle 5 sec]
           ├─112009 /usr/sbin/zabbix_server: history poller #3 [got 0 values in 0.000021 sec, idle 5 sec]
           ├─112010 /usr/sbin/zabbix_server: history poller #4 [got 0 values in 0.000011 sec, idle 5 sec]
           ├─112011 /usr/sbin/zabbix_server: history poller #5 [got 0 values in 0.000023 sec, idle 4 sec]
           ├─112012 /usr/sbin/zabbix_server: availability manager #1 [queued 0, processed 0 values, idle 5.005267 sec during 5.005344 sec]
           ├─112013 /usr/sbin/zabbix_server: trigger housekeeper [deleted 0 problems records in 0.000465 sec, idle for 60 second(s)]
           └─112014 /usr/sbin/zabbix_server: odbc poller #1 [got 0 values in 0.000025 sec, idle 5 sec]

Apr 23 22:19:13 cn-tx-bj7-c8 systemd[1]: Starting Zabbix Server...
Apr 23 22:19:13 cn-tx-bj7-c8 zabbix_server[111947]: /usr/sbin/zabbix_server: /usr/pgsql-14/lib/libpq.so.5: no version information available (required by /usr/sbin/zabbix_server)
Apr 23 22:19:14 cn-tx-bj7-c8 systemd[1]: zabbix-server.service: Supervising process 111949 which is not our child. We'll most likely not notice when it exits.
Apr 23 22:19:14 cn-tx-bj7-c8 systemd[1]: Started Zabbix Server.

并且版本是最新的 6.0 LTS

代码语言:shell复制
[root@cn-tx-bj7-c8 ~]# zabbix_server -V
zabbix_server: /usr/pgsql-14/lib/libpq.so.5: no version information available (required by zabbix_server)
zabbix_server (Zabbix) 6.0.3
Revision 506e2b51e2 4 April 2022, compilation time: Apr  4 2022 17:03:26

Copyright (C) 2022 Zabbix SIA
License GPLv2 : GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1g FIPS  21 Apr 2020
Running with OpenSSL 1.1.1k  FIPS 25 Mar 2021

最后访问 81 端口,开始初始化(没啥需要深入说的,就把截图全贴出来了

下一步下一步
下一步下一步
「世界上最好的语言」「世界上最好的语言」
配置 PG 数据库配置 PG 数据库
其他设置其他设置
最后确认最后确认
安装完成安装完成
首页首页

可以看出默认就把安装 Zabbix 的这台机器配好了监控

Zabbix serverZabbix server

并且已经采集到了各种指标的数据

选取一个指标举个栗子,查看这个月云主机剩余内存的曲线图

当然了,也可以去查看默认配置的「图形」和「仪表盘」

至此,Zabbix 监控系统已经搭建完成,本文 50% 已经搞定,剩下的就是接入监控设备了

0x03.通过 Zabbix Agent 接入 Windows 云主机

要接入 Windows 云主机,需要在 Windows 云主机上安装 Zabbix agent,这里选择基于 Go 语言开发的 2 代版本,在 Windows 下双击 msi 安装到系统服务中即可

Zabbix agent 2 v6.0.4Zabbix agent 2 v6.0.4

唯一需要填写的是 Zabbix server 的 IP 地址,不过万一写错了也可以去配置文件中更改,再重启服务即可

因为同属于北京的私有网络 VPC,也配好了内网互联(云联网),所以云主机和轻量机是可以通过内网互通的

这里填写的是私有地址,其实也可以填写公网地址,但毕竟轻量机(Zabbix server)的出流量会扣额度嘛(内网它不香吗?

然后在 Zabbix server 的「配置」-「主机」中把它加进去就好了,可以关联上官方模板,就不用自己手动创建监控项了

cn-tx-bj3-w9dcn-tx-bj3-w9d

官方模板创建了 129 个监控项

同样选取一个指标,查看这个月云主机已用内存的曲线图

再贴一下两个「仪表盘」

系统性能系统性能
网络接口网络接口

0x04.通过 SNMP 接入 MikroTik 的路由器

先介绍一下 MicroTik,参照官网:https://mikrotik.com/aboutus

MikroTik is a Latvian company which was founded in 1996 to develop routers and wireless ISP systems. MikroTik now provides hardware and software for Internet connectivity in most of the countries around the world. Our experience in using industry standard PC hardware and complete routing systems allowed us in 1997 to create the RouterOS software system that provides extensive stability, controls, and flexibility for all kinds of data interfaces and routing. In 2002 we decided to make our own hardware, and the RouterBOARD brand was born. We have resellers in most parts of the world, and customers in probably every country on the planet. Our company is located in Riga, the capital city of Latvia and has more than 280 employees.

是一家拉脱维亚公司,主要研发路由器和无线 ISP 系统,不仅生产硬件,还推出了著名的 RouterOS(ROS)路由器操作系统

RB5009RB5009

家里的正是最新款路由器 RB5009,具体型号是 RB5009UG S IN,详情页:https://mikrotik.com/product/rb5009ug_s_in#fndtn-testresults

RB5009UG S INRB5009UG S IN

接口有 10G SFP 光口和 2.5 Gigabit Ethernet 电口各一个,剩下 7 个口是 10/100/1000 Ethernet 自适应电口

俯视图俯视图
接口接口

说回来,关于路由器接入 Zabbix 可参考这篇文章:https://techexpert.tips/zabbix/monitor-mikrotik-zabbix/

一、首先还是在路由器上开启 SNMP

虽然 public community 下仅有只读权限,但为了安全性考虑建议开启 SNMPv3 的最严格模式,即 authPriv

网页编辑完成后可通过终端查看当前 SNMP 配置

代码语言:javascript复制
[admin@MikroTik] > snmp print 
          enabled: yes
          contact: yuangezhizao <root@yuangezhizao.cn>
         location: PYDL
        engine-id: 
      src-address: ::
      trap-target: 
   trap-community: public
     trap-version: 2
  trap-generators: interfaces,start-trap,temp-exception
  trap-interfaces: all 

再通过 snmpwalk 验证,说明是没有问题了

代码语言:javascript复制
pi@rpi-master:~ $ snmpwalk -v3 -c public -u public 192.168.25.254
iso.3.6.1.2.1.1.1.0 = STRING: "RouterOS RB5009UG S "
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.14988.1
iso.3.6.1.2.1.1.3.0 = Timeticks: (6138000) 17:03:00.00
iso.3.6.1.2.1.1.4.0 = STRING: "yuangezhizao <root@yuangezhizao.cn>"
iso.3.6.1.2.1.1.5.0 = STRING: "MikroTik"
iso.3.6.1.2.1.1.6.0 = STRING: "PYDL"

二、然后在 Zabbix 中添加主机(路由器)

同样关联上官方模板

并且 Zabbix 的官方模板还真有 MicroTik 的这款路由器,位于倒数第二个

截止目前,官方模板创建了 206 个监控项

再选取一个指标,这次不看内存了,看下 ICMP 的响应时间,可以看出有一阵较长的无响应,是因为今天 05:23 猫棒挂了导致家宽拨号不能,与外网失联了几个小时,怀疑是光衰太低(26~27 左右)的缘故,不一定啥时候就突然寄了……

0x05.Zabbix 生态

至此,额外两种接入监控的实战都介绍完了:Zabbix agent 2 & SNMP

不过从宏观来讲,接入监控只是第一步,针对于这些采集到的监控信息,如何有效的利用才是关键,特性一文中也介绍了如何发现异常数据进行告警甚至主动采取措施,这些都是后续需要花时间来做的,而不是仅仅将设备接入监控就万事大吉了

再来举个栗子,从「问题」页面就可以看到内置的告警策略

比如虽然是 INFO 的这一条

代码语言:javascript复制
Interface ether1(): Ethernet has changed to lower speed than it was before

没错,是因为这时候把猫棒换回了光猫,所以 ether1 口的速率从 2.5g 降低到了 1g

0x05.后记

本来想着只介绍 Zabbix 的搭建应该花不了多长时间,结果发现写完却花了接近 4 个小时

不过对于 Zabbix 倒是有了更深一步的认识,这也算是一种收获吧,Zabbix 有着一套完整的监控生态

还是那句话,接入监控只是第一步,后续才是关键,不过限于篇幅原因只能拆分到后续的文章再来介绍了

最后,感谢云 社区举办的这次「玩转「Lighthouse」有奖征文活动」,祝云 社区越来越好!

2022-05-04

远哥制造

0 人点赞