zabbix3.4监控zookeeper
zookeeper监控要点系统监控 这个监控linux系统以及修改linux服务器参数即可 内存使用量 ZooKeeper应当完全运行在内存中,不能使用到SWAP。Java Heap大小不能超过可用内存。 Swap使用量 使用Swap会降低ZooKeeper的性能,设置vm.swappiness = 0 网络带宽占用 如果发现ZooKeeper性能降低关注下网络带宽占用情况和丢包情况,通常情况下ZooKeeper是20%写入80%读入 磁盘使用量 ZooKeeper数据目录使用情况需要注意 磁盘I/O ZooKeeper的磁盘写入是异步的,所以不会存在很大的I/O请求,如果ZooKeeper和其他I/O密集型服务公用应该关注下磁盘I/O情况
ZooKeeper监控 zk_avg/min/max_latency 响应一个客户端请求的时间,建议这个时间大于10个Tick就报警 平均延迟/最小延迟/最大延迟 zk_outstanding_requests 排队请求的数量,当ZooKeeper超过了它的处理能力时,这个值会增大,建议设置报警阀值为10 堆积请求数 zk_packets_received 接收到客户端请求的包数量 收包数 zk_packets_sent 发送给客户单的包数量,主要是响应和通知 发包数 zk_max_file_descriptor_count 最大允许打开的文件数,由ulimit控制 最大文件描述符数量 zk_open_file_descriptor_count 打开文件数量,当这个值大于允许值得85%时报警 打开的文件描述符数量 Mode 运行的角色,如果没有加入集群就是standalone,加入集群式follower或者leader zk_followers leader角色才会有这个输出,集合中follower的个数。正常的值应该是集合成员的数量减1 follower数量 zk_pending_syncs leader角色才会有这个输出,pending syncs的数量 准备同步数 zk_znode_count znodes的数量 znode数量 zk_watch_count watches的数量 watch数量 Java Heap Size ZooKeeper Java进程的
监控脚本
[root@lanzhu-linux-nginx-cn summer]# cat check_zookeeper.sh #!/bin/bash
function imok { echo ruok|nc 127.0.0.1 2181 | wc -l }
function zk_min_latency { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_avg_latency { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_max_latency { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_outstanding_requests { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_packets_received { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_packets_sent { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_znode_count { echo mntr | nc 127.0.0.1 2181 | grep "
function zk_watch_count { echo mntr | nc 127.0.0.1 2181 | grep "
#excute function $1
zabbix_agentd.conf配置文件添加
UserParameter=zookeeper.status[*],/alidata/summer/check_zookeeper.sh $1
zabbix-web端添加zookeeper模板见附件
<?xml version="1.0" encoding="UTF-8"?> <zabbix_export> <version>3.4</version> <date>2018-09-28T08:08:15Z</date> <groups> <group> <name>Templates</name> </group> </groups> <templates> <template> <template>Template App zookeeper</template> <name>Template App zookeeper</name> <description/> <groups> <group> <name>Templates</name> </group> </groups> <applications> <application> <name>zookeeper</name> </application> </applications> <items> <item> <name>zookeeper status running</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[imok]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_avg_latency</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_avg_latency]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_max_latency</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_max_latency]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_min_latency</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_min_latency]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_outstanding_requests</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_outstanding_requests]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_packets_received</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_packets_received]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zk_packets_sent</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_packets_sent]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zk_watch_count</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_watch_count]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> <item> <name>zookeeper status zk_znode_count</name> <type>0</type> <snmp_community/> <snmp_oid/> <key>zookeeper.status[zk_znode_count]</key> <delay>60</delay> <history>90d</history> <trends>365d</trends> <status>0</status> <value_type>3</value_type> <allowed_hosts/> <units/> <snmpv3_contextname/> <snmpv3_securityname/> <snmpv3_securitylevel>0</snmpv3_securitylevel> <snmpv3_authprotocol>0</snmpv3_authprotocol> <snmpv3_authpassphrase/> <snmpv3_privprotocol>0</snmpv3_privprotocol> <snmpv3_privpassphrase/> <params/> <ipmi_sensor/> <authtype>0</authtype> <username/> <password/> <publickey/> <privatekey/> <port/> <description/> <inventory_link>0</inventory_link> <applications> <application> <name>zookeeper</name> </application> </applications> <valuemap/> <logtimefmt/> <preprocessing/> <jmx_endpoint/> <master_item/> </item> </items> <discovery_rules/> <httptests/> <macros/> <templates/> <screens/> </template> </templates> <triggers> <trigger> <expression>{Template App zookeeper:zookeeper.status[imok].last()}<>0</expression> <recovery_mode>1</recovery_mode> <recovery_expression>{Template App zookeeper:zookeeper.status[imok].diff()}=0</recovery_expression> <name>php_39.108.161.17_zookeeper is down</name> <correlation_mode>0</correlation_mode> <correlation_tag/> <url/> <status>0</status> <priority>3</priority> <description/> <type>0</type> <manual_close>0</manual_close> <dependencies/> <tags/> </trigger> </triggers> <graphs> <graph> <name>zookeeper request time</name> <width>900</width> <height>200</height> <yaxismin>0.0000</yaxismin> <yaxismax>100.0000</yaxismax> <show_work_period>1</show_work_period> <show_triggers>1</show_triggers> <type>0</type> <show_legend>1</show_legend> <show_3d>0</show_3d> <percent_left>0.0000</percent_left> <percent_right>0.0000</percent_right> <ymin_type_1>0</ymin_type_1> <ymax_type_1>0</ymax_type_1> <ymin_item_1>0</ymin_item_1> <ymax_item_1>0</ymax_item_1> <graph_items> <graph_item> <sortorder>0</sortorder> <drawtype>0</drawtype> <color>1A7C11</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_avg_latency]</key> </item> </graph_item> <graph_item> <sortorder>1</sortorder> <drawtype>0</drawtype> <color>F63100</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_max_latency]</key> </item> </graph_item> <graph_item> <sortorder>2</sortorder> <drawtype>0</drawtype> <color>2774A4</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_min_latency]</key> </item> </graph_item> </graph_items> </graph> <graph> <name>zookeeper server status</name> <width>900</width> <height>200</height> <yaxismin>0.0000</yaxismin> <yaxismax>100.0000</yaxismax> <show_work_period>1</show_work_period> <show_triggers>1</show_triggers> <type>0</type> <show_legend>1</show_legend> <show_3d>0</show_3d> <percent_left>0.0000</percent_left> <percent_right>0.0000</percent_right> <ymin_type_1>0</ymin_type_1> <ymax_type_1>0</ymax_type_1> <ymin_item_1>0</ymin_item_1> <ymax_item_1>0</ymax_item_1> <graph_items> <graph_item> <sortorder>0</sortorder> <drawtype>0</drawtype> <color>1A7C11</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_packets_sent]</key> </item> </graph_item> <graph_item> <sortorder>1</sortorder> <drawtype>0</drawtype> <color>F63100</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_watch_count]</key> </item> </graph_item> <graph_item> <sortorder>2</sortorder> <drawtype>0</drawtype> <color>2774A4</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_outstanding_requests]</key> </item> </graph_item> <graph_item> <sortorder>3</sortorder> <drawtype>0</drawtype> <color>A54F10</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_packets_received]</key> </item> </graph_item> <graph_item> <sortorder>4</sortorder> <drawtype>0</drawtype> <color>FC6EA3</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[zk_znode_count]</key> </item> </graph_item> </graph_items> </graph> <graph> <name>zookeeper status</name> <width>900</width> <height>200</height> <yaxismin>0.0000</yaxismin> <yaxismax>100.0000</yaxismax> <show_work_period>1</show_work_period> <show_triggers>1</show_triggers> <type>0</type> <show_legend>1</show_legend> <show_3d>0</show_3d> <percent_left>0.0000</percent_left> <percent_right>0.0000</percent_right> <ymin_type_1>0</ymin_type_1> <ymax_type_1>0</ymax_type_1> <ymin_item_1>0</ymin_item_1> <ymax_item_1>0</ymax_item_1> <graph_items> <graph_item> <sortorder>0</sortorder> <drawtype>1</drawtype> <color>1A7C11</color> <yaxisside>0</yaxisside> <calc_fnc>2</calc_fnc> <type>0</type> <item> <host>Template App zookeeper</host> <key>zookeeper.status[imok]</key> </item> </graph_item> </graph_items> </graph> </graphs> </zabbix_export>