问题分析解决
昨天突然服务器重启了,最后导致的就是Zabbix的数据库MYSQL库表坏了,然后MYSQL就启动不了了。启动不了咋整,看log呗,报什么异常情况,查看error如下:
代码语言:javascript复制2017-09-21 14:41:18 4255 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-21 14:41:18 4255 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-21 14:41:18 4255 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-09-21 14:41:18 4255 [Note] InnoDB: CPU does not support crc32 instructions
2017-09-21 14:41:18 4255 [Note] InnoDB: Using Linux native AIO
2017-09-21 14:41:18 4255 [Note] InnoDB: Initializing buffer pool, size = 256.0M
2017-09-21 14:41:18 4255 [Note] InnoDB: Completed initialization of buffer pool
2017-09-21 14:41:18 4255 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-21 14:41:18 4255 [Note] InnoDB: Log scan progressed past the checkpoint lsn 491181006779
2017-09-21 14:41:18 4255 [Note] InnoDB: Database was not shutdown normally!
2017-09-21 14:41:18 4255 [Note] InnoDB: Starting crash recovery.
2017-09-21 14:41:18 4255 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-09-21 14:41:18 4255 [ERROR] InnoDB: Attempted to open a previously opened tablespace. Previous tablespace mysql/slave_master_info uses space ID: 4 at filepath: ./mysql/slave_master_info.ibd. Cannot open tablespace zabbix/groups which uses space ID: 4 at filepath: ./zabbix/groups.ibd
2017-09-21 14:41:18 7fa123271720 InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: Error: could not open single-table tablespace file ./zabbix/groups.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
InnoDB: To fix the problem and start mysqld:
InnoDB: 1) If there is a permission problem in the file and mysqld cannot
InnoDB: open the file, you should modify the permissions.
InnoDB: 2) If the table is not needed, or you can restore it from a backup,
InnoDB: then you can remove the .ibd file, and InnoDB will do a normal
InnoDB: crash recovery and ignore that table.
InnoDB: 3) If the file system or the disk is broken, and you cannot remove
InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf
InnoDB: and force InnoDB to continue crash recovery here.
170921 14:41:18 mysqld_safe mysqld from pid file /data/appData/mysql/zabbix_server.pid ended
170921 14:43:16 mysqld_safe Starting mysqld daemon with databases from /data/appData/mysql
2017-09-21 14:43:16 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-09-21 14:43:16 4985 [Note] Plugin 'FEDERATED' is disabled.
2017-09-21 14:43:16 7fe0b7d51720 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator.
从log中可以看出来,一些InnoDB 表的表空间都有问题,log还明确的告诉你了you can set innodb_force_recovery > 0 in my.cnf
and force InnoDB to continue crash recovery here.
那就试试呗,在/etc/my.cnf
的[mysqld]
下增加 innodb_force_recovery = 1
试试,添加完成后,果然MYSQL启动ok了。
启动ok了,但是凭经验应该会发现好多表,是非ok状态的,具体情况如下:
代码语言:javascript复制mysql> check table groups;
--------------- ------- ---------- ---------------------------------------------------------
| Table | Op | Msg_type | Msg_text |
--------------- ------- ---------- ---------------------------------------------------------
| zabbix.groups | check | Warning | InnoDB: Tablespace is missing for table 'zabbix/groups' |
| zabbix.groups | check | Error | Table 'zabbix.groups' doesn't exist |
| zabbix.groups | check | status | Operation failed |
--------------- ------- ---------- ---------------------------------------------------------
3 rows in set (0.00 sec)
果不其然,就拿上面的groups表状态来说,就是缺少表空间,那咋办,看看repair能够修复不:
代码语言:javascript复制mysql> repair table groups;
--------------- -------- ---------- ---------------------------------------------------------
| Table | Op | Msg_type | Msg_text |
--------------- -------- ---------- ---------------------------------------------------------
| zabbix.groups | repair | Warning | InnoDB: Tablespace is missing for table 'zabbix/groups' |
| zabbix.groups | repair | Error | Table 'zabbix.groups' doesn't exist |
| zabbix.groups | repair | status | Operation failed |
--------------- -------- ---------- ---------------------------------------------------------
3 rows in set (0.00 sec)
但是不起作用啊,只能再次看看log然后分析问题了,log内容如下:
代码语言:javascript复制InnoDB: Error: could not open single-table tablespace file ./zabbix/groups.ibd
InnoDB: We do not continue the crash recovery, because the table may become
InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it.
还是跟之前一样,没辙只能上Google找找资料看看了;从网上看到差不多的文章说需要设置如下:
代码语言:javascript复制innodb_force_recovery = 6
innodb_purge_thread = 1
先不管什么意思了,先配置上看看能不能修复表了,配置之后果然服务也启动起来了,表也修复好了,具体check table
结果如下:
zabbix.acknowledges check status OK
zabbix.actions check status OK
zabbix.alerts check status OK
zabbix.application_discovery check status OK
zabbix.application_prototype check status OK
zabbix.application_template check status OK
zabbix.applications check status OK
zabbix.auditlog check status OK
zabbix.auditlog_details check status OK
zabbix.autoreg_host check status OK
zabbix.conditions check status OK
zabbix.config check status OK
zabbix.corr_condition check status OK
zabbix.corr_condition_group check status OK
zabbix.corr_condition_tag check status OK
zabbix.corr_condition_tagpair check status OK
zabbix.corr_condition_tagvalue check status OK
zabbix.corr_operation check status OK
zabbix.correlation check status OK
zabbix.dbversion check status OK
zabbix.dchecks check status OK
zabbix.dhosts check status OK
zabbix.drules check status OK
zabbix.dservices check status OK
zabbix.escalations check status OK
zabbix.event_recovery check status OK
zabbix.event_tag check status OK
zabbix.events check status OK
zabbix.expressions check status OK
zabbix.functions check status OK
zabbix.globalmacro check status OK
zabbix.globalvars check status OK
zabbix.graph_discovery check status OK
zabbix.graph_theme check status OK
zabbix.graphs check status OK
zabbix.graphs_items check status OK
zabbix.group_discovery check status OK
zabbix.group_prototype check status OK
zabbix.groups check status OK
zabbix.history check status OK
zabbix.history_log check status OK
zabbix.history_str check status OK
zabbix.history_text check status OK
zabbix.history_uint check status OK
zabbix.host_discovery check status OK
zabbix.host_inventory check status OK
zabbix.hostmacro check status OK
zabbix.hosts check status OK
zabbix.hosts_groups check status OK
zabbix.hosts_templates check status OK
zabbix.housekeeper check status OK
zabbix.httpstep check status OK
zabbix.httpstepitem check status OK
zabbix.httptest check status OK
zabbix.httptestitem check status OK
zabbix.icon_map check status OK
zabbix.icon_mapping check status OK
zabbix.ids check status OK
zabbix.images check status OK
zabbix.interface check status OK
zabbix.interface_discovery check status OK
zabbix.item_application_prototype check status OK
zabbix.item_condition check status OK
zabbix.item_discovery check status OK
zabbix.items check status OK
zabbix.items_applications check status OK
zabbix.maintenances check status OK
zabbix.maintenances_groups check status OK
zabbix.maintenances_hosts check status OK
zabbix.maintenances_windows check status OK
zabbix.mappings check status OK
zabbix.media check status OK
zabbix.media_type check status OK
zabbix.opcommand check status OK
zabbix.opcommand_grp check status OK
zabbix.opcommand_hst check status OK
zabbix.opconditions check status OK
zabbix.operations check status OK
zabbix.opgroup check status OK
zabbix.opinventory check status OK
zabbix.opmessage check status OK
zabbix.opmessage_grp check status OK
zabbix.opmessage_usr check status OK
zabbix.optemplate check status OK
zabbix.problem check status OK
zabbix.problem_tag check status OK
zabbix.profiles check status OK
zabbix.proxy_autoreg_host check status OK
zabbix.proxy_dhistory check status OK
zabbix.proxy_history check status OK
zabbix.regexps check status OK
zabbix.rights check status OK
zabbix.screen_user check status OK
zabbix.screen_usrgrp check status OK
zabbix.screens check status OK
zabbix.screens_items check status OK
zabbix.scripts check status OK
zabbix.service_alarms check status OK
zabbix.services check status OK
zabbix.services_links check status OK
zabbix.services_times check status OK
zabbix.sessions check status OK
zabbix.slides check status OK
zabbix.slideshow_user check status OK
zabbix.slideshow_usrgrp check status OK
zabbix.slideshows check status OK
zabbix.sysmap_element_url check status OK
zabbix.sysmap_url check status OK
zabbix.sysmap_user check status OK
zabbix.sysmap_usrgrp check status OK
zabbix.sysmaps check status OK
zabbix.sysmaps_elements check status OK
zabbix.sysmaps_link_triggers check status OK
zabbix.sysmaps_links check status OK
zabbix.task check status OK
zabbix.task_close_problem check status OK
zabbix.timeperiods check status OK
zabbix.trends check status OK
zabbix.trends_uint check status OK
zabbix.trigger_depends check status OK
zabbix.trigger_discovery check status OK
zabbix.trigger_tag check status OK
zabbix.triggers check status OK
zabbix.users check status OK
zabbix.users_groups check status OK
zabbix.usrgrp check status OK
zabbix.valuemaps check status OK
统计了一下Zabbix库下的127张表,状态都是OK的,那目前来看是没有问题了,手动查询了groups、history_text等几个表查询都是没有问题的。
注: 在check table的时候,遇到大数据的表会比较慢,耐心等待即可!
看起来基本没有问题了,那咱就先把Zabbix Server启动起来看看吧,然后用tail命令MYSQL的error日志中还会有什么异常情况。
麻蛋,发现还有错误,日记如下:
代码语言:javascript复制2017-09-21 15:59:21 7f31b1d29700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
2017-09-21 15:59:21 7f31b1d29700 InnoDB: Recalculation of persistent statistics requested for table "zabbix"."escalations" but the required persistent statistics storage is not present or is corrupted. Using transient stats instead.
2017-09-21 15:59:31 7f31b1d29700 InnoDB: Error: Table "mysql"."innodb_table_stats" not found.
2017-09-21 15:59:31 7f31b1d29700 InnoDB: Recalculation of persistent statistics requested for table "zabbix"."housekeeper" but the required persistent statistics storage is not present or is corrupted. Using transient stats instead.
2017-09-21 15:59:49 7f31b6933700 InnoDB: Error: table `mysql`.`innodb_table_stats` does not exist in the InnoDB internal
InnoDB: data dictionary though MySQL is trying to drop it.
InnoDB: Have you copied the .frm file of the table to the
InnoDB: MySQL database directory from another database?
InnoDB: You can look for further help from
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html
2017-09-21 15:31:05 7f31b68b1700 InnoDB: Error: Table "mysql"."innodb_index_stats" not found.
2017-09-21 15:31:05 7f31b68b1700 InnoDB: Error: Fetch of persistent statistics requested for table "zabbix"."problem_tag" but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_relay_log_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_master_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
2017-09-21 16:00:04 8996 [Warning] InnoDB: Cannot open table mysql/slave_worker_info from the internal data dictionary of InnoDB though the .frm file for the table exists. See http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting.html for how you can resolve the problem.
仔细看看都是一些系统表的报错innodb_table_stats
,innodb_index_stats
,slave_master_info
,slave_worker_info
等表的报错。
这种情况之前遇到过,需要清理系统表数据和删除表,然后导入mysql_system_tables.sql
即可,操作如下:
1、登录数据库,进入mysql库,执行如下SQL删除5张表
代码语言:javascript复制mysql> use mysql;
mysql> drop table if exists innodb_index_stats;
mysql> drop table if exists innodb_table_stats;
mysql> drop table if exists slave_master_info;
mysql> drop table if exists slave_relay_log_info;
mysql> drop table if exists slave_worker_info;
执行完后,可以用show tables查看一下,看表的数据是否已经比删除之前减少了,如果减少了,说明你成功了!
记住,一定要是drop table if exists
2、停止数据库,进入到数据库数据文件所在目录,删除上面5个表所对应的idb文件
代码语言:javascript复制/etc/init.d/mysqld stop
cd /data/appData/mysql/mysql
rm -rf innodb_index_stats.ibd innodb_table_stats.ibd slave_master_info.ibd slave_relay_log_info.ibd slave_worker_info.ibd
3、重新启动数据库,进入到mysql库,重建上面被删除的表结构 数据库的建表脚本在mysql软件的安装目录的share目录下或者mysql的安装包的script目录下,我的mysql软件的安装路径为/data/app/mysql-3307/
代码语言:javascript复制# /etc/init.d/mysqld start
# mysql -uxx -poo -P 3307
mysql> use mysql;
mysql> source /data/app/mysql-3307/share/mysql/mysql_system_tables.sql;
mysql> show tables;
---------------------------
| Tables_in_mysql |
---------------------------
| columns_priv |
| db |
| event |
| func |
| general_log |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| innodb_index_stats |
| innodb_table_stats |
| ndb_binlog_index |
| plugin |
| proc |
| procs_priv |
| proxies_priv |
| servers |
| slave_master_info |
| slave_relay_log_info |
| slave_worker_info |
| slow_log |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| user |
---------------------------
28 rows in set (0.00 sec)mysql> desc innodb_table_stats;
-------------------------- --------------------- ------ ----- ------------------- -----------------------------
| Field | Type | Null | Key | Default | Extra |
-------------------------- --------------------- ------ ----- ------------------- -----------------------------
| database_name | varchar(64) | NO | PRI | NULL | |
| table_name | varchar(64) | NO | PRI | NULL | |
| last_update | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| n_rows | bigint(20) unsigned | NO | | NULL | |
| clustered_index_size | bigint(20) unsigned | NO | | NULL | |
| sum_of_other_index_sizes | bigint(20) unsigned | NO | | NULL | |
-------------------------- --------------------- ------ ----- ------------------- -----------------------------
6 rows in set (0.00 sec)
说明表都正常了,再次查看mysql报错日志,就会发现没有了关于这系统表的报错日志,到这里就所有的故障和错误都处理好了。
分析总结
MYSQL参数之innodb_force_recovery
innodb_force_recovery影响整个InnoDB存储引擎的恢复状况。默认为0,表示当需要恢复时执行所有的.
innodb_force_recovery可以设置为1-6,大的数字包含前面所有数字的影响。当设置参数值大于0后,可以对表进行select,create,drop操作,但insert,update或者delete这类操作是不允许的。
- (SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页。
- (SRV_FORCE_NO_BACKGROUND):阻止主线程的运行,如主线程需要执行full purge操作,会导致crash。
- (SRV_FORCE_NO_TRX_UNDO):不执行事务回滚操作。
- (SRV_FORCE_NO_IBUF_MERGE):不执行插入缓冲的合并操作。
- (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交。
- (SRV_FORCE_NO_LOG_REDO):不执行前滚的操作。
官网介绍:https://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
MYSQL参数之innodb_purge_threads
innodb_purge_threads 将purge线程从master线程分离出来,提高cpu使用率提升存储引擎性能,innodb1.2之后可以设置多个purge线程。
这里的一个重要知识点就是 对 innodb_force_recovery 参数的理解了,要是遇到数据损坏甚至是其他的损坏。可能上面的方法不行了,需要尝试另一个方法:insert into tb select * from ta limit X;甚至是dump出去,再load回来。