导读
有时候需要使用mysql的数据文件做恢复, 或者其它某些异常导致mysqld启动的时候报错如下:
代码语言:txt复制2024-09-13T02:20:33.489262Z 0 [Warning] [MY-013244] [Server] --collation-server: 'utf8_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead.
2024-09-13T02:20:33.495902Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-09-13T02:20:33.832733Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-09-13T02:20:33.919585Z 0 [ERROR] [MY-012095] [InnoDB] [FATAL] Tablespace id is 2 in the data dictionary but in file ./db1/t20240912.ibd it is 49591!
2024-09-13T02:20:33.919622Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: fil0fil.cc:2582:ib::fatal triggered thread 140154526787328
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:20:33 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long) 0x2e) [0x1f9ffde]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(print_fatal_signal(int) 0x2eb) [0x103628b]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_server_abort() 0x5e) [0x103638e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_abort() 0xa) [0x1f9a3aa]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long) 0x30c) [0x225c4bc]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ib::fatal::~fatal() 0x98) [0x225ee18]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::get_file_size(fil_node_t*, bool) 0x3fe) [0x239df2e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::open_file(fil_node_t*) 0x294) [0x23ad124]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::prepare_file_for_io(fil_node_t*) 0x33) [0x23adfd3]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::do_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) 0x1a9) [0x23aeb69]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) 0x57) [0x23af5f7]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t const&, page_size_t const&, bool) 0x13e) [0x22ed5be]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_background(page_id_t const&, page_size_t const&, bool) 0x26) [0x22ed966]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x22d1503]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_dump_thread() 0xe5) [0x22d1c55]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)()> > >::_M_run() 0xbc) [0x212061c]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x279e1ef]
分析
根据描述来看是 db1/t20240912.ibd 数据字典里面记录的Tablespace id
是2, 但实际文件记录的是49591. 然后就挂了
既然是表空间id不一致, 那解决办法就至少有3种了.
1: 修改数据字典
2: 修改ibd文件里面的表空间id
3: 使用alter table xxx import tablespace
的方式导入表
基于我们之前修改 lower_case_table_names 的经验来看, 修改ibd文件里面的信息更实际一点-_-
虽然之前讲过ibd的结构, 但这里再来回顾一下
也就是每页的第34-38字节 和 第1页的38-42 字节记录了表空间ID. 我们只需要修改这里就行.(记得同步校验字段)
验证
方法1(推荐)
我们先使用import tablespace
的方式导入, 该方式需要先discard表空间, 所以记得先备份表空间文件, 不然discard了就没得了. 当然现在数据库都起不来了, 我们直接mv走就是了
mv /data/mysql_3306/mysqldata/db1/t20240912.ibd /tmp
systemctl start mysqld_3306
应该会在日志里面发现如下Warning, 但我们并不在意它
代码语言:txt复制2024-09-13T06:23:30.463261Z 1 [Warning] [MY-012351] [InnoDB] Tablespace 2, name 'db1/t20240912', file './db1/t20240912.ibd' is missing!
然后我们再拷贝回去并导入数据库
代码语言:sql复制SHELL> mv /tmp/t20240912.ibd /data/mysql_3306/mysqldata/db1
SQL> alter table db1.t20240912 import tablespace;
方法2
方法1看起来没毛病, 我们先使用脚本看下t20240912.ibd
的表空间id是多少(2还是49591呢?) 脚本放在文末的
好家伙, 居然给我修改为了2... 汽油磁力!
我们也来使用脚本来修改ibd文件里面的表空间id为2吧. (先回退快照,或者重新模拟下故障)
代码语言:shell复制systemctl stop mysqld_3306
rm -rf /data/mysql_3306/mysqldata/db1/t20240912.ibd
python mysql_replace_tablespaceid.py t20240912.ibd 2 /data/mysql_3306/mysqldata/db1/t20240912.ibd
chown mysql:mysql /data/mysql_3306/mysqldata/db1/t20240912.ibd
systemctl start mysqld_3306
看起来没报错, 我们登录数据库验证下呢
说:该ibd文件的事务比我们系统的事务要高.... (使用旧的ibd文件在新环境恢复的事情被发现了). 官方竟然给我们来了这么一手. 汽油磁力!*2
error日志里面也是类似的信息(但更具体了):
代码语言:txt复制2024-09-13T06:41:20.444460Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=4] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444507Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2024-09-13T06:41:20.444523Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=1] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444533Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
理论上通过拷贝redo,ibdata等文件是可以解决这个问题的. 但实际环境哪来那么多东西呢.(当然也可以使用ibd2sql之类的工具直接将ibd文件转为SQL语句). 也可以直接修改ibd文件里面的trx_id, 但这就太复杂了. 有没得简单点的方法呢???
按照官方的设置 innodb_force_recovery
应该可以(毕竟可以不看redo了都).
或者就是修改系统的LSN(18335077)为大于等于我们当前值(113002135714). 毕竟ibdata也是page 比如:
那我们小试一波.
代码语言:python代码运行次数:0复制import struct
filename = '/data/mysql_3306/mysqldata/ibdata1'
filename2 = '/tmp/ibdata1'
PAGE_SIZE = 16384
CURRENT_LSN = 18335077
LSN = 113002135714 100000
def create_crc32c_table():
poly = 0x82f63b78
table = []
for i in range(256):
crc = i
for _ in range(8):
if crc & 1:
crc = (crc >> 1) ^ poly
else:
crc >>= 1
table.append(crc)
return table
crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
crc = 0xFFFFFFFF
for byte in bytearray(data): # for PY2
crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
return crc ^ 0xFFFFFFFF
def replace_crc32(data):
c1 = calculate_crc32c(data[4:26])
c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
cb = struct.pack('>L',(c1^c2)&(2**32-1))
data = cb data[4:PAGE_SIZE-8] cb data[PAGE_SIZE-4:]
return data
def find_xx_positions(s,x):
positions = []
xl = len(x)
start = 0
while True:
pos = s.find(x, start)
if pos == -1:
break
positions.append(pos)
start = pos xl
return positions
LSN_OLD = struct.pack('>Q',CURRENT_LSN)
LSN_NEW = struct.pack('>Q',LSN)
f2 = open(filename2,'wb')
with open(filename, 'rb') as f:
while True:
data = f.read(PAGE_SIZE)
if data == b'':
break
for x in find_xx_positions(data,LSN_OLD):
data = data[:x] LSN_NEW data[x 8:]
data = replace_crc32(data)
f2.write(data)
f2.flush()
f2.close()
可惜, 不行. 虽然报错看起来像是坏块, 但检查了不是坏块, 这回应该是内部的数据存在校验了. 或者是ibdata1根本没记录LSN, lsn只记录在redo里面
LSN修改
既然改不了系统的, 那就还是来修改我们自己的ibd文件的LSN咯... 根据上面的图, 我们知道1个page里面有多个lsn
那我们直接将lsn修改为0吧. 再改改代码
修改之后查询数据,并查看日志信息如下: LSN问题确实没得了, 但trx问题还是没处理啊....
虽然我们也可以修改trx_id, 但这涉及到行数据的解析了, 比较麻烦. 还是老老实实的使用import tablespace吧.
总结
数据恢复的时候还是先使用官方的命令,(不但帮忙修改了tablespace id, 甚至还修改了lsn&trx?). 如果尝试手动修改tablespace信息的话, 还需要考虑LSN
和TRX ID
之类的问题.老麻烦了
附源码
这里附的是修改ibd文件的tablespace id和lsn的源码, github上那个没得lsn的修改的.(也就两三行代码的差距)
代码语言:python代码运行次数:0复制#!/usr/bin/env python
# -*- coding: utf-8 -*-
# write by ddcw
# innodb 表空间中的 space_id的替换
# 表空间id位于34-38 大端字节序
PAGE_SIZE = 16384
import sys,struct,os
args = len(sys.argv)
if args == 2:
with open(sys.argv[1],'rb') as f:
data = f.read(PAGE_SIZE)
space_id = struct.unpack('>L',data[34:38])[0]
msg = "TABLESPACE ID: " str(space_id) 'n'
sys.stdout.write(msg)
sys.exit()
elif args == 4:
filename = sys.argv[1]
space_id = int(sys.argv[2])
filename2 = sys.argv[3]
if not os.path.exists(filename):
msg = filename " IS NOT EXISTS.n"
sys.stdout.write(msg)
sys.exit(1)
elif int(os.stat(filename).st_size % PAGE_SIZE) != 0:
msg = filename " Maybe not mysql's ibd filen"
sys.stdout.write(msg)
sys.exit(2)
if os.path.exists(filename2):
msg = filename2 " IS EXISTS. Please rename itn"
sys.stdout.write(msg)
sys.exit(3)
else:
msg = "nExample: npython " sys.argv[0] " test.ibdnpython " sys.argv[0] " test.ibd 123456 new_test.ibdnn"
sys.stdout.write(msg)
sys.exit(4)
def create_crc32c_table():
poly = 0x82f63b78
table = []
for i in range(256):
crc = i
for _ in range(8):
if crc & 1:
crc = (crc >> 1) ^ poly
else:
crc >>= 1
table.append(crc)
return table
crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
crc = 0xFFFFFFFF
for byte in bytearray(data): # for PY2
crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
return crc ^ 0xFFFFFFFF
def replace_lsn(data):
return data[:16] struct.pack('>Q',0) data[24:PAGE_SIZE-4:] struct.pack('>L',0)
def replace_crc32(data):
data = replace_lsn(data)
c1 = calculate_crc32c(data[4:26])
c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
cb = struct.pack('>L',(c1^c2)&(2**32-1))
data = cb data[4:PAGE_SIZE-8] cb data[PAGE_SIZE-4:]
return data
f2 = open(filename2,'wb')
SPACE_ID = struct.pack('>L',space_id)
with open(filename, 'rb') as f:
# FSP 38-42 (SPACE_HEADER:4 is SPACE ID)
data = f.read(PAGE_SIZE)
data = data[:34] SPACE_ID SPACE_ID data[42:]
data = replace_crc32(data)
f2.write(data)
while True:
data = f.read(PAGE_SIZE)
if data == b'':
break
if data[34:38] != b'x00x00x00x00':
data = replace_crc32(data[:34] SPACE_ID data[38:])
f2.write(data)
f2.flush()
f2.close()
msg = 'Write to filename: ' filename2 'n'
sys.stdout.write(msg)