[MYSQL] MY-012095 表空间ID不对

2024-09-13 16:01:00 浏览数 (2)

导读

有时候需要使用mysql的数据文件做恢复, 或者其它某些异常导致mysqld启动的时候报错如下:

代码语言:txt复制
2024-09-13T02:20:33.489262Z 0 [Warning] [MY-013244] [Server] --collation-server: 'utf8_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead.
2024-09-13T02:20:33.495902Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-09-13T02:20:33.832733Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-09-13T02:20:33.919585Z 0 [ERROR] [MY-012095] [InnoDB] [FATAL] Tablespace id is 2 in the data dictionary but in file ./db1/t20240912.ibd it is 49591!
2024-09-13T02:20:33.919622Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: fil0fil.cc:2582:ib::fatal triggered thread 140154526787328
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:20:33 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long) 0x2e) [0x1f9ffde]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(print_fatal_signal(int) 0x2eb) [0x103628b]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_server_abort() 0x5e) [0x103638e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_abort() 0xa) [0x1f9a3aa]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long) 0x30c) [0x225c4bc]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ib::fatal::~fatal() 0x98) [0x225ee18]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::get_file_size(fil_node_t*, bool) 0x3fe) [0x239df2e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::open_file(fil_node_t*) 0x294) [0x23ad124]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::prepare_file_for_io(fil_node_t*) 0x33) [0x23adfd3]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::do_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) 0x1a9) [0x23aeb69]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*) 0x57) [0x23af5f7]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t const&, page_size_t const&, bool) 0x13e) [0x22ed5be]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_background(page_id_t const&, page_size_t const&, bool) 0x26) [0x22ed966]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x22d1503]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_dump_thread() 0xe5) [0x22d1c55]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)()> > >::_M_run() 0xbc) [0x212061c]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x279e1ef]

分析

根据描述来看是 db1/t20240912.ibd 数据字典里面记录的Tablespace id是2, 但实际文件记录的是49591. 然后就挂了

既然是表空间id不一致, 那解决办法就至少有3种了.

1: 修改数据字典

2: 修改ibd文件里面的表空间id

3: 使用alter table xxx import tablespace的方式导入表

基于我们之前修改 lower_case_table_names 的经验来看, 修改ibd文件里面的信息更实际一点-_-

虽然之前讲过ibd的结构, 但这里再来回顾一下

也就是每页的第34-38字节 和 第1页的38-42 字节记录了表空间ID. 我们只需要修改这里就行.(记得同步校验字段)

验证

方法1(推荐)

我们先使用import tablespace的方式导入, 该方式需要先discard表空间, 所以记得先备份表空间文件, 不然discard了就没得了. 当然现在数据库都起不来了, 我们直接mv走就是了

代码语言:shell复制
mv /data/mysql_3306/mysqldata/db1/t20240912.ibd /tmp
systemctl start mysqld_3306

应该会在日志里面发现如下Warning, 但我们并不在意它

代码语言:txt复制
2024-09-13T06:23:30.463261Z 1 [Warning] [MY-012351] [InnoDB] Tablespace 2, name 'db1/t20240912', file './db1/t20240912.ibd' is missing!

然后我们再拷贝回去并导入数据库

代码语言:sql复制
SHELL> mv /tmp/t20240912.ibd /data/mysql_3306/mysqldata/db1
SQL> alter table db1.t20240912 import tablespace;

方法2

方法1看起来没毛病, 我们先使用脚本看下t20240912.ibd的表空间id是多少(2还是49591呢?) 脚本放在文末的

好家伙, 居然给我修改为了2... 汽油磁力!

我们也来使用脚本来修改ibd文件里面的表空间id为2吧. (先回退快照,或者重新模拟下故障)

代码语言:shell复制
systemctl stop mysqld_3306
rm -rf /data/mysql_3306/mysqldata/db1/t20240912.ibd
python mysql_replace_tablespaceid.py t20240912.ibd 2 /data/mysql_3306/mysqldata/db1/t20240912.ibd
chown mysql:mysql /data/mysql_3306/mysqldata/db1/t20240912.ibd
systemctl start mysqld_3306

看起来没报错, 我们登录数据库验证下呢

说:该ibd文件的事务比我们系统的事务要高.... (使用旧的ibd文件在新环境恢复的事情被发现了). 官方竟然给我们来了这么一手. 汽油磁力!*2

error日志里面也是类似的信息(但更具体了):

代码语言:txt复制
2024-09-13T06:41:20.444460Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=4] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444507Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2024-09-13T06:41:20.444523Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=1] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444533Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.

理论上通过拷贝redo,ibdata等文件是可以解决这个问题的. 但实际环境哪来那么多东西呢.(当然也可以使用ibd2sql之类的工具直接将ibd文件转为SQL语句). 也可以直接修改ibd文件里面的trx_id, 但这就太复杂了. 有没得简单点的方法呢???

按照官方的设置 innodb_force_recovery应该可以(毕竟可以不看redo了都).

或者就是修改系统的LSN(18335077)为大于等于我们当前值(113002135714). 毕竟ibdata也是page 比如:

那我们小试一波.

代码语言:python代码运行次数:0复制
import struct
filename = '/data/mysql_3306/mysqldata/ibdata1'
filename2 = '/tmp/ibdata1'
PAGE_SIZE = 16384
CURRENT_LSN = 18335077
LSN = 113002135714   100000
def create_crc32c_table():
	poly = 0x82f63b78
	table = []
	for i in range(256):
		crc = i
		for _ in range(8):
			if crc & 1:
				crc = (crc >> 1) ^ poly
			else:
				crc >>= 1
		table.append(crc)
	return table

crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
	crc = 0xFFFFFFFF
	for byte in bytearray(data): # for PY2
		crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
	return crc ^ 0xFFFFFFFF

def replace_crc32(data):
	c1 = calculate_crc32c(data[4:26])
	c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
	cb = struct.pack('>L',(c1^c2)&(2**32-1))
	data = cb   data[4:PAGE_SIZE-8]   cb   data[PAGE_SIZE-4:]
	return data

def find_xx_positions(s,x):
	positions = []
	xl = len(x)
	start = 0
	while True:
		pos = s.find(x, start)
		if pos == -1:
			break
		positions.append(pos)
		start = pos   xl
	return positions

LSN_OLD = struct.pack('>Q',CURRENT_LSN)
LSN_NEW = struct.pack('>Q',LSN)
f2 = open(filename2,'wb')
with open(filename, 'rb') as f:
	while True:
		data = f.read(PAGE_SIZE)
		if data == b'':
			break
		for x in find_xx_positions(data,LSN_OLD):
			data = data[:x]   LSN_NEW   data[x 8:]
			data = replace_crc32(data)
		f2.write(data)

f2.flush()
f2.close()

可惜, 不行. 虽然报错看起来像是坏块, 但检查了不是坏块, 这回应该是内部的数据存在校验了. 或者是ibdata1根本没记录LSN, lsn只记录在redo里面

LSN修改

既然改不了系统的, 那就还是来修改我们自己的ibd文件的LSN咯... 根据上面的图, 我们知道1个page里面有多个lsn

那我们直接将lsn修改为0吧. 再改改代码

修改之后查询数据,并查看日志信息如下: LSN问题确实没得了, 但trx问题还是没处理啊....

虽然我们也可以修改trx_id, 但这涉及到行数据的解析了, 比较麻烦. 还是老老实实的使用import tablespace吧.

总结

数据恢复的时候还是先使用官方的命令,(不但帮忙修改了tablespace id, 甚至还修改了lsn&trx?). 如果尝试手动修改tablespace信息的话, 还需要考虑LSNTRX ID之类的问题.老麻烦了

附源码

这里附的是修改ibd文件的tablespace id和lsn的源码, github上那个没得lsn的修改的.(也就两三行代码的差距)

代码语言:python代码运行次数:0复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# write by ddcw
# innodb 表空间中的 space_id的替换
# 表空间id位于34-38 大端字节序

PAGE_SIZE = 16384
import sys,struct,os
args = len(sys.argv)
if args == 2:
	with open(sys.argv[1],'rb') as f:
		data = f.read(PAGE_SIZE) 
		space_id = struct.unpack('>L',data[34:38])[0]
		msg = "TABLESPACE ID: "   str(space_id)   'n'
		sys.stdout.write(msg)
		sys.exit()
elif args == 4:
	filename = sys.argv[1]
	space_id = int(sys.argv[2])
	filename2 = sys.argv[3]
	if not os.path.exists(filename):
		msg = filename   " IS NOT EXISTS.n"
		sys.stdout.write(msg)
		sys.exit(1)
	elif int(os.stat(filename).st_size % PAGE_SIZE) != 0:
		msg = filename   " Maybe not mysql's ibd filen"
		sys.stdout.write(msg)
		sys.exit(2)
	if os.path.exists(filename2):
		msg = filename2   " IS EXISTS. Please rename itn"
		sys.stdout.write(msg)
		sys.exit(3)
else:
	msg = "nExample: npython "   sys.argv[0]   " test.ibdnpython "   sys.argv[0]   " test.ibd 123456 new_test.ibdnn"
	sys.stdout.write(msg)
	sys.exit(4)

def create_crc32c_table():
	poly = 0x82f63b78
	table = []
	for i in range(256):
		crc = i
		for _ in range(8):
			if crc & 1:
				crc = (crc >> 1) ^ poly
			else:
				crc >>= 1
		table.append(crc)
	return table

crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
	crc = 0xFFFFFFFF
	for byte in bytearray(data): # for PY2
		crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
	return crc ^ 0xFFFFFFFF

def replace_lsn(data):
	return data[:16]   struct.pack('>Q',0)   data[24:PAGE_SIZE-4:]   struct.pack('>L',0)

def replace_crc32(data):
	data = replace_lsn(data)
	c1 = calculate_crc32c(data[4:26])
	c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
	cb = struct.pack('>L',(c1^c2)&(2**32-1))
	data = cb   data[4:PAGE_SIZE-8]   cb   data[PAGE_SIZE-4:]
	return data



f2 = open(filename2,'wb')
SPACE_ID = struct.pack('>L',space_id)
with open(filename, 'rb') as f:
	# FSP 38-42 (SPACE_HEADER:4 is SPACE ID) 
	data = f.read(PAGE_SIZE)
	data = data[:34]   SPACE_ID   SPACE_ID   data[42:]
	data = replace_crc32(data)
	f2.write(data)
	while True:
		data = f.read(PAGE_SIZE)
		if data == b'':
			break
		if data[34:38] != b'x00x00x00x00':
			data = replace_crc32(data[:34]   SPACE_ID   data[38:])
		f2.write(data)

f2.flush()
f2.close()
msg = 'Write to filename: '   filename2   'n'
sys.stdout.write(msg)

0 人点赞