同事反馈说某个MySQL数据库创建索引提示错误,模拟报错如下,
代码语言:javascript复制CREATE INDEX t_reg_code_idx USING BTREE ON t(reg_code)
BLOB/TEXT column 'reg_code' used in key specification without a key length
从这个提示,可以知道是给T表的reg_code字段创建一个BTREE索引,而这个reg_code列的字段类型是BLOB或TEXT,错误信息说的是需要在键的说明中有长度定义,这是什么意思?
这个库是MySQL 8.0,从官方手册,可以找到这段对Index Prefixes的说明(如下所示),意思是如果对BLOB或者TEXT列创建索引,必须指定索引的前缀长度。对于使用REDUNDANT或者COMPACT行格式的InnoDB表,索引前缀最多767个字节,对于使用DYNAMIC或者COMPRESSED行格式的InnoDB表,索引前缀的上限最多是3072个字节,如果是MyISAM表,前缀长度最多可以达到1000个字节。
https://dev.mysql.com/doc/refman/8.0/en/column-indexes.html
Index Prefixes
With col_name(N)
syntax in an index specification for a string column, you can create an index that uses only the first N
characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB
or TEXT
column, you must specify a prefix length for the index. For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
Prefixes can be up to 767 bytes long for InnoDB
tables that use the REDUNDANT
or COMPACT
row format. The prefix length limit is 3072 bytes for InnoDB
tables that use the DYNAMIC
or COMPRESSED
row format. For MyISAM tables, the prefix length limit is 1000 bytes.
Note:Prefix limits are measured in bytes, whereas the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB). Take this into account when specifying a prefix length for a nonbinary string column that uses a multibyte character set.
If a search term exceeds the index prefix length, the index is used to exclude non-matching rows, and the remaining rows are examined for possible matches.
MySQL 5.7官方手册中,对索引前缀的限制有所不同,InnoDB表的索引前缀最多可以达到1000个字节(此处结合其它章节的说名和实验,我认为是错误的,应该是3072个字节),但前提是设置了innodb_large_prefix(该参数只对DYNAMIC或者COMPRESSED行格式生效,对REDUNDANT或者COMPACT行格式无效),否则只能达到767个字节。
https://dev.mysql.com/doc/refman/5.7/en/column-indexes.html
Index Prefixes
With col_name(N)
syntax in an index specification for a string column, you can create an index that uses only the first N
characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB
or TEXT
column, you must specify a prefix length for the index. For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
Prefixes can be up to 1000 bytes long (767 bytes for InnoDB
tables, unless you have innodb_large_prefix
set).
Note:Prefix limits are measured in bytes, whereas the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB). Take this into account when specifying a prefix length for a nonbinary string column that uses a multibyte character set.
If a search term exceeds the index prefix length, the index is used to exclude non-matching rows, and the remaining rows are examined for possible matches.
因此,可以看到MySQL 5.7和8.0在InnoDB表的索引前缀长度限制的设置上有所调整,但是限制还是有,这是和Oracle等有所不同的一个特性。
可以通过实验,验证下MySQL 8.0对于前缀长度的限制,例如创建一张row format是COMPACT的InnoDB表,指定前缀长度10000,提示最大键的长度只能是767个字节,
代码语言:javascript复制create table test01 (
id int(30) not null auto_increment,
t_a text,
primary key(id),
index idx_t_a(t_a(10000))
) COLLATE='gbk_chinese_ci' ENGINE=InnoDB ROW_FORMAT=COMPACT;
SQL 错误 [1071] [42000]: Specified key was too long; max key length is 767 bytes
创建一张row format是COMPRESSED的InnoDB表,指定前缀长度10000,提示最大键的长度只能是3072个字节,
代码语言:javascript复制create table test01 (
id int(30) not null auto_increment,
t_a text,
primary key(id),
index idx_t_a(t_a(10000))
) COLLATE='gbk_chinese_ci' ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
SQL 错误 [1071] [42000]: Specified key was too long; max key length is 3072 bytes
抛开技术问题,和同事追问了下这个操作的背景,原始需求是某个厂商的ETL任务需要从源库将数据导入目标库,源库字段是VARCHAR类型,目标库定义为TEXT,才间接引起的这个问题。推测一种可能的原因,因为VARCHAR、TEXT都可以存储字符串类型的数据,所以没做区分,能存下需要的数据就可以了,另一种可能,为了图省事儿,不用关注源库和目标库字符串类型定义的长度,直接设置了TEXT类型,保证肯定能存下。
无论是何种原因,TEXT这种大字段类型,一般不推荐作为索引检索字段,因为往往它存储了很多字符,索引存储空间会占用更多,索引的区分度也会有影响。
因此,虽然这个问题表象是个技术问题,但实际上来源于不合理的设计,我们在进行应用设计、数据库设计时,如果能多考虑一些合理性,避免一些所谓的省事儿,可能在实际使用过程中就会更顺畅,事半功倍。
近期更新的文章:
《MySQL的字符集转换操作场景》
《金融知识小科普 - 二级市场》
《poweroff指令的介绍》
《MySQL 8.0新密码策略的细节场景讲解》
《几种数据泄露场景的排查解决路径》
近期的热文:
《推荐一篇Oracle RAC Cache Fusion的经典论文》
《"红警"游戏开源代码带给我们的震撼》
文章分类和索引:
《公众号1300篇文章分类和索引》