Hive Schema Tool元数据运维

2021-01-07 18:55:48 浏览数 (1)

Hive Schema存在的问题

较早的Hive版本,不会在MetaStore中写入版本号。所以升级到新版本之后,会报错:

代码语言:javascript复制
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

在日志中会提示以下信息:

代码语言:javascript复制
Caused by: MetaException(message:Version information not found in metastore. )

这种情况,可以在较早的Hive中设置hive.metastore.schema.verification=true,来开启版本号的写入。

但在版本升级时遇到了这种情况,就需要使用Hive Schema Tool来解决了。

什么是Hive Schema Tool

Hive提供Hive Schema Tool用于MetaSore Schema的运维修复、升级。

代码语言:javascript复制
$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type
 -driver <driver>                   Driver name for connection
 -dryRun                            List SQL scripts (no execute)
 -help                              Print this message
 -info                              Show config and schema details
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -metaDbType <metaDatabaseType>     Used only if upgrading the system catalog for hive
 -passWord <password>               Override config file password
 -upgradeSchema                     Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -url <url>                         Connection url to the database
 -userName <user>                   Override config file user name
 -verbose                           Only print SQL statements
(Additional catalog related options added in Hive 3.0.0 (HIVE-19135] release are below.
 -createCatalog <catalog>       Create catalog with given name
 -catalogLocation <location>        Location of new catalog, required when adding a catalog
 -catalogDescription <description>  Description of new catalog
 -ifNotExists                       If passed then it is not an error to create an existing catalog
 -moveDatabase <database>                     Move a database between catalogs.  All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
 -moveTable  <table>                Move a table to a different database.  Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase 
 -toCatalog  <catalog>              Catalog a moving database or table is going to.  This is required if you are moving a database or table.
 -fromCatalog <catalog>             Catalog a moving database or table is coming from.  This is required if you are moving a database or table.
 -toDatabase  <database>            Database a moving table is going to.  This is required if you are moving a table.
 -fromDatabase <database>           Database a moving table is coming from.  This is required if you are moving a table.

支持derby|mysql|postgres|oracle|mssql这几种dbtype类型。

Hive Schema Tool的使用

以下是Hive Schema Tool的官方使用演示。

  1. 初始化元数据信息,在数据库derby中生成Shema数据 schematool -dbType derby -initSchema
  2. 获取元数据Schema信息 schematool -dbType derby -info
  3. 将元数据Schema信息升级到当前版本,upgradeSchemaFrom参数指定旧的hive版本 schematool -dbType derby -upgradeSchemaFrom 0.10.0
  4. 将元数据Schema信息升级到当前版本,并查看升级所需要的脚本 schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
  5. 将hive元数据信息迁移到spark目录中 schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
  6. 将Hive数据库和表迁移到Spark中 # 在spark中创建对应数据库newdb,用于接收hive迁移来的数据库 beeline ... -e "create database if not exists newdb"; # 进行数据库迁移 schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark # 进行表数据迁移 schematool -moveTable table1 -fromCatalog hive -toCatalog spark -fromDatabase db1 -toDatabase newdb

Hive Schema Tool解决Hive元数据问题十分方便,而且还支持数据迁移到Spark,当真是一款运维利器。

0 人点赞