前言
本片博文是“大数据线上问题排查系列”大类别之一,讲述前段时间我司某产品在某券商遇到的一个问题及解决方案,其背后涉及到 hive 的一个 BUG,在 hive 3.0 才修复。以下是正文。
问题现象
cdh6.2.1中,开启 kerberos 和 sentry 的hive 中,使用 dml 语句 insert overwrite 插入数据到分区表的新分区时,会报错。同时注意到:
- 查看底层 hdfs 发现对应的目录和数据都有了;
- 使用 show partitions xxx 查看不到对应的新分区;
- 使用 select 语句也查不到该新分区下的数据;
- 只要是插入数据到新的分区,不管使用静态分区还是动态分区模式,都会报错;
问题分析
为确定问题,首先尝试了不同场景,发现如下现象:
- 当使用 insert overwrite 插入数据到分区表的已有分区时,不会报错;(不管该分区下有没有数据);
- 使用 insert into 插入数据到分区表的新分区时,不会报错;
- 对于非分区表,insert overwrite 和 insert into 都不会报错;
- 当使用insert overwrite 插入数据到分区表的新分区报错后,可以使用 msck repair test0317 修复hive metastore中相关元数据,修复后 select, show partitions 等语句可以查询到分区和该分区的底层数据,没有问题;
为进一步分析问题,常看了相关日志,包括客户端日志,hiveserver2 日志和 hive metastore 日志分别如下所示。
客户端日志(beeline)
代码语言:javascript复制INFO : Loading data to table apollo_ods_jzfix.test0317 partition (ptdate=1) from hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1/.hive-staging_hive_2021-03-17_15-09-13_232_1543365768355672834-7333/-ext-10000
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Cumulative CPU: 2.56 sec HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
INFO : Total MapReduce CPU Time Spent: 2 seconds 560 msec
INFO : Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event (state=08S01,code=1)
hive metastore 日志
代码语言:javascript复制2021-03-17 15:09:30,039 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_table : db=apollo_ods_jzfix tbl=test0317
2021-03-17 15:09:30,044 INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-9-thread-122]: 121: get_partition : db=apollo_ods_jzfix tbl=test0317[1]
2021-03-17 15:09:30,044 INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_partition : db=apollo_ods_jzfix tbl=test0317[1]
2021-03-17 15:09:30,053 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-122]: NoSuchObjectException(message:partition values=[1])
at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003)
at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy26.getPartition(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553)
at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737)
at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-03-17 15:09:30,054 ERROR org.apache.thrift.ProcessFunction: [pool-9-thread-122]: Internal error processing fire_listener_event
org.apache.hadoop.hive.metastore.api.NoSuchObjectException: partition values=[1]
at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy26.getPartition(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
hiveserver2 日志
代码语言:javascript复制 2021-03-17 15:09:30,016 INFO org.apache.hadoop.hive.ql.exec.MoveTask: [HiveServer2-Background-Pool: Thread-183022]: Partition is: {ptdate=1}
2021-03-17 15:09:30,033 INFO org.apache.hadoop.hive.common.FileUtils: [HiveServer2-Background-Pool: Thread-183022]: Creating directory if it doesn't exist: hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1
2021-03-17 15:09:30,057 WARN org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: [HiveServer2-Background-Pool: Thread-183022]: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2021-03-17 15:09:31,063 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Closed a connection to metastore, current connections: 9231
2021-03-17 15:09:31,063 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Trying to connect to metastore with URI thrift://dev-dw-dn01:9083
2021-03-17 15:09:31,065 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Opened a connection to metastore, current connections: 9232
2021-03-17 15:09:31,065 INFO hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Connected to metastore.
2021-03-17 15:09:31,196 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_lis
tener_event2021-03-17 15:09:31,196 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: MapReduce Jobs Launched:
2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Stage-Stage-1: Map: 1 Cumulative CPU: 2.56 sec HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Total MapReduce CPU Time Spent: 2 seconds 560 msec
2021-03-17 15:09:31,197 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
2021-03-17 15:09:31,206 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-183022]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2433) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
... 11 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
问题临时解决方法
通过以上不同场景下的尝试和相关日志,可以确定这应该是某些配置项引起的 HIVE 的一个 BUG,但笔者当时并没有发现相关的JIRA。为了快速推进项目,笔者当时提出了以下 workaround:
- 业务测问题解决方法一:在 insert overwrite 后 (会报错),增加脚本执行命令 msck repair test0317 修复hive metastore 中相关元数据: 该方法的缺点是不太方便嵌入到调度流程中,因insert overwrite报错后,调度工具就不会继续调度执行后续流程了,无法做到业务代码不侵入;
- 业务测问题解决方法二:在执行正常业务流程前,添加一个前置流程,该前置流程的内容是提前创建好正常流程中需要的新分区,其内容实例如下:alter table test0317 add partition (ptdate=10); (ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec;
最终为快速推进项目,我们采用了上述方案2临时解决了问题。
问题解决方法
在有了临时解决方案,确保了项目能顺利推进的同时,为彻底了解背后技术细节,笔者也在 cloudera 社区发了帖子寻求支持,但社区的相应速度没有那么及时,前几日才收到回馈,截图如下:
JIRA截图
根据 cloudra 官方的回复,参考上述 jira 链接,并对照源码,确认这是 HIVE 的一个BUG,且在 HIVE 3.0 才修复完毕,其主要问题概括如下:
在 hive.metastore.dml.events 配置为 true时,向不存在的分区插入数据,比如向新分区 insert overwrite,或以动态分区模式插入数据到新分区时,hive原先的处理有问题,会因为 metastore 找不到分区而断开链接并报错。
所以,在上述业务侧可以采取的方案二之外,还可以采取如下解决方案:
- 配置 hive.metastore.dml.events 为false
- 或升级到hive3.0.