今天在hbase中创建快照的时候遇到了如下错误:
01.hbase(main):004:0> snapshot 'booking', 'booking-snapshot-20140912' 02. 03.ERROR: org.apache.Hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=booking-snapshot-20140912 table=booking type=FLUSH } had an error. Procedure booking-snapshot-20140912 { waiting=[hbase1.data.cn,60020,1407930968832, hbase45.data.cn,60020,1408609189376, hbase23.data.cn,60020,1407930978740, hbase37.data.cn,60020,1408608587411, hbase46.data.cn,60020,1408609190515, hbase6.data.cn,60020,1407930958926, hbase44.data.cn,60020,1408609188252, hbase7.data.cn,60020,1407930960021, hbase49.data.cn,60020,1408609193897, hbase47.data.cn,60020,1408609191647, hbase21.data.cn,60020,1407930976874, hbase39.data.cn,60020,1408608669063, hbase13.data.cn,60020,1407930966976, hbase15.data.cn,60020,1407930969235, hbase19.data.cn,60020,1407930973863, hbase16.data.cn,60020,1407930971152, hbase18.data.cn,60020,1407930972762, hbase43.data.cn,60020,1408609187126, hbase12.data.cn,60020,1407930966365, hbase10.data.cn,60020,1407930963512, hbase3.data.cn,60020,1407930955378, hbase11.data.cn,60020,1407930965112, hbase24.data.cn,60020,1407930979654, hbase2.data.cn,60020,1407930954308, hbase9.data.cn,60020,1407930962354, hbase38.data.cn,60020,1408608663894, hbase40.data.cn,60020,1408608674240, hbase41.data.cn,60020,1408609184867, hbase4.data.cn,60020,1407930956670, hbase36.data.cn,60020,1408608406292, hbase17.data.cn,60020,1407930972505, hbase35.data.cn,60020,1408607982898, hbase20.data.cn,60020,1407930974993, hbase48.data.cn,60020,1408609192763, hbase22.data.cn,60020,1407930978159, hbase8.data.cn,60020,1407930961333] done=[] } 04. at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:342) 05. at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2905) 06. at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40494) 07. at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) 08. at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) 09. at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) 10. at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 11. at java.util.concurrent.FutureTask.run(FutureTask.java:262) 12. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 13. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 14. at java.lang.Thread.run(Thread.java:744) 15.Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via timer-java.util.Timer@69db0cb4:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1410453067992, End:1410453127992, diff:60000, max:60000 ms 16. at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83) 17. at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:320) 18. at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:332) 19. ... 10 more 20.Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout caused Foreign Exception Start:1410453067992, End:1410453127992, diff:60000, max:60000 ms 21. at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:70) 22. at java.util.TimerThread.mainLoop(Timer.java:555) 23. at java.util.TimerThread.run(Timer.java:505)
出现这种问题的原因是因为和服务器通信超时导致的。所以需要将下面两个参数的默认值进行调整。
1、hbase.snapshot.region.timeout
2、hbase.snapshot.master.timeoutMillis
这两个值的默认值为60000,单位是毫秒,也即1min。如果通信时间超过该值,就会报上面的错误。