简介:
Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
1, 适用场景
Hive 构建在基于静态批处理的Hadoop 之上,Hadoop 通常都有较高的延迟并且在作业提交和调度的时候需要大量的开销。因此,Hive 并不能够在大规模数据集上实现低延迟快速的查询,例如,Hive 在几百MB 的数据集上执行查询一般有分钟级的时间延迟。因此,
Hive 并不适合那些需要低延迟的应用,例如,联机事务处理(OLTP)。Hive 查询操作过程严格遵守Hadoop MapReduce 的作业执行模型,Hive 将用户的HiveQL语句通过解释器转换为MapReduce 作业提交到Hadoop 集群上,Hadoop 监控作业执行过程,然后返回作业执行结果给用户。Hive 并非为联机事务处理而设计,Hive 并不提供实时的查询和基于行级的数据更新操作。Hive 的最佳使用场合是大数据集的批处理作业,例如,网络日志分析。
2,下载安装 前期Hadoop安装准备,参考CentOS 6.4下Hadoop2.3.0详细安装过程:http://www.linuxidc.com/Linux/2014-08/105915.htm
下载地址
wget http://mirror.bit.edu.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
解压安装
tar zxvf apache-hive-0.13.1-bin.tar.gz -C /home/hadoop/src/
PS:Hive只需要在一个节点上安装即可,本例安装在name节点上面的虚拟机上面,与hadoop的name节点复用一台虚拟机器。
3,配置hive环境变量
vim hive-env.sh
export HIVE_HOME=/home/hadoop/src/hive-0.13.1
export PATH=$PATH:$HIVE_HOME/bin
4,配置hadoop以及hbase参数
vim hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/src/hadoop-2.3.0/
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/src/hive-0.13.1/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/src/hive-0.13.1/lib
5,验证安装:
启动hive命令行模式,出现hive,说明安装成功了
[hadoop@name01 lib]$ hive --service cli
15/01/09 00:20:32 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
创建表,执行create命令,出现OK,说明命令执行成功,也说明hive安装成功。
hive> create table test(key string);
OK
Time taken: 8.749 seconds
hive>
6,验证可用性
启动hive
[hadoop@name01 root]$hive --service metastore &
查看后台hive运行进程
[hadoop@name01 root]$ ps -eaf|grep hive
hadoop 4025 2460 1 22:52 pts/0 00:00:19 /usr/lib/jvm/jdk1.7.0_60/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/src/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/src/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/src/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/src/hive-0.13.1/lib/hive-service-0.13.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop 4575 4547 0 23:14 pts/1 00:00:00 grep hive
[hadoop@name01 root]$
6.1在hive下执行命令,创建2个字段的表,字段间隔用’,’隔开:
hive> create table test(key string);
OK
Time taken: 8.749 seconds
hive> create table tim_test(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.145 seconds
hive>
6.2准备导入到数据库的txt文件,并输入值:
[hadoop@name01 hive-0.13.1]$ more tim_hive_test.txt
123,xinhua
456,dingxilu
789,fanyulu
903,fahuazhengroad
[hadoop@name01 hive-0.13.1]$
6.4 再打开一个xshell端口,进入服务器端启动hive:
[hadoop@name01 root]$ hive --service metastore
Starting Hive Metastore Server
6.5 再打开一个xshell端口,进入hive客户端录入数据:
[hadoop@name01 hive-0.13.1]$ hive
Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> load data local inpath '/home/hadoop/src/hive-0.13.1/tim_hive_test.txt' into table tim_test;
Copying data from file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt
Copying file: file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt
Loading data to table default.tim_test
[Warning] could not update stats.
OK
Time taken: 7.208 seconds
hive>
6.6 验证录入数据是否成功,看到dfs出来有tim_test
hive> dfs -ls /home/hadoop/hive/warehouse;
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2015-01-12 01:47 /home/hadoop/hive/warehouse/hive_hbase_mapping_table_1
drwxr-xr-x - hadoop supergroup 0 2015-01-12 02:11 /home/hadoop/hive/warehouse/tim_test
hive>
7,安装部署中的报错记录: 报错1:
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
缺少mysql的jar包,copy到hive的lib目录下面,OK。
报错2:
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.52.130:3306/hive_remote?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: null, message from server: "Host '192.168.52.128' is not allowed to connect to this MySQL server"
将hadoop用户添加到mysql组:
[root@data02 mysql]# gpasswd -a hadoop mysql
Adding user hadoop to group mysql
[root@data02 mysql]#
^C[hadoop@name01 conf]$ telnet 192.168.52.130 3306
Trying 192.168.52.130...
Connected to 192.168.52.130.
Escape character is '^]'.
G
-------------------------------------------------------------------------------- Host '192.168.52.128' is not allowed to connect to this MySQL serverConnection closed by foreign host.
[hadoop@name01 conf]$
解决办法:修改mysql账号
mysql> update user set user = 'hadoop' where user = 'root' and host='%';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.09 sec)
mysql>
报错3:
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOException: Exception thrown calling table.exists() for hive_remote.`SEQUENCE_TABLE`
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
……
NestedThrowablesStackTrace:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
解决,去远程mysql库上修改字符集从utf8mb4修改成utf8
mysql> alter database hive_remote /*!40100 DEFAULT CHARACTER SET utf8 */;
Query OK, 1 row affected (0.03 sec)
mysql>
然后在data01上面配置hive client端
scp -r hive-0.13.1/ data01:/home/hadoop/src/
报错3:
继续启动,查看日志信息:
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
卡在这里不动,去看日志信息
[hadoop@name01 hadoop]$ tail -f hive.log
2015-01-09 03:46:27,692 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore 2015-01-09 03:46:27,892 WARN [main]: metastore.ObjectStore (ObjectStore.java:checkSchema(6295)) - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.0 2015-01-09 03:46:30,574 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore 2015-01-09 03:46:30,582 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore 2015-01-09 03:46:31,168 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty 2015-01-09 03:46:31,473 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server 2015-01-09 03:46:31,481 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]... 2015-01-09 03:46:31,481 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200 2015-01-09 03:46:31,482 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000 2015-01-09 03:46:31,482 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true
在hive-site.xml上添加如下:
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.52.128:9083</value>
</property>
报错4:
2015-01-09 04:01:43,053 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore 2015-01-09 04:01:43,540 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore 2015-01-09 04:01:43,546 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore 2015-01-09 04:01:43,684 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty 2015-01-09 04:01:44,041 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server 2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]... 2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200 2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000 2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true 2015-01-09 04:24:13,917 INFO [Thread-3]: metastore.HiveMetaStore (HiveMetaStore.java:run(5073)) - Shutting down hive metastore.
解决:
查了好久,No user is added in admin role, since config is empty没有查到问题所在,碰到此类情况的一起交流下,欢迎留言。