文档编写目的
- 整理CDH5中安装Impyla的步骤
集群环境
- CDH5.16.2
- anaconda3
- python3.7
组件介绍
- Impyla:适用于分布式查询引擎的HiveServer2实现(例如Impala,Hive)的Python客户端。
Impyla依赖包
- six
- bit_array
- thriftpy
- thrift_sasl
- sasl
安装依赖
- 安装thrift_sasl需要先执行,否则安装会提示缺少sasl.h文件
yum install gcc-c python-devel.x86_64 cyrus-sasl-devel.x86_64
安装其他依赖
代码语言:javascript复制pip install bit_array
pip install thriftpy
pip install six
#指定thrift_sasl==0.2.1, 否则连接hive会报错
pip install thrift_sasl
pip install sasl
安装Impyla
- python3.7不支持最新的版本,需要指定impyla的版本为0.15a1
/usr/local/anaconda3/bin/pip install impyla==0.15a1
Impyla测试
Impala
- 需要impala的jdbc对应的ip和端口
from impala.dbapi import connect
conn = connect(host='192.168.xx.xx',port=25004)
print(conn)
cursor = conn.cursor()
cursor.execute('show databases')
results = cursor.fetchall()
print(results)
cursor.execute('SELECT distinct id FROM ods.test limit 10')
series_code = cursor.fetchall()
print(series_code)
Hive
代码语言:javascript复制from impala.dbapi import connect
conn = connect(host="192.168.xx.xx", port=25005, database="ods", auth_mechanism="PLAIN")
print(conn)
cursor = conn.cursor()
cursor.execute("show databases")
print(cursor.description)
results = cursor.fetchall()
print(results)
cursor.execute("select distinct series_code from ods.test")
print(cursor)
series_code = cursor.fetchall()
print(series_code)