1. 安装 docker-compose
代码语言:javascript复制pip install docker-compose
2. docker-hive安装
参考了一些 https://www.likecs.com/show-152186.html 中的信息,去 https://github.com/big-data-europe/docker-hive 中把下载 docker-compose.yml
和 hadoop-hive.env
放在 docker-hive路径下,cmd输入 docker-compose up -d,会部署hive相关的容器
docker-compose exec hive-server bash
or docker exec -it docker-hive_hive-server_1 /bin/bash
进入 hive-server 容器:
hive 创建数据库: CREATE DATABASE IF NOT EXISTS ai_data; show databases;
然后需要在容器内部署写hive的服务,发现镜像的 Python 版本是3.4 的,需要升级
2.1 升级镜像内的python环境
参考 https://blog.csdn.net/mameng1988/article/details/83782831 进行换源,加速后续下载,安装vim git wget
, apt upgrade
apt install software-properties-common
apt-get install python-pip python3-pip
apt-get install libsasl2-dev
apt install python3-dev
wget https://www.openssl.org/source/old/1.1.1/openssl-1.1.1.tar.gz --no-check-certificate
安装openssl 参考 https://www.cnblogs.com/chengfo/p/16289666.html
升级 python到3.8 https://blog.csdn.net/qq_38048756/article/details/121211362
注意需要配置 ssl,vim python3_安装文件路径/Modules/Setup
填写openssl的路径,还有取消5行注释,如上图所示
pip 换源
代码语言:javascript复制vim ~/.pip/pip.conf
代码语言:javascript复制[global]
index-url=http://mirrors.aliyun.com/pypi/simple
[install]
trusted-host=mirrors.aliyun.com
创建虚拟环境
代码语言:javascript复制pip install virtualenv -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
python -m venv venv
python -m pip install xxx
git clone https://username:PASSWORD@git.XXX
apt-get install mysql-server mysql-client libmysqlclient-dev
3. 制作镜像
把上面做好的镜像打包为 ai_hive
版本 v3
docker commit -m "ai hive" docker-hive_hive-server_1 ai_hive:v3
这一步的目的是,后面可以直接用自己的镜像,docker-compose up -d
会使得 步骤2 中的操作丢失
4. yml配置
更改 yml 配置,使用自己打包好的镜像
起别名 docker tag bde2020/hive:2.3.2-postgresql-metastore ai_hive
5. hive-server容器内收不到 kafka消息
docker kafka 用的是 https://github.com/bitnami/containers/blob/main/bitnami/kafka/README.md 参考 https://blog.csdn.net/u013012063/article/details/120326757 解决 收不到消息的问题
最后的配置贴在这里
docker-compose.yml
version: "3"
networks:
app-tier:
driver: bridge
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
volumes:
- namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop-hive.env
ports:
- "50070:50070"
networks:
- app-tier
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
volumes:
- datanode:/hadoop/dfs/data
env_file:
- ./hadoop-hive.env
environment:
SERVICE_PRECONDITION: "namenode:50070"
ports:
- "50075:50075"
networks:
- app-tier
hive-server:
#image: bde2020/hive:2.3.2-postgresql-metastore
image: ai_hive:v3
env_file:
- ./hadoop-hive.env
environment:
HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
SERVICE_PRECONDITION: "hive-metastore:9083"
ports:
- "10000:10000"
networks:
- app-tier
links:
- kafka
depends_on:
- kafka
- zookeeper
hive-metastore:
image: bde2020/hive:2.3.2-postgresql-metastore
env_file:
- ./hadoop-hive.env
command: /opt/hive/bin/hive --service metastore
environment:
SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432"
ports:
- "9083:9083"
networks:
- app-tier
hive-metastore-postgresql:
image: bde2020/hive-metastore-postgresql:2.3.0
networks:
- app-tier
presto-coordinator:
image: shawnzhu/prestodb:0.181
ports:
- "8080:8080"
networks:
- app-tier
zookeeper:
image: 'bitnami/zookeeper:latest'
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
networks:
- app-tier
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
- '29092:29092'
environment:
- KAFKA_BROKER_ID=1
#- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
#- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://127.0.0.1:9092
- KAFKA_LISTENER_SECURITY_PROTOCOL_MAP= CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS= CLIENT://:9092,EXTERNAL://:29092
- KAFKA_ADVERTISED_LISTENERS= CLIENT://kafka:9092,EXTERNAL://10.24.0.8:29092
- KAFKA_INTER_BROKER_LISTENER_NAME= CLIENT
- KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR= 1
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
networks:
- app-tier
volumes:
namenode:
datanode:
其中 10.24.0.8
是 hive-server里面 输入 ping kafka
后显示的 kafka ip
app-tier
网络 外部访问 kafka 需要使用 EXTERNAL
的端口 10.24.0.8:29092
在内部访问的话,kafka:9092
,kafka:29092
都可以
hadoop-hive.env
HIVE_SITE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore-postgresql/metastore
HIVE_SITE_CONF_javax_jdo_option_ConnectionDriverName=org.postgresql.Driver
HIVE_SITE_CONF_javax_jdo_option_ConnectionUserName=hive
HIVE_SITE_CONF_javax_jdo_option_ConnectionPassword=hive
HIVE_SITE_CONF_datanucleus_autoCreateSchema=false
HIVE_SITE_CONF_hive_metastore_uris=thrift://hive-metastore:9083
HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
CORE_CONF_hadoop_http_staticuser_user=root
CORE_CONF_hadoop_proxyuser_hue_hosts=*
CORE_CONF_hadoop_proxyuser_hue_groups=*
HDFS_CONF_dfs_webhdfs_enabled=true
HDFS_CONF_dfs_permissions_enabled=false
YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver
YARN_CONF_yarn_resourcemanager_address=resourcemanager:8032
YARN_CONF_yarn_resourcemanager_scheduler_address=resourcemanager:8030
YARN_CONF_yarn_resourcemanager_resource__tracker_address=resourcemanager:8031
整了3天,做个记录,能帮到你少走弯路就好。