上文简单的了解了airflow的概念与使用场景,今天就通过Docker安装一下Airflow,在使用中在深入的了解一下airflow有哪些具体的功能。
1Airflow容器化部署
阿里云的宿主机环境:
- 操作系统:
Ubuntu 20.04.3 LTS
- 内核版本:
Linux 5.4.0-91-generic
安装docker
安装Docker可参考官方文档[1],纯净系统,就没必要卸载旧版本了,因为是云上平台,为防止配置搞坏环境,你可以先提前进行快照。
代码语言:javascript复制 # 更新repo
sudo apt-get update
sudo apt-get install
ca-certificates
curl
gnupg
lsb-release
# 添加docker gpg key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# 设置docker stable仓库地址
echo
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# 查看可安装的docker-ce版本
root@bigdata1:~# apt-cache madison docker-ce
docker-ce | 5:20.10.12~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.11~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.10~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.9~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
# 安装命令格式
#sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
# 安装指定版本
sudo apt-get install docker-ce=5:20.10.12~3-0~ubuntu-focal docker-ce-cli=5:20.10.12~3-0~ubuntu-focal containerd.io
优化Docker配置
/etc/docker/daemon.json
{
"data-root": "/var/lib/docker",
"exec-opts": [
"native.cgroupdriver=systemd"
],
"registry-mirrors": [
"https://****.mirror.aliyuncs.com" #此处配置一些加速的地址,比如阿里云的等等...
],
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
配置开机自己
代码语言:javascript复制systemctl daemon-reload
systemctl enable --now docker.service
容器化安装Airflow
数据库选型
根据官网的说明,数据库建议使用MySQL8 和postgresql 9.6 ,在官方的docker-compose脚本[2]中使用是PostgreSQL,因此我们需要调整一下docker-compose.yml的内容
代码语言:javascript复制---
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.2.3}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql mysqldb://airflow:aaaa@mysql/airflow # 此处替换为mysql连接方式
AIRFLOW__CELERY__RESULT_BACKEND: db mysql://airflow:aaaa@mysql/airflow # 此处替换为mysql连接方式
AIRFLOW__CELERY__BROKER_URL: redis://:xxxx@redis:6379/0 # 为保证安全,我们对redis开启了认证,因此将此处xxxx替换为redis密码
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
mysql: # 此处修改为mysql service名称
condition: service_healthy
services:
mysql:
image: mysql:8.0.27 # 修改为mysql最新版镜像
environment:
MYSQL_ROOT_PASSWORD: bbbb # MySQL root账号密码
MYSQL_USER: airflow
MYSQL_PASSWORD: aaaa # airflow用户的密码
MYSQL_DATABASE: airflow
command:
--default-authentication-plugin=mysql_native_password # 指定默认的认证插件
--collation-server=utf8mb4_general_ci # 依据官方指定字符集
--character-set-server=utf8mb4 # 依据官方指定字符编码
volumes:
- /apps/airflow/mysqldata8:/var/lib/mysql # 持久化MySQL数据
- /apps/airflow/my.cnf:/etc/my.cnf # 持久化MySQL配置文件
healthcheck:
test: mysql --user=$$MYSQL_USER --password=$$MYSQL_PASSWORD -e 'SHOW DATABASES;' # healthcheck command
interval: 5s
retries: 5
restart: always
redis:
image: redis:6.2
expose:
- 6379
command: redis-server --requirepass xxxx # redis-server开启密码认证
healthcheck:
test: ["CMD", "redis-cli","-a","xxxx","ping"] # redis使用密码进行healthcheck
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
function ver() {
printf "dddd" $${1//./ }
}
airflow_version=$$(gosu airflow airflow version)
airflow_version_comparable=$$(ver $${airflow_version})
min_airflow_version=2.2.0
min_airflow_version_comparable=$$(ver $${min_airflow_version})
if (( airflow_version_comparable < min_airflow_version_comparable )); then
echo
echo -e "