1、概述
1.1 介绍
在分布式架构、微服务以及k8s
生态相关技术环境下,对应用的请求链路进行追踪(也叫做APM
,Application Performance Management
)是非常有必要的,链路追踪简单来说就是将应用从流量到达前端开始,一直到最后端的数据库核心,中间经过的每一层请求链路的完整行为都记录下来,而且通过可视化的形式实现链路信息查询、依赖关系、性能分析、拓扑展示等等,利用链路追踪系统可以很好的帮我们定位问题,这是常规监控手段实现起来比较困难的
常用的链路追踪系统有商业版本和开源版本,比较出名(我了解过的)的有如下:
- 商业版本
- 听云
- 博睿宏远
- 开源版本
- Skywalking:中国,个人开源,目前隶属于
Apache
基金会,作者近期刚刚入选Apache
首位中国董事 - Pinpoint:韩国,个人开源
- Zipkin:美国,Twitter公司开源
- Cat:中国,美团开源
- Skywalking:中国,个人开源,目前隶属于
具体每一款链路追踪系统的的详细信息可以在网上找到,其中商业版本这里不做评价
开源版本中后两款对业务代码有侵入性,前两款的对比可以参考下图
图片地址:https://skywalking.apache.org/zh/2019-02-24-skywalking-pk-pinpoint/0081Kckwly1gkl4kjo1okj30in0q3gnb.jpg
1.2 组件
本文采用的是SkyWalking
,简单来说分为以下几个组成部分(以本文中的部署方式划分)
- skywalking-oap-server:后端服务
- skywalking-ui:
ui
前端 - skywalking-es-init:初始化
es
集群数据使用 - elasticsearch:存储
skywalking
的数据指标
2、基础准备
2.1 准备helm环境
helm3
版本只需要一个二进制包即可,我这里的版本如下
# helm version
version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}
2.2 创建单独的ns
将skywalking
部署在单独的命名空间下
# kubectl create ns monitoring
namespace/monitoring created
2.3 创建secret
这里记录的是在内网环境下部署的skywalking
,本地电脑为helm
部署客户端可以访问外网,k8s
集群无外网。因此需要将skywalking
用到的镜像全部由内网环境私有镜像仓库提供
2.3.1 拉取镜像的secret
代码语言:javascript复制# kubectl create secret docker-registry registry-pull-secret --docker-username=admin --docker-password=123456 --docker-email=admin@admin.com --docker-server=hub.ssgeek.com -n monitoring
secret/registry-pull-secret created
2.3.2 用于https安全访问的secret
可选步骤,我的集群中有cert-manager
自动颁发证书,提供给skywalking ui
的ingress
使用,对应需要修改后面的chart
包相关部分
# cat certificate.yaml
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: skywalking
namespace: monitoring
spec:
secretName: skywalking
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
duration: 2160h
renewBefore: 360h
keyEncoding: pkcs1
dnsNames:
- skywalking.ssgeek.com
# kubectl apply -f certificate.yaml
certificate.cert-manager.io/skywalking created
# kubectl get certificate,secret -n monitoring|grep skywalking
certificate.cert-manager.io/skywalking True skywalking 2m50s
secret/skywalking kubernetes.io/tls 3 2m49s
2.3.3 用于skywalking ui访问控制的secret
skywalking
的ui
界面默认没有访问控制,可以通过下面基于Nginx Ingress
的basic auth
方案,也可以使用我之前文章中记录的基于k8s Ingress Nginx OAuth2 Gitlab无代码侵入实现自定义服务的外部验证
画重点:这里使用basic
有个小坑,文档参考[^1],经过测试,在创建secret
之前通过htpasswd
工具生成的记录用户名密码的文件的文件名,必须叫auth
,不然经过后续的一顿操作,最终访问的结果还是503
,这与传统方式配置nginx
的basic auth
是不同的,可能在源码中将此参数硬编码了,具体原因没有深究
# htpasswd -c auth skywalking
New password:
Re-type new password:
Adding password for user skywalking
# kubectl -n monitoring create secret generic ui-auth --from-file=auth
secret/ui-auth created
2.4 私有仓库镜像存储
把部署涉及到的相关镜像存储到内部仓库,部署的是目前最新版本的skywalking
apache/skywalking-ui:8.4.0
hub.ssgeek.com/skywalking/skywalking-ui:8.4.0
apache/skywalking-oap-server:8.4.0-es7
hub.ssgeek.com/skywalking/skywalking-oap-server:8.4.0-es7
busybox:1.30
hub.ssgeek.com/skywalking/busybox:1.30
docker.elastic.co/elasticsearch/elasticsearch:7.5.2
hub.ssgeek.com/skywalking/elasticsearch:7.5.2
3、获取chart并更新依赖和value相关参数
获取官方最新的chart,并更新chart依赖,更新依赖会自动下载一个子chart
包,也就是elasticsearch
的官方chart
,下载的包不用解压更改,所有参数都通过父chart
的value.yaml
全局指定
# git clone https://github.com/apache/skywalking-kubernetes.git
# cd skywalking-kubernetes/chart
# helm dep up skywalking
Hang tight while we grab the latest from your chart repositories...
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading elasticsearch from repo https://helm.elastic.co/
Deleting outdated charts
修改value.yaml
,下面的内容中只列出了我修改后的部分内容,其中关于elasticsearch
还有很多参数及优化可供配置,这里仅使用精简配置,更多内容可以参考官方的说明[^2]
...
imagePullSecrets:
- name: registry-pull-secret
initContainer:
image: hub.ssgeek.com/skywalking/busybox
tag: '1.30'
oap:
name: oap
# When 'dynamicConfigEnabled' set to true, enable oap dynamic configuration through k8s configmap,
# Note: The default configmap data is empty, please refer to the detailed documentation (https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/dynamic-config.md)
# Sync period in seconds. Defaults to 60 seconds. env: SW_CONFIG_CONFIGMAP_PERIOD
dynamicConfigEnabled: false
image:
repository: hub.ssgeek.com/skywalking/skywalking-oap-server
tag: 8.4.0-es7 # Must be set explicitly
pullPolicy: IfNotPresent
storageType: elasticsearch7 # 存储类型为es7
...
tolerations: []
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 1Gi
...
env:
# more env, please refer to https://hub.docker.com/r/apache/skywalking-oap-server
# or https://github.com/apache/skywalking-docker/blob/master/6/6.4/oap/README.md#sw_telemetry
SW_NAMESPACE: "skywalking" # 指定es索引前缀为skywalking_, 其中下划线_会自动加上
...
ui:
name: ui
replicas: 1
image:
repository: hub.ssgeek.com/skywalking/skywalking-ui
tag: 8.4.0 # Must be set explicitly
pullPolicy: IfNotPresent
# podAnnotations:
# example: oap-foo
nodeAffinity: {}
nodeSelector: {}
tolerations: []
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
# 指定basic auth相关注解
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret: ui-auth
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
path: /
hosts:
- skywalking.ssgeek.com
tls:
- secretName: skywalking
hosts:
- skywalking.ssgeek.com
...
elasticsearch:
enabled: true
config: # For users of an existing elasticsearch cluster,takes effect when `elasticsearch.enabled` is false
port:
http: 9200
# host: elasticsearch # es service on kubernetes or host
host: elasticsearch-logging.logging.svc
user: "elastic" # [optional]
password: "elastic" # [optional]
clusterName: "elasticsearch"
nodeGroup: "logging"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName "-" nodeGroup for your master group
masterService: "elasticsearch-logging"
...
image: "hub.ssgeek.com/skywalking/elasticsearch"
imageTag: "7.5.2"
imagePullPolicy: "IfNotPresent"
...
resources:
requests:
cpu: "100m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
...
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "ceph-rbd"
resources:
requests:
storage: 30Gi
...
persistence:
enabled: true
annotations: {}
...
imagePullSecrets:
- name: registry-pull-secret
4、helm安装skywalking
前面的准备工作都做完后,就可以开始通过helm
一键部署skywalking
了
# helm install skywalking skywalking -n monitoring --values ./skywalking/values.yaml
NAME: skywalking
LAST DEPLOYED: Thu Mar 18 18:45:03 2021
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
************************************************************************
* *
* SkyWalking Helm Chart by SkyWalking Team *
* *
************************************************************************
Thank you for installing skywalking.
Your release is named skywalking.
Learn more, please visit https://skywalking.apache.org/
Get the UI URL by running these commands:
https://skywalking.ssgeek.com/
5、检查
观察pod
日志,直到出现create instance_jvm_thread_peak_count index template finished
2021-03-18 10:48:32,242 - org.apache.skywalking.oap.server.core.storage.model.ModelInstaller -139765 [main] INFO [] - table: instance_jvm_thread_peak_count does not exist
2021-03-18 10:48:32,243 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -139766 [main] INFO [] - index skywalking_instance_jvm_thread_peak_count's columnTypeEsMapping builder str: {properties={service_id={type=keyword}, count={index=false, type=long}, time_bucket={type=long}, entity_id={type=keyword}, value={type=long}, summation={index=false, type=long}}}
2021-03-18 10:48:32,614 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140137 [main] INFO [] - create instance_jvm_thread_peak_count index template finished, isAcknowledged: true
2021-03-18 10:48:33,319 - org.apache.skywalking.oap.server.storage.plugin.elasticsearch.base.StorageEsInstaller -140842 [main] INFO [] - create instance_jvm_thread_peak_count-20210318 index finished, isAcknowledged: true
......
2021-03-18 10:48:33,583 - org.eclipse.jetty.server.handler.ContextHandler -141106 [main] INFO [] - Started o.e.j.s.ServletContextHandler@12e4822b{/,null,AVAILABLE}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.AbstractConnector -141120 [main] INFO [] - Started ServerConnector@5cc9d3d0{HTTP/1.1, (http/1.1)}{0.0.0.0:12800}
2021-03-18 10:48:33,597 - org.eclipse.jetty.server.Server -141120 [main] INFO [] - Started @141185ms
2021-03-18 10:48:33,599 - org.apache.skywalking.oap.server.core.storage.PersistenceTimer -141122 [main] INFO [] - persistence timer start
2021-03-18 10:48:33,603 - org.apache.skywalking.oap.server.core.cache.CacheUpdateTimer -141126 [main] INFO [] - Cache updateServiceInventory timer start
2021-03-18 10:48:41,499 - org.apache.skywalking.oap.server.starter.OAPServerBootstrap -149022 [main] INFO [] - OAP starts up in init mode successfully, exit now...
查看pod状态
代码语言:javascript复制# kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
elasticsearch-logging-0 1/1 Running 0 5m54s
elasticsearch-logging-1 1/1 Running 0 5m53s
elasticsearch-logging-2 1/1 Running 0 5m53s
skywalking-es-init-t7ndj 0/1 Completed 0 5m54s
skywalking-oap-57d7f454f5-8gbh5 1/1 Running 2 5m54s
skywalking-oap-57d7f454f5-vqh2d 1/1 Running 2 5m54s
skywalking-ui-698cdb4dbc-xxktt 1/1 Running 0 5m54s
访问web ui
,通过界面访问并输入basic auth
设置的用户名和密码后,成功访问到skywalking
的主界面
到这里,基于k8s
helm
在内网环境下部署的skywalking
服务端就结束了,如果是完全没有内网的环境,可以把前面修改完成后的chart
包打包上传到私有helm
仓库例如harbor
中,这样chart
image
都是内网,部署时就完全不需要外网了
后面会继续实践后并分享采集端的接入以及具体使用,欢迎催更~ ☺
参考文档
- [1] https://kubernetes.github.io/ingress-nginx/examples/auth/basic/
- [2] https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html https://artifacthub.io/packages/helm/bitnami/elasticsearch
- https://github.com/apache/skywalking
- https://github.com/apache/skywalking-kubernetes