“ 本文来演示如何创建一个operator, 该operator会自动监管应用的pod数量。并且,把这个operator部署在Kubernete/OpenShift 集群上,让它真正运行起来。”
1, 安装operator-sdk Mac 直接用 `brew` 安装即可。其它平台可以参考https://github.com/operator-framework/operator-sdk/blob/master/doc/user/install-operator-sdk.md
代码语言:javascript复制$ brew install operator-sdk
$ operator-sdk version
$ operator-sdk version: "v0.12.0", commit: "2445fcda834ca4b7cf0d6c38fba6317fb219b469", go version: "go1.13.4 darwin/amd64"
注意⚠️: 这里最好使用go1.13
2,新建一个operator 项目, 比如 learn-operator
operator-sdk 目前支持Go, Ansible 和 Helm语言. 本示例,我选择使用默认的Golang语言来编写这个operator. 工程目录为 $GOPATH/src/github.com/exmaple-inc
mac:example-inc jianzhang$ operator-sdk new learn-operator
INFO[0000] Creating new Go operator 'learn-operator'.
INFO[0000] Created go.mod
INFO[0000] Created tools.go
INFO[0000] Created cmd/manager/main.go
INFO[0000] Created build/Dockerfile
INFO[0000] Created build/bin/entrypoint
INFO[0000] Created build/bin/user_setup
INFO[0000] Created deploy/service_account.yaml
INFO[0000] Created deploy/role.yaml
INFO[0000] Created deploy/role_binding.yaml
INFO[0000] Created deploy/operator.yaml
INFO[0000] Created pkg/apis/apis.go
INFO[0000] Created pkg/controller/controller.go
INFO[0000] Created version/version.go
INFO[0000] Created .gitignore
INFO[0000] Validating project
INFO[0022] Project validation successful.
INFO[0022] Project creation complete.
查看下目录结构,可以看到整个工程的框架已经被operator-sdk 创建好了。并且operator-sdk 已经帮我们创建好了与Kubernetes 或者 OpenShift 相关的代码,非常方便! 作为应用开发人员,我们并不需要深入了解底层云平台的API 接口。只需专注于自己的逻辑代码就可以了。
代码语言:javascript复制
mac:example-inc jianzhang$ tree learn-operator/
learn-operator/
├── build
│ ├── Dockerfile
│ └── bin
│ ├── entrypoint
│ └── user_setup
├── cmd
│ └── manager
│ └── main.go
├── deploy
│ ├── operator.yaml
│ ├── role.yaml
│ ├── role_binding.yaml
│ └── service_account.yaml
├── go.mod
├── go.sum
├── pkg
│ ├── apis
│ │ └── apis.go
│ └── controller
│ └── controller.go
├── tools.go
└── version
└── version.go
9 directories, 14 files
业务逻辑代码只需关心两个方面:
- 自定义的API
// pkg/apis/apis.go
package apis
import (
"k8s.io/apimachinery/pkg/runtime"
)
// AddToSchemes may be used to add all resources defined in the project to a Scheme
var AddToSchemes runtime.SchemeBuilder
// AddToScheme adds all Resources to the Scheme
func AddToScheme(s *runtime.Scheme) error {
return AddToSchemes.AddToScheme(s)
}
import (
"k8s.io/apimachinery/pkg/runtime"
)
// AddToSchemes may be used to add all resources defined in the project to a Scheme
var AddToSchemes runtime.SchemeBuilder
// AddToScheme adds all Resources to the Scheme
func AddToScheme(s *runtime.Scheme) error {
return AddToSchemes.AddToScheme(s)
}
- 以及它控制器:
// pkg/controller/controller.go
package controller
import (
"sigs.k8s.io/controller-runtime/pkg/manager"
)
// AddToManagerFuncs is a list of functions to add all Controllers to the Manager
var AddToManagerFuncs []func(manager.Manager) error
// AddToManager adds all Controllers to the Manager
func AddToManager(m manager.Manager) error {
for _, f := range AddToManagerFuncs {
if err := f(m); err != nil {
return err
}
}
return nil
}
3,开始编写逻辑代码 使用`add api` 创建新的API资源 使用 --kind 来指定新API的名称,这里命名为 `Learn`
代码语言:javascript复制mac:learn-operator jianzhang$ operator-sdk add api --api-version=app.learn.com/v1 --kind=Learn
INFO[0000] Generating api version app.learn.com/v1 for kind Learn.
INFO[0000] Created pkg/apis/app/group.go
INFO[0033] Created pkg/apis/app/v1/learn_types.go
INFO[0033] Created pkg/apis/addtoscheme_app_v1.go
INFO[0033] Created pkg/apis/app/v1/register.go
INFO[0033] Created pkg/apis/app/v1/doc.go
INFO[0033] Created deploy/crds/app.learn.com_v1_learn_cr.yaml
INFO[0037] Created deploy/crds/app.learn.com_learns_crd.yaml
INFO[0037] Running deepcopy code-generation for Custom Resource group versions: [app:[v1], ]
INFO[0045] Code-generation complete.
INFO[0045] Running OpenAPI code-generation for Custom Resource group versions: [app:[v1], ]
INFO[0054] Created deploy/crds/app.learn.com_learns_crd.yaml
INFO[0054] Code-generation complete.
INFO[0054] API generation complete.
可以看到,对应的CR(customer resource)已经被operator-sdk 创建。
代码语言:javascript复制deploy/crds/app.learn.com_v1_learn_cr.yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
name: example-learn
spec:
# Add fields here
size: 3
使用`add controller`创建对应的控制器
代码语言:javascript复制
mac:learn-operator jianzhang$ operator-sdk add controller --api-version=app.learn.com/v1 --kind=Learn
INFO[0000] Generating controller version app.learn.com/v1 for kind Learn.
INFO[0000] Created pkg/controller/learn/learn_controller.go
INFO[0000] Created pkg/controller/add_learn.go
INFO[0000] Controller generation complete.
- 添加代码
在资源类型文件中定义自己的资源结构。本示例的operator会监控Learn 资源,并根据Learn 资源中的size
域来更改对应的pod 数量。LearnStatus
结构会显示实时状态。
// pkg/apis/app/v1/learn_types.go
type LearnSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
// Size is the size of the learn deployment
Size int32 `json:"size"`
}
// LearnStatus defines the observed state of Learn
// k8s:openapi-gen=true
type LearnStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
// Add custom validation using kubebuilder tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
// PodNames are the names of the learn pods
PodNames []string `json:"podnames"`
}
详细代码, 参见:https://github.com/jianzhangbjz/learn-operator/blob/master/pkg/apis/app/v1/learn_types.go#L12-L28
- 逻辑控制代码:
// Ensure the deployment size is the same as the spec
size := learn.Spec.Size
if *found.Spec.Replicas != size {
found.Spec.Replicas = &size
err = r.client.Update(context.TODO(), found)
if err != nil {
reqLogger.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
return reconcile.Result{}, err
}
// Spec updated - return and requeue
return reconcile.Result{Requeue: true}, nil
}
// Update the Learn status with the pod names
// List the pods for this learn's deployment
podList := &corev1.PodList{}
listOpts := []client.ListOption{
client.InNamespace(learn.Namespace),
client.MatchingLabels(labelsForLearn(learn.Name)),
}
if err = r.client.List(context.TODO(), podList, listOpts...); err != nil {
reqLogger.Error(err, "Failed to list pods", "Learn.Namespace", learn.Namespace, "Learn.Name", learn.Name)
return reconcile.Result{}, err
}
podNames := getPodNames(podList.Items)
// Update status.PodNames if needed
if !reflect.DeepEqual(podNames, learn.Status.PodNames) {
learn.Status.PodNames = podNames
err := r.client.Status().Update(context.TODO(), learn)
if err != nil {
reqLogger.Error(err, "Failed to update Learn status")
return reconcile.Result{}, err
}
}
详细代码, 参见:https://github.com/jianzhangbjz/learn-operator/blob/master/pkg/apis/app/v1/learn_types.go#L12-L28
4,构建对应的operator image
现在,代码已经写好了。我们要让它运行起来。在云平台中,组件是容器化运行,那首先我们需要创建一个image. 使用build
参数可以快速把代码打包到一个image. 当然你可以修改Dockerfile
来定制特别的需求,这里选择默认配置。构建过程如下:
mac:learn-operator jianzhang$ operator-sdk build quay.io/jiazha/learn-operator
INFO[0001] Building OCI image quay.io/jiazha/learn-operator
Sending build context to Docker daemon 40.14MB
Step 1/7 : FROM registry.access.redhat.com/ubi8/ubi-minimal:latest
latest: Pulling from ubi8/ubi-minimal
645c2831c08a: Pull complete
5e98065763a5: Pull complete
Digest: sha256:32fb8bae553bfba2891f535fa9238f79aafefb7eff603789ba8920f505654607
Status: Downloaded newer image for registry.access.redhat.com/ubi8/ubi-minimal:latest
---> 469119976c56
Step 2/7 : ENV OPERATOR=/usr/local/bin/learn-operator USER_UID=1001 USER_NAME=learn-operator
---> Running in 0238e3a3b78a
Removing intermediate container 0238e3a3b78a
---> a5a49d29df84
Step 3/7 : COPY build/_output/bin/learn-operator ${OPERATOR}
---> b9f310c13223
Step 4/7 : COPY build/bin /usr/local/bin
---> 085a9494584e
Step 5/7 : RUN /usr/local/bin/user_setup
---> Running in 564f938ba278
mkdir -p /root
chown 1001:0 /root
chmod ug rwx /root
chmod g rw /etc/passwd
rm /usr/local/bin/user_setup
Removing intermediate container 564f938ba278
---> 2ddceb6ddd43
Step 6/7 : ENTRYPOINT ["/usr/local/bin/entrypoint"]
---> Running in 50e82b9c4b58
Removing intermediate container 50e82b9c4b58
---> 01889797cc39
Step 7/7 : USER ${USER_UID}
---> Running in 9d9917ada91b
Removing intermediate container 9d9917ada91b
---> d34a0831ba52
Successfully built d34a0831ba52
Successfully tagged quay.io/jiazha/learn-operator:latest
INFO[0038] Operator build complete.
- 把该镜像推送到一个image 仓库。这里选择Quay.
注意,这里选择的是一个公开的image 仓库。如果使用私有的,需要另外配置你的仓库的token 到这个云平台中。
代码语言:javascript复制mac:learn-operator jianzhang$ docker push quay.io/jiazha/learn-operator
The push refers to repository [quay.io/jiazha/learn-operator]
89ed084dc713: Pushed
6c1790c8ff98: Pushed
198c24bacf4a: Pushed
a066f3d73913: Pushed
26b543be03e2: Pushed
latest: digest: sha256:1bc419f412b5fe6efeb310783095d94523d6e059c6e974ca444a287bab80dd0d size: 8377
5,部署operator
我们使用YAML文件来部署这个operator到云平台,当然你也可以使用Helm. Operator-SDK 已经自动生成了所有相关的部署文件,我们只需在部署文件中配置上面这个image 即可.
代码语言:javascript复制$ sed -i "" 's|REPLACE_IMAGE|quay.io/jiazha/learn-operator|g' deploy/operator.yaml
可以看到,在部署之前,当前集群中并无 kind
资源:
mac:learn-operator jianzhang$ oc get learn
error: the server doesn't have a resource type "learn"
- 开始部署:
mac:learn-operator jianzhang$ oc create -f deploy/role.yaml
role.rbac.authorization.k8s.io/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/role_binding.yaml
rolebinding.rbac.authorization.k8s.io/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/operator.yaml
deployment.apps/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/crds/app.learn.com_learns_crd.yaml
customresourcedefinition.apiextensions.k8s.io/learns.app.learn.com created
可以看到该operator已经运行起来了,并且该集群中已经有了learn
资源了!
mac:learn-operator jianzhang$ oc get pods
NAME READY STATUS RESTARTS AGE
learn-operator-768d88c6d6-8g9lz 1/1 Running 0 10m
mac:learn-operator jianzhang$ oc get learn
No resources found.
可轻松定制自己的API 资源,这就是Kubernetes的魅力所在!关于如何快速搭建自己的Kubernetes 或者 OpenShift 会在之后介绍。
好了,那我们就开始使用这个learn
资源吧!
使用定制的资源!我们指定该资源的大小为2
看看会发生什么。
eploy/crds/app.learn.com_v1_learn_cr.yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
name: example-learn
spec:
# Add fields here
size: 2
mac:learn-operator jianzhang$ oc create -f deploy/crds/app.learn.com_v1_learn_cr.yaml
learn.app.learn.com/example-learn created
mac:learn-operator jianzhang$ oc get learn
NAME AGE
example-learn 2m12s
查看下这个example-learn
对象,可以看到它的status
显示了两个pod 的名称。我们再看下pod, 可以看到就是新生成的这两个pod!
mac:learn-operator jianzhang$ oc get learn example-learn -o yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
creationTimestamp: "2019-11-09T14:20:07Z"
generation: 1
name: example-learn
namespace: learn
resourceVersion: "3098847"
selfLink: /apis/app.learn.com/v1/namespaces/learn/learns/example-learn
uid: ce6c8b2b-b5f1-4fba-8ded-649849920186
spec:
size: 2
status:
podnames:
- example-learn-6764b9858-l9xpj
- example-learn-6764b9858-tzdnv
mac:learn-operator jianzhang$ oc get pods
NAME READY STATUS RESTARTS AGE
example-learn-6764b9858-l9xpj 1/1 Running 0 2m42s
example-learn-6764b9858-tzdnv 1/1 Running 0 2m42s
learn-operator-768d88c6d6-cfl6n 1/1 Running 0 3m37s
那把size
改为 3
试试?可以看到pod数量增长到了3个!
mac:learn-operator jianzhang$ oc edit learn example-learn
learn.app.learn.com/example-learn edited
mac:learn-operator jianzhang$ oc get pods
NAME READY STATUS RESTARTS AGE
example-learn-6764b9858-l9xpj 1/1 Running 0 12m
example-learn-6764b9858-pbpzd 0/1 ContainerCreating 0 9s
example-learn-6764b9858-tzdnv 1/1 Running 0 12m
learn-operator-768d88c6d6-cfl6n 1/1 Running 0 13m
mac:learn-operator jianzhang$ oc get learn example-learn -o yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
creationTimestamp: "2019-11-09T14:20:07Z"
generation: 2
name: example-learn
namespace: learn
resourceVersion: "3113493"
selfLink: /apis/app.learn.com/v1/namespaces/learn/learns/example-learn
uid: ce6c8b2b-b5f1-4fba-8ded-649849920186
spec:
size: 3
status:
podnames:
- example-learn-6764b9858-l9xpj
- example-learn-6764b9858-tzdnv
- example-learn-6764b9858-pbpzd
那如果我不修改这个example-learn
对象,直接删除一个pod呢?会发生什么?
至此,该operator已经在集群中正常运转了,关于这个operator的所有代码可以在这里找到:https://github.com/jianzhangbjz/learn-operator