首页 » 翻译 » Kubernetes » 正文

编写你自己的Kubernetes Operator

Kubernetes1.7 以后,我们可以通过定义自己的CRDS custom Resource Definitions来让我们可以更加灵活的使用Kubernetes这个强大的工具

CRDs are only a mean to specify a configuration, though. The cluster still needs controllers to monitor its state and reconcile the resource to match with the configuration. This is where Operators come into play.

CRDS 其实就是一些对象的定义,我们最终还是需要controllers 来监控和调度资源来打到预期值,这就是为什么我们需要Operators

Operators are controllers working in association with custom resources to perform tasks that well… “human operators” have to take care of. Think about deploying a database cluster with a manageable number of nodes, managing upgrades and even performing backups. A custom resource would specify the version of the database, the number of nodes to deploy and the frequency of the backups as well as their target storage, and the controller would implement all the business logic needed to perform these operations. This is what the etcd-operator does, for example.


Yet, writing a CRD schema and its accompanying controller can be a daunting task. Thankfully, the Operator SDK is here to help. Let’s see how it can be used to build a simple operator that scales up and down pods running a particular Docker image. In other words, let’s code a basic Kubernetes ReplicaSet!

但是,编写CRD已经对应的负责逻辑处理controllers是一个艰巨的任务。不过,我们有一个工具能让这个工作变得简单一些:Operator SDK  ,下面我们就用个例子来实现Kubernetes的replicaSet 功能,不过我们是通过我们自己定义的CRD+controller

Please note that at the time of writing this article, the Operator SDK was in version v0.4.0, so things may have changed afterwards. Also, in this article, we’ll build a Go Operator, but keep in mind that operators can also be developed as Ansible playbooks or Helm charts.

请注意当前的Operator SDK 版本为v0.4.0 ,后续版本可能会有所不同

Primary resources and secondary resources主要资源和次要资源

Before diving into the Operator SDK and the source code, let’s take a step back and discuss about operators a little more. As we’ve seen before, operators are the CRDs associated with controllers which observe and act upon changes in the configuration or changes in the state of the cluster. But there are actually two kinds of resources that need to be monitored by the controller. The primary resources and the secondary resources.

在我们开始之前,让我们先进一步了解operators。正如我们前边介绍的,operator是 CRDS + conrollers 一个组合,CRDS负责定义资源,controllers负责监控并执行必须的操作让结果符合预期。但其实,我们有两种资源需要被controller监控,就是 primary resources and the secondary resources.

In the case of a ReplicaSet, the primary resource is the ReplicaSet itself: it specifies the Docker image to run and the number of pods in the set. The secondary resources are the pods themselves. Then, when a change occurs in the ReplicaSet spec itself (e.g., the version of the image was changed or the number of pods was updated) or when a change occurs in the pods (e.g.: a pod was deleted), then the controller is notified and acts in consequence to reconcile the state of the cluster by rolling out a new version, or just scaling up or down the pods.

在ReplicaSet的例子中, 主要资源是ReplicaSet本身,它指定了具体的Docker image版本和pod的数量,次要资源就是这些pod本身。然后当ReplicaSet 本身发生变化时(例如docker的镜像版本或者Pod的数量发生变化),或者pods发生了变化(pod被删掉), 这个时候,controllers就会得到通知并作出相应的调整,例如,按照新的docker image版本重新部署指定数量的pod, 或者增加一个pod来达到预期的数量

In the case of a DaemonSet, the primary resources is the DaemonSet itself, while the secondary resources are once again the pods, but also the nodes of the cluster. The difference here is that the controller also monitors the nodes of the cluster to add or remove pods as the cluster grows or shrinks.

我们的例子DaemonSet, 主要资源就是DaemonSet本身,但是,次要资源除了pod之外,还有集群的节点。这与ReplicaSet不同的地方在于我们的controller会监控这个cluster节点的数量,并随着节点的数量的增加而增加我们的pod的数量,减少而减少pod的数量

Building a PodSet operator 创建PodSet operator

Enough with the theory, it’s now time to dive into the code and build a basic ReplicaSet-ish operator called PodSet and which will take care of scaling up and down pods which run execute a sleep 3600 in a busybox container. Nothing fancy here, but the focus of this article is building an operator, not diving too deep in the specificities of setting up a cluster of this or that application.

我们已经讲完了理论部分,让我们真正的实践起来,编写一个我们自己的ReplicaSet operator,我们叫它PodSet。它的功能就是增加和减少Pod的数量,它的基础镜像是busybox。

Installing the Operator SDK 安装Operator SDK

The first thing to do is download and install the SDK:


Bootstrapping the Go project  初始化我们的代码

Next, let’s use the operator-sdk command to bootstrap the project:

我们用operator-sdk提供的命令来创建我们的Operator: podset-operator

Once this is done, we have the base layout for our project, which contains not only the go code to run the operator (cmd/manager/main.go), but also the Dockerfile to package the binary into a Docker image, and a set of YAML manifests to 1) deploy the Docker image and 2) create the Service Account to run the controller along with a role and role bindings to allow the operations (adding or removing pods, etc.).

我们执行完这个命令以后,我们就有一个基本的框架了,包含了必须的go 代码以及Dockerfile ,以及一些镜像还有一些manifests ,这些manifest用来部署Docker 镜像,已经创建 执行我们controllers 的Service Account 以及相应的role bindings

Adding a CRD and a controller 增加CRD以及controller

Now, let’s create a CRD for the PodSet operator, as well as its associated controller. Since this will be the first version of the operator, it is a good practice to set the version to v1alpha1:

现在我们来通过命令来增加我们PodSet 的CRD 和 与之配套的controllers,因为是版本1,我们就把版本设置为v1a1pha1

上边的两个命令又产生了一下go 代码和一系列YAML文件。最主要的改变如下:

Another set of YAML files and Go code were generated by these 2 commands, and the most noticeable changes are the following ones:

  1. the new deploy/crds folder contains the Custom Resource Definition of the PodSet, along with an example of Custom Resource:  在文件夹deploy/crds下包含了Custom Resource Definition 的具体代码(Podset),例子如下:

The full name of the CRD is podsets.app.example.com but there are also various names associated with it (PodSetPodSetListpodsets and podset). They will be part of the extended cluster API and available in the kubectl command line, as we’ll see later when deploying and running the operator.

我们CRD的完整名字: podsets.app.example.com 但是还是有一些自动生成的名字与之关联(PodSet,PodSetList,potsets, podset),我们将会在后边用到这些

2. The pkg/apis/app/v1alpha1/podset_types.go defines the structure of the PodSetSpec which is the expected state of the PodSet and which is specified by the user in the aforementioned deploy/crds/app_v1alpha1_podset_cr.yamlfile. It also defines the structure of the PodSetStatus which will be used to provide the observed state of the PodSet when the kubectl describecommand is executed. More on this later.

在pkg/apis/app/v1alpha1/podset_types.go这个文件中定义了PodSetSpec的架构,主要是指定了PodSet的期望状态,这部分是通过deploy/crds/app_v1alpha1_podset_cr.yaml这个文件用户自己设定的。文件中同时定义了PodSetStatus的架构,主要是为了提供当我们执行kubectl describe的时候显示给我们的内容



3. The scaffold file pkg/controller/podset/podset_controller.go is where we will put the business logic of the controller.


The rest of the changes are the necessary plumbing to register the CRD and the controller.


Implementing the business logic of the controller


Out of the box, the controller code generated by the SDK creates a single pod if none with a given app label already exists. Although this is not exactly what we want here, it is a nonetheless a good starting point since it shows how to use the k8s API, whether to list the pods or to create new ones.

默认情况下,SDK给我们创建的controller创建了一个Pod,如果事先没有其它的pod存在的情况下(pod的标签为 app)

虽然这并不严格符合我们的需求,但是,这也是一个很好的例子让我们来了解如何使用k8s API , 无论是List pod的列表或者创建一个新的Pod

The first thing we want to do is changing the PodSetSpec and PodSetStatusGo structs by adding fields to store the number of replicas in the former and the name of the pods in the latter:


Each time we make a change in these structures, we need to run the operator-sdk generate k8s command to update the pkg/apis/app/v1alpha1/zz_generated.deepcopy.go file accordingly.

每次我们修改如上两个结构体,我们需要执行命令:operator-sdk generate k8s 来更新pkg/apis/app/v1alpha1/zz_generated.deepcopy.go文件

Then, we need to configure the primary and secondary resources that the controller will monitor in the namespace. For our PodSet operator, the primary resource is the PodSet resource and the secondary resources are the pods in the namespace. By chance, we don’t have to do anything as this was already implemented in the generated code. Remember that by default, the controller operates on a PodSet resource and creates a pod.

接下来我们需要配置contoller需要监控主要和次要资源,对于我们这个例子来说,PodSet operator,主要资源就是PodSet 这个类型的资源,次要资源就是当前namespace下的pod。对于我们这个特例,我们不需要做任何的修改,因为默认的代码已经实现了这个功能

Lastly, we need to implement the logic of scaling up and down the pods and updating the custom resource status with the names of the pods. All of this happens in the Reconcile function of the controller.


During the reconcile, the controller fetches the PodSet resource in the current namespace and compares the value of its Replica field with the actual number of Pods that match a specific set of labels (here, app and version) to decide whether pods need to be created or deleted.


Instead of going into the details of the implementation of the controller’s Reconcile function (the code is available on GitHub), let’s focus on the key points to remember here:

代码详见: (the code is available on GitHub),我们这里只介绍一下关键的点:

  1. The reconcile function is invoked each time the PodSet resource is changed or a change happens in the pods belonging to the PodSet.
  2. If pods need to be added or removed, the Reconcile function should only add or remove one pod at a time, return, and wait for the next invocation (since it will be called after a pod was created or deleted).
  3. Make sure that the pods are “owned” by the PodSet primary resource using the controllerutil.SetControllerReference() function. Having this ownership in place means that when the PodSet resource is deleted, all its “child” pods are deleted as well.
  4. 首先,reconcile每次有新的PodSet的资源发生变化的时候都会触发,或者属于某个PodSet的Pod发生变化的时候
  5. 如果一个Pod需要被删除或者增加,Reconcile 必须每次只能删除或者增加一个,然后再等待下次执行(每次新的pod增加或者删除后都会再次调用)
  6. 确保pod是归属于PodSet的主要资源,我们可以通过controllerutil.SetControllerReference()方法来实现,这样的目的是当PodSet删除以后,所属的Pod也会被全部删除

Building and publishing the operator


Let’s use the Operator SDK to build the Docker image containing the controller, and let’s push it to an online registry. We’ll use Quay.io in this case, but other registries would work as well:

我们用Operator SDK来创建包含controller 代码的Docker 镜像,然后让我们push到Quay.io 这个registry

Also, we need to update the operator.yaml manifest to use the new Docker image available on Quay.io: