目 录CONTENT

文章目录

Kubernetes控制器开发实战:从零构建Website Operator

Administrator
2025-09-11 / 0 评论 / 1 点赞 / 30 阅读 / 0 字 / 正在检测是否收录...
温馨提示:
部分素材来自网络,若不小心影响到您的利益,请联系我们删除。

Kubernetes控制器开发实战:从零构建Website Operator

在现代云原生应用开发中,通过自定义控制器实现应用自动化部署和管理已成为一种主流方式。本文将基于前文介绍的Website CRD,深入讲解如何从零开始构建一个完整的Website Operator,实现网站应用的自动化部署和管理。

一、Operator概述与实际应用场景

Operator是一种将运维知识编码到Kubernetes中的方法,它通过自定义控制器来管理复杂应用的生命周期。在实际应用中,许多企业需要统一管理大量网站应用,每个网站都包含Deployment、Service等标准组件,但配置参数略有不同。通过构建Website Operator,可以实现:

  1. 统一网站应用的部署标准
  2. 自动化创建和配置相关资源
  3. 集中监控和管理所有网站实例
  4. 实现一键扩缩容和版本升级

在企业内部PaaS平台、多租户网站托管服务等场景中,Website Operator能够显著提升运维效率和标准化水平。

二、开发环境准备

在开始开发Operator之前,我们需要准备相应的开发工具:

1. 安装Go语言环境

Operator通常使用Go语言开发,需要安装Go 1.19+版本:

# 检查Go版本
go version

# 如果未安装或版本过低,请先安装或升级Go

2. 安装Kubebuilder

Kubebuilder是构建Kubernetes API的SDK框架,提供了代码生成和项目初始化工具:

# 下载并安装kubebuilder
curl -L -o kubebuilder https://go.kubebuilder.io/dl/latest/$(go env GOOS)/$(go env GOARCH)
chmod +x kubebuilder
sudo mv kubebuilder /usr/local/bin/

# 验证安装
kubebuilder version

提供一个kubebuilder

3. 安装kubectl和Kubernetes集群

确保已安装kubectl并能够访问Kubernetes集群:

# 检查kubectl版本
kubectl version --short

# 确保可以访问集群
kubectl cluster-info

三、初始化Operator项目

使用Kubebuilder初始化Website Operator项目:

# 创建项目目录
mkdir website-operator
cd website-operator

# 初始化项目
kubebuilder init --domain example.com --repo example.com/website-operator

# 创建API和控制器
kubebuilder create api --group web --version v1 --kind Website --resource --controller

该命令会生成项目的基本结构,包括:

  • API类型定义
  • 控制器实现
  • CRD清单文件
  • RBAC权限配置

四、定义API类型

根据我们在CRD入门指南中定义的Website资源,我们需要完善API类型定义。

1. 编辑API类型文件

编辑api/v1/website_types.go文件:
或者直接使用示例文件website_types.go

/*
Copyright 2023 example.com.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

// WebsiteSpec defines the desired state of Website
type WebsiteSpec struct {
    // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
    // Important: Run "make" to regenerate code after modifying this file

    // Domain is the website domain
    Domain string `json:"domain"`

    // Image is the website container image
    Image string `json:"image"`

    // Replicas is the number of website replicas
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=10
    Replicas int32 `json:"replicas"`

    // Port is the website service port
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=65535
    Port int32 `json:"port"`
}

// WebsiteStatus defines the observed state of Website
type WebsiteStatus struct {
    // INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
    // Important: Run "make" to regenerate code after modifying this file

    // AvailableReplicas is the number of available replicas
    AvailableReplicas int32 `json:"availableReplicas,omitempty"`

    // Phase is the website current phase
    Phase string `json:"phase,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Domain",type=string,JSONPath=`.spec.domain`
//+kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
//+kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// Website is the Schema for the websites API
type Website struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   WebsiteSpec   `json:"spec,omitempty"`
    Status WebsiteStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// WebsiteList contains a list of Website
type WebsiteList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []Website `json:"items"`
}

func init() {
    SchemeBuilder.Register(&Website{}, &WebsiteList{})
}

2. 生成代码和清单

修改API类型后,需要生成相关代码和CRD清单:

# 生成代码和CRD清单
make generate
make manifests

五、实现控制器逻辑

控制器是Operator的核心组件,负责监听自定义资源的变化并执行相应的操作。

1. 编写控制器代码

编辑internal/controller/website_controller.go文件:
或者直接使用示例文件website_controller.go

/*
Copyright 2023 example.com.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package controller

import (
    "context"
    "fmt"

    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/util/intstr"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log"

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/types"

    webv1 "example.com/website-operator/api/v1"
)

// WebsiteReconciler reconciles a Website object
type WebsiteReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

//+kubebuilder:rbac:groups=web.example.com,resources=websites,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=web.example.com,resources=websites/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=web.example.com,resources=websites/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
func (r *WebsiteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)
    
    log.Info("=== START RECONCILING WEBSITE ===", "Request.Namespace", req.Namespace, "Request.Name", req.Name)

    // 获取Website实例
    website := &webv1.Website{}
    err := r.Get(ctx, req.NamespacedName, website)
    if err != nil {
        if errors.IsNotFound(err) {
            // 资源不存在,可能已被删除
            log.Info("Website resource not found. Ignoring since object must be deleted")
            return ctrl.Result{}, nil
        }
        // 其他错误重新入队
        log.Error(err, "Failed to get Website")
        return ctrl.Result{}, err
    }
    
    log.Info("Found Website resource", "Website.Name", website.Name, "Website.Namespace", website.Namespace)
    log.Info("Website Spec", "Spec", website.Spec)

    // 创建或更新Deployment
    deployment := &appsv1.Deployment{}
    deploymentName := types.NamespacedName{Name: website.Name, Namespace: website.Namespace}
    err = r.Get(ctx, deploymentName, deployment)
    if err != nil && errors.IsNotFound(err) {
        log.Info("Deployment not found, creating new Deployment", "Deployment.Name", website.Name, "Deployment.Namespace", website.Namespace)
        // Deployment不存在,创建新的Deployment
        deployment = r.deploymentForWebsite(website)
        log.Info("Creating Deployment", "Deployment", deployment)
        err = r.Create(ctx, deployment)
        if err != nil {
            log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
            return ctrl.Result{}, err
        }
        // Deployment创建成功
        log.Info("Deployment created successfully", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Deployment")
        return ctrl.Result{}, err
    } else {
        log.Info("Deployment already exists", "Deployment.Name", deployment.Name, "Deployment.Namespace", deployment.Namespace)
    }

    // 创建或更新Service
    service := &corev1.Service{}
    serviceName := types.NamespacedName{Name: website.Name, Namespace: website.Namespace}
    err = r.Get(ctx, serviceName, service)
    if err != nil && errors.IsNotFound(err) {
        log.Info("Service not found, creating new Service", "Service.Name", website.Name, "Service.Namespace", website.Namespace)
        // Service不存在,创建新的Service
        service = r.serviceForWebsite(website)
        log.Info("Creating Service", "Service", service)
        err = r.Create(ctx, service)
        if err != nil {
            log.Error(err, "Failed to create new Service", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
            return ctrl.Result{}, err
        }
        // Service创建成功
        log.Info("Service created successfully", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
        return ctrl.Result{Requeue: true}, nil
    } else if err != nil {
        log.Error(err, "Failed to get Service")
        return ctrl.Result{}, err
    } else {
        log.Info("Service already exists", "Service.Name", service.Name, "Service.Namespace", service.Namespace)
    }

    // 更新Website状态
    availableReplicas := deployment.Status.AvailableReplicas
    log.Info("Checking Website status", "CurrentAvailableReplicas", website.Status.AvailableReplicas, "DeploymentAvailableReplicas", availableReplicas)
    if website.Status.AvailableReplicas != availableReplicas {
        log.Info("Updating Website status", "OldAvailableReplicas", website.Status.AvailableReplicas, "NewAvailableReplicas", availableReplicas)
        website.Status.AvailableReplicas = availableReplicas
        website.Status.Phase = "Running"
        err = r.Status().Update(ctx, website)
        if err != nil {
            log.Error(err, "Failed to update Website status")
            return ctrl.Result{}, err
        }
        log.Info("Website status updated successfully")
    } else {
        log.Info("Website status is up to date")
    }

    log.Info("=== FINISH RECONCILING WEBSITE ===", "Website.Name", website.Name, "Website.Namespace", website.Namespace)
    return ctrl.Result{}, nil
}

// deploymentForWebsite returns a website Deployment object
func (r *WebsiteReconciler) deploymentForWebsite(w *webv1.Website) *appsv1.Deployment {
    ls := labelsForWebsite(w.Name)
    replicas := w.Spec.Replicas

    dep := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      w.Name,
            Namespace: w.Namespace,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: ls,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: ls,
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{{
                        Image: w.Spec.Image,
                        Name:  "website",
                        Ports: []corev1.ContainerPort{{
                            ContainerPort: w.Spec.Port,
                            Name:          "http",
                        }},
                    }},
                },
            },
        },
    }

    // 设置Website实例作为Deployment的OwnerReference
    ctrl.SetControllerReference(w, dep, r.Scheme)
    return dep
}

// serviceForWebsite returns a website Service object
func (r *WebsiteReconciler) serviceForWebsite(w *webv1.Website) *corev1.Service {
    ls := labelsForWebsite(w.Name)

    svc := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      w.Name,
            Namespace: w.Namespace,
        },
        Spec: corev1.ServiceSpec{
            Selector: ls,
            Ports: []corev1.ServicePort{{
                Port:       w.Spec.Port,
                TargetPort: intstr.FromString("http"),
                Protocol:   corev1.ProtocolTCP,
            }},
            Type: corev1.ServiceTypeClusterIP,
        },
    }

    // 设置Website实例作为Service的OwnerReference
    ctrl.SetControllerReference(w, svc, r.Scheme)
    return svc
}

// labelsForWebsite returns the labels for selecting the resources
// belonging to the given website name.
func labelsForWebsite(name string) map[string]string {
    return map[string]string{"app": name, "website": name}
}

// SetupWithManager sets up the controller with the Manager.
func (r *WebsiteReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&webv1.Website{}).
        Owns(&appsv1.Deployment{}).
        Owns(&corev1.Service{}).
        Complete(r)
}

2. 更新依赖

# 更新Go模块依赖
go mod tidy

六、构建和部署Operator

1. 清理未使用的导入

在构建之前,确保清理所有未使用的导入:

# 自动移除未使用的导入
goimports -w .

2. 构建Operator镜像

使用kubebuilder生成的Dockerfile时,其基础镜像可能需要特殊网络环境才能拉取,并且不包含相关调试工具,因此可以使用张师傅提供的Dockerfile文件:
Dockerfile。相关基础镜像张师傅已上传至网盘,如有需要可联络张师傅
另外,张师傅的Dockerfile设置了代理ENV GOPROXY=https://goproxy.cn,direct以加速依赖下载。

# 设置镜像仓库
export IMG=example.com/website-operator:v0.0.1
# 或者更改成自己的镜像仓库地址
export IMG=k8s-harbor:30002/library/website-operator:v0.0.1

# 构建镜像
make docker-build IMG=$IMG

# 推送镜像到镜像仓库
make docker-push IMG=$IMG

build.png

3. 部署Operator

# 部署CRD
make install

# 部署控制器
make deploy IMG=$IMG

# 卸载控制器
make undeploy

4. 验证部署

# 检查CRD是否创建成功
kubectl get crd websites.web.example.com

# 检查控制器是否运行正常
kubectl get deployment -n website-operator-system

# 检查控制器日志
kubectl logs -n website-operator-system -l control-plane=controller-manager -c manager

七、测试Operator功能

1. 创建Website资源

创建一个测试的Website资源文件config/samples/web_v1_website.yaml

apiVersion: web.example.com/v1
kind: Website
metadata:
  name: nginx-website
  namespace: default
spec:
  domain: "nginx.example.com"
  image: "nginx:1.20"
  replicas: 2
  port: 80

应用该资源:

kubectl apply -f config/samples/web_v1_website.yaml

2. 验证资源创建

# 检查Website资源
kubectl get websites
kubectl describe website nginx-website

# 检查Deployment是否创建
kubectl get deployments
kubectl get deployment nginx-website

# 检查Service是否创建
kubectl get services
kubectl get service nginx-website

# 检查Pod是否正常运行
kubectl get pods

3. 测试自动修复功能

尝试手动删除由Operator创建的Deployment或Service,观察Operator是否会自动重新创建:

# 删除Deployment
kubectl delete deployment nginx-website

# 等待几秒钟后检查Deployment是否被重新创建
kubectl get deployments

九、故障排除

1. 控制器未创建Deployment或Service

检查以下几点:

  1. 控制器是否正常运行:

    kubectl get pods -n website-operator-system
    
  2. 查看控制器日志:

    kubectl logs -n website-operator-system -l control-plane=controller-manager -c manager
    
  3. 检查RBAC权限:

    kubectl get clusterrole website-operator-manager-role -o yaml
    
  4. 验证Website资源是否正确创建:

    kubectl get websites
    kubectl describe website <website-name>
    

2. 权限不足

确保控制器具有创建Deployment和Service的权限,检查RBAC配置。

3. 镜像拉取失败

检查镜像名称是否正确,镜像是否存在于镜像仓库中。

十、总结

通过本文的学习,我们从零开始构建了一个完整的Website Operator,实现了以下功能:

  1. 定义了Website自定义资源
  2. 实现了控制器逻辑,自动创建Deployment和Service
  3. 添加了状态更新和监控功能
  4. 实现了优雅删除机制
  5. 部署和测试了Operator

Website Operator展示了如何通过Kubernetes控制器模式实现应用的自动化管理。通过这种方式,我们可以将复杂的运维操作封装成简单的声明式API,大大提升运维效率和标准化水平。

在实际生产环境中,Operator还需要考虑更多因素,如安全性、可观测性、升级策略等。希望本文能为读者构建自己的Operator提供有价值的参考。

参考文档

  1. Kubernetes官方文档 - Custom Resources
  2. Kubernetes官方文档 - Operators
  3. Kubebuilder官方文档
  4. Operator SDK
  5. Kubernetes控制器模式
1
  1. 支付宝打赏

    qrcode alipay
  2. 微信打赏

    qrcode weixin

评论区