Kubernetes-10 Go 应用完整实战：从代码到生产 K8s 部署

2026-03-01 约 4995 字预计阅读 10 分钟

1. 项目概述

1.1 项目：Go Todo API 服务

一个简单的 RESTful API 服务，包含：

HTTP API（Todo CRUD）
MySQL 数据库
Redis 缓存
Prometheus 监控指标
结构化日志
优雅关闭

1.2 技术栈

语言：    Go 1.22+
框架：    net/http（标准库）
数据库：  MySQL 8.0（GORM）
缓存：    Redis 7.x
监控：    Prometheus metrics
日志：    slog（Go 1.21+ 结构化日志）
容器：    Docker（多阶段构建）
编排：    Kubernetes 1.28+
包管理：  Helm 3.x
CI/CD：   GitHub Actions

1.3 架构图

Internet
   ↓ HTTPS
Ingress（nginx-ingress）
   ↓
todo-api Service（ClusterIP: 80）
   ↓
todo-api Pod × 3（HPA 管理）
   ↓ 读写
MySQL StatefulSet  |  Redis Deployment
（持久化存储）      |  （内存缓存）

2. Go 应用代码

2.1 项目结构

todo-api/
├── cmd/
│   └── server/
│       └── main.go          # 入口
├── internal/
│   ├── config/
│   │   └── config.go        # 配置加载
│   ├── handler/
│   │   ├── handler.go       # HTTP 处理器
│   │   └── middleware.go    # 中间件
│   ├── model/
│   │   └── todo.go          # 数据模型
│   ├── repository/
│   │   ├── mysql.go         # MySQL 实现
│   │   └── redis.go         # Redis 缓存
│   └── metrics/
│       └── metrics.go       # Prometheus 指标
├── Dockerfile
├── go.mod
└── go.sum

2.2 main.go

package main

import (
    "context"
    "log/slog"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"

    "github.com/myorg/todo-api/internal/config"
    "github.com/myorg/todo-api/internal/handler"
    "github.com/myorg/todo-api/internal/metrics"
    "github.com/myorg/todo-api/internal/repository"
)

func main() {
    // 初始化结构化日志
    logLevel := slog.LevelInfo
    if os.Getenv("LOG_LEVEL") == "debug" {
        logLevel = slog.LevelDebug
    }
    logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
        Level: logLevel,
    }))
    slog.SetDefault(logger)

    // 加载配置
    cfg, err := config.Load()
    if err != nil {
        slog.Error("Failed to load config", "error", err)
        os.Exit(1)
    }

    slog.Info("Starting todo-api",
        "version", os.Getenv("APP_VERSION"),
        "port", cfg.Server.Port,
    )

    // 初始化数据库
    db, err := repository.NewMySQL(cfg.Database)
    if err != nil {
        slog.Error("Failed to connect to MySQL", "error", err)
        os.Exit(1)
    }
    defer db.Close()

    // 初始化 Redis
    cache, err := repository.NewRedis(cfg.Redis)
    if err != nil {
        slog.Error("Failed to connect to Redis", "error", err)
        os.Exit(1)
    }
    defer cache.Close()

    // 初始化 Prometheus 指标
    reg := metrics.NewRegistry()

    // 设置路由
    mux := handler.NewRouter(db, cache, reg, cfg)

    // HTTP 服务器
    server := &http.Server{
        Addr:         ":" + cfg.Server.Port,
        Handler:      mux,
        ReadTimeout:  cfg.Server.ReadTimeout,
        WriteTimeout: cfg.Server.WriteTimeout,
        IdleTimeout:  cfg.Server.IdleTimeout,
    }

    // 启动服务器
    go func() {
        slog.Info("Server listening", "addr", server.Addr)
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            slog.Error("Server error", "error", err)
            os.Exit(1)
        }
    }()

    // 优雅关闭
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    sig := <-quit

    slog.Info("Shutting down server", "signal", sig.String())

    ctx, cancel := context.WithTimeout(context.Background(), cfg.Server.ShutdownTimeout)
    defer cancel()

    if err := server.Shutdown(ctx); err != nil {
        slog.Error("Server shutdown error", "error", err)
    }

    slog.Info("Server stopped")
}

2.3 config.go

package config

import (
    "fmt"
    "os"
    "time"

    "gopkg.in/yaml.v3"
)

type Config struct {
    Server   ServerConfig   `yaml:"server"`
    Database DatabaseConfig `yaml:"database"`
    Redis    RedisConfig    `yaml:"redis"`
}

type ServerConfig struct {
    Port            string        `yaml:"port"`
    ReadTimeout     time.Duration `yaml:"readTimeout"`
    WriteTimeout    time.Duration `yaml:"writeTimeout"`
    IdleTimeout     time.Duration `yaml:"idleTimeout"`
    ShutdownTimeout time.Duration `yaml:"shutdownTimeout"`
}

type DatabaseConfig struct {
    DSN          string        `yaml:"dsn"`
    MaxOpenConns int           `yaml:"maxOpenConns"`
    MaxIdleConns int           `yaml:"maxIdleConns"`
    ConnMaxLife  time.Duration `yaml:"connMaxLife"`
}

type RedisConfig struct {
    Addr     string `yaml:"addr"`
    Password string `yaml:"password"`
    DB       int    `yaml:"db"`
}

func Load() (*Config, error) {
    // 默认配置
    cfg := &Config{
        Server: ServerConfig{
            Port:            "8080",
            ReadTimeout:     30 * time.Second,
            WriteTimeout:    30 * time.Second,
            IdleTimeout:     120 * time.Second,
            ShutdownTimeout: 25 * time.Second,
        },
        Database: DatabaseConfig{
            MaxOpenConns: 100,
            MaxIdleConns: 10,
            ConnMaxLife:  time.Hour,
        },
        Redis: RedisConfig{
            Addr: "redis:6379",
            DB:   0,
        },
    }

    // 从文件加载（ConfigMap 挂载）
    configPath := getEnv("CONFIG_PATH", "/etc/app/config.yaml")
    if data, err := os.ReadFile(configPath); err == nil {
        if err := yaml.Unmarshal(data, cfg); err != nil {
            return nil, fmt.Errorf("parse config: %w", err)
        }
    }

    // 环境变量覆盖（Secret 注入）
    if dsn := os.Getenv("DATABASE_DSN"); dsn != "" {
        cfg.Database.DSN = dsn
    }
    if cfg.Database.DSN == "" {
        return nil, fmt.Errorf("DATABASE_DSN is required")
    }

    if redisPass := os.Getenv("REDIS_PASSWORD"); redisPass != "" {
        cfg.Redis.Password = redisPass
    }
    if port := getEnv("SERVER_PORT", ""); port != "" {
        cfg.Server.Port = port
    }

    return cfg, nil
}

func getEnv(key, defaultVal string) string {
    if v := os.Getenv(key); v != "" {
        return v
    }
    return defaultVal
}

2.4 handler.go

package handler

import (
    "encoding/json"
    "log/slog"
    "net/http"
    "strconv"
    "time"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

type Router struct {
    mux *http.ServeMux
}

func NewRouter(db TodoRepository, cache CacheRepository, reg MetricsRegistry, cfg *Config) http.Handler {
    mux := http.NewServeMux()
    h := &TodoHandler{db: db, cache: cache}

    // 健康检查
    mux.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
    })

    // 就绪检查（检查数据库连接）
    mux.HandleFunc("GET /ready", func(w http.ResponseWriter, r *http.Request) {
        if err := db.Ping(r.Context()); err != nil {
            slog.Error("Database not ready", "error", err)
            http.Error(w, "not ready", http.StatusServiceUnavailable)
            return
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
    })

    // Prometheus 指标
    mux.Handle("GET /metrics", promhttp.HandlerFor(reg.Registry(), promhttp.HandlerOpts{}))

    // Todo API
    mux.HandleFunc("GET /todos", h.ListTodos)
    mux.HandleFunc("POST /todos", h.CreateTodo)
    mux.HandleFunc("GET /todos/{id}", h.GetTodo)
    mux.HandleFunc("PUT /todos/{id}", h.UpdateTodo)
    mux.HandleFunc("DELETE /todos/{id}", h.DeleteTodo)

    // 中间件链
    return Chain(mux,
        RequestIDMiddleware,
        LoggingMiddleware(reg),
        RecoveryMiddleware,
    )
}

2.5 metrics.go

package metrics

import (
    "net/http"
    "strconv"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

type Registry struct {
    reg *prometheus.Registry

    httpRequestsTotal    *prometheus.CounterVec
    httpRequestDuration  *prometheus.HistogramVec
    dbQueryDuration      *prometheus.HistogramVec
    cacheHits            *prometheus.CounterVec
    activeConnections    prometheus.Gauge
}

func NewRegistry() *Registry {
    reg := prometheus.NewRegistry()
    reg.MustRegister(prometheus.NewGoCollector())
    reg.MustRegister(prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}))

    r := &Registry{reg: reg}

    r.httpRequestsTotal = promauto.With(reg).NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    r.httpRequestDuration = promauto.With(reg).NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
        []string{"method", "path"},
    )

    r.cacheHits = promauto.With(reg).NewCounterVec(
        prometheus.CounterOpts{
            Name: "cache_operations_total",
            Help: "Total number of cache operations",
        },
        []string{"operation", "result"},  // hit/miss
    )

    return r
}

// ObserveHTTP 记录 HTTP 请求指标
func (r *Registry) ObserveHTTP(method, path string, status int, duration time.Duration) {
    r.httpRequestsTotal.WithLabelValues(method, path, strconv.Itoa(status)).Inc()
    r.httpRequestDuration.WithLabelValues(method, path).Observe(duration.Seconds())
}

3. Dockerfile 多阶段构建

# ===== 阶段1：构建 =====
FROM golang:1.22-alpine AS builder

# 安装必要工具
RUN apk add --no-cache git ca-certificates tzdata

WORKDIR /build

# 优先复制 go.mod，利用 Docker 层缓存
COPY go.mod go.sum ./
RUN go mod download

# 复制源码
COPY . .

# 构建（禁用 CGO，静态二进制）
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build \
    -ldflags="-s -w -X main.Version=${VERSION} -X main.BuildTime=$(date -u +%Y%m%d%H%M%S)" \
    -trimpath \
    -o /build/todo-api \
    ./cmd/server

# 验证二进制
RUN /build/todo-api --version || true

# ===== 阶段2：运行 =====
FROM gcr.io/distroless/static-debian12:nonroot

# 从构建阶段复制二进制和必要文件
COPY --from=builder /build/todo-api /todo-api
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# 不需要 root
USER nonroot:nonroot

EXPOSE 8080 9090

ENTRYPOINT ["/todo-api"]

# 构建镜像
docker build \
  --build-arg VERSION=$(git describe --tags --always) \
  -t myregistry.io/todo-api:$(git rev-parse --short HEAD) \
  .

# 验证镜像大小（应该很小）
docker images myregistry.io/todo-api

# 本地测试
docker run -p 8080:8080 \
  -e DATABASE_DSN="user:pass@tcp(localhost:3306)/tododb" \
  myregistry.io/todo-api:latest

4. K8s 资源清单

4.1 命名空间

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: todo-app
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

4.2 密钥配置

# k8s/secrets.yaml（实际生产用 External Secrets Operator）
apiVersion: v1
kind: Secret
metadata:
  name: todo-api-secret
  namespace: todo-app
type: Opaque
stringData:
  DATABASE_DSN: "todo_user:strongpassword@tcp(mysql-service:3306)/tododb?parseTime=true&loc=Local"
  REDIS_PASSWORD: "redis-strong-password"
  MYSQL_ROOT_PASSWORD: "root-strong-password"
  MYSQL_PASSWORD: "strongpassword"

4.3 ConfigMap

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: todo-api-config
  namespace: todo-app
data:
  config.yaml: |
    server:
      port: "8080"
      readTimeout: 30s
      writeTimeout: 30s
      idleTimeout: 120s
      shutdownTimeout: 25s
    database:
      maxOpenConns: 50
      maxIdleConns: 10
      connMaxLife: 1h
    redis:
      addr: redis-service:6379
      db: 0

4.4 MySQL StatefulSet

# k8s/mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: todo-app
spec:
  serviceName: mysql-headless
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      securityContext:
        runAsUser: 999
        fsGroup: 999
      containers:
      - name: mysql
        image: mysql:8.0
        args:
        - --default-authentication-plugin=mysql_native_password
        - --character-set-server=utf8mb4
        - --collation-server=utf8mb4_unicode_ci
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: MYSQL_ROOT_PASSWORD
        - name: MYSQL_DATABASE
          value: tododb
        - name: MYSQL_USER
          value: todo_user
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: MYSQL_PASSWORD
        ports:
        - containerPort: 3306
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping", "-h", "localhost"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-u", "root", "-p$(MYSQL_ROOT_PASSWORD)", "-e", "SELECT 1"]
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql

  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
  namespace: todo-app
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: todo-app
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306

4.5 Redis Deployment

# k8s/redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: todo-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
      containers:
      - name: redis
        image: redis:7.2-alpine
        command: ["redis-server"]
        args:
        - "--requirepass"
        - "$(REDIS_PASSWORD)"
        - "--maxmemory"
        - "512mb"
        - "--maxmemory-policy"
        - "allkeys-lru"
        env:
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: REDIS_PASSWORD
        ports:
        - containerPort: 6379
        resources:
          requests:
            cpu: "100m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "768Mi"
        livenessProbe:
          exec:
            command: ["redis-cli", "ping"]
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: todo-app
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379

4.6 Todo API Deployment

# k8s/todo-api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: todo-api
  namespace: todo-app
  annotations:
    configmap.reloader.stakater.com/reload: "todo-api-config"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: todo-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 10
  revisionHistoryLimit: 5

  template:
    metadata:
      labels:
        app: todo-api
        version: "1.0.0"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: todo-api-sa
      terminationGracePeriodSeconds: 30

      # 反亲和性：Pod 分散到不同 Node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: todo-api
              topologyKey: kubernetes.io/hostname

      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        runAsGroup: 65534
        fsGroup: 65534
        seccompProfile:
          type: RuntimeDefault

      initContainers:
      # 等待 MySQL 就绪
      - name: wait-for-mysql
        image: busybox:1.36
        command:
        - sh
        - -c
        - |
          until nc -z mysql-service 3306; do
            echo "Waiting for MySQL..."
            sleep 2
          done
          echo "MySQL is ready!"          
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
          allowPrivilegeEscalation: false
          capabilities:
            drop: [ALL]

      containers:
      - name: todo-api
        image: myregistry.io/todo-api:v1.0.0
        imagePullPolicy: Always
        ports:
        - name: http
          containerPort: 8080
        - name: metrics
          containerPort: 9090

        envFrom:
        - secretRef:
            name: todo-api-secret

        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP_VERSION
          value: "1.0.0"
        - name: CONFIG_PATH
          value: /etc/app/config.yaml

        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

        livenessProbe:
          httpGet:
            path: /healthz
            port: http
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3

        startupProbe:
          httpGet:
            path: /healthz
            port: http
          failureThreshold: 30
          periodSeconds: 10

        securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop: [ALL]

        volumeMounts:
        - name: app-config
          mountPath: /etc/app
          readOnly: true
        - name: tmp
          mountPath: /tmp

      volumes:
      - name: app-config
        configMap:
          name: todo-api-config
      - name: tmp
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: todo-api-service
  namespace: todo-app
spec:
  selector:
    app: todo-api
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: metrics
    port: 9090
    targetPort: 9090
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: todo-api-hpa
  namespace: todo-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: todo-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: todo-api-ingress
  namespace: todo-app
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/limit-rps: "100"
spec:
  tls:
  - hosts:
    - api.todo.example.com
    secretName: todo-api-tls
  rules:
  - host: api.todo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: todo-api-service
            port:
              number: 80
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: todo-api-sa
  namespace: todo-app
automountServiceAccountToken: false

5. Helm Chart 打包

# 项目结构
charts/todo-api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
└── templates/
    ├── _helpers.tpl
    ├── namespace.yaml
    ├── serviceaccount.yaml
    ├── configmap.yaml
    ├── secret.yaml       # 只在测试环境用，生产用 External Secrets
    ├── deployment.yaml
    ├── service.yaml
    ├── ingress.yaml
    ├── hpa.yaml
    ├── networkpolicy.yaml
    ├── rbac.yaml
    └── NOTES.txt

# 安装到 staging
helm upgrade --install todo-api ./charts/todo-api \
  -f charts/todo-api/values.yaml \
  -f charts/todo-api/values-staging.yaml \
  --namespace todo-staging \
  --create-namespace \
  --set image.tag=$IMAGE_TAG \
  --wait \
  --timeout 5m

# 安装到 production
helm upgrade --install todo-api ./charts/todo-api \
  -f charts/todo-api/values.yaml \
  -f charts/todo-api/values-production.yaml \
  --namespace todo-production \
  --set image.tag=$IMAGE_TAG \
  --wait

6. CI/CD 流水线

# .github/workflows/deploy.yaml
name: Build and Deploy

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}/todo-api

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: testpassword
          MYSQL_DATABASE: tododb_test
        ports: ["3306:3306"]
        options: --health-cmd="mysqladmin ping" --health-interval=10s

      redis:
        image: redis:7.2-alpine
        ports: ["6379:6379"]

    steps:
    - uses: actions/checkout@v4

    - uses: actions/setup-go@v5
      with:
        go-version: "1.22"
        cache: true

    - name: Run tests
      run: |
        go test ./... -v -race -coverprofile=coverage.out
        go tool cover -func=coverage.out        
      env:
        DATABASE_DSN: "root:testpassword@tcp(localhost:3306)/tododb_test?parseTime=true"
        REDIS_ADDR: "localhost:6379"

    - name: Run linter
      uses: golangci/golangci-lint-action@v4

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name == 'push'
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
    - uses: actions/checkout@v4

    - name: Log in to registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=sha,prefix=sha-
          type=ref,event=branch
          type=semver,pattern={{version}}          

    - name: Build and push
      id: build
      uses: docker/build-push-action@v5
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        build-args: VERSION=${{ github.sha }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

    - name: Scan for vulnerabilities
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
        format: sarif
        exit-code: "1"
        severity: CRITICAL,HIGH

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging

    steps:
    - uses: actions/checkout@v4

    - name: Deploy to staging
      run: |
        helm upgrade --install todo-api ./charts/todo-api \
          -f charts/todo-api/values.yaml \
          -f charts/todo-api/values-staging.yaml \
          --namespace todo-staging \
          --create-namespace \
          --set image.tag=sha-${{ github.sha }} \
          --wait --timeout 5m        
      env:
        KUBECONFIG: ${{ secrets.STAGING_KUBECONFIG }}

    - name: Run smoke tests
      run: |
        kubectl wait --for=condition=ready pod -l app=todo-api \
          -n todo-staging --timeout=120s
        curl -f https://api.staging.todo.example.com/healthz        

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production    # 需要手动审批

    steps:
    - uses: actions/checkout@v4

    - name: Deploy to production
      run: |
        helm upgrade --install todo-api ./charts/todo-api \
          -f charts/todo-api/values.yaml \
          -f charts/todo-api/values-production.yaml \
          --namespace todo-production \
          --set image.tag=sha-${{ github.sha }} \
          --wait --timeout 10m        
      env:
        KUBECONFIG: ${{ secrets.PRODUCTION_KUBECONFIG }}

    - name: Verify deployment
      run: |
        kubectl rollout status deployment/todo-api -n todo-production
        curl -f https://api.todo.example.com/healthz

7. 可观测性：监控与日志

7.1 Prometheus 监控配置

# k8s/servicemonitor.yaml（需要 Prometheus Operator）
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: todo-api
  namespace: todo-app
  labels:
    release: kube-prometheus-stack    # 与 Prometheus Operator 的 selector 匹配
spec:
  selector:
    matchLabels:
      app: todo-api
  endpoints:
  - port: metrics
    path: /metrics
    interval: 15s

7.2 Grafana Dashboard

关键指标：

请求 QPS：rate(http_requests_total[5m])
错误率：rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
P99 延迟：histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
Pod 数量：kube_deployment_status_replicas{deployment="todo-api"}
CPU 使用：rate(container_cpu_usage_seconds_total{pod=~"todo-api-.*"}[5m])

7.3 日志收集（Loki + Fluent Bit）

# DaemonSet: Fluent Bit（系列第5篇已介绍）
# 日志格式约定（Go 应用输出 JSON）

// 结构化日志，方便 Loki 查询
slog.Info("Request completed",
    "method", r.Method,
    "path", r.URL.Path,
    "status", statusCode,
    "duration_ms", duration.Milliseconds(),
    "pod", os.Getenv("POD_NAME"),
    "trace_id", traceID,
)

Loki 查询示例：

# 查询所有 5xx 错误
{namespace="todo-app", app="todo-api"} | json | status >= 500

# 查询慢请求（>1s）
{namespace="todo-app", app="todo-api"} | json | duration_ms > 1000

# 统计某路径的错误率
rate({namespace="todo-app"} | json | path="/todos" | status >= 400 [5m])

8. 生产运维技巧

8.1 常用排查命令

# ===== 快速诊断 =====

# 查看 Pod 是否正常
kubectl get pods -n todo-app -o wide

# Pod 卡在 Pending？
kubectl describe pod <pod-name> -n todo-app
# 重点看 Events，常见原因：
# - Insufficient cpu/memory → 资源不足，扩容 Node 或降低 requests
# - No nodes available → 节点 Taint 没有匹配的 Toleration
# - PVC not bound → PVC 找不到 PV

# Pod CrashLoopBackOff？
kubectl logs <pod-name> -n todo-app --previous
kubectl describe pod <pod-name> -n todo-app

# 进入容器调试
kubectl exec -it <pod-name> -n todo-app -- sh

# ===== 网络排查 =====

# 测试 Service 连通性（在 Pod 内）
kubectl run nettest --image=nicolaka/netshoot -it --rm --restart=Never \
  -n todo-app -- curl http://todo-api-service/healthz

# 检查 Endpoints
kubectl get endpoints todo-api-service -n todo-app
# 如果 Endpoints 为空，说明没有就绪的 Pod 匹配 selector

# DNS 解析
kubectl run nettest --image=busybox -it --rm --restart=Never -- nslookup todo-api-service.todo-app

# ===== 性能排查 =====
kubectl top pods -n todo-app
kubectl top nodes
kubectl describe hpa todo-api-hpa -n todo-app

# ===== 日志 =====
# 实时查看所有 todo-api Pod 的日志
kubectl logs -f -l app=todo-api -n todo-app --max-log-requests=5

# 查看某时间段的日志
kubectl logs <pod-name> -n todo-app --since=1h
kubectl logs <pod-name> -n todo-app --since-time="2026-03-16T10:00:00Z"

8.2 维护操作

# 临时禁用某个 Pod（调试）
# 修改 Pod 的 label，使 Service 不再路由流量到它
kubectl label pod <pod-name> app=todo-api-debug --overwrite -n todo-app

# 紧急回滚
kubectl rollout undo deployment/todo-api -n todo-app
kubectl rollout status deployment/todo-api -n todo-app

# 紧急扩容
kubectl scale deployment todo-api --replicas=10 -n todo-app

# 暂停 HPA（手动控制副本数时）
kubectl patch hpa todo-api-hpa -n todo-app -p '{"spec":{"minReplicas":5,"maxReplicas":5}}'

# 强制删除卡住的 Pod
kubectl delete pod <pod-name> -n todo-app --force --grace-period=0

# 查看资源使用排名
kubectl top pods -n todo-app --sort-by=cpu
kubectl top pods -n todo-app --sort-by=memory

8.3 PodDisruptionBudget（保证升级/维护时可用性）

# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: todo-api-pdb
  namespace: todo-app
spec:
  selector:
    matchLabels:
      app: todo-api
  minAvailable: 2       # 最少保持 2 个 Pod 可用
  # 或：
  # maxUnavailable: 1   # 最多 1 个 Pod 不可用

有了 PDB，kubectl drain node-1 时会等待其他 Pod 就绪后再驱逐，保证服务不中断。

9. 完整项目结构

todo-api/
├── cmd/
│   └── server/
│       └── main.go
├── internal/
│   ├── config/config.go
│   ├── handler/
│   │   ├── handler.go
│   │   ├── middleware.go
│   │   └── todo.go
│   ├── model/todo.go
│   ├── repository/
│   │   ├── interface.go
│   │   ├── mysql.go
│   │   └── redis.go
│   └── metrics/metrics.go
├── k8s/                          # K8s 资源清单
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secrets.yaml
│   ├── mysql.yaml
│   ├── redis.yaml
│   ├── todo-api.yaml
│   ├── rbac.yaml
│   ├── networkpolicy.yaml
│   ├── pdb.yaml
│   └── servicemonitor.yaml
├── charts/                       # Helm Charts
│   └── todo-api/
│       ├── Chart.yaml
│       ├── values.yaml
│       ├── values-staging.yaml
│       ├── values-production.yaml
│       └── templates/
├── .github/
│   └── workflows/
│       └── deploy.yaml           # CI/CD
├── Dockerfile
├── go.mod
├── go.sum
├── Makefile
└── README.md

Makefile：

VERSION ?= $(shell git describe --tags --always)
IMAGE    = myregistry.io/todo-api
TAG      = $(shell git rev-parse --short HEAD)

.PHONY: build push deploy-staging deploy-prod

build:
	docker build --build-arg VERSION=$(VERSION) -t $(IMAGE):$(TAG) .

push: build
	docker push $(IMAGE):$(TAG)

test:
	go test ./... -v -race

lint:
	golangci-lint run

deploy-staging: push
	helm upgrade --install todo-api ./charts/todo-api \
		-f charts/todo-api/values.yaml \
		-f charts/todo-api/values-staging.yaml \
		--namespace todo-staging --create-namespace \
		--set image.tag=$(TAG) --wait

deploy-prod: push
	helm upgrade --install todo-api ./charts/todo-api \
		-f charts/todo-api/values.yaml \
		-f charts/todo-api/values-production.yaml \
		--namespace todo-production \
		--set image.tag=$(TAG) --wait

rollback:
	helm rollback todo-api -n todo-production

logs:
	kubectl logs -f -l app=todo-api -n todo-production --max-log-requests=5

status:
	kubectl get pods,svc,ingress,hpa -n todo-production

10. 总结与进阶路线

10.1 系列总结

通过这 10 篇文章，我们系统学习了：

文章	核心知识
01-introduction	K8s 是什么、为什么用、本地环境搭建
02-architecture	Control Plane 组件、Node 组件、工作原理
03-core-concepts	Pod、Namespace、Label/Selector、资源限制
04-workloads	Deployment/StatefulSet/DaemonSet/Job/CronJob/HPA
05-networking	Service 四种类型、Ingress、DNS、NetworkPolicy
06-storage	Volume/PV/PVC/StorageClass 动态供给
07-config-secret	ConfigMap、Secret、配置热更新
08-rbac-security	RBAC、ServiceAccount、安全上下文
09-helm	Chart 开发、模板语法、多环境管理
10-go-app-deploy	Go 应用全栈部署实战

10.2 进阶路线

Level 1：运维进阶

Kustomize（轻量级多环境配置管理）
ArgoCD / Flux（GitOps 持续部署）
Prometheus + Grafana（完整监控体系）
EFK/PLG Stack（日志体系）
Velero（集群备份与恢复）

Level 2：平台工程

Istio / Linkerd（服务网格）
OPA / Kyverno（策略引擎）
Cluster API（K8s 管理 K8s）
多集群管理（Fleet、OCM）
FinOps（K8s 成本优化）

Level 3：K8s 开发

client-go（K8s Go 客户端）
controller-runtime（Controller 框架）
kubebuilder（Operator 开发框架）
Admission Webhook（准入控制器）
CRD（自定义资源）
K8s Operator 模式

Level 4：K8s 贡献

阅读 K8s 源码（k8s.io/kubernetes）
参与 SIG（Special Interest Group）
参与 CNCF 开源项目

10.3 推荐资源

官方文档：

书籍：

《Kubernetes in Action》（最好的 K8s 书）
《Cloud Native Go》（Go + K8s 开发）
《Programming Kubernetes》（K8s Operator 开发）

实践平台：

killercoda.com（免费在线 K8s 实验环境）
play.katacoda.com
killer.sh（CKA/CKAD 考试模拟）

认证：

CKA（Certified Kubernetes Administrator）
CKAD（Certified Kubernetes Application Developer）
CKS（Certified Kubernetes Security Specialist）

这个系列到此结束。K8s 是一个庞大的生态，但只要掌握了核心概念，后面的内容都是在核心基础上的扩展。

从 Go 开发者的角度，K8s 既是你的应用运行平台，也是你可以深度参与的开源生态（K8s 本身、大量工具链都是 Go 写的）。

希望这个系列对你有帮助！

目录