目录

Kubernetes-10 Go 应用完整实战:从代码到生产 K8s 部署

Go 应用完整实战:从代码到生产 K8s 部署

系列第十篇(终篇)。本篇将前九篇的所有知识融合为一个完整的实战项目:从零开始构建一个 Go 微服务,并将其部署到 K8s 生产环境。


目录

  1. 项目概述
  2. Go 应用代码
  3. Dockerfile 多阶段构建
  4. K8s 资源清单
  5. Helm Chart 打包
  6. CI/CD 流水线
  7. 可观测性:监控与日志
  8. 生产运维技巧
  9. 完整项目结构
  10. 总结与进阶路线

1. 项目概述

1.1 项目:Go Todo API 服务

一个简单的 RESTful API 服务,包含:

  • HTTP API(Todo CRUD)
  • MySQL 数据库
  • Redis 缓存
  • Prometheus 监控指标
  • 结构化日志
  • 优雅关闭

1.2 技术栈

语言:    Go 1.22+
框架:    net/http(标准库)
数据库:  MySQL 8.0(GORM)
缓存:    Redis 7.x
监控:    Prometheus metrics
日志:    slog(Go 1.21+ 结构化日志)
容器:    Docker(多阶段构建)
编排:    Kubernetes 1.28+
包管理:  Helm 3.x
CI/CD:   GitHub Actions

1.3 架构图

Internet
   ↓ HTTPS
Ingress(nginx-ingress)
   ↓
todo-api Service(ClusterIP: 80)
   ↓
todo-api Pod × 3(HPA 管理)
   ↓ 读写
MySQL StatefulSet  |  Redis Deployment
(持久化存储)      |  (内存缓存)

2. Go 应用代码

2.1 项目结构

todo-api/
├── cmd/
│   └── server/
│       └── main.go          # 入口
├── internal/
│   ├── config/
│   │   └── config.go        # 配置加载
│   ├── handler/
│   │   ├── handler.go       # HTTP 处理器
│   │   └── middleware.go    # 中间件
│   ├── model/
│   │   └── todo.go          # 数据模型
│   ├── repository/
│   │   ├── mysql.go         # MySQL 实现
│   │   └── redis.go         # Redis 缓存
│   └── metrics/
│       └── metrics.go       # Prometheus 指标
├── Dockerfile
├── go.mod
└── go.sum

2.2 main.go

package main

import (
    "context"
    "log/slog"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"

    "github.com/myorg/todo-api/internal/config"
    "github.com/myorg/todo-api/internal/handler"
    "github.com/myorg/todo-api/internal/metrics"
    "github.com/myorg/todo-api/internal/repository"
)

func main() {
    // 初始化结构化日志
    logLevel := slog.LevelInfo
    if os.Getenv("LOG_LEVEL") == "debug" {
        logLevel = slog.LevelDebug
    }
    logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
        Level: logLevel,
    }))
    slog.SetDefault(logger)

    // 加载配置
    cfg, err := config.Load()
    if err != nil {
        slog.Error("Failed to load config", "error", err)
        os.Exit(1)
    }

    slog.Info("Starting todo-api",
        "version", os.Getenv("APP_VERSION"),
        "port", cfg.Server.Port,
    )

    // 初始化数据库
    db, err := repository.NewMySQL(cfg.Database)
    if err != nil {
        slog.Error("Failed to connect to MySQL", "error", err)
        os.Exit(1)
    }
    defer db.Close()

    // 初始化 Redis
    cache, err := repository.NewRedis(cfg.Redis)
    if err != nil {
        slog.Error("Failed to connect to Redis", "error", err)
        os.Exit(1)
    }
    defer cache.Close()

    // 初始化 Prometheus 指标
    reg := metrics.NewRegistry()

    // 设置路由
    mux := handler.NewRouter(db, cache, reg, cfg)

    // HTTP 服务器
    server := &http.Server{
        Addr:         ":" + cfg.Server.Port,
        Handler:      mux,
        ReadTimeout:  cfg.Server.ReadTimeout,
        WriteTimeout: cfg.Server.WriteTimeout,
        IdleTimeout:  cfg.Server.IdleTimeout,
    }

    // 启动服务器
    go func() {
        slog.Info("Server listening", "addr", server.Addr)
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            slog.Error("Server error", "error", err)
            os.Exit(1)
        }
    }()

    // 优雅关闭
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    sig := <-quit

    slog.Info("Shutting down server", "signal", sig.String())

    ctx, cancel := context.WithTimeout(context.Background(), cfg.Server.ShutdownTimeout)
    defer cancel()

    if err := server.Shutdown(ctx); err != nil {
        slog.Error("Server shutdown error", "error", err)
    }

    slog.Info("Server stopped")
}

2.3 config.go

package config

import (
    "fmt"
    "os"
    "time"

    "gopkg.in/yaml.v3"
)

type Config struct {
    Server   ServerConfig   `yaml:"server"`
    Database DatabaseConfig `yaml:"database"`
    Redis    RedisConfig    `yaml:"redis"`
}

type ServerConfig struct {
    Port            string        `yaml:"port"`
    ReadTimeout     time.Duration `yaml:"readTimeout"`
    WriteTimeout    time.Duration `yaml:"writeTimeout"`
    IdleTimeout     time.Duration `yaml:"idleTimeout"`
    ShutdownTimeout time.Duration `yaml:"shutdownTimeout"`
}

type DatabaseConfig struct {
    DSN          string        `yaml:"dsn"`
    MaxOpenConns int           `yaml:"maxOpenConns"`
    MaxIdleConns int           `yaml:"maxIdleConns"`
    ConnMaxLife  time.Duration `yaml:"connMaxLife"`
}

type RedisConfig struct {
    Addr     string `yaml:"addr"`
    Password string `yaml:"password"`
    DB       int    `yaml:"db"`
}

func Load() (*Config, error) {
    // 默认配置
    cfg := &Config{
        Server: ServerConfig{
            Port:            "8080",
            ReadTimeout:     30 * time.Second,
            WriteTimeout:    30 * time.Second,
            IdleTimeout:     120 * time.Second,
            ShutdownTimeout: 25 * time.Second,
        },
        Database: DatabaseConfig{
            MaxOpenConns: 100,
            MaxIdleConns: 10,
            ConnMaxLife:  time.Hour,
        },
        Redis: RedisConfig{
            Addr: "redis:6379",
            DB:   0,
        },
    }

    // 从文件加载(ConfigMap 挂载)
    configPath := getEnv("CONFIG_PATH", "/etc/app/config.yaml")
    if data, err := os.ReadFile(configPath); err == nil {
        if err := yaml.Unmarshal(data, cfg); err != nil {
            return nil, fmt.Errorf("parse config: %w", err)
        }
    }

    // 环境变量覆盖(Secret 注入)
    if dsn := os.Getenv("DATABASE_DSN"); dsn != "" {
        cfg.Database.DSN = dsn
    }
    if cfg.Database.DSN == "" {
        return nil, fmt.Errorf("DATABASE_DSN is required")
    }

    if redisPass := os.Getenv("REDIS_PASSWORD"); redisPass != "" {
        cfg.Redis.Password = redisPass
    }
    if port := getEnv("SERVER_PORT", ""); port != "" {
        cfg.Server.Port = port
    }

    return cfg, nil
}

func getEnv(key, defaultVal string) string {
    if v := os.Getenv(key); v != "" {
        return v
    }
    return defaultVal
}

2.4 handler.go

package handler

import (
    "encoding/json"
    "log/slog"
    "net/http"
    "strconv"
    "time"

    "github.com/prometheus/client_golang/prometheus/promhttp"
)

type Router struct {
    mux *http.ServeMux
}

func NewRouter(db TodoRepository, cache CacheRepository, reg MetricsRegistry, cfg *Config) http.Handler {
    mux := http.NewServeMux()
    h := &TodoHandler{db: db, cache: cache}

    // 健康检查
    mux.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
    })

    // 就绪检查(检查数据库连接)
    mux.HandleFunc("GET /ready", func(w http.ResponseWriter, r *http.Request) {
        if err := db.Ping(r.Context()); err != nil {
            slog.Error("Database not ready", "error", err)
            http.Error(w, "not ready", http.StatusServiceUnavailable)
            return
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
    })

    // Prometheus 指标
    mux.Handle("GET /metrics", promhttp.HandlerFor(reg.Registry(), promhttp.HandlerOpts{}))

    // Todo API
    mux.HandleFunc("GET /todos", h.ListTodos)
    mux.HandleFunc("POST /todos", h.CreateTodo)
    mux.HandleFunc("GET /todos/{id}", h.GetTodo)
    mux.HandleFunc("PUT /todos/{id}", h.UpdateTodo)
    mux.HandleFunc("DELETE /todos/{id}", h.DeleteTodo)

    // 中间件链
    return Chain(mux,
        RequestIDMiddleware,
        LoggingMiddleware(reg),
        RecoveryMiddleware,
    )
}

2.5 metrics.go

package metrics

import (
    "net/http"
    "strconv"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

type Registry struct {
    reg *prometheus.Registry

    httpRequestsTotal    *prometheus.CounterVec
    httpRequestDuration  *prometheus.HistogramVec
    dbQueryDuration      *prometheus.HistogramVec
    cacheHits            *prometheus.CounterVec
    activeConnections    prometheus.Gauge
}

func NewRegistry() *Registry {
    reg := prometheus.NewRegistry()
    reg.MustRegister(prometheus.NewGoCollector())
    reg.MustRegister(prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}))

    r := &Registry{reg: reg}

    r.httpRequestsTotal = promauto.With(reg).NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    r.httpRequestDuration = promauto.With(reg).NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
        []string{"method", "path"},
    )

    r.cacheHits = promauto.With(reg).NewCounterVec(
        prometheus.CounterOpts{
            Name: "cache_operations_total",
            Help: "Total number of cache operations",
        },
        []string{"operation", "result"},  // hit/miss
    )

    return r
}

// ObserveHTTP 记录 HTTP 请求指标
func (r *Registry) ObserveHTTP(method, path string, status int, duration time.Duration) {
    r.httpRequestsTotal.WithLabelValues(method, path, strconv.Itoa(status)).Inc()
    r.httpRequestDuration.WithLabelValues(method, path).Observe(duration.Seconds())
}

3. Dockerfile 多阶段构建

# ===== 阶段1:构建 =====
FROM golang:1.22-alpine AS builder

# 安装必要工具
RUN apk add --no-cache git ca-certificates tzdata

WORKDIR /build

# 优先复制 go.mod,利用 Docker 层缓存
COPY go.mod go.sum ./
RUN go mod download

# 复制源码
COPY . .

# 构建(禁用 CGO,静态二进制)
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build \
    -ldflags="-s -w -X main.Version=${VERSION} -X main.BuildTime=$(date -u +%Y%m%d%H%M%S)" \
    -trimpath \
    -o /build/todo-api \
    ./cmd/server

# 验证二进制
RUN /build/todo-api --version || true

# ===== 阶段2:运行 =====
FROM gcr.io/distroless/static-debian12:nonroot

# 从构建阶段复制二进制和必要文件
COPY --from=builder /build/todo-api /todo-api
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# 不需要 root
USER nonroot:nonroot

EXPOSE 8080 9090

ENTRYPOINT ["/todo-api"]
# 构建镜像
docker build \
  --build-arg VERSION=$(git describe --tags --always) \
  -t myregistry.io/todo-api:$(git rev-parse --short HEAD) \
  .

# 验证镜像大小(应该很小)
docker images myregistry.io/todo-api

# 本地测试
docker run -p 8080:8080 \
  -e DATABASE_DSN="user:pass@tcp(localhost:3306)/tododb" \
  myregistry.io/todo-api:latest

4. K8s 资源清单

4.1 命名空间

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: todo-app
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

4.2 密钥配置

# k8s/secrets.yaml(实际生产用 External Secrets Operator)
apiVersion: v1
kind: Secret
metadata:
  name: todo-api-secret
  namespace: todo-app
type: Opaque
stringData:
  DATABASE_DSN: "todo_user:strongpassword@tcp(mysql-service:3306)/tododb?parseTime=true&loc=Local"
  REDIS_PASSWORD: "redis-strong-password"
  MYSQL_ROOT_PASSWORD: "root-strong-password"
  MYSQL_PASSWORD: "strongpassword"

4.3 ConfigMap

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: todo-api-config
  namespace: todo-app
data:
  config.yaml: |
    server:
      port: "8080"
      readTimeout: 30s
      writeTimeout: 30s
      idleTimeout: 120s
      shutdownTimeout: 25s
    database:
      maxOpenConns: 50
      maxIdleConns: 10
      connMaxLife: 1h
    redis:
      addr: redis-service:6379
      db: 0    

4.4 MySQL StatefulSet

# k8s/mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: todo-app
spec:
  serviceName: mysql-headless
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      securityContext:
        runAsUser: 999
        fsGroup: 999
      containers:
      - name: mysql
        image: mysql:8.0
        args:
        - --default-authentication-plugin=mysql_native_password
        - --character-set-server=utf8mb4
        - --collation-server=utf8mb4_unicode_ci
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: MYSQL_ROOT_PASSWORD
        - name: MYSQL_DATABASE
          value: tododb
        - name: MYSQL_USER
          value: todo_user
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: MYSQL_PASSWORD
        ports:
        - containerPort: 3306
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping", "-h", "localhost"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["mysql", "-u", "root", "-p$(MYSQL_ROOT_PASSWORD)", "-e", "SELECT 1"]
          initialDelaySeconds: 10
          periodSeconds: 5
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql

  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
  namespace: todo-app
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: todo-app
spec:
  clusterIP: None
  selector:
    app: mysql
  ports:
  - port: 3306

4.5 Redis Deployment

# k8s/redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: todo-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
      containers:
      - name: redis
        image: redis:7.2-alpine
        command: ["redis-server"]
        args:
        - "--requirepass"
        - "$(REDIS_PASSWORD)"
        - "--maxmemory"
        - "512mb"
        - "--maxmemory-policy"
        - "allkeys-lru"
        env:
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: todo-api-secret
              key: REDIS_PASSWORD
        ports:
        - containerPort: 6379
        resources:
          requests:
            cpu: "100m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "768Mi"
        livenessProbe:
          exec:
            command: ["redis-cli", "ping"]
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: todo-app
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379

4.6 Todo API Deployment

# k8s/todo-api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: todo-api
  namespace: todo-app
  annotations:
    configmap.reloader.stakater.com/reload: "todo-api-config"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: todo-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 10
  revisionHistoryLimit: 5

  template:
    metadata:
      labels:
        app: todo-api
        version: "1.0.0"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: todo-api-sa
      terminationGracePeriodSeconds: 30

      # 反亲和性:Pod 分散到不同 Node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: todo-api
              topologyKey: kubernetes.io/hostname

      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        runAsGroup: 65534
        fsGroup: 65534
        seccompProfile:
          type: RuntimeDefault

      initContainers:
      # 等待 MySQL 就绪
      - name: wait-for-mysql
        image: busybox:1.36
        command:
        - sh
        - -c
        - |
          until nc -z mysql-service 3306; do
            echo "Waiting for MySQL..."
            sleep 2
          done
          echo "MySQL is ready!"          
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
          allowPrivilegeEscalation: false
          capabilities:
            drop: [ALL]

      containers:
      - name: todo-api
        image: myregistry.io/todo-api:v1.0.0
        imagePullPolicy: Always
        ports:
        - name: http
          containerPort: 8080
        - name: metrics
          containerPort: 9090

        envFrom:
        - secretRef:
            name: todo-api-secret

        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP_VERSION
          value: "1.0.0"
        - name: CONFIG_PATH
          value: /etc/app/config.yaml

        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

        livenessProbe:
          httpGet:
            path: /healthz
            port: http
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3

        startupProbe:
          httpGet:
            path: /healthz
            port: http
          failureThreshold: 30
          periodSeconds: 10

        securityContext:
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          capabilities:
            drop: [ALL]

        volumeMounts:
        - name: app-config
          mountPath: /etc/app
          readOnly: true
        - name: tmp
          mountPath: /tmp

      volumes:
      - name: app-config
        configMap:
          name: todo-api-config
      - name: tmp
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: todo-api-service
  namespace: todo-app
spec:
  selector:
    app: todo-api
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: metrics
    port: 9090
    targetPort: 9090
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: todo-api-hpa
  namespace: todo-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: todo-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: todo-api-ingress
  namespace: todo-app
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/limit-rps: "100"
spec:
  tls:
  - hosts:
    - api.todo.example.com
    secretName: todo-api-tls
  rules:
  - host: api.todo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: todo-api-service
            port:
              number: 80
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: todo-api-sa
  namespace: todo-app
automountServiceAccountToken: false

5. Helm Chart 打包

# 项目结构
charts/todo-api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
└── templates/
    ├── _helpers.tpl
    ├── namespace.yaml
    ├── serviceaccount.yaml
    ├── configmap.yaml
    ├── secret.yaml       # 只在测试环境用,生产用 External Secrets
    ├── deployment.yaml
    ├── service.yaml
    ├── ingress.yaml
    ├── hpa.yaml
    ├── networkpolicy.yaml
    ├── rbac.yaml
    └── NOTES.txt
# 安装到 staging
helm upgrade --install todo-api ./charts/todo-api \
  -f charts/todo-api/values.yaml \
  -f charts/todo-api/values-staging.yaml \
  --namespace todo-staging \
  --create-namespace \
  --set image.tag=$IMAGE_TAG \
  --wait \
  --timeout 5m

# 安装到 production
helm upgrade --install todo-api ./charts/todo-api \
  -f charts/todo-api/values.yaml \
  -f charts/todo-api/values-production.yaml \
  --namespace todo-production \
  --set image.tag=$IMAGE_TAG \
  --wait

6. CI/CD 流水线

# .github/workflows/deploy.yaml
name: Build and Deploy

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}/todo-api

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      mysql:
        image: mysql:8.0
        env:
          MYSQL_ROOT_PASSWORD: testpassword
          MYSQL_DATABASE: tododb_test
        ports: ["3306:3306"]
        options: --health-cmd="mysqladmin ping" --health-interval=10s

      redis:
        image: redis:7.2-alpine
        ports: ["6379:6379"]

    steps:
    - uses: actions/checkout@v4

    - uses: actions/setup-go@v5
      with:
        go-version: "1.22"
        cache: true

    - name: Run tests
      run: |
        go test ./... -v -race -coverprofile=coverage.out
        go tool cover -func=coverage.out        
      env:
        DATABASE_DSN: "root:testpassword@tcp(localhost:3306)/tododb_test?parseTime=true"
        REDIS_ADDR: "localhost:6379"

    - name: Run linter
      uses: golangci/golangci-lint-action@v4

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name == 'push'
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
    - uses: actions/checkout@v4

    - name: Log in to registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=sha,prefix=sha-
          type=ref,event=branch
          type=semver,pattern={{version}}          

    - name: Build and push
      id: build
      uses: docker/build-push-action@v5
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        build-args: VERSION=${{ github.sha }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

    - name: Scan for vulnerabilities
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
        format: sarif
        exit-code: "1"
        severity: CRITICAL,HIGH

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging

    steps:
    - uses: actions/checkout@v4

    - name: Deploy to staging
      run: |
        helm upgrade --install todo-api ./charts/todo-api \
          -f charts/todo-api/values.yaml \
          -f charts/todo-api/values-staging.yaml \
          --namespace todo-staging \
          --create-namespace \
          --set image.tag=sha-${{ github.sha }} \
          --wait --timeout 5m        
      env:
        KUBECONFIG: ${{ secrets.STAGING_KUBECONFIG }}

    - name: Run smoke tests
      run: |
        kubectl wait --for=condition=ready pod -l app=todo-api \
          -n todo-staging --timeout=120s
        curl -f https://api.staging.todo.example.com/healthz        

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production    # 需要手动审批

    steps:
    - uses: actions/checkout@v4

    - name: Deploy to production
      run: |
        helm upgrade --install todo-api ./charts/todo-api \
          -f charts/todo-api/values.yaml \
          -f charts/todo-api/values-production.yaml \
          --namespace todo-production \
          --set image.tag=sha-${{ github.sha }} \
          --wait --timeout 10m        
      env:
        KUBECONFIG: ${{ secrets.PRODUCTION_KUBECONFIG }}

    - name: Verify deployment
      run: |
        kubectl rollout status deployment/todo-api -n todo-production
        curl -f https://api.todo.example.com/healthz        

7. 可观测性:监控与日志

7.1 Prometheus 监控配置

# k8s/servicemonitor.yaml(需要 Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: todo-api
  namespace: todo-app
  labels:
    release: kube-prometheus-stack    # 与 Prometheus Operator 的 selector 匹配
spec:
  selector:
    matchLabels:
      app: todo-api
  endpoints:
  - port: metrics
    path: /metrics
    interval: 15s

7.2 Grafana Dashboard

关键指标:

  • 请求 QPS:rate(http_requests_total[5m])
  • 错误率:rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
  • P99 延迟:histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
  • Pod 数量:kube_deployment_status_replicas{deployment="todo-api"}
  • CPU 使用:rate(container_cpu_usage_seconds_total{pod=~"todo-api-.*"}[5m])

7.3 日志收集(Loki + Fluent Bit)

# DaemonSet: Fluent Bit(系列第5篇已介绍)
# 日志格式约定(Go 应用输出 JSON)
// 结构化日志,方便 Loki 查询
slog.Info("Request completed",
    "method", r.Method,
    "path", r.URL.Path,
    "status", statusCode,
    "duration_ms", duration.Milliseconds(),
    "pod", os.Getenv("POD_NAME"),
    "trace_id", traceID,
)

Loki 查询示例:

# 查询所有 5xx 错误
{namespace="todo-app", app="todo-api"} | json | status >= 500

# 查询慢请求(>1s)
{namespace="todo-app", app="todo-api"} | json | duration_ms > 1000

# 统计某路径的错误率
rate({namespace="todo-app"} | json | path="/todos" | status >= 400 [5m])

8. 生产运维技巧

8.1 常用排查命令

# ===== 快速诊断 =====

# 查看 Pod 是否正常
kubectl get pods -n todo-app -o wide

# Pod 卡在 Pending?
kubectl describe pod <pod-name> -n todo-app
# 重点看 Events,常见原因:
# - Insufficient cpu/memory → 资源不足,扩容 Node 或降低 requests
# - No nodes available → 节点 Taint 没有匹配的 Toleration
# - PVC not bound → PVC 找不到 PV

# Pod CrashLoopBackOff?
kubectl logs <pod-name> -n todo-app --previous
kubectl describe pod <pod-name> -n todo-app

# 进入容器调试
kubectl exec -it <pod-name> -n todo-app -- sh

# ===== 网络排查 =====

# 测试 Service 连通性(在 Pod 内)
kubectl run nettest --image=nicolaka/netshoot -it --rm --restart=Never \
  -n todo-app -- curl http://todo-api-service/healthz

# 检查 Endpoints
kubectl get endpoints todo-api-service -n todo-app
# 如果 Endpoints 为空,说明没有就绪的 Pod 匹配 selector

# DNS 解析
kubectl run nettest --image=busybox -it --rm --restart=Never -- nslookup todo-api-service.todo-app

# ===== 性能排查 =====
kubectl top pods -n todo-app
kubectl top nodes
kubectl describe hpa todo-api-hpa -n todo-app

# ===== 日志 =====
# 实时查看所有 todo-api Pod 的日志
kubectl logs -f -l app=todo-api -n todo-app --max-log-requests=5

# 查看某时间段的日志
kubectl logs <pod-name> -n todo-app --since=1h
kubectl logs <pod-name> -n todo-app --since-time="2026-03-16T10:00:00Z"

8.2 维护操作

# 临时禁用某个 Pod(调试)
# 修改 Pod 的 label,使 Service 不再路由流量到它
kubectl label pod <pod-name> app=todo-api-debug --overwrite -n todo-app

# 紧急回滚
kubectl rollout undo deployment/todo-api -n todo-app
kubectl rollout status deployment/todo-api -n todo-app

# 紧急扩容
kubectl scale deployment todo-api --replicas=10 -n todo-app

# 暂停 HPA(手动控制副本数时)
kubectl patch hpa todo-api-hpa -n todo-app -p '{"spec":{"minReplicas":5,"maxReplicas":5}}'

# 强制删除卡住的 Pod
kubectl delete pod <pod-name> -n todo-app --force --grace-period=0

# 查看资源使用排名
kubectl top pods -n todo-app --sort-by=cpu
kubectl top pods -n todo-app --sort-by=memory

8.3 PodDisruptionBudget(保证升级/维护时可用性)

# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: todo-api-pdb
  namespace: todo-app
spec:
  selector:
    matchLabels:
      app: todo-api
  minAvailable: 2       # 最少保持 2 个 Pod 可用
  # 或:
  # maxUnavailable: 1   # 最多 1 个 Pod 不可用

有了 PDB,kubectl drain node-1 时会等待其他 Pod 就绪后再驱逐,保证服务不中断。


9. 完整项目结构

todo-api/
├── cmd/
│   └── server/
│       └── main.go
├── internal/
│   ├── config/config.go
│   ├── handler/
│   │   ├── handler.go
│   │   ├── middleware.go
│   │   └── todo.go
│   ├── model/todo.go
│   ├── repository/
│   │   ├── interface.go
│   │   ├── mysql.go
│   │   └── redis.go
│   └── metrics/metrics.go
├── k8s/                          # K8s 资源清单
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secrets.yaml
│   ├── mysql.yaml
│   ├── redis.yaml
│   ├── todo-api.yaml
│   ├── rbac.yaml
│   ├── networkpolicy.yaml
│   ├── pdb.yaml
│   └── servicemonitor.yaml
├── charts/                       # Helm Charts
│   └── todo-api/
│       ├── Chart.yaml
│       ├── values.yaml
│       ├── values-staging.yaml
│       ├── values-production.yaml
│       └── templates/
├── .github/
│   └── workflows/
│       └── deploy.yaml           # CI/CD
├── Dockerfile
├── go.mod
├── go.sum
├── Makefile
└── README.md

Makefile:

VERSION ?= $(shell git describe --tags --always)
IMAGE    = myregistry.io/todo-api
TAG      = $(shell git rev-parse --short HEAD)

.PHONY: build push deploy-staging deploy-prod

build:
	docker build --build-arg VERSION=$(VERSION) -t $(IMAGE):$(TAG) .

push: build
	docker push $(IMAGE):$(TAG)

test:
	go test ./... -v -race

lint:
	golangci-lint run

deploy-staging: push
	helm upgrade --install todo-api ./charts/todo-api \
		-f charts/todo-api/values.yaml \
		-f charts/todo-api/values-staging.yaml \
		--namespace todo-staging --create-namespace \
		--set image.tag=$(TAG) --wait

deploy-prod: push
	helm upgrade --install todo-api ./charts/todo-api \
		-f charts/todo-api/values.yaml \
		-f charts/todo-api/values-production.yaml \
		--namespace todo-production \
		--set image.tag=$(TAG) --wait

rollback:
	helm rollback todo-api -n todo-production

logs:
	kubectl logs -f -l app=todo-api -n todo-production --max-log-requests=5

status:
	kubectl get pods,svc,ingress,hpa -n todo-production

10. 总结与进阶路线

10.1 系列总结

通过这 10 篇文章,我们系统学习了:

文章 核心知识
01-introduction K8s 是什么、为什么用、本地环境搭建
02-architecture Control Plane 组件、Node 组件、工作原理
03-core-concepts Pod、Namespace、Label/Selector、资源限制
04-workloads Deployment/StatefulSet/DaemonSet/Job/CronJob/HPA
05-networking Service 四种类型、Ingress、DNS、NetworkPolicy
06-storage Volume/PV/PVC/StorageClass 动态供给
07-config-secret ConfigMap、Secret、配置热更新
08-rbac-security RBAC、ServiceAccount、安全上下文
09-helm Chart 开发、模板语法、多环境管理
10-go-app-deploy Go 应用全栈部署实战

10.2 进阶路线

Level 1:运维进阶

  • Kustomize(轻量级多环境配置管理)
  • ArgoCD / Flux(GitOps 持续部署)
  • Prometheus + Grafana(完整监控体系)
  • EFK/PLG Stack(日志体系)
  • Velero(集群备份与恢复)

Level 2:平台工程

  • Istio / Linkerd(服务网格)
  • OPA / Kyverno(策略引擎)
  • Cluster API(K8s 管理 K8s)
  • 多集群管理(Fleet、OCM)
  • FinOps(K8s 成本优化)

Level 3:K8s 开发

  • client-go(K8s Go 客户端)
  • controller-runtime(Controller 框架)
  • kubebuilder(Operator 开发框架)
  • Admission Webhook(准入控制器)
  • CRD(自定义资源)
  • K8s Operator 模式

Level 4:K8s 贡献

  • 阅读 K8s 源码(k8s.io/kubernetes)
  • 参与 SIG(Special Interest Group)
  • 参与 CNCF 开源项目

10.3 推荐资源

官方文档:

书籍:

  • 《Kubernetes in Action》(最好的 K8s 书)
  • 《Cloud Native Go》(Go + K8s 开发)
  • 《Programming Kubernetes》(K8s Operator 开发)

实践平台:

  • killercoda.com(免费在线 K8s 实验环境)
  • play.katacoda.com
  • killer.sh(CKA/CKAD 考试模拟)

认证:

  • CKA(Certified Kubernetes Administrator)
  • CKAD(Certified Kubernetes Application Developer)
  • CKS(Certified Kubernetes Security Specialist)

这个系列到此结束。K8s 是一个庞大的生态,但只要掌握了核心概念,后面的内容都是在核心基础上的扩展。

从 Go 开发者的角度,K8s 既是你的应用运行平台,也是你可以深度参与的开源生态(K8s 本身、大量工具链都是 Go 写的)。

希望这个系列对你有帮助!