Kubernetes-10 Go 应用完整实战:从代码到生产 K8s 部署
目录
Go 应用完整实战:从代码到生产 K8s 部署
系列第十篇(终篇)。本篇将前九篇的所有知识融合为一个完整的实战项目:从零开始构建一个 Go 微服务,并将其部署到 K8s 生产环境。
目录
1. 项目概述
1.1 项目:Go Todo API 服务
一个简单的 RESTful API 服务,包含:
- HTTP API(Todo CRUD)
- MySQL 数据库
- Redis 缓存
- Prometheus 监控指标
- 结构化日志
- 优雅关闭
1.2 技术栈
语言: Go 1.22+
框架: net/http(标准库)
数据库: MySQL 8.0(GORM)
缓存: Redis 7.x
监控: Prometheus metrics
日志: slog(Go 1.21+ 结构化日志)
容器: Docker(多阶段构建)
编排: Kubernetes 1.28+
包管理: Helm 3.x
CI/CD: GitHub Actions
1.3 架构图
Internet
↓ HTTPS
Ingress(nginx-ingress)
↓
todo-api Service(ClusterIP: 80)
↓
todo-api Pod × 3(HPA 管理)
↓ 读写
MySQL StatefulSet | Redis Deployment
(持久化存储) | (内存缓存)
2. Go 应用代码
2.1 项目结构
todo-api/
├── cmd/
│ └── server/
│ └── main.go # 入口
├── internal/
│ ├── config/
│ │ └── config.go # 配置加载
│ ├── handler/
│ │ ├── handler.go # HTTP 处理器
│ │ └── middleware.go # 中间件
│ ├── model/
│ │ └── todo.go # 数据模型
│ ├── repository/
│ │ ├── mysql.go # MySQL 实现
│ │ └── redis.go # Redis 缓存
│ └── metrics/
│ └── metrics.go # Prometheus 指标
├── Dockerfile
├── go.mod
└── go.sum
2.2 main.go
package main
import (
"context"
"log/slog"
"net/http"
"os"
"os/signal"
"syscall"
"time"
"github.com/myorg/todo-api/internal/config"
"github.com/myorg/todo-api/internal/handler"
"github.com/myorg/todo-api/internal/metrics"
"github.com/myorg/todo-api/internal/repository"
)
func main() {
// 初始化结构化日志
logLevel := slog.LevelInfo
if os.Getenv("LOG_LEVEL") == "debug" {
logLevel = slog.LevelDebug
}
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: logLevel,
}))
slog.SetDefault(logger)
// 加载配置
cfg, err := config.Load()
if err != nil {
slog.Error("Failed to load config", "error", err)
os.Exit(1)
}
slog.Info("Starting todo-api",
"version", os.Getenv("APP_VERSION"),
"port", cfg.Server.Port,
)
// 初始化数据库
db, err := repository.NewMySQL(cfg.Database)
if err != nil {
slog.Error("Failed to connect to MySQL", "error", err)
os.Exit(1)
}
defer db.Close()
// 初始化 Redis
cache, err := repository.NewRedis(cfg.Redis)
if err != nil {
slog.Error("Failed to connect to Redis", "error", err)
os.Exit(1)
}
defer cache.Close()
// 初始化 Prometheus 指标
reg := metrics.NewRegistry()
// 设置路由
mux := handler.NewRouter(db, cache, reg, cfg)
// HTTP 服务器
server := &http.Server{
Addr: ":" + cfg.Server.Port,
Handler: mux,
ReadTimeout: cfg.Server.ReadTimeout,
WriteTimeout: cfg.Server.WriteTimeout,
IdleTimeout: cfg.Server.IdleTimeout,
}
// 启动服务器
go func() {
slog.Info("Server listening", "addr", server.Addr)
if err := server.ListenAndServe(); err != http.ErrServerClosed {
slog.Error("Server error", "error", err)
os.Exit(1)
}
}()
// 优雅关闭
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
sig := <-quit
slog.Info("Shutting down server", "signal", sig.String())
ctx, cancel := context.WithTimeout(context.Background(), cfg.Server.ShutdownTimeout)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
slog.Error("Server shutdown error", "error", err)
}
slog.Info("Server stopped")
}
2.3 config.go
package config
import (
"fmt"
"os"
"time"
"gopkg.in/yaml.v3"
)
type Config struct {
Server ServerConfig `yaml:"server"`
Database DatabaseConfig `yaml:"database"`
Redis RedisConfig `yaml:"redis"`
}
type ServerConfig struct {
Port string `yaml:"port"`
ReadTimeout time.Duration `yaml:"readTimeout"`
WriteTimeout time.Duration `yaml:"writeTimeout"`
IdleTimeout time.Duration `yaml:"idleTimeout"`
ShutdownTimeout time.Duration `yaml:"shutdownTimeout"`
}
type DatabaseConfig struct {
DSN string `yaml:"dsn"`
MaxOpenConns int `yaml:"maxOpenConns"`
MaxIdleConns int `yaml:"maxIdleConns"`
ConnMaxLife time.Duration `yaml:"connMaxLife"`
}
type RedisConfig struct {
Addr string `yaml:"addr"`
Password string `yaml:"password"`
DB int `yaml:"db"`
}
func Load() (*Config, error) {
// 默认配置
cfg := &Config{
Server: ServerConfig{
Port: "8080",
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 120 * time.Second,
ShutdownTimeout: 25 * time.Second,
},
Database: DatabaseConfig{
MaxOpenConns: 100,
MaxIdleConns: 10,
ConnMaxLife: time.Hour,
},
Redis: RedisConfig{
Addr: "redis:6379",
DB: 0,
},
}
// 从文件加载(ConfigMap 挂载)
configPath := getEnv("CONFIG_PATH", "/etc/app/config.yaml")
if data, err := os.ReadFile(configPath); err == nil {
if err := yaml.Unmarshal(data, cfg); err != nil {
return nil, fmt.Errorf("parse config: %w", err)
}
}
// 环境变量覆盖(Secret 注入)
if dsn := os.Getenv("DATABASE_DSN"); dsn != "" {
cfg.Database.DSN = dsn
}
if cfg.Database.DSN == "" {
return nil, fmt.Errorf("DATABASE_DSN is required")
}
if redisPass := os.Getenv("REDIS_PASSWORD"); redisPass != "" {
cfg.Redis.Password = redisPass
}
if port := getEnv("SERVER_PORT", ""); port != "" {
cfg.Server.Port = port
}
return cfg, nil
}
func getEnv(key, defaultVal string) string {
if v := os.Getenv(key); v != "" {
return v
}
return defaultVal
}
2.4 handler.go
package handler
import (
"encoding/json"
"log/slog"
"net/http"
"strconv"
"time"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
type Router struct {
mux *http.ServeMux
}
func NewRouter(db TodoRepository, cache CacheRepository, reg MetricsRegistry, cfg *Config) http.Handler {
mux := http.NewServeMux()
h := &TodoHandler{db: db, cache: cache}
// 健康检查
mux.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
})
// 就绪检查(检查数据库连接)
mux.HandleFunc("GET /ready", func(w http.ResponseWriter, r *http.Request) {
if err := db.Ping(r.Context()); err != nil {
slog.Error("Database not ready", "error", err)
http.Error(w, "not ready", http.StatusServiceUnavailable)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
})
// Prometheus 指标
mux.Handle("GET /metrics", promhttp.HandlerFor(reg.Registry(), promhttp.HandlerOpts{}))
// Todo API
mux.HandleFunc("GET /todos", h.ListTodos)
mux.HandleFunc("POST /todos", h.CreateTodo)
mux.HandleFunc("GET /todos/{id}", h.GetTodo)
mux.HandleFunc("PUT /todos/{id}", h.UpdateTodo)
mux.HandleFunc("DELETE /todos/{id}", h.DeleteTodo)
// 中间件链
return Chain(mux,
RequestIDMiddleware,
LoggingMiddleware(reg),
RecoveryMiddleware,
)
}
2.5 metrics.go
package metrics
import (
"net/http"
"strconv"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
type Registry struct {
reg *prometheus.Registry
httpRequestsTotal *prometheus.CounterVec
httpRequestDuration *prometheus.HistogramVec
dbQueryDuration *prometheus.HistogramVec
cacheHits *prometheus.CounterVec
activeConnections prometheus.Gauge
}
func NewRegistry() *Registry {
reg := prometheus.NewRegistry()
reg.MustRegister(prometheus.NewGoCollector())
reg.MustRegister(prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}))
r := &Registry{reg: reg}
r.httpRequestsTotal = promauto.With(reg).NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests",
},
[]string{"method", "path", "status"},
)
r.httpRequestDuration = promauto.With(reg).NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
},
[]string{"method", "path"},
)
r.cacheHits = promauto.With(reg).NewCounterVec(
prometheus.CounterOpts{
Name: "cache_operations_total",
Help: "Total number of cache operations",
},
[]string{"operation", "result"}, // hit/miss
)
return r
}
// ObserveHTTP 记录 HTTP 请求指标
func (r *Registry) ObserveHTTP(method, path string, status int, duration time.Duration) {
r.httpRequestsTotal.WithLabelValues(method, path, strconv.Itoa(status)).Inc()
r.httpRequestDuration.WithLabelValues(method, path).Observe(duration.Seconds())
}
3. Dockerfile 多阶段构建
# ===== 阶段1:构建 =====
FROM golang:1.22-alpine AS builder
# 安装必要工具
RUN apk add --no-cache git ca-certificates tzdata
WORKDIR /build
# 优先复制 go.mod,利用 Docker 层缓存
COPY go.mod go.sum ./
RUN go mod download
# 复制源码
COPY . .
# 构建(禁用 CGO,静态二进制)
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build \
-ldflags="-s -w -X main.Version=${VERSION} -X main.BuildTime=$(date -u +%Y%m%d%H%M%S)" \
-trimpath \
-o /build/todo-api \
./cmd/server
# 验证二进制
RUN /build/todo-api --version || true
# ===== 阶段2:运行 =====
FROM gcr.io/distroless/static-debian12:nonroot
# 从构建阶段复制二进制和必要文件
COPY --from=builder /build/todo-api /todo-api
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# 不需要 root
USER nonroot:nonroot
EXPOSE 8080 9090
ENTRYPOINT ["/todo-api"]
# 构建镜像
docker build \
--build-arg VERSION=$(git describe --tags --always) \
-t myregistry.io/todo-api:$(git rev-parse --short HEAD) \
.
# 验证镜像大小(应该很小)
docker images myregistry.io/todo-api
# 本地测试
docker run -p 8080:8080 \
-e DATABASE_DSN="user:pass@tcp(localhost:3306)/tododb" \
myregistry.io/todo-api:latest
4. K8s 资源清单
4.1 命名空间
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: todo-app
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
4.2 密钥配置
# k8s/secrets.yaml(实际生产用 External Secrets Operator)
apiVersion: v1
kind: Secret
metadata:
name: todo-api-secret
namespace: todo-app
type: Opaque
stringData:
DATABASE_DSN: "todo_user:strongpassword@tcp(mysql-service:3306)/tododb?parseTime=true&loc=Local"
REDIS_PASSWORD: "redis-strong-password"
MYSQL_ROOT_PASSWORD: "root-strong-password"
MYSQL_PASSWORD: "strongpassword"
4.3 ConfigMap
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: todo-api-config
namespace: todo-app
data:
config.yaml: |
server:
port: "8080"
readTimeout: 30s
writeTimeout: 30s
idleTimeout: 120s
shutdownTimeout: 25s
database:
maxOpenConns: 50
maxIdleConns: 10
connMaxLife: 1h
redis:
addr: redis-service:6379
db: 0
4.4 MySQL StatefulSet
# k8s/mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: todo-app
spec:
serviceName: mysql-headless
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
securityContext:
runAsUser: 999
fsGroup: 999
containers:
- name: mysql
image: mysql:8.0
args:
- --default-authentication-plugin=mysql_native_password
- --character-set-server=utf8mb4
- --collation-server=utf8mb4_unicode_ci
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: todo-api-secret
key: MYSQL_ROOT_PASSWORD
- name: MYSQL_DATABASE
value: tododb
- name: MYSQL_USER
value: todo_user
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: todo-api-secret
key: MYSQL_PASSWORD
ports:
- containerPort: 3306
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
livenessProbe:
exec:
command: ["mysqladmin", "ping", "-h", "localhost"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-u", "root", "-p$(MYSQL_ROOT_PASSWORD)", "-e", "SELECT 1"]
initialDelaySeconds: 10
periodSeconds: 5
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
namespace: todo-app
spec:
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: todo-app
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
4.5 Redis Deployment
# k8s/redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: todo-app
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
containers:
- name: redis
image: redis:7.2-alpine
command: ["redis-server"]
args:
- "--requirepass"
- "$(REDIS_PASSWORD)"
- "--maxmemory"
- "512mb"
- "--maxmemory-policy"
- "allkeys-lru"
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: todo-api-secret
key: REDIS_PASSWORD
ports:
- containerPort: 6379
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "768Mi"
livenessProbe:
exec:
command: ["redis-cli", "ping"]
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: todo-app
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
4.6 Todo API Deployment
# k8s/todo-api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: todo-api
namespace: todo-app
annotations:
configmap.reloader.stakater.com/reload: "todo-api-config"
spec:
replicas: 3
selector:
matchLabels:
app: todo-api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
minReadySeconds: 10
revisionHistoryLimit: 5
template:
metadata:
labels:
app: todo-api
version: "1.0.0"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: todo-api-sa
terminationGracePeriodSeconds: 30
# 反亲和性:Pod 分散到不同 Node
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: todo-api
topologyKey: kubernetes.io/hostname
securityContext:
runAsNonRoot: true
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534
seccompProfile:
type: RuntimeDefault
initContainers:
# 等待 MySQL 就绪
- name: wait-for-mysql
image: busybox:1.36
command:
- sh
- -c
- |
until nc -z mysql-service 3306; do
echo "Waiting for MySQL..."
sleep 2
done
echo "MySQL is ready!"
securityContext:
runAsNonRoot: true
runAsUser: 65534
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
containers:
- name: todo-api
image: myregistry.io/todo-api:v1.0.0
imagePullPolicy: Always
ports:
- name: http
containerPort: 8080
- name: metrics
containerPort: 9090
envFrom:
- secretRef:
name: todo-api-secret
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: APP_VERSION
value: "1.0.0"
- name: CONFIG_PATH
value: /etc/app/config.yaml
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: http
failureThreshold: 30
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
volumeMounts:
- name: app-config
mountPath: /etc/app
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: app-config
configMap:
name: todo-api-config
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: todo-api-service
namespace: todo-app
spec:
selector:
app: todo-api
ports:
- name: http
port: 80
targetPort: 8080
- name: metrics
port: 9090
targetPort: 9090
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: todo-api-hpa
namespace: todo-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: todo-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: todo-api-ingress
namespace: todo-app
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/limit-rps: "100"
spec:
tls:
- hosts:
- api.todo.example.com
secretName: todo-api-tls
rules:
- host: api.todo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: todo-api-service
port:
number: 80
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: todo-api-sa
namespace: todo-app
automountServiceAccountToken: false
5. Helm Chart 打包
# 项目结构
charts/todo-api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
└── templates/
├── _helpers.tpl
├── namespace.yaml
├── serviceaccount.yaml
├── configmap.yaml
├── secret.yaml # 只在测试环境用,生产用 External Secrets
├── deployment.yaml
├── service.yaml
├── ingress.yaml
├── hpa.yaml
├── networkpolicy.yaml
├── rbac.yaml
└── NOTES.txt
# 安装到 staging
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-staging.yaml \
--namespace todo-staging \
--create-namespace \
--set image.tag=$IMAGE_TAG \
--wait \
--timeout 5m
# 安装到 production
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-production.yaml \
--namespace todo-production \
--set image.tag=$IMAGE_TAG \
--wait
6. CI/CD 流水线
# .github/workflows/deploy.yaml
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/todo-api
jobs:
test:
runs-on: ubuntu-latest
services:
mysql:
image: mysql:8.0
env:
MYSQL_ROOT_PASSWORD: testpassword
MYSQL_DATABASE: tododb_test
ports: ["3306:3306"]
options: --health-cmd="mysqladmin ping" --health-interval=10s
redis:
image: redis:7.2-alpine
ports: ["6379:6379"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: "1.22"
cache: true
- name: Run tests
run: |
go test ./... -v -race -coverprofile=coverage.out
go tool cover -func=coverage.out
env:
DATABASE_DSN: "root:testpassword@tcp(localhost:3306)/tododb_test?parseTime=true"
REDIS_ADDR: "localhost:6379"
- name: Run linter
uses: golangci/golangci-lint-action@v4
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name == 'push'
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=sha-
type=ref,event=branch
type=semver,pattern={{version}}
- name: Build and push
id: build
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: VERSION=${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Scan for vulnerabilities
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
format: sarif
exit-code: "1"
severity: CRITICAL,HIGH
deploy-staging:
needs: build
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Deploy to staging
run: |
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-staging.yaml \
--namespace todo-staging \
--create-namespace \
--set image.tag=sha-${{ github.sha }} \
--wait --timeout 5m
env:
KUBECONFIG: ${{ secrets.STAGING_KUBECONFIG }}
- name: Run smoke tests
run: |
kubectl wait --for=condition=ready pod -l app=todo-api \
-n todo-staging --timeout=120s
curl -f https://api.staging.todo.example.com/healthz
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production # 需要手动审批
steps:
- uses: actions/checkout@v4
- name: Deploy to production
run: |
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-production.yaml \
--namespace todo-production \
--set image.tag=sha-${{ github.sha }} \
--wait --timeout 10m
env:
KUBECONFIG: ${{ secrets.PRODUCTION_KUBECONFIG }}
- name: Verify deployment
run: |
kubectl rollout status deployment/todo-api -n todo-production
curl -f https://api.todo.example.com/healthz
7. 可观测性:监控与日志
7.1 Prometheus 监控配置
# k8s/servicemonitor.yaml(需要 Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: todo-api
namespace: todo-app
labels:
release: kube-prometheus-stack # 与 Prometheus Operator 的 selector 匹配
spec:
selector:
matchLabels:
app: todo-api
endpoints:
- port: metrics
path: /metrics
interval: 15s
7.2 Grafana Dashboard
关键指标:
- 请求 QPS:
rate(http_requests_total[5m]) - 错误率:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) - P99 延迟:
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) - Pod 数量:
kube_deployment_status_replicas{deployment="todo-api"} - CPU 使用:
rate(container_cpu_usage_seconds_total{pod=~"todo-api-.*"}[5m])
7.3 日志收集(Loki + Fluent Bit)
# DaemonSet: Fluent Bit(系列第5篇已介绍)
# 日志格式约定(Go 应用输出 JSON)
// 结构化日志,方便 Loki 查询
slog.Info("Request completed",
"method", r.Method,
"path", r.URL.Path,
"status", statusCode,
"duration_ms", duration.Milliseconds(),
"pod", os.Getenv("POD_NAME"),
"trace_id", traceID,
)
Loki 查询示例:
# 查询所有 5xx 错误
{namespace="todo-app", app="todo-api"} | json | status >= 500
# 查询慢请求(>1s)
{namespace="todo-app", app="todo-api"} | json | duration_ms > 1000
# 统计某路径的错误率
rate({namespace="todo-app"} | json | path="/todos" | status >= 400 [5m])
8. 生产运维技巧
8.1 常用排查命令
# ===== 快速诊断 =====
# 查看 Pod 是否正常
kubectl get pods -n todo-app -o wide
# Pod 卡在 Pending?
kubectl describe pod <pod-name> -n todo-app
# 重点看 Events,常见原因:
# - Insufficient cpu/memory → 资源不足,扩容 Node 或降低 requests
# - No nodes available → 节点 Taint 没有匹配的 Toleration
# - PVC not bound → PVC 找不到 PV
# Pod CrashLoopBackOff?
kubectl logs <pod-name> -n todo-app --previous
kubectl describe pod <pod-name> -n todo-app
# 进入容器调试
kubectl exec -it <pod-name> -n todo-app -- sh
# ===== 网络排查 =====
# 测试 Service 连通性(在 Pod 内)
kubectl run nettest --image=nicolaka/netshoot -it --rm --restart=Never \
-n todo-app -- curl http://todo-api-service/healthz
# 检查 Endpoints
kubectl get endpoints todo-api-service -n todo-app
# 如果 Endpoints 为空,说明没有就绪的 Pod 匹配 selector
# DNS 解析
kubectl run nettest --image=busybox -it --rm --restart=Never -- nslookup todo-api-service.todo-app
# ===== 性能排查 =====
kubectl top pods -n todo-app
kubectl top nodes
kubectl describe hpa todo-api-hpa -n todo-app
# ===== 日志 =====
# 实时查看所有 todo-api Pod 的日志
kubectl logs -f -l app=todo-api -n todo-app --max-log-requests=5
# 查看某时间段的日志
kubectl logs <pod-name> -n todo-app --since=1h
kubectl logs <pod-name> -n todo-app --since-time="2026-03-16T10:00:00Z"
8.2 维护操作
# 临时禁用某个 Pod(调试)
# 修改 Pod 的 label,使 Service 不再路由流量到它
kubectl label pod <pod-name> app=todo-api-debug --overwrite -n todo-app
# 紧急回滚
kubectl rollout undo deployment/todo-api -n todo-app
kubectl rollout status deployment/todo-api -n todo-app
# 紧急扩容
kubectl scale deployment todo-api --replicas=10 -n todo-app
# 暂停 HPA(手动控制副本数时)
kubectl patch hpa todo-api-hpa -n todo-app -p '{"spec":{"minReplicas":5,"maxReplicas":5}}'
# 强制删除卡住的 Pod
kubectl delete pod <pod-name> -n todo-app --force --grace-period=0
# 查看资源使用排名
kubectl top pods -n todo-app --sort-by=cpu
kubectl top pods -n todo-app --sort-by=memory
8.3 PodDisruptionBudget(保证升级/维护时可用性)
# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: todo-api-pdb
namespace: todo-app
spec:
selector:
matchLabels:
app: todo-api
minAvailable: 2 # 最少保持 2 个 Pod 可用
# 或:
# maxUnavailable: 1 # 最多 1 个 Pod 不可用
有了 PDB,kubectl drain node-1 时会等待其他 Pod 就绪后再驱逐,保证服务不中断。
9. 完整项目结构
todo-api/
├── cmd/
│ └── server/
│ └── main.go
├── internal/
│ ├── config/config.go
│ ├── handler/
│ │ ├── handler.go
│ │ ├── middleware.go
│ │ └── todo.go
│ ├── model/todo.go
│ ├── repository/
│ │ ├── interface.go
│ │ ├── mysql.go
│ │ └── redis.go
│ └── metrics/metrics.go
├── k8s/ # K8s 资源清单
│ ├── namespace.yaml
│ ├── configmap.yaml
│ ├── secrets.yaml
│ ├── mysql.yaml
│ ├── redis.yaml
│ ├── todo-api.yaml
│ ├── rbac.yaml
│ ├── networkpolicy.yaml
│ ├── pdb.yaml
│ └── servicemonitor.yaml
├── charts/ # Helm Charts
│ └── todo-api/
│ ├── Chart.yaml
│ ├── values.yaml
│ ├── values-staging.yaml
│ ├── values-production.yaml
│ └── templates/
├── .github/
│ └── workflows/
│ └── deploy.yaml # CI/CD
├── Dockerfile
├── go.mod
├── go.sum
├── Makefile
└── README.md
Makefile:
VERSION ?= $(shell git describe --tags --always)
IMAGE = myregistry.io/todo-api
TAG = $(shell git rev-parse --short HEAD)
.PHONY: build push deploy-staging deploy-prod
build:
docker build --build-arg VERSION=$(VERSION) -t $(IMAGE):$(TAG) .
push: build
docker push $(IMAGE):$(TAG)
test:
go test ./... -v -race
lint:
golangci-lint run
deploy-staging: push
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-staging.yaml \
--namespace todo-staging --create-namespace \
--set image.tag=$(TAG) --wait
deploy-prod: push
helm upgrade --install todo-api ./charts/todo-api \
-f charts/todo-api/values.yaml \
-f charts/todo-api/values-production.yaml \
--namespace todo-production \
--set image.tag=$(TAG) --wait
rollback:
helm rollback todo-api -n todo-production
logs:
kubectl logs -f -l app=todo-api -n todo-production --max-log-requests=5
status:
kubectl get pods,svc,ingress,hpa -n todo-production
10. 总结与进阶路线
10.1 系列总结
通过这 10 篇文章,我们系统学习了:
| 文章 | 核心知识 |
|---|---|
| 01-introduction | K8s 是什么、为什么用、本地环境搭建 |
| 02-architecture | Control Plane 组件、Node 组件、工作原理 |
| 03-core-concepts | Pod、Namespace、Label/Selector、资源限制 |
| 04-workloads | Deployment/StatefulSet/DaemonSet/Job/CronJob/HPA |
| 05-networking | Service 四种类型、Ingress、DNS、NetworkPolicy |
| 06-storage | Volume/PV/PVC/StorageClass 动态供给 |
| 07-config-secret | ConfigMap、Secret、配置热更新 |
| 08-rbac-security | RBAC、ServiceAccount、安全上下文 |
| 09-helm | Chart 开发、模板语法、多环境管理 |
| 10-go-app-deploy | Go 应用全栈部署实战 |
10.2 进阶路线
Level 1:运维进阶
- Kustomize(轻量级多环境配置管理)
- ArgoCD / Flux(GitOps 持续部署)
- Prometheus + Grafana(完整监控体系)
- EFK/PLG Stack(日志体系)
- Velero(集群备份与恢复)
Level 2:平台工程
- Istio / Linkerd(服务网格)
- OPA / Kyverno(策略引擎)
- Cluster API(K8s 管理 K8s)
- 多集群管理(Fleet、OCM)
- FinOps(K8s 成本优化)
Level 3:K8s 开发
- client-go(K8s Go 客户端)
- controller-runtime(Controller 框架)
- kubebuilder(Operator 开发框架)
- Admission Webhook(准入控制器)
- CRD(自定义资源)
- K8s Operator 模式
Level 4:K8s 贡献
- 阅读 K8s 源码(k8s.io/kubernetes)
- 参与 SIG(Special Interest Group)
- 参与 CNCF 开源项目
10.3 推荐资源
官方文档:
书籍:
- 《Kubernetes in Action》(最好的 K8s 书)
- 《Cloud Native Go》(Go + K8s 开发)
- 《Programming Kubernetes》(K8s Operator 开发)
实践平台:
- killercoda.com(免费在线 K8s 实验环境)
- play.katacoda.com
- killer.sh(CKA/CKAD 考试模拟)
认证:
- CKA(Certified Kubernetes Administrator)
- CKAD(Certified Kubernetes Application Developer)
- CKS(Certified Kubernetes Security Specialist)
这个系列到此结束。K8s 是一个庞大的生态,但只要掌握了核心概念,后面的内容都是在核心基础上的扩展。
从 Go 开发者的角度,K8s 既是你的应用运行平台,也是你可以深度参与的开源生态(K8s 本身、大量工具链都是 Go 写的)。
希望这个系列对你有帮助!
xingliuhua