Kubernetes 1.35 Worker 节点标准化部署流程
在构建高可用 Kubernetes 集群的过程中,Worker 节点的标准化配置是确保集群稳定性与可维护性的关键一环。本文详细记录了基于 Ubuntu/Debian 系统、使用 Containerd 作为容器运行时、Kubernetes v1.35 的通用 Worker 节点部署全过程,并提供一键自动化脚本,适用于大规模节点快速上线。
环境说明
目标节点列表(GPU计算型):
192.168.7.71 k8s-worker-gpu01
192.168.7.72 k8s-worker-gpu02
192.168.7.73 k8s-worker-gpu03
192.168.7.74 k8s-worker-gpu04
192.168.7.75 k8s-worker-gpu05
硬件规格:64 核 CPU / 512 GB 内存 / 256 GB SSD(启用 writeback 与 discard)+ 8 TB HDD(挂载于 /var/lib,同样启用 writeback 与 discard),文件系统为 XFS。
注:GPU 节点(如
k8s-worker-gpu01)可复用相同流程,仅在打标签阶段区分角色。
部署步骤详解
1. 禁用 Swap
Kubernetes 官方明确要求关闭 swap,否则 kubelet 将无法启动。
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
2. 内核模块与网络配置
启用 overlay 和 br_netfilter 模块以支持容器网络,并调整 sysctl 参数:
# /etc/modules-load.d/k8s.conf
overlay
br_netfilter
# /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
执行 sysctl --system 使配置生效。
禁用nouveau
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
blacklist nvidiafb
options nouveau modeset=0
EOF
sudo update-initramfs -u
3.安装nvidia驱动
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
# 查看有哪些支持版本
apt list -a nvidia-driver-*
...
nvidia-driver-580-open/noble-updates 580.95.05-0ubuntu0.24.04.3 amd64
nvidia-driver-580-open/noble-security 580.95.05-0ubuntu0.24.04.2 amd64
nvidia-driver-580-server-open/noble-updates 580.95.05-0ubuntu0.24.04.3 amd64
nvidia-driver-580-server-open/noble-security 580.95.05-0ubuntu0.24.04.2 amd64
nvidia-driver-580-server/noble-updates 580.95.05-0ubuntu0.24.04.3 amd64
nvidia-driver-580-server/noble-security 580.95.05-0ubuntu0.24.04.2 amd64
nvidia-driver-580/noble-updates 580.95.05-0ubuntu0.24.04.3 amd64
nvidia-driver-580/noble-security 580.95.05-0ubuntu0.24.04.2 amd64
# 我选择装最新版 nvidia-driver-580-server
apt install nvidia-driver-580-server
4. 安装 Containerd 运行时
从预下载的二进制包安装 Containerd、runc 及 CNI 插件:
- 解压
containerd-2.2.1-linux-amd64.tar.gz到/usr/local - 安装
runc.amd64到/usr/local/sbin/runc - 解压 CNI 插件到
/opt/cni/bin
生成默认配置并修改关键参数:
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true # 必须设为 true 以匹配 kubelet 的 systemd 驱动
注册 systemd 服务并启动:
systemctl daemon-reload
systemctl enable --now containerd
5. 配置 crictl
安装 crictl 并指向 Containerd socket:
# /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
验证输出中应包含 "systemdCgroup": true。
6. 预拉取镜像
为避免因网络问题导致 Pod 启动失败,提前拉取并重命名用到的镜像:
ctr -n k8s.io images pull registry.aliyuncs.com/google_containers/pause:3.10.1 --platform linux/amd64
ctr -n k8s.io images tag \
registry.aliyuncs.com/google_containers/pause:3.10.1 \
registry.k8s.io/pause:3.10.1
ctr -n k8s.io images pull "registry.aliyuncs.com/google_containers/kube-proxy:v1.35.0" --platform linux/$(uname -m)
ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/kube-proxy:v1.35.0 registry.k8s.io/kube-proxy:v1.35.0
7. 安装 kubelet 与 kubeadm
使用清华大学 Kubernetes 镜像源安装指定版本组件:
# 添加 GPG 密钥
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 配置 APT 源
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/core:/stable:/v1.35/deb/ /" > /etc/apt/sources.list.d/kubernetes.list
apt update
apt install -y kubelet=1.35.0-1.1 kubeadm=1.35.0-1.1
apt-mark hold kubelet kubeadm
8. 加入集群
在 Control Plane 节点执行:
kubeadm token create --print-join-command
将输出的命令在 Worker 节点执行,完成注册。
9. 节点角色标记(可选)
为便于调度区分,可为节点打上自定义角色标签:
# 通用节点
kubectl label node k8s-worker-general01 node-role.kubernetes.io/worker-general=""
# GPU 节点
kubectl label node k8s-worker-gpu01 node-role.kubernetes.io/worker-gpu=""
此时 kubectl get nodes 将显示清晰的角色标识。
自动化部署脚本
以下脚本整合上述所有操作,支持一键初始化 Worker 节点环境(不含 join 步骤,需手动或传参执行)。
#!/bin/bash
set -e
# ==============================
# 用户需提前设置以下变量(或通过命令行传参)
# ==============================
K8S_VERSION="1.35.0"
K8S_DEB_VERSION="1.35.0-1.1"
ARCH="amd64"
DOWNLOAD_DIR="/home/gongdear/k8s1.35" # 请确保该目录存在且包含所需 tar 包
# 可选:如果已知 join 命令,可通过环境变量传入
# KUBEADM_JOIN_CMD="kubeadm join ..."
# ==============================
# 1. 关闭 swap
# ==============================
echo "[INFO] Disabling swap..."
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
# ==============================
# 2. 内核模块与 sysctl
# ==============================
echo "[INFO] Configuring kernel modules and sysctl..."
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
# ==============================
# 3. 安装 containerd, runc, CNI
# ==============================
echo "[INFO] Installing containerd, runc, and CNI plugins..."
cd "$DOWNLOAD_DIR"
# 解压 containerd
tar Cxzvf /usr/local containerd-2.2.1-linux-${ARCH}.tar.gz
# 安装 runc
install -m 755 runc.${ARCH} /usr/local/sbin/runc
# 安装 CNI 插件
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-${ARCH}-v1.9.0.tgz
# 生成默认 config.toml
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
# 修改 SystemdCgroup = true
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# 安装 systemd service
cp containerd.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now containerd
# ==============================
# 4. 安装 crictl
# ==============================
echo "[INFO] Installing crictl..."
tar zxvf crictl-v${K8S_VERSION}-linux-${ARCH}.tar.gz -C /usr/local/bin
cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
# ==============================
# 5. 预拉取镜像
# ==============================
echo "[INFO] Pre-pulling and tagging pause image..."
ctr -n k8s.io images pull "registry.aliyuncs.com/google_containers/pause:3.10.1" --platform linux/$(uname -m)
ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/pause:3.10.1 registry.k8s.io/pause:3.10.1
ctr -n k8s.io images pull "registry.aliyuncs.com/google_containers/kube-proxy:v1.35.0" --platform linux/$(uname -m)
ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/kube-proxy:v1.35.0 registry.k8s.io/kube-proxy:v1.35.0
# ==============================
# 6. 添加 Kubernetes APT 源(清华镜像)
# ==============================
echo "[INFO] Adding Kubernetes APT repository (Tsinghua mirror)..."
# 下载并安装 GPG key
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# 写入 sources.list
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/core:/stable:/v1.35/deb/ /
EOF
apt update
# ==============================
# 7. 安装 kubelet 和 kubeadm(指定版本)
# ==============================
echo "[INFO] Installing kubelet and kubeadm v${K8S_VERSION}..."
apt install -y kubelet=${K8S_DEB_VERSION} kubeadm=${K8S_DEB_VERSION}
apt-mark hold kubelet kubeadm
# ==============================
# 8. 提示用户执行 join
# ==============================
echo ""
echo "✅ Worker node preparation completed!"
echo ""
if [[ -n "${KUBEADM_JOIN_CMD}" ]]; then
echo "[INFO] Executing kubeadm join command from environment variable..."
eval "${KUBEADM_JOIN_CMD}"
else
echo "👉 Please run the following command on this node to join the cluster:"
echo ""
echo " kubeadm join <control-plane-host>:<port> --token <token> \\"
echo " --discovery-token-ca-cert-hash sha256:<hash>"
echo ""
echo "💡 You can get this command by running on control-plane:"
echo " kubeadm token create --print-join-command"
fi
echo ""
echo "After joining, label the node accordingly, e.g.:"
echo " kubectl label node \$(hostname) node-role.kubernetes.io/worker-general=\"\""
使用建议
- 前提准备:确保所有 Worker 节点已配置好主机名、网络及时间同步。
- 离线包分发:将
containerd、runc、CNI、crictl等二进制包统一放置于$DOWNLOAD_DIR。 - 批量部署:结合 Ansible 或
pdsh可实现 50+ 节点分钟级上线。 - 安全加固:生产环境中建议进一步配置 SELinux/AppArmor、审计日志及最小权限原则。
经过实际使用,建议还是给每个节点的containerd创建代理最省心
# 创建 drop-in 目录
sudo mkdir -p /etc/systemd/system/containerd.service.d
# 创建代理配置
sudo tee /etc/systemd/system/containerd.service.d/proxy.conf <<EOF
[Service]
Environment="HTTP_PROXY=http://your.proxy.local:3128"
Environment="HTTPS_PROXY=http://your.proxy.local:3128"
Environment="NO_PROXY=localhost,127.0.0.1,.svc,.cluster.local,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
EOF
然后重载并重启containerd
sudo systemctl daemon-reload
sudo systemctl restart containerd