Kubernetes Redis 4.0.14 集群部署与排错手册
2025-08-13
14 min read
📚 Kubernetes Redis 4.0.14 集群部署与排错手册(真实验证版)
命名空间:cka | 版本:v9.0(含真实命令验证)
1️⃣ 第一部分:引言
本文档记录了在 Kubernetes 集群中,使用原始配置文件(pv
、redis.conf
、redis-cluster.yaml
)部署 Redis 4.0.14 集群的全过程。
重点包括:
- 使用
redis.conf
创建 ConfigMap - 挂载配置到 Redis Pod
- 集群初始化失败:
redis:4.0.14
官方镜像不支持redis-cli --cluster
- 排错验证与最终解决方案
- 一键初始化脚本
所有步骤均基于真实命令验证,确保可复现。
2️⃣ 第二部分:开始部署
2.1 准备 NFS 存储(在 NFS 服务器上执行)
mkdir -p /data/k8sdata/cka/redis{0..5}
chmod 777 /data/k8sdata/cka/redis{0..5}
2.2 创建 PersistentVolume(PV)
✅ 使原始
pv/redis-cluster-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv0
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis0
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv1
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis1
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv2
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis2
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv3
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis3
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv4
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis4
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: redis-cluster-pv5
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
nfs:
server: 172.31.7.110
path: /data/k8sdata/cka/redis5
应用 PV:
kubectl apply -f pv/redis-cluster-pv.yaml
kubectl get pv
✅ 确认 6 个 PV 状态为
Available
。
2.3 创建 Redis 配置文件 redis.conf
✅ 使用原始的
redis.conf
# redis.conf
port 6379
bind 0.0.0.0
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
save ""
💡 说明:
cluster-enabled yes
:启用集群模式cluster-config-file nodes.conf
:集群节点信息文件appendonly yes
:开启 AOF 持久化save ""
:关闭 RDB 持久化,避免与 AOF 冲突
2.4 从 redis.conf
创建 ConfigMap
kubectl create configmap redis-config \
--from-file=redis.conf=./redis.conf \
-n cka
✅ 验证 ConfigMap:
kubectl get configmap redis-config -n cka -o yaml
2.5 部署 Redis StatefulSet 与 Service(含 ConfigMap 挂载)
✅ 更新后的
redis-cluster.yaml
,包含 ConfigMap 挂载
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: cka
labels:
app: redis
spec:
selector:
app: redis
appCluster: redis-cluster
ports:
- name: redis
port: 6379
targetPort: 6379
- name: cluster
port: 16379
targetPort: 16379
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: redis-access
namespace: cka
labels:
app: redis
spec:
selector:
app: redis
appCluster: redis-cluster
ports:
- name: redis-access
protocol: TCP
port: 6379
targetPort: 6379
type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: cka
spec:
serviceName: redis
replicas: 6
selector:
matchLabels:
app: redis
appCluster: redis-cluster
template:
metadata:
labels:
app: redis
appCluster: redis-cluster
spec:
terminationGracePeriodSeconds: 20
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:4.0.14
command:
- "redis-server"
args:
- "/etc/redis/redis.conf"
resources:
requests:
cpu: "500m"
memory: "500Mi"
ports:
- containerPort: 6379
name: redis
protocol: TCP
- containerPort: 16379
name: cluster
protocol: TCP
volumeMounts:
- name: conf
mountPath: /etc/redis
- name: data
mountPath: /var/lib/redis
volumes:
- name: conf
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
应用配置:
kubectl apply -f redis-cluster.yaml
2.6 等待 Pod 就绪
kubectl get pods -n cka -o wide -w
✅ 等待所有
redis-0
到redis-5
状态变为Running
。
3️⃣ 第三部分:初始化报错(首次尝试)
尝试使用 redis:4.0.14
镜像中的 redis-cli
初始化集群
kubectl run -it --rm --restart=Never \
--namespace cka \
redis-admin \
--image=redis:4.0.14 \
-- redis-cli --cluster create \
redis-access.cka.svc.cluster.local:6379 \
redis-access.cka.svc.cluster.local:6379 \
redis-access.cka.svc.cluster.local:6379 \
redis-access.cka.svc.cluster.local:6379 \
redis-access.cka.svc.cluster.local:6379 \
redis-access.cka.svc.cluster.local:6379 \
--cluster-replicas 1 \
--cluster-yes
❌ 报错信息:
redis-cli: unknown option '--cluster'
🔴 错误原因:
redis:4.0.14
镜像中的redis-cli
不支持--cluster
子命令。
4️⃣ 第四部分:排错与验证
🔎 排错 1:验证 redis:4.0.14
是否支持 --cluster
docker run -it --rm redis:4.0.14 redis-cli --help | grep cluster
🔍 输出为空
docker run -it --rm redis:4.0.14 redis-cli --cluster help
❌ 输出:
redis-cli: unknown option '--cluster'
✅ 结论:
redis:4.0.14
官方镜像确实不支持--cluster
。
🔎 排错 2:尝试使用 redis-trib.rb
docker run -it --rm redis:4.0.14 sh -c "ls /usr/local/bin | grep trib"
❌ 输出为空,
redis-trib.rb
不存在。
✅ 结论:无法使用旧工具。
✅ 解决方案:使用高版本 redis-cli
(redis:6.2.6
) + Pod IP
获取 Pod IP:
root@101-master1:/opt/k8s-data/yaml/cka/redis-cluster# kubectl get pod -cka -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cka deploy-devops-redis-6d9fd4dbcb-hbs6q 1/1 Running 0 176m 10.200.45.203 172.31.7.105 <none> <none>
cka cka-nginx-deployment-69db98d5ff-qc7mr 1/1 Running 0 9h 10.200.11.135 172.31.7.106 <none> <none>
cka cka-tomcat-app1-deployment-78c495f67d-6db72 1/1 Running 0 24h 10.200.45.199 172.31.7.105 <none> <none>
cka cka-tomcat-app1-deployment-78c495f67d-xbm5f 1/1 Running 0 24h 10.200.11.134 172.31.7.106 <none> <none>
cka redis-0 1/1 Running 0 30m 10.200.210.76 172.31.7.104 <none> <none>
cka redis-1 1/1 Running 0 29m 10.200.45.204 172.31.7.105 <none> <none>
cka redis-2 1/1 Running 0 24m 10.200.11.136 172.31.7.106 <none> <none>
cka redis-3 1/1 Running 0 24m 10.200.210.77 172.31.7.104 <none> <none>
cka redis-4 1/1 Running 0 24m 10.200.45.205 172.31.7.105 <none> <none>
cka redis-5 1/1 Running 0 24m 10.200.11.137 172.31.7.106 <none> <none>
cka zookeeper1-5666cd8f6f-b7flr 1/1 Running 0 4h28m 10.200.45.201 172.31.7.105 <none> <none>
cka zookeeper2-c4964cd66-vsrt6 1/1 Running 0 4h28m 10.200.45.202 172.31.7.105 <none> <none>
cka zookeeper3-55fc5c6847-xfw9g 1/1 Running 0 4h28m 10.200.210.75 172.31.7.104 <none> <none>
kubectl get pods -n cka -l app=redis -o jsonpath='{range .items[*]}{.status.podIP}{"\n"}{end}'
使用高版本 redis-cli
初始化:
kubectl run -it --rm --restart=Never \
--namespace cka \
redis-admin \
--image=redis:6.2.6 \
-- redis-cli --cluster create \
10.200.210.76:6379 \
10.200.45.204:6379 \
10.200.11.136:6379 \
10.200.210.77:6379 \
10.200.45.205:6379 \
10.200.11.137:6379 \
--cluster-replicas 1 \
--cluster-yes
✅ 成功输出:
root@101-master1:/opt/k8s-data/yaml/cka/redis-cluster# kubectl run -it --rm --restart=Never \
> --namespace cka \
> redis-admin \
> --image=redis:6.2.6 \
> -- redis-cli --cluster create \
> 10.200.210.76:6379 \
> 10.200.45.204:6379 \
> 10.200.11.136:6379 \
> 10.200.210.77:6379 \
> 10.200.45.205:6379 \
> 10.200.11.137:6379 \
> --cluster-replicas 1 \
> --cluster-yes
If you don't see a command prompt, try pressing enter.
..
>>> Performing Cluster Check (using node 10.200.210.76:6379)
M: 0d3db136c17df8b0681fe8e2436a4f99203f0507 10.200.210.76:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: a6f78d2a82b2c56fdd8f61794a96e30f2398777f 10.200.11.137:6379
slots: (0 slots) slave
replicates 931716e199e90bc313c3480661af530d1f32bf08
S: 4a4baa12df1ada0d81e27b9a1ecc7aa2f9cec0ff 10.200.45.205:6379
slots: (0 slots) slave
replicates 0d3db136c17df8b0681fe8e2436a4f99203f0507
S: 6f33cd693707f3109704ca23151a4f1b7716e380 10.200.210.77:6379
slots: (0 slots) slave
replicates 7a8f9abcbc779faebb10e2b5de9f00be0d7ad482
M: 7a8f9abcbc779faebb10e2b5de9f00be0d7ad482 10.200.11.136:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
M: 931716e199e90bc313c3480661af530d1f32bf08 10.200.45.204:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
pod "redis-admin" deleted
root@101-master1:/opt/k8s-data/yaml/cka/redis-cluster# kubectl exec -n cka redis-0 -- redis-cli -c cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:55
cluster_stats_messages_pong_sent:58
cluster_stats_messages_sent:113
cluster_stats_messages_ping_received:53
cluster_stats_messages_pong_received:55
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:113
root@101-master1:/opt/k8s-data/yaml/cka/redis-cluster# kubectl exec -n cka redis-0 -- redis-cli -c cluster nodes
a6f78d2a82b2c56fdd8f61794a96e30f2398777f 10.200.11.137:6379@16379 slave 931716e199e90bc313c3480661af530d1f32bf08 0 1755030011982 6 connected
0d3db136c17df8b0681fe8e2436a4f99203f0507 10.200.210.76:6379@16379 myself,master - 0 1755030010000 1 connected 0-5460
4a4baa12df1ada0d81e27b9a1ecc7aa2f9cec0ff 10.200.45.205:6379@16379 slave 0d3db136c17df8b0681fe8e2436a4f99203f0507 0 1755030011000 5 connected
6f33cd693707f3109704ca23151a4f1b7716e380 10.200.210.77:6379@16379 slave 7a8f9abcbc779faebb10e2b5de9f00be0d7ad482 0 1755030011577 4 connected
7a8f9abcbc779faebb10e2b5de9f00be0d7ad482 10.200.11.136:6379@16379 master - 0 1755030010000 3 connected 10923-16383
931716e199e90bc313c3480661af530d1f32bf08 10.200.45.204:6379@16379 master - 0 1755030010971 2 connected 5461-10922
[OK] All 16384 slots covered.
📌 排错总结
问题 | 原因 | 解决方案 |
---|---|---|
redis-cli: unknown option '--cluster' |
redis:4.0.14 镜像不支持 |
使用 redis:6.2.6 的 redis-cli |
redis-trib.rb 不存在 |
已被移除 | 放弃使用 |
“same host” 错误 | 使用了 ClusterIP 服务 | 改用 Pod IP 初始化 |
5️⃣ 第五部分:验证集群状态
5.1 检查集群整体状态
kubectl exec -n cka redis-0 -- redis-cli -c cluster info
✅ 输出:
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_known_nodes:6
cluster_size:3
5.2 查看节点拓扑
kubectl exec -n cka redis-0 -- redis-cli -c cluster nodes
✅ 输出(节选):
0d3db136c17df8b0681fe8e2436a4f99203f0507 10.200.210.76:6379@16379 master - 0 1755030010000 1 connected 0-5460
931716e199e90bc313c3480661af530d1f32bf08 10.200.45.204:6379@16379 master - 0 1755030010971 2 connected 5461-10922
7a8f9abcbc779faebb10e2b5de9f00be0d7ad482 10.200.11.136:6379@16379 master - 0 1755030010000 3 connected 10923-16383
✅ 结论:3 主 3 从,集群健康。
✅ 总结
- **
redis.conf
已通过 ConfigMap 正确挂载** redis:4.0.14
官方镜像确实不支持--cluster
- 必须使用高版本
redis-cli
(如6.2.6
)进行集群管理 - 初始化必须使用 Pod IP,避免服务名解析问题
🎉 此文档为完整、真实、可执行的部署手册,可用于团队交付。
6️⃣ 第六部分:一键初始化脚本
init-redis-cluster.sh
#!/bin/bash
# init-redis-cluster.sh
# 一键初始化 Redis 集群(使用 Pod IP + 高版本 redis-cli)
# 命名空间:cka
set -e
NAMESPACE="cka"
APP_LABEL="app=redis"
REPLICA_COUNT=1
echo "🔍 获取 Redis Pod 及其 IP 地址..."
POD_IPS=($(kubectl get pods -n $NAMESPACE -l $APP_LABEL -o jsonpath='{.items[*].status.podIP}'))
POD_NAMES=($(kubectl get pods -n $NAMESPACE -l $APP_LABEL -o jsonpath='{.items[*].metadata.name}'))
if [ ${#POD_IPS[@]} -ne 6 ]; then
echo "❌ 错误:期望 6 个 Pod,但只找到 ${#POD_IPS[@]} 个"
kubectl get pods -n $NAMESPACE -l $APP_LABEL
exit 1
fi
echo "_Pods: ${POD_NAMES[*]}"
echo "_Pods IPs: ${POD_IPS[*]}"
echo "🚀 使用 redis:6.2.6 初始化 Redis 集群..."
kubectl run -it --rm --restart=Never \
--namespace $NAMESPACE \
redis-admin \
--image=redis:6.2.6 \
-- sh -c "
redis-cli --cluster create \
${POD_IPS[0]}:6379 \
${POD_IPS[1]}:6379 \
${POD_IPS[2]}:6379 \
${POD_IPS[3]}:6379 \
${POD_IPS[4]}:6379 \
${POD_IPS[5]}:6379 \
--cluster-replicas $REPLICA_COUNT \
--cluster-yes
"
echo "✅ Redis 集群初始化完成!"
✅ 使用方法:
chmod +x init-redis-cluster.sh
./init-redis-cluster.sh
✅ 总结
redis:4.0.14
官方镜像中的redis-cli
确实不支持--cluster
命令,这是部署中的关键障碍。- 必须使用高版本
redis-cli
(如redis:6.2.6
) 来管理 Redis 4.0.14 集群。 - 初始化时必须使用 Pod IP,避免 ClusterIP 导致的“同一主机”错误。
🎉 此文档真实还原了排错过程,可作为标准操作手册。