一、概览
RocketMQ Operator 用于管理部署在 Kubernetes 集群上的 RocketMQ 服务实例,它使用Operator SDK构建,Operator SDK 是Operator Framework的一部分。
二、快速开始
2.1 部署RocketMQ Operator
1)在 Kubernetes 集群主节点上克隆项目:
$ git clone https://github.com/apache/rocketmq-operator.git
$ cd rocketmq-operator
2)要在您的 Kubernetes 集群上部署 RocketMQ Operator,请运行以下命令:
$ make deploy
如果出现错误rocketmq-operator/bin/controller-gen: No such file or directory
,请运行go version
检查 Golang 版本,主版本应为 1.16。然后go mod tidy
在运行之前运行make deploy
。
安装go1.16版本
wget https://dl.google.com/go/go1.16.15.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.16.15.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.profile
source ~/.profile
查看go版本
root@k8s-master01:~/rocketmq-operator# go version
go version go1.16.15 linux/amd64
或者可以通过Helm部署 RocketMQ Operator :
$ helm install rocketmq-operator charts/rocketmq-operator
3)使用命令kubectl get pods
检查 RocketMQ Operator 部署状态,例如:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rocketmq-operator-564b5d75d-jllzk 1/1 Running 0 108s
如果发现找不到pod镜像,则运行以下命令在本地新建一个,镜像标签由IMG
参数指定。
$ make docker-build IMG=apache/rocketmq-operator:0.4.0-snapshot
2.2 定义RocketMQ集群
RocketMQ Operator 提供多种自定义资源(CRD)用于让用户定义需要部署的RocketMQ服务组件集群的规模、从节点数等相关配置,服务组件集群包括Name Server集群以及Broker集群。
克隆的rocketmq-operator.git里 example目录下有很多示例文件
root@k8s-master01:~/rocketmq-operator/example# ls
rocketmq_v1alpha1_broker_cr.yaml rocketmq_v1alpha1_console_cr.yaml rocketmq_v1alpha1_nameservice_cr.yaml rocketmq_v1alpha1_topictransfer_cr.yaml
rocketmq_v1alpha1_cluster_service.yaml rocketmq_v1alpha1_controller_cr.yaml rocketmq_v1alpha1_rocketmq_cluster.yaml
1)查看example
路径下的 rocketmq_v1alpha1_nameservice_cr.yaml
NameService自定义资源示例配置文件, 例如:
apiVersion: rocketmq.apache.org/v1alpha1
kind: NameService
metadata:
name: name-service
spec:
# size is the the name service instance number of the name service cluster
size: 1
# nameServiceImage is the customized docker image repo of the RocketMQ name service
nameServiceImage: apacherocketmq/rocketmq-nameserver:4.5.0-alpine-operator-0.3.0
# imagePullPolicy is the image pull policy
imagePullPolicy: Always
# hostNetwork can be true or false
hostNetwork: true
# Set DNS policy for the pod.
# Defaults to "ClusterFirst".
# Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
# DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
# To have DNS options set along with hostNetwork, you have to specify DNS policy
# explicitly to 'ClusterFirstWithHostNet'.
dnsPolicy: ClusterFirstWithHostNet
# resources describes the compute resource requirements and limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1024Mi"
cpu: "500m"
# storageMode can be EmptyDir, HostPath, StorageClass
storageMode: HostPath
# hostPath is the local path to store data
hostPath: /data/rocketmq/nameserver
# volumeClaimTemplates defines the storageClass
volumeClaimTemplates:
- metadata:
name: namesrv-storage
annotations:
volume.beta.kubernetes.io/storage-class: rocketmq-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
它定义了RocketMQ Name Server 集群的规模(size
)等。
2)检查example
下的rocketmq_v1alpha1_broker_cr.yaml
Broker自定义资源示例文件, 例如:
root@k8s-master01:~/rocketmq-operator/example# cat rocketmq_v1alpha1_broker_cr.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: broker-config
data:
BROKER_MEM: " -Xms2g -Xmx2g -Xmn1g "
broker-common.conf: |
# brokerClusterName, brokerName, brokerId are automatically generated by the operator and do not set it manually!!!
deleteWhen=04
fileReservedTime=48
flushDiskType=ASYNC_FLUSH
# set brokerRole to ASYNC_MASTER or SYNC_MASTER. DO NOT set to SLAVE because the replica instance will automatically be set!!!
brokerRole=ASYNC_MASTER
---
apiVersion: rocketmq.apache.org/v1alpha1
kind: Broker
metadata:
# name of broker cluster
name: broker
spec:
# size is the number of the broker cluster, each broker cluster contains a master broker and [replicaPerGroup] replica brokers.
size: 1
# nameServers is the [ip:port] list of name service
nameServers: ""
# replicaPerGroup is the number of each broker cluster
replicaPerGroup: 1
# brokerImage is the customized docker image repo of the RocketMQ broker
brokerImage: apacherocketmq/rocketmq-broker:4.5.0-alpine-operator-0.3.0
# imagePullPolicy is the image pull policy
imagePullPolicy: Always
# resources describes the compute resource requirements and limits
resources:
requests:
memory: "2048Mi"
cpu: "250m"
limits:
memory: "12288Mi"
cpu: "500m"
# allowRestart defines whether allow pod restart
allowRestart: true
# storageMode can be EmptyDir, HostPath, StorageClass
storageMode: HostPath
# hostPath is the local path to store data
hostPath: /data/rocketmq/broker
# scalePodName is [Broker name]-[broker group number]-master-0
scalePodName: broker-0-master-0
# env defines custom env, e.g. BROKER_MEM
env:
- name: BROKER_MEM
valueFrom:
configMapKeyRef:
name: broker-config
key: BROKER_MEM
# volumes defines the broker.conf
volumes:
- name: broker-config
configMap:
name: broker-config
items:
- key: broker-common.conf
path: broker-common.conf
# volumeClaimTemplates defines the storageClass
volumeClaimTemplates:
- metadata:
name: broker-storage
annotations:
volume.beta.kubernetes.io/storage-class: rocketmq-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 8Gi
它定义了RocketMQ Broker集群的规模,初始化Broker集群时Name Server集群的[IP:端口]列表等参数。
注:
size
指的是Broker组数,一个Broker组包含一个Broker主节点以及若干个(可以是0个)Broker从节点。
3)检查example
下的rocketmq_v1alpha1_console_cr.yaml
console自定义资源示例文件, 例如:
root@k8s-master01:~/rocketmq-operator/example# cat rocketmq_v1alpha1_console_cr.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: console-config
data:
application.yml: |
server:
port: 8080
servlet:
encoding:
charset: UTF-8
enabled: true
force: true
spring:
application:
name: rocketmq-dashboard
logging:
config: classpath:logback.xml
rocketmq:
config:
isVIPChannel:
timeoutMillis:
dataPath: /tmp/rocketmq-console/data
enableDashBoardCollect: true
msgTrackTopicName:
ticketKey: ticket
loginRequired: false
useTLS: false
threadpool:
config:
coreSize: 10
maxSize: 10
keepAliveTime: 3000
queueSize: 5000
role-permission.yml: |
rolePerms:
ordinary:
- /rocketmq/*.query
- /ops/*.query
- /dashboard/*.query
- /topic/*.query
- /topic/sendTopicMessage.do
- /producer/*.query
- /message/*.query
- /messageTrace/*.query
- /monitor/*.query
- /consumer/*.query
- /cluster/*.query
- /dlqMessage/*.query
- /dlqMessage/exportDlqMessage.do
- /dlqMessage/batchResendDlqMessage.do
- /acl/*.query
---
apiVersion: rocketmq.apache.org/v1alpha1
kind: Console
metadata:
name: console
namespace: default
spec:
# nameServers is the [ip:port] list of name service
nameServers: ""
# consoleDeployment define the console deployment
consoleDeployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: rocketmq-console
spec:
replicas: 1
selector:
matchLabels:
app: rocketmq-console
template:
metadata:
labels:
app: rocketmq-console
spec:
containers:
- name: console
image: apacherocketmq/rocketmq-console:2.0.0
args: ["--spring.config.location=/apps/data/console/config/"]
ports:
- containerPort: 8080
volumeMounts:
- mountPath: "/apps/data/console/config"
name: console-config
volumes:
- name: console-config
configMap:
name: console-config
这个文件定义了 Apache RocketMQ 的 Web 管理控制台(Console )
2.3 创建 RocketMQ 集群
说在部署之前,示例中的yaml文件默认都是使用HostPath的存储类型,如果我们本地k8s集群有storageClass的话,需要手动修改spec.storageMode: StorageClass,并在volumeClaimTemplates中指定storageClassName
1)部署Name Server集群,运行命令:
$ kubectl apply -f example/rocketmq_v1alpha1_nameservice_cr.yaml
nameservice.rocketmq.apache.org/name-service created
检查当前Kubernetes集群pod状态:
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
name-service-0 1/1 Running 0 6m59s 192.168.160.202 k8s-node-02 <none> <none>
rocketmq-operator-54b66c5c6b-5jngk 1/1 Running 0 31m 10.244.66.2 k8s-node-02 <none> <none>
可以看到有1个Name Server的pod(name-service-0
)启动了,以及它对应的IP地址。修改 rocketmq_v1alpha1_broker_cr.yaml
、rocketmq_v1alpha1_console_cr.yaml
文件中的nameServers
配置,将默认配置替换为此时您看到的Name Server所在pod的真实IP地址:9876。
如果您修改了
NameService
的size
默认配置,启动了多个Name Server,则nameServers
配置需修改为类似IP1:9876;IP2:9876
的这种列表形式。
2)部署Broker集群,运行命令:
$ kubectl apply -f example/rocketmq_v1alpha1_broker_cr.yaml
broker.rocketmq.apache.org/broker created
之后Broker集群的容器就会被自动创建好,最终查看集群pod状态会得到类似如下的输出:
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
broker-0-master-0 1/1 Running 0 6m12s
broker-0-replica-1-0 1/1 Running 0 6m12s
name-service-0 1/1 Running 0 16m
rocketmq-operator-54b66c5c6b-5jngk 1/1 Running 0 41m
3)部署console,运行命令
$ kubectl apply -f example/rocketmq_v1alpha1_console_cr.yaml
查看部署详情
root@k8s-master01:~/rocketmq-operator/example# kubectl get pod
NAME READY STATUS RESTARTS AGE
broker-0-master-0 1/1 Running 0 49m
broker-0-replica-1-0 1/1 Running 0 49m
console-549d547cb6-cknm2 1/1 Running 0 30m
name-service-0 1/1 Running 0 60m
rocketmq-operator-54b66c5c6b-5jngk 1/1 Running 3 (17m ago) 84m
4)部署service并访问RocketMQ 控制台。
$ kubectl apply -f example/rocketmq_v1alpha1_cluster_service.yaml
service/console-service created
查看svc部署详情
root@k8s-master01:~/rocketmq-operator# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
console-service NodePort 10.244.35.77 <none> 8080:30000/TCP 49s
然后就可以通过 URL 访问 RocketMQ 控制台访问 nodeip:30000
恭喜您成功通过RocketMQ Operator创建了您的RocketMQ服务集群!
三、水平扩缩容
3.1 Name Server 集群扩缩容
如果当前的Name Server集群规模不能满足您的需求,您可以通过RocketMQ Operator轻松地实现Name Server集群的扩缩容。
例如您希望扩大您的Name Server集群规模,可以通过修改NameService自定义资源声明文件例如rocketmq_v1alpha1_nameservice_cr.yaml
中的size
配置项来增大为您希望的服务实例数,例如从size: 1
修改为 size: 2
注意:如果您的Broker自定义资源声明文件中使用的镜像为4.5.0或更早的版本,您需要确保您的Broker自定义资源声明文件中设置了
allowRestart: true
,使得Broker可以通过滚动重启的方式来注册新扩容出来的Name Server,该过程不会影响集群对外的持续服务。如果allowRestart: false
,请改为allowRestart: true
并运行kubectl apply -f example/rocketmq_v1alpha1_broker_cr.yaml
以应用修改后的配置。
在修改了size
之后,只需简单地运行:
kubectl apply -f example/rocketmq_v1alpha1_nameservice_cr.yaml
之后新的Name Server就会被自动部署出来,与此同时Operator会自动通知所有的Broker去更新他们的Name Server列表参数,使得他们可以注册新的Name Server服务。
注意:在
allowRestart: true
配置下,Broker集群会逐步地滚动式重启服务以更新参数,这个过程对集群外部的生产者和消费者是不会感知到的。
3.2 Broker集群扩缩容
3.2.1 无顺序消息的Broker集群扩容
随着您业务的发展,原来的Broker集群规模可能无法满足您的生产需求,您可以通过RocketMQ Operator轻松地实现Broker集群的扩容:
- 修改Broker自定义资源声明文件中的
size
配置为您希望的Broker集群规模,例如从size: 1
修改为size: 2
- 配置源Broker pod,所谓源Broker是指把哪个pod上的Broker元数据(包括Topic信息和订阅信息)到新扩容出来的Broker。源Broker pod对应的配置项默认为:
...
# scalePodName is broker-[broker group number]-master-0
scalePodName: broker-0-master-0
...
表示将broker-0-master
的元数据同步给所有新扩容出来的Broker。
- 应用修改后的Broker自定义资源声明文件:
kubectl apply -f example/rocketmq_v1alpha1_broker_cr.yaml
之后Operator就会帮您自动创建出新扩容出的Broker组,并且在新扩容出来的Broker启动之前将元数据文件同步到每个新Broker上,因而新Broker启动之后会加载源Broker的元数据信息(包括Topic和订阅信息)。
四、Topic 迁移
Topic 迁移
是指用户希望将一个Topic的服务工作从一个源集群转移到另一个目标集群,并且在这个过程中不影响业务。这可能发生在用户想要停用源集群或减轻源集群的工作负载压力。 通常Topic 迁移
的过程分为以下7步:
- 添加要转移的Topic的所有消费者组到目标集群。
- 添加要转移的Topic到目标集群。
- 源集群对应Topic禁止写入新的消息
- 检查所有对应消费者组的消费进度状态,直到确认源集群中的所有消息都被消费了,没有堆积。
- 当确认源集群中的所有对应消息都被消费之后,删除源集群中的对应Topic
- 删除源集群中的所有对应消费者组
- 在目标集群中创建RETRY Topic
通过Operator提供的TopicTransfer
自定义资源可以帮助您自动完成Topic迁移的工作。只需简单地配置自定义资源声明文件example/rocketmq_v1alpha1_topictransfer_cr.yaml
:
apiVersion: rocketmq.apache.org/v1alpha1
kind: TopicTransfer
metadata:
name: topictransfer
spec:
# topic defines which topic to be transferred
topic: TopicTest
# sourceCluster define the source cluster
sourceCluster: broker-0
# targetCluster defines the target cluster
targetCluster: broker-1
然后应用 TopicTransfer
自定义资源:
$ kubectl apply -f example/rocketmq_v1alpha1_topictransfer_cr.yaml
之后Operator就会自动地帮您完成Topic迁移的工作。
如果在Topic迁移过程中遇到了错误,Operator将自动回滚所有的Topic迁移中间过程的操作,使得集群恢复到应用TopicTransfer
自定义资源之前的状态,以保证Topic迁移操作的原子性。
您可以通过查看Operator日志或通过RocketMQ的Admin工具来检查和验证当前Topic迁移过程的状态:
$ kubectl logs -f [operator-pod-name]
$ sh bin/mqadmin consumerprogress -g [consumer-group] -n [name-server-ip]:9876
五、环境清理
如果您想要下线RocketMQ的Broker集群,运行:
$ kubectl delete -f example/rocketmq_v1alpha1_broker_cr.yaml
如果您想要下线RocketMQ的Name Server集群,运行:
$ kubectl delete -f example/rocketmq_v1alpha1_nameservice_cr.yaml
如果您想要清理整个RocketMQ集群以及Operator,运行:
$ ./purge-operator.sh