0% found this document useful (0 votes)
284 views148 pages

MSKDev Guide

The Amazon Managed Streaming for Apache Kafka (MSK) Developer Guide provides comprehensive instructions for setting up and managing Amazon MSK, a fully managed service for processing streaming data with Apache Kafka. It includes tutorials on creating clusters, managing configurations, and monitoring performance, along with details on security, migration, and troubleshooting. The guide also outlines the architecture of Amazon MSK, including broker and ZooKeeper nodes, and emphasizes its ability to automatically recover from common failure scenarios.

Uploaded by

aakash suvarna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
284 views148 pages

MSKDev Guide

The Amazon Managed Streaming for Apache Kafka (MSK) Developer Guide provides comprehensive instructions for setting up and managing Amazon MSK, a fully managed service for processing streaming data with Apache Kafka. It includes tutorials on creating clusters, managing configurations, and monitoring performance, along with details on security, migration, and troubleshooting. The guide also outlines the architecture of Amazon MSK, including broker and ZooKeeper nodes, and emphasizes its ability to automatically recover from common failure scenarios.

Uploaded by

aakash suvarna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

Amazon Managed

Streaming for Apache Kafka


Developer Guide
Amazon Managed Streaming for
Apache Kafka Developer Guide

Amazon Managed Streaming for Apache Kafka: Developer Guide


Amazon Managed Streaming for
Apache Kafka Developer Guide

Table of Contents
Welcome ........................................................................................................................................... 1
What is Amazon MSK? ................................................................................................................ 1
Setting up ......................................................................................................................................... 4
Sign up for Amazon ................................................................................................................... 4
Download libraries and tools ....................................................................................................... 4
Getting started .................................................................................................................................. 5
Step 1: Create a cluster .............................................................................................................. 5
Step 2: Create a client machine ................................................................................................... 5
Step 3: Create a topic ................................................................................................................. 6
Step 4: Produce and consume data .............................................................................................. 7
Step 5: View metrics .................................................................................................................. 8
Step 6: Delete the resources ........................................................................................................ 8
How it works ................................................................................................................................... 10
Creating a cluster ..................................................................................................................... 10
Broker types .................................................................................................................... 10
Creating a cluster using the Amazon Web Services Management Console ................................. 11
Creating a cluster using the Amazon CLI ............................................................................. 12
Creating a cluster with a custom MSK configuration using the Amazon CLI ............................... 13
Creating a cluster using the API ......................................................................................... 13
Deleting a cluster ..................................................................................................................... 13
Deleting a cluster using the Amazon Web Services Management Console ................................. 14
Deleting a cluster using the Amazon CLI ............................................................................. 14
Deleting a cluster using the API ......................................................................................... 14
Getting the Apache ZooKeeper connection string ......................................................................... 14
Getting the Apache ZooKeeper connection string using the Amazon Web Services Management
Console ........................................................................................................................... 14
Getting the Apache ZooKeeper connection string using the Amazon CLI .................................. 14
Getting the Apache ZooKeeper connection string using the API .............................................. 15
Getting the bootstrap brokers .................................................................................................... 16
Getting the bootstrap brokers using the Amazon Web Services Management Console ................ 16
Getting the bootstrap brokers using the Amazon CLI ............................................................ 16
Getting the bootstrap brokers using the API ........................................................................ 16
Listing clusters ......................................................................................................................... 17
Listing clusters using the Amazon Web Services Management Console .................................... 17
Listing clusters using the Amazon CLI ................................................................................. 17
Listing clusters using the API ............................................................................................. 17
Provisioning storage throughput ................................................................................................ 17
Throughput bottlenecks .................................................................................................... 17
Measuring storage throughput ........................................................................................... 18
Configuration update ....................................................................................................... 18
Provisioning storage throughput using the Amazon Web Services Management Console ............. 18
Provisioning storage throughput using the Amazon CLI ......................................................... 19
Provisioning storage throughput using the API ..................................................................... 20
Scaling up broker storage .......................................................................................................... 20
Automatic scaling ............................................................................................................. 20
Manual scaling ................................................................................................................. 22
Updating the broker type .......................................................................................................... 22
Updating the broker type using the Amazon Web Services Management Console ...................... 23
Updating the broker type using the Amazon CLI .................................................................. 23
Updating the broker type using the API .............................................................................. 24
Updating the configuration of a cluster ....................................................................................... 24
Updating the configuration of a cluster using the Amazon CLI ................................................ 25
Updating the configuration of a cluster using the API ........................................................... 26
Expanding a cluster .................................................................................................................. 26

iii
Amazon Managed Streaming for
Apache Kafka Developer Guide

Expanding a cluster using the Amazon Web Services Management Console .............................. 26
Expanding a cluster using the Amazon CLI .......................................................................... 27
Expanding a cluster using the API ...................................................................................... 28
Updating security ..................................................................................................................... 28
Updating a cluster's security settings using the Amazon Web Services Management Console ....... 28
Updating a cluster's security settings using the Amazon CLI ................................................... 29
Updating a cluster's security settings using the API ............................................................... 30
Rebooting a broker for a cluster ................................................................................................ 30
Rebooting a broker using the Amazon Web Services Management Console ............................... 30
Rebooting a broker using the Amazon CLI ........................................................................... 30
Rebooting a broker using the API ....................................................................................... 30
Tagging a cluster ...................................................................................................................... 31
Tag basics ....................................................................................................................... 32
Tracking costs using tagging .............................................................................................. 32
Tag restrictions ................................................................................................................ 32
Tagging resources using the Amazon MSK API ..................................................................... 33
Configuration ................................................................................................................................... 34
Custom configurations .............................................................................................................. 34
Dynamic configuration ...................................................................................................... 39
Topic-level configuration ................................................................................................... 40
States ............................................................................................................................. 40
Default configuration ................................................................................................................ 40
Configuration operations ........................................................................................................... 42
Create configuration ......................................................................................................... 42
To update an MSK configuration ........................................................................................ 43
To delete an MSK configuration ......................................................................................... 44
To describe an MSK configuration ...................................................................................... 44
To describe an MSK configuration revision ........................................................................... 44
To list all MSK configurations in your account for the current Region ....................................... 45
MSK Serverless ................................................................................................................................ 47
Getting started tutorial ............................................................................................................. 47
Step 1: Create a cluster ..................................................................................................... 48
Step 2: Create an IAM role ................................................................................................ 49
Step 3: Create a client machine .......................................................................................... 50
Step 4: Create a topic ....................................................................................................... 51
Step 5: Produce and consume data ..................................................................................... 52
Step 6: Delete resources .................................................................................................... 52
Configuration ........................................................................................................................... 53
Monitoring ............................................................................................................................... 53
Cluster states ................................................................................................................................... 55
Security ........................................................................................................................................... 57
Data protection ........................................................................................................................ 57
Encryption ....................................................................................................................... 58
How do I get started with encryption? ................................................................................ 59
Authentication and authorization for Amazon MSK APIs ................................................................ 61
How Amazon MSK works with IAM ..................................................................................... 61
Identity-based policy examples .......................................................................................... 64
Service-linked roles .......................................................................................................... 67
Amazon managed policies ................................................................................................. 68
Troubleshooting ............................................................................................................... 72
Authentication and authorization for Apache Kafka APIs ............................................................... 73
IAM access control ............................................................................................................ 73
Mutual TLS authentication ................................................................................................. 81
SASL/SCRAM authentication .............................................................................................. 85
Apache Kafka ACLs ........................................................................................................... 88
Changing security groups .......................................................................................................... 89
Controlling access to Apache ZooKeeper ..................................................................................... 90

iv
Amazon Managed Streaming for
Apache Kafka Developer Guide

To place your Apache ZooKeeper nodes in a separate security group ....................................... 90


Using TLS security with Apache ZooKeeper .......................................................................... 91
Logging ................................................................................................................................... 92
Broker logs ...................................................................................................................... 92
CloudTrail events .............................................................................................................. 94
Compliance validation ............................................................................................................... 97
Resilience ................................................................................................................................ 97
Infrastructure security ............................................................................................................... 97
Connecting to an MSK cluster ............................................................................................................ 99
Public access ............................................................................................................................ 99
Access from within Amazon ..................................................................................................... 101
Amazon VPC peering ...................................................................................................... 102
Amazon Direct Connect ................................................................................................... 102
Amazon Transit Gateway ................................................................................................. 102
VPN connections ............................................................................................................ 102
REST proxies .................................................................................................................. 102
Multiple Region multi-VPC connectivity ............................................................................. 102
EC2-Classic .................................................................................................................... 102
Port information .................................................................................................................... 103
Migration ....................................................................................................................................... 104
Migrating your Apache Kafka cluster to Amazon MSK .................................................................. 104
Migrating from one Amazon MSK cluster to another ................................................................... 105
MirrorMaker 1.0 best practices ................................................................................................. 105
MirrorMaker 2.* advantages ..................................................................................................... 106
Monitoring a cluster ........................................................................................................................ 107
Amazon MSK metrics for monitoring with CloudWatch ................................................................ 107
DEFAULT Level monitoring .............................................................................................. 107
PER_BROKER Level monitoring ......................................................................................... 112
PER_TOPIC_PER_BROKER Level monitoring ....................................................................... 115
PER_TOPIC_PER_PARTITION Level monitoring ................................................................. 115
Viewing Amazon MSK metrics using CloudWatch ........................................................................ 116
Consumer-lag monitoring ........................................................................................................ 116
Open monitoring with Prometheus ........................................................................................... 117
Creating an Amazon MSK cluster with open monitoring enabled ........................................... 117
Enabling open monitoring for an existing Amazon MSK cluster ............................................. 117
Setting up a Prometheus host on an Amazon EC2 instance .................................................. 118
Prometheus metrics ........................................................................................................ 119
Storing Prometheus metrics in amazon managed service for Prometheus ............................... 119
Cruise Control ................................................................................................................................ 121
Quota ........................................................................................................................................... 123
Amazon MSK quota ................................................................................................................ 123
Quota for serverless clusters .................................................................................................... 123
MSK Connect quota ................................................................................................................ 124
Resources ...................................................................................................................................... 125
Apache Kafka versions .................................................................................................................... 126
Supported Apache Kafka versions ............................................................................................. 126
Apache Kafka version 3.2.0 .............................................................................................. 126
Apache Kafka version 3.1.1 .............................................................................................. 126
Apache Kafka version 2.8.1 .............................................................................................. 127
Apache Kafka version 2.8.0 .............................................................................................. 127
Apache Kafka version 2.7.2 .............................................................................................. 127
Apache Kafka version 2.7.1 .............................................................................................. 127
Apache Kafka version 2.6.3 .............................................................................................. 127
Apache Kafka version 2.6.2 [recommended] ....................................................................... 127
Apache Kafka version 2.7.0 .............................................................................................. 127
Apache Kafka version 2.6.1 .............................................................................................. 127
Apache Kafka version 2.6.0 .............................................................................................. 127

v
Amazon Managed Streaming for
Apache Kafka Developer Guide

Apache Kafka version 2.5.1 .............................................................................................. 127


Amazon MSK bug-fix version 2.4.1.1 ................................................................................. 128
Apache Kafka version 2.4.1 (use 2.4.1.1 instead) ................................................................. 128
Apache Kafka version 2.3.1 .............................................................................................. 129
Apache Kafka version 2.2.1 .............................................................................................. 129
Apache Kafka version 1.1.1 (for existing clusters only) ......................................................... 129
Updating the Apache Kafka version .......................................................................................... 129
Troubleshooting ............................................................................................................................. 132
Consumer group stuck in PreparingRebalance state ............................................................... 132
Static membership protocol ............................................................................................. 132
Identify and reboot ......................................................................................................... 133
Error delivering broker logs to Amazon CloudWatch Logs ............................................................ 133
No default security group ........................................................................................................ 133
Cluster appears stuck in the CREATING state .............................................................................. 134
Cluster state goes from CREATING to FAILED ............................................................................. 134
Cluster state is ACTIVE but producers cannot send data or consumers cannot receive data ................ 134
Amazon CLI doesn't recognize Amazon MSK .............................................................................. 134
Partitions go offline or replicas are out of sync .......................................................................... 134
Disk space is running low ........................................................................................................ 134
Memory running low .............................................................................................................. 134
Producer gets NotLeaderForPartitionException ........................................................................... 135
Under-replicated partitions (URP) greater than zero .................................................................... 135
Cluster has topics called __amazon_msk_canary and __amazon_msk_canary_state .......................... 135
Partition replication fails ......................................................................................................... 135
Unable to access cluster that has public access turned on ............................................................ 135
Unable to access cluster from within Amazon: Networking issues .................................................. 136
Amazon EC2 client and MSK cluster in the same VPC .......................................................... 136
Amazon EC2 client and MSK cluster in different VPCs .......................................................... 137
On-premises client .......................................................................................................... 137
Amazon Direct Connect ................................................................................................... 137
Failed authentication: Too many connects .................................................................................. 137
MSK Serverless: Cluster creation fails ........................................................................................ 137
Best practices ................................................................................................................................. 138
Right-size your cluster: Number of partitions per broker ............................................................. 138
Right-size your cluster: Number of brokers per cluster ................................................................ 138
Build highly available clusters .................................................................................................. 139
Monitor CPU usage ................................................................................................................. 139
Monitor disk space .................................................................................................................. 140
Adjust data retention parameters ............................................................................................. 140
Monitor Apache Kafka memory ................................................................................................ 141
Don't add non-MSK brokers ..................................................................................................... 141
Enable in-transit encryption ..................................................................................................... 141
Reassign partitions ................................................................................................................. 141
Amazon glossary ............................................................................................................................ 142

vi
Amazon Managed Streaming for
Apache Kafka Developer Guide
What is Amazon MSK?

Welcome to the Amazon MSK


Developer Guide
Welcome to the Amazon MSK Developer Guide. The following topics can help you get started using this
guide, based on what you're trying to do.

• Create an Amazon MSK cluster by following the Getting started using Amazon MSK (p. 5) tutorial.
• Dive deeper into the functionality of Amazon MSK in Amazon MSK: How it works (p. 10).
• Run Apache Kafka without having to manage and scale cluster capacity with MSK Serverless (p. 47).

For highlights, product details, and pricing, see the service page for Amazon MSK.

What is Amazon MSK?


Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that enables
you to build and run applications that use Apache Kafka to process streaming data. Amazon MSK
provides the control-plane operations, such as those for creating, updating, and deleting clusters. It
lets you use Apache Kafka data-plane operations, such as those for producing and consuming data. It
runs open-source versions of Apache Kafka. This means existing applications, tooling, and plugins from
partners and the Apache Kafka community are supported without requiring changes to application code.
You can use Amazon MSK to create clusters that use any of the Apache Kafka versions listed under the
section called “Supported Apache Kafka versions” (p. 126).

The following diagram provides an overview of how Amazon MSK works.

1
Amazon Managed Streaming for
Apache Kafka Developer Guide
What is Amazon MSK?

The diagram demonstrates the interaction between the following components:

• Broker nodes — When creating an Amazon MSK cluster, you specify how many broker nodes you want
Amazon MSK to create in each Availability Zone. In the example cluster shown in this diagram, there's
one broker per Availability Zone. Each Availability Zone has its own virtual private cloud (VPC) subnet.
• ZooKeeper nodes — Amazon MSK also creates the Apache ZooKeeper nodes for you. Apache
ZooKeeper is an open-source server that enables highly reliable distributed coordination.
• Producers, consumers, and topic creators — Amazon MSK lets you use Apache Kafka data-plane
operations to create topics and to produce and consume data.

2
Amazon Managed Streaming for
Apache Kafka Developer Guide
What is Amazon MSK?

• Cluster Operations You can use the Amazon Web Services Management Console, the Amazon
Command Line Interface (Amazon CLI), or the APIs in the SDK to perform control-plane operations. For
example, you can create or delete an Amazon MSK cluster, list all the clusters in an account, view the
properties of a cluster, and update the number and type of brokers in a cluster.

Amazon MSK detects and automatically recovers from the most common failure scenarios for clusters so
that your producer and consumer applications can continue their write and read operations with minimal
impact. When Amazon MSK detects a broker failure, it mitigates the failure or replaces the unhealthy
or unreachable broker with a new one. In addition, where possible, it reuses the storage from the older
broker to reduce the data that Apache Kafka needs to replicate. Your availability impact is limited to the
time required for Amazon MSK to complete the detection and recovery. After a recovery, your producer
and consumer apps can continue to communicate with the same broker IP addresses that they used
before the failure.

3
Amazon Managed Streaming for
Apache Kafka Developer Guide
Sign up for Amazon

Setting up Amazon MSK


Before you use Amazon MSK for the first time, complete the following tasks.

Tasks
• Sign up for Amazon (p. 4)
• Download libraries and tools (p. 4)

Sign up for Amazon


When you sign up for Amazon, your Amazon Web Services account is automatically signed up for all
services in Amazon, including Amazon MSK. You are charged only for the services that you use.

If you have an Amazon account already, skip to the next task. If you don't have an Amazon account, use
the following procedure to create one.

To sign up for an Amazon Web Services account

1. Open https://portal.amazonaws.cn/billing/signup.
2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a verification code on the
phone keypad.

Download libraries and tools


The following libraries and tools can help you work with Amazon MSK:

• The Amazon Command Line Interface (Amazon CLI) supports Amazon MSK. The Amazon CLI enables
you to control multiple Amazon Web Services from the command line and automate them through
scripts. Upgrade your Amazon CLI to the latest version to ensure that it has support for the Amazon
MSK features that are documented in this user guide. For detailed instructions on how to upgrade the
Amazon CLI, see Installing the Amazon Command Line Interface. After you install the Amazon CLI, you
must configure it. For information on how to configure the Amazon CLI, see aws configure.
• The Amazon Managed Streaming for Kafka API Reference documents the API operations that Amazon
MSK supports.
• The Amazon Web Services SDKs for Go, Java, JavaScript, .NET, Node.js, PHP, Python, and Ruby include
Amazon MSK support and samples.

4
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 1: Create a cluster

Getting started using Amazon MSK


This tutorial shows you an example of how you can create an MSK cluster, produce and consume data,
and monitor the health of your cluster using metrics. This example doesn't represent all the options you
can choose when you create an MSK cluster. In different parts of this tutorial, we choose default options
for simplicity. This doesn't mean that they're the only options that work for setting up an MSK cluster or
client instances.

Topics
• Step 1: Create an Amazon MSK cluster (p. 5)
• Step 2: Create a client machine (p. 5)
• Step 3: Create a topic (p. 6)
• Step 4: Produce and consume data (p. 7)
• Step 5: Use Amazon CloudWatch to view Amazon MSK metrics (p. 8)
• Step 6: Delete the Amazon resources created for this tutorial (p. 8)

Step 1: Create an Amazon MSK cluster


In this step of Getting Started Using Amazon MSK (p. 5), you create an Amazon MSK cluster.

To create an Amazon MSK cluster using the Amazon Web Services Management Console

1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. Choose Create cluster.
3. For Creation method, leave the Quick create option selected. The Quick create option lets you
create a cluster with default settings.
4. For Cluster name, enter a descriptive name for your cluster. For example, MSKTutorialCluster.
5. For General cluster properties, choose Provisioned as the Cluster type.
6. From the table under All cluster settings, copy the values of the following settings and save them
because you need them later in this tutorial:

• VPC
• Subnets
• Security groups associated with VPC
7. Choose Create cluster.
8. Check the cluster Status on the Cluster summary page. The status changes from Creating to Active
as Amazon MSK provisions the cluster. When the status is Active, you can connect to the cluster. For
more information about cluster status, see Cluster states (p. 55).

Next Step

Step 2: Create a client machine (p. 5)

Step 2: Create a client machine


In this step of Getting Started Using Amazon MSK (p. 5), you create a client machine. You use this
client machine to create a topic that produces and consumes data. For simplicity, you'll create this client

5
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 3: Create a topic

machine in the VPC that is associated with the MSK cluster so that the client can easily connect to the
cluster.

To create a client machine

1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.


2. Choose Launch instances.
3. Enter a Name for your client machine, such as MSKTutorialClient.
4. Leave Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type selected for Amazon Machine
Image (AMI) type.
5. Leave the t2.micro instance type selected.
6. Under Key pair (login), choose Create a new key pair. Enter MSKKeyPair for Key pair name, and
then choose Download Key Pair. Alternatively, you can use an existing key pair.
7. Choose Launch instance.
8. Choose View Instances. Then, in the Security Groups column, choose the security group that is
associated with your new instance. Copy the ID of the security group, and save it for later.
9. Open the Amazon VPC console at https://console.amazonaws.cn/vpc/.
10. In the navigation pane, choose Security Groups. Find the security group whose ID you saved in the
section called “Step 1: Create a cluster” (p. 5).
11. In the Inbound Rules tab, choose Edit inbound rules.
12. Choose Add rule.
13. In the new rule, choose All traffic in the Type column. In the second field in the Source column,
select the security group of your client machine. This is the group whose name you saved after you
launched the client machine instance.
14. Choose Save rules. Now the cluster's security group can accept traffic that comes from the client
machine's security group.

Next Step

Step 3: Create a topic (p. 6)

Step 3: Create a topic


In this step of Getting Started Using Amazon MSK (p. 5), you install Apache Kafka client libraries and
tools on the client machine, and then you create a topic.

To create a topic on the client machine

1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.


2. In the navigation pane, choose Instances. Then select the check box beside the name of the client
machine that you created in Step 2: Create a client machine (p. 5).
3. Choose Actions, and then choose Connect. Follow the instructions in the console to connect to your
client machine.
4. Install Java on the client machine by running the following command:

sudo yum install java-1.8.0

5. Run the following command to download Apache Kafka.

wget https://archive.apache.org/dist/kafka/2.6.2/kafka_2.12-2.6.2.tgz

6
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 4: Produce and consume data

Note
If you want to use a mirror site other than the one used in this command, you can choose a
different one on the Apache website.
6. Run the following command in the directory where you downloaded the TAR file in the previous
step.

tar -xzf kafka_2.12-2.6.2.tgz

7. Go to the kafka_2.12-2.6.2 directory.


8. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
9. Wait for the status of your cluster to become Active. This might take several minutes. After the
status becomes Active, choose the cluster name. This takes you to a page containing the cluster
summary.
10. Choose View client information.
11. Copy the connection string for plaintext authentication.
12. Run the following command, replacing BootstrapServerString with the connection string that
you obtained in the previous step.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-


server BootstrapServerString --replication-factor 3 --partitions 1 --topic
MSKTutorialTopic

If the command succeeds, you see the following message: Created topic MSKTutorialTopic.

Next Step

Step 4: Produce and consume data (p. 7)

Step 4: Produce and consume data


In this step of Getting Started Using Amazon MSK (p. 5), you produce and consume data.

To produce and consume messages

1. Go to the bin folder of the Apache Kafka installation on the client machine, and create a text file
named client.properties with the following contents.

security.protocol=PLAINTEXT

2. Run the following command to start a console producer. Replace BootstrapServerString


with the plaintext connection string that you obtained in the section called “Step 3: Create a
topic” (p. 6). For instructions on how to retrieve this connection string, see Getting the bootstrap
brokers for an Amazon MSK cluster (p. 16).

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-
list BootstrapServerString --producer.config client.properties --topic MSKTutorialTopic

3. Enter any message that you want, and press Enter. Repeat this step two or three times. Every
time you enter a line and press Enter, that line is sent to your Apache Kafka cluster as a separate
message.
4. Keep the connection to the client machine open, and then open a second, separate connection to
that machine in a new window.

7
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 5: View metrics

5. In the following command, replace BootstrapServerString with the plaintext connection string
that you saved earlier. Then, to create a console consumer, run the following command with your
second connection to the client machine.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-
server BootstrapServerString --consumer.config client.properties --topic
MSKTutorialTopic --from-beginning

You start seeing the messages you entered earlier when you used the console producer command.
6. Enter more messages in the producer window, and watch them appear in the consumer window.

Next Step

Step 5: Use Amazon CloudWatch to view Amazon MSK metrics (p. 8)

Step 5: Use Amazon CloudWatch to view Amazon


MSK metrics
In this step of Getting Started Using Amazon MSK (p. 5), you look at the Amazon MSK metrics in
Amazon CloudWatch.

To view Amazon MSK metrics in CloudWatch

1. Open the CloudWatch console at https://console.amazonaws.cn/cloudwatch/.


2. In the navigation pane, choose Metrics.
3. Choose the All metrics tab, and then choose AWS/Kafka.
4. To view broker-level metrics, choose Broker ID, Cluster Name. For cluster-level metrics, choose
Cluster Name.
5. (Optional) In the graph pane, select a statistic and a time period, and then create a CloudWatch
alarm using these settings.

Next Step

Step 6: Delete the Amazon resources created for this tutorial (p. 8)

Step 6: Delete the Amazon resources created for


this tutorial
In the final step of Getting Started Using Amazon MSK (p. 5), you delete the MSK cluster and the
client machine that you created for this tutorial.

To delete the resources using the Amazon Web Services Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.


2. Choose the name of your cluster. For example, MSKTutorialCluster.
3. Choose Actions, then choose Delete.
4. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.

8
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 6: Delete the resources

5. Choose the instance that you created for your client machine, for example, MSKTutorialClient.
6. Choose Instance state, then choose Terminate instance.

9
Amazon Managed Streaming for
Apache Kafka Developer Guide
Creating a cluster

Amazon MSK: How it works


An Amazon MSK cluster is the primary Amazon MSK resource that you can create in your account. The
topics in this section describe how to perform common Amazon MSK operations. For a list of all the
operations that you can perform on an MSK cluster, see the following:

• The Amazon Web Services Management Console


• The Amazon MSK API Reference
• The Amazon MSK CLI Command Reference

Topics
• Creating an Amazon MSK cluster (p. 10)
• Deleting an Amazon MSK cluster (p. 13)
• Getting the Apache ZooKeeper connection string for an Amazon MSK cluster (p. 14)
• Getting the bootstrap brokers for an Amazon MSK cluster (p. 16)
• Listing Amazon MSK clusters (p. 17)
• Provisioning storage throughput (p. 17)
• Scaling up broker storage (p. 20)
• Updating the broker type (p. 22)
• Updating the configuration of an Amazon MSK cluster (p. 24)
• Expanding an Amazon MSK cluster (p. 26)
• Updating a cluster's security settings (p. 28)
• Rebooting a broker for an Amazon MSK cluster (p. 30)
• Tagging an Amazon MSK cluster (p. 31)

Creating an Amazon MSK cluster


Important
You can't change the VPC for an Amazon MSK cluster after you create the cluster.

Before you can create an Amazon MSK cluster you need to have an Amazon Virtual Private Cloud (VPC)
and set up subnets within that VPC.

You need two subnets in two different Availability Zones in the US West (N. California) Region. In all
other Regions where Amazon MSK is available, you can specify either two or three subnets. Your subnets
must all be in different Availability Zones. When you create a cluster, Amazon MSK distributes the broker
nodes evenly over the subnets that you specify.

Broker types
When you create an Amazon MSK cluster, you specify the type of brokers that you want it to have.
Amazon MSK supports the following broker types:

• kafka.t3.small
• kafka.m5.large, kafka.m5.xlarge, kafka.m5.2xlarge, kafka.m5.4xlarge, kafka.m5.8xlarge,
kafka.m5.12xlarge, kafka.m5.16xlarge, kafka.m5.24xlarge

10
Amazon Managed Streaming for
Apache Kafka Developer Guide
Creating a cluster using the Amazon
Web Services Management Console
M5 brokers have higher baseline throughput performance than T3 brokers and are recommended for
production workloads. M5 brokers can also have more partitions per broker than T3 brokers. Use M5
brokers if you are running larger production-grade workloads or require a greater number of partitions.
To learn more about M5 instance types, see Amazon EC2 M5 Instances.

T3 brokers have the ability to use CPU credits to temporarily burst performance. Use T3 brokers for
low-cost development, if you are testing small to medium streaming workloads, or if you have low-
throughput streaming workloads that experience temporary spikes in throughput. We recommend
that you run a proof-of-concept test to determine if T3 brokers are sufficient for production or critical
workload. To learn more about T3 instance types, see Amazon EC2 T3 Instances.

For more information on how to choose broker types, see Best practices (p. 138).

Creating a cluster using the Amazon Web Services


Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose Create cluster.
3. Specify a name for the cluster.
4. In the VPC list, choose the VPC you want to use for the cluster. You can also specify which version of
Apache Kafka you want Amazon MSK to use to create the cluster.
5. Specify two subnets if you're using one of the following Regions: South America (São Paulo), Canada
(Central), and US West (N. California). In other Regions where Amazon MSK is available, you can
specify either two or three subnets. The subnets that you specify must be in different Availability
Zones.
6. Choose the kind of configuration you want. For information about MSK configurations, see
Configuration (p. 34).
7. Specify the type and number of brokers you want MSK to create in each Availability Zone. The
minimum is one broker per Availability Zone and the maximum is 30 brokers per cluster.
8. (Optional) Assign tags to your cluster. Tags are optional. For more information, see the section called
“Tagging a cluster” (p. 31).
9. You can adjust the storage volume per broker. After you create the cluster, you can increase the
storage volume per broker but you can't decrease it.
10. Choose the settings you want for encrypting data in transit. By default, MSK encrypts data as it
transits between brokers within a cluster. If you don't want to encrypt data as it transits between
brokers, clear the check box labeled Enable encryption within the cluster.
11. Choose one of the three settings for encrypting data as it transits between clients and brokers. For
more information, see the section called “Encryption in transit” (p. 58).
12. Choose the kind of KMS key that you want to use for encrypting data at rest. For more information,
see the section called “Encryption at rest” (p. 58).
13. If you want to authenticate the identity of clients, choose Enable TLS client authentication by
selecting the box next to it. For more information about authentication, see the section called
“Mutual TLS authentication” (p. 81).
14. Choose the monitoring level you want. This determines the set of metrics you get. For more
information, see Monitoring a cluster (p. 107).
15. (Optional) Choose Advanced settings, and then choose Customize settings. You can specify one or
more security groups that you want to give access to your cluster (for example, the security groups
of client machines). If you specify security groups that were shared with you, you must ensure
that you have permissions to them. Specifically, you need the ec2:DescribeSecurityGroups
permission. For an example, see Amazon EC2: Allows Managing EC2 Security Groups Associated With
a Specific VPC, Programmatically and in the Console.
16. Choose Create cluster.

11
Amazon Managed Streaming for
Apache Kafka Developer Guide
Creating a cluster using the Amazon CLI

17. Check the cluster Status on the Cluster summary page. The status changes from Creating to Active
as Amazon MSK provisions the cluster. When the status is Active, you can connect to the cluster. For
more information about cluster status, see Cluster states (p. 55).

Creating a cluster using the Amazon CLI


1. Copy the following JSON and save it to a file. Name the file brokernodegroupinfo.json. Replace
the subnet IDs in the JSON with the values that correspond to your subnets. These subnets must
be in different Availability Zones. Replace "Security-Group-ID" with the ID of one or more
security groups of the client VPC. Clients associated with these security groups get access to the
cluster. If you specify security groups that were shared with you, you must ensure that you have
permissions to them. Specifically, you need the ec2:DescribeSecurityGroups permission. For
an example, see Amazon EC2: Allows Managing EC2 Security Groups Associated With a Specific VPC,
Programmatically and in the Console. Finally, save the updated JSON file on the computer where
you have the Amazon CLI installed.

{
"InstanceType": "kafka.m5.large",
"ClientSubnets": [
"Subnet-1-ID",
"Subnet-2-ID"
],
"SecurityGroups": [
"Security-Group-ID"
]
}

Important
Specify exactly two subnets if you are using one of the following Regions: South America
(São Paulo), Canada (Central), and US West (N. California). For other Regions where Amazon
MSK is available, you can specify either two or three subnets. The subnets that you specify
must be in distinct Availability Zones. When you create a cluster, Amazon MSK distributes
the broker nodes evenly across the subnets that you specify.
2. Run the following Amazon CLI command in the directory where you saved the
brokernodegroupinfo.json file, replacing "Your-Cluster-Name" with a name of your
choice. For "Monitoring-Level", you can specify one of the following three values: DEFAULT,
PER_BROKER, or PER_TOPIC_PER_BROKER. For information about these three different levels of
monitoring, see ??? (p. 107). The enhanced-monitoring parameter is optional. If you don't
specify it in the create-cluster command, you get the DEFAULT level of monitoring.

aws kafka create-cluster --cluster-name "Your-Cluster-Name" --broker-node-group-info


file://brokernodegroupinfo.json --kafka-version "2.2.1" --number-of-broker-nodes 3 --
enhanced-monitoring "Monitoring-Level"

The output of the command looks like the following JSON:

{
"ClusterArn": "...",
"ClusterName": "AWSKafkaTutorialCluster",
"State": "CREATING"
}

Note
The create-cluster command might return an error stating that one or more subnets
belong to unsupported Availability Zones. When this happens, the error indicates which

12
Amazon Managed Streaming for
Apache Kafka Developer Guide
Creating a cluster with a custom MSK
configuration using the Amazon CLI
Availability Zones are unsupported. Create subnets that don't use the unsupported
Availability Zones and try the create-cluster command again.
3. Save the value of the ClusterArn key because you need it to perform other actions on your cluster.
4. Run the following command to check your cluster STATE. The STATE value changes from CREATING
to ACTIVE as Amazon MSK provisions the cluster. When the state is ACTIVE, you can connect to the
cluster. For more information about cluster status, see Cluster states (p. 55).

aws kafka describe-cluster --cluster-arn <your-cluster-ARN>

Creating a cluster with a custom MSK configuration


using the Amazon CLI

For information about custom MSK configurations and how to create them, see Configuration (p. 34).

1. Save the following JSON to a file, replacing configuration-arn with the ARN of the configuration
that you want to use to create the cluster.

{
"Arn": configuration-arn,
"Revision": 1
}

2. Run the create-cluster command and use the configuration-info option to point to the
JSON file you saved in the previous step. The following is an example.

aws kafka create-cluster --cluster-name ExampleClusterName --broker-node-group-info


file://brokernodegroupinfo.json --kafka-version "1.1.1" --number-of-broker-nodes 3 --
enhanced-monitoring PER_TOPIC_PER_BROKER --configuration-info file://configuration.json

The following is an example of a successful response after running this command.

{
"ClusterArn": "arn:aws:kafka:us-east-1:123456789012:cluster/
CustomConfigExampleCluster/abcd1234-abcd-dcba-4321-a1b2abcd9f9f-2",
"ClusterName": "CustomConfigExampleCluster",
"State": "CREATING"
}

Creating a cluster using the API


To create a cluster using the API, see CreateCluster.

Deleting an Amazon MSK cluster


Note
If your cluster has an auto-scaling policy, we recommend that you remove the policy before you
delete the cluster. For more information, see Automatic scaling (p. 20).

13
Amazon Managed Streaming for
Apache Kafka Developer Guide
Deleting a cluster using the Amazon
Web Services Management Console

Deleting a cluster using the Amazon Web Services


Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster that you want to delete by selecting the check box next to it.
3. Choose Delete, and then confirm deletion.

Deleting a cluster using the Amazon CLI


Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that you
obtained when you created your cluster. If you don't have the ARN for your cluster, you can find it by
listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

aws kafka delete-cluster --cluster-arn ClusterArn

Deleting a cluster using the API


To delete a cluster using the API, see DeleteCluster.

Getting the Apache ZooKeeper connection string


for an Amazon MSK cluster
Getting the Apache ZooKeeper connection string
using the Amazon Web Services Management
Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. The table shows all the clusters for the current region under this account. Choose the name of a
cluster to view its description.
3. On the Cluster summary page, choose View client information. This shows you the bootstrap
brokers, as well as the Apache ZooKeeper connection string.

Getting the Apache ZooKeeper connection string


using the Amazon CLI
1. If you don't know the Amazon Resource Name (ARN) of your cluster, you can find it by listing all the
clusters in your account. For more information, see the section called “Listing clusters” (p. 17).
2. To get the Apache ZooKeeper connection string, along with other information about your cluster,
run the following command, replacing ClusterArn with the ARN of your cluster.

aws kafka describe-cluster --cluster-arn ClusterArn

14
Amazon Managed Streaming for
Apache Kafka Developer Guide
Getting the Apache ZooKeeper
connection string using the API
The output of this describe-cluster command looks like the following JSON example.

{
"ClusterInfo": {
"BrokerNodeGroupInfo": {
"BrokerAZDistribution": "DEFAULT",
"ClientSubnets": [
"subnet-0123456789abcdef0",
"subnet-2468013579abcdef1",
"subnet-1357902468abcdef2"
],
"InstanceType": "kafka.m5.large",
"StorageInfo": {
"EbsStorageInfo": {
"VolumeSize": 1000
}
}
},
"ClusterArn": "arn:aws:kafka:us-east-1:111122223333:cluster/
testcluster/12345678-abcd-4567-2345-abcdef123456-2",
"ClusterName": "testcluster",
"CreationTime": "2018-12-02T17:38:36.75Z",
"CurrentBrokerSoftwareInfo": {
"KafkaVersion": "2.2.1"
},
"CurrentVersion": "K13V1IB3VIYZZH",
"EncryptionInfo": {
"EncryptionAtRest": {
"DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:555555555555:key/12345678-
abcd-2345-ef01-abcdef123456"
}
},
"EnhancedMonitoring": "DEFAULT",
"NumberOfBrokerNodes": 3,
"State": "ACTIVE",
"ZookeeperConnectString": "10.0.1.101:2018,10.0.2.101:2018,10.0.3.101:2018"
}
}

The previous JSON example shows the ZookeeperConnectString key in the output of the
describe-cluster command. Copy the value corresponding to this key and save it for when you
need to create a topic on your cluster.
Important
Your Amazon MSK cluster must be in the ACTIVE state for you to be able to obtain the
Apache ZooKeeper connection string. When a cluster is still in the CREATING state, the
output of the describe-cluster command doesn't include ZookeeperConnectString.
If this is the case, wait a few minutes and then run the describe-cluster again after
your cluster reaches the ACTIVE state.

Getting the Apache ZooKeeper connection string


using the API
To get the Apache ZooKeeper connection string using the API, see DescribeCluster.

15
Amazon Managed Streaming for
Apache Kafka Developer Guide
Getting the bootstrap brokers

Getting the bootstrap brokers for an Amazon MSK


cluster
Getting the bootstrap brokers using the Amazon Web
Services Management Console
The term bootstrap brokers refers to a list of brokers that an Apache Kafka client can use as a starting
point to connect to the cluster. This list doesn't necessarily include all of the brokers in a cluster.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.


2. The table shows all the clusters for the current region under this account. Choose the name of a
cluster to view its description.
3. On the Cluster summary page, choose View client information. This shows you the bootstrap
brokers, as well as the Apache ZooKeeper connection string.

Getting the bootstrap brokers using the Amazon CLI


Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that you
obtained when you created your cluster. If you don't have the ARN for your cluster, you can find it by
listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

aws kafka get-bootstrap-brokers --cluster-arn ClusterArn

For an MSK cluster that uses the section called “IAM access control” (p. 73), the output of this
command looks like the following JSON example.

{
"BootstrapBrokerStringSaslIam": "b-1.myTestCluster.123z8u.c2.kafka.us-
west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098"
}

The following example shows the bootstrap brokers for a cluster that has public access
turned on. Use the BootstrapBrokerStringPublicSaslIam for public access, and the
BootstrapBrokerStringSaslIam string for access from within Amazon.

{
"BootstrapBrokerStringPublicSaslIam": "b-2-public.myTestCluster.v4ni96.c2.kafka-
beta.us-east-1.amazonaws.com:9198,b-1-public.myTestCluster.v4ni96.c2.kafka-
beta.us-east-1.amazonaws.com:9198,b-3-public.myTestCluster.v4ni96.c2.kafka-beta.us-
east-1.amazonaws.com:9198",
"BootstrapBrokerStringSaslIam": "b-2.myTestCluster.v4ni96.c2.kafka-
beta.us-east-1.amazonaws.com:9098,b-1.myTestCluster.v4ni96.c2.kafka-
beta.us-east-1.amazonaws.com:9098,b-3.myTestCluster.v4ni96.c2.kafka-beta.us-
east-1.amazonaws.com:9098"
}

The bootstrap brokers string should contain three brokers from across the Availability Zones in which
your MSK cluster is deployed (unless only two brokers are available).

Getting the bootstrap brokers using the API


To get the bootstrap brokers using the API, see GetBootstrapBrokers.

16
Amazon Managed Streaming for
Apache Kafka Developer Guide
Listing clusters

Listing Amazon MSK clusters


Listing clusters using the Amazon Web Services
Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. The table shows all the clusters for the current region under this account. Choose the name of a
cluster to view its details.

Listing clusters using the Amazon CLI


Run the following command.

aws kafka list-clusters

Listing clusters using the API


To list clusters using the API, see ListClusters.

Provisioning storage throughput


Amazon MSK brokers persist data on storage volumes. Storage I/O is consumed when producers write
to the cluster, when data is replicated between brokers, and when consumers read data that isn't in
memory. The volume storage throughput is the rate at which data can be written into and read from a
storage volume. Provisioned storage throughput is the ability to specify that rate for the brokers in your
cluster.

You can specify the provisioned throughput rate in MiB per second for clusters whose brokers are of type
kafka.m5.4xlarge or larger and if the storage volume is 10 GiB or greater. It is possible to specify
provisioned throughput during cluster creation. You can also enable or disable provisioned throughput
for a cluster that is in the ACTIVE state.

Throughput bottlenecks
There are multiple causes of bottlenecks in broker throughput: volume throughput, EC2-EBS network
throughput, and EC2 egress throughput. You can enable provisioned storage throughput to adjust
volume throughput. However, broker throughput limitations can be caused by EC2-EBS network
throughput and EC2 egress throughput.

EC2 egress throughput is impacted by the number of consumer groups and consumers per consumer
groups. Also, both EC2-EBS network throughput and EC2 egress throughput are higher for larger broker
types, as shown in the following table.

Broker type EC2-EBS network throughput (MBps)

kafka.m5.4xlarge 593.75

kafka.m5.8xlarge 850

kafka.m5.12xlarge 1187.5

17
Amazon Managed Streaming for
Apache Kafka Developer Guide
Measuring storage throughput

Broker type EC2-EBS network throughput (MBps)

kafka.m5.16xlarge 1700

kafka.m5.24xlarge 2375

Measuring storage throughput


You can use the VolumeReadBytes and VolumeWriteBytes metrics to measure the average storage
throughput of a cluster. The sum of these two metrics gives the average storage throughput in bytes.
To get the average storage throughput for a cluster, set these two metrics to SUM and the period to 1
minute, then use the following formula.

Average storage throughput in MiB/s = (Sum(VolumeReadBytes) + Sum(VolumeWriteBytes)) / (60


* 1024 * 1024)

For information about the VolumeReadBytes and VolumeWriteBytes metrics, see the section called
“PER_BROKER Level monitoring” (p. 112).

Configuration update
You can update your Amazon MSK configuration either before or after you turn on provisioned
throughput. However, you won't see the desired throughput until you perform both actions: update the
num.replica.fetchers configuration parameter and turn on provisioned throughput.

In the default Amazon MSK configuration, num.replica.fetchers has a value of 2. To update your
num.replica.fetchers, you can use the suggested values from the following table. These values are
for guidance purposes. We recommend that you adjust these values based on your use case.

Broker type num.replica.fetchers

kafka.m5.4xlarge 4

kafka.m5.8xlarge 8

kafka.m5.12xlarge 14

kafka.m5.16xlarge 16

kafka.m5.24xlarge 16

Your updated configuration may not take effect for up to 24 hours, and may take longer when a source
volume is not fully utilized. However, transitional volume performance at least equals the performance
of source storage volumes during the migration period. A fully-utilized 1 TiB volume typically takes
about six hours to migrate to an updated configuration.

Provisioning storage throughput using the Amazon


Web Services Management Console
1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. Choose Create cluster.

18
Amazon Managed Streaming for
Apache Kafka Developer Guide
Provisioning storage throughput using the Amazon CLI

3. Choose Custom create.


4. Specify a name for the cluster.
5. In the Storage section, choose Enable.
6. Choose a value for storage throughput per broker.
7. Choose a VPC, zones and subnets, and a security group.
8. Choose Next.
9. At the bottom of the Security step, choose Next.
10. At the bottom of the Monitoring and tags step, choose Next.
11. Review the cluster settings, then choose Create cluster.

Provisioning storage throughput using the Amazon


CLI
This section shows an example of how you can use the Amazon CLI to create a cluster with provisioned
throughput enabled.

1. Copy the following JSON and paste it into a file. Replace the subnet IDs and security group ID
placeholders with values from your account. Name the file cluster-creation.json and save it.

{
"Provisioned": {
"BrokerNodeGroupInfo":{
"InstanceType":"kafka.m5.4xlarge",
"ClientSubnets":[
"Subnet-1-ID",
"Subnet-2-ID"
],
"SecurityGroups":[
"Security-Group-ID"
],
"StorageInfo": {
"EbsStorageInfo": {
"VolumeSize": 10,
"ProvisionedThroughput": {
"Enabled": true,
"VolumeThroughput": 250
}
}
}
},
"EncryptionInfo": {
"EncryptionInTransit": {
"InCluster": false,
"ClientBroker": "PLAINTEXT"
}
},
"KafkaVersion":"2.2.1",
"NumberOfBrokerNodes": 2
},
"ClusterName": "provisioned-throughput-example"
}

2. Run the following Amazon CLI command from the directory where you saved the JSON file in the
previous step.

aws kafka create-cluster-v2 --cli-input-json file://cluster-creation.json

19
Amazon Managed Streaming for
Apache Kafka Developer Guide
Provisioning storage throughput using the API

Provisioning storage throughput using the API


To configure provisioned storage throughput while creating a cluster, use CreateClusterV2.

Scaling up broker storage


You can increase the amount of EBS storage per broker. You can't decrease the storage.

Storage volumes remain available during this scaling-up operation.


Important
When storage is scaled for an MSK cluster, the additional storage is made available right away.
However, the cluster requires a cool-down period after every storage scaling event. Amazon
MSK uses this cool-down period to optimize the cluster before it can be scaled again. This period
can range from a minimum of 6 hours to over 24 hours, depending on the cluster's storage size
and utilization and on traffic. This is applicable for both auto scaling events and manual scaling
using the UpdateBrokerStorage operation. For information about right-sizing your storage, see
Best practices (p. 138).

Topics
• Automatic scaling (p. 20)
• Manual scaling (p. 22)

Automatic scaling
To automatically expand your cluster's storage in response to increased usage, you can configure an
Application Auto-Scaling policy for Amazon MSK. In an auto-scaling policy, you set the target disk
utilization and the maximum scaling capacity.

Before you use automatic scaling for Amazon MSK, you should consider the following:

• Important
A storage scaling action can occur only once every six hours.

We recommend that you start with a right-sized storage volume for your storage demands.
For guidance on right-sizing your cluster, see Right-size your cluster: Number of brokers per
cluster (p. 138).
• Amazon MSK does not reduce cluster storage in response to reduced usage. Amazon MSK does not
support decreasing the size of storage volumes. If you need to reduce the size of your cluster storage,
you must migrate your existing cluster to a cluster with smaller storage. For information about
migrating a cluster, see Migration (p. 104).
• Amazon MSK does not support automatic scaling in the Asia Pacific (Osaka) and Africa (Cape Town)
Regions.
• When you associate an auto-scaling policy with your cluster, Amazon EC2 Auto Scaling automatically
creates an Amazon CloudWatch alarm for target tracking. If you delete a cluster with an auto-scaling
policy, this CloudWatch alarm persists. To delete the CloudWatch alarm, you should remove an auto-
scaling policy from a cluster before you delete the cluster. To learn more about target tracking, see
Target tracking scaling policies for Amazon EC2 Auto Scaling in the Amazon EC2 Auto Scaling User
Guide.

Auto-scaling policy details


An auto-scaling policy defines the following parameters for your cluster:

20
Amazon Managed Streaming for
Apache Kafka Developer Guide
Automatic scaling

• Storage Utilization Target: The storage utilization threshold that Amazon MSK uses to trigger an
auto-scaling operation. You can set the utilization target between 10% and 80% of the current storage
capacity. We recommend that you set the Storage Utilization Target between 50% and 60%.
• Maximum Storage Capacity: The maximum scaling limit that Amazon MSK can set for your broker
storage. You can set the maximum storage capacity up to 16 TiB per broker. For more information, see
Amazon MSK quota (p. 123).

When Amazon MSK detects that your Maximum Disk Utilization metric is equal to or greater than
the Storage Utilization Target setting, it increases your storage capacity by an amount equal to
the larger of two numbers: 10 GiB or 10% of current storage. For example, if you have 1000 GiB, that
amount is 100 GiB. The service checks your storage utilization every minute. Further scaling operations
continue to increase storage by an amount equal to the larger of two numbers: 10 GiB or 10% of current
storage.

To determine if auto-scaling operations have occurred, use the ListClusterOperations operation.

Setting up automatic scaling for your Amazon MSK cluster


You can use the Amazon MSK console, the Amazon MSK API, or Amazon CloudFormation to implement
automatic scaling for storage. CloudFormation support is available through Application Auto Scaling.
Note
You can't implement automatic scaling when you create a cluster. You must first create the
cluster, and then create and enable an auto-scaling policy for it. However, you can create the
policy while Amazon MSK service creates your cluster.

Setting up automatic scaling using the Amazon Web Services Management


Console
1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. In the list of clusters, choose your cluster. This takes you to a page that lists details about the cluster.
3. In the Auto scaling for storage section, choose Configure.
4. Create and name an auto-scaling policy. Specify the storage utilization target, the maximum storage
capacity, and the target metric.
5. Choose Save changes.

When you save and enable the new policy, the policy becomes active for the cluster. Amazon MSK then
expands the cluster's storage when the storage utilization target is reached.

Setting up automatic scaling using the CLI


1. Use the RegisterScalableTarget command to register a storage utilization target.
2. Use the PutScalingPolicy command to create an auto-expansion policy.

Setting up automatic-scaling using the API


1. Use the RegisterScalableTarget API to register a storage utilization target.
2. Use the PutScalingPolicy API to create an auto-expansion policy.

21
Amazon Managed Streaming for
Apache Kafka Developer Guide
Manual scaling

Manual scaling
To increase storage, wait for the cluster to be in the ACTIVE state. Storage scaling has a cool-down
period of at least six hours between events. Even though the operation makes additional storage
available right away, the service performs optimizations on your cluster that can take up to 24 hours or
more. The duration of these optimizations is proportional to your storage size.

Scaling up broker storage using the Amazon Web Services


Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster for which you want to update broker storage.
3. In the Storage section, choose Edit.
4. Specify the storage volume you want. You can only increase the amount of storage, you can't
decrease it.
5. Choose Save changes.

Scaling up broker storage using the Amazon CLI


Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that you
obtained when you created your cluster. If you don't have the ARN for your cluster, you can find it by
listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

Replace Current-Cluster-Version with the current version of the cluster.


Important
Cluster versions aren't simple integers. To find the current version of the cluster, use the
DescribeCluster operation or the describe-cluster Amazon CLI command. An example version is
KTVPDKIKX0DER.

The Target-Volume-in-GiB parameter represents the amount of storage that you want each broker
to have. It is only possible to update the storage for all the brokers. You can't specify individual brokers
for which to update storage. The value you specify for Target-Volume-in-GiB must be a whole
number that is greater than 100 GiB. The storage per broker after the update operation can't exceed
16384 GiB.

aws kafka update-broker-storage --cluster-arn ClusterArn --current-version Current-


Cluster-Version --target-broker-ebs-volume-info '{"KafkaBrokerNodeId": "All",
"VolumeSizeGB": Target-Volume-in-GiB}'

Scaling up broker storage using the API


To update a broker storage using the API, see UpdateBrokerStorage.

Updating the broker type


You can scale your MSK cluster on demand by changing the type (the size or family) of your brokers
without reassigning Apache Kafka partitions. Changing the type of your brokers gives you the flexibility
to adjust your MSK cluster's compute capacity based on changes in your workloads, without interrupting

22
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the broker type using the Amazon
Web Services Management Console
your cluster I/O. Amazon MSK uses the same broker type for all the brokers in a given cluster. This
section describes how to update the broker type for your MSK cluster. The broker-type update happens
in a rolling fashion while the cluster is up and running. This means that Amazon MSK takes down one
broker at a time to perform the broker-type update. For information about how to make a cluster highly
available during a broker-type update, see the section called “Build highly available clusters” (p. 139).
To further reduce any potential impact on productivity, you can perform the broker-type update during a
period of low traffic.

During a broker-type update, you can continue to produce and consume data. However, you must wait
until the update is done before you can reboot brokers or invoke any of the update operations listed
under Amazon MSK operations.

If you want to update your cluster to a smaller broker type, we recommend that you try the update on a
test cluster first to see how it affects your scenario.
Important
You can't update a cluster to a smaller broker type if the number of partitions per broker
exceeds the maximum number specified in the section called “ Right-size your cluster: Number
of partitions per broker” (p. 138).

Updating the broker type using the Amazon Web


Services Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster for which you want to update the broker type.
3. On the details page for the cluster, find the Brokers summary section, and choose Edit broker type.
4. Choose the broker type you want from the list.
5. Save changes.

Updating the broker type using the Amazon CLI


1. Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that
you obtained when you created your cluster. If you don't have the ARN for your cluster, you can find
it by listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

Replace Current-Cluster-Version with the current version of the cluster and TargetType with
the new type that you want the brokers to be. To learn more about broker types, see the section
called “Broker types” (p. 10).

aws kafka update-broker-type --cluster-arn ClusterArn --current-version Current-


Cluster-Version --target-instance-type TargetType

The following is an example of how to use this command:

aws kafka update-broker-type --cluster-arn "arn:aws:kafka:us-


east-1:0123456789012:cluster/exampleName/abcd1234-0123-abcd-5678-1234abcd-1" --current-
version "K1X5R6FKA87" --target-instance-type kafka.m5.large

The output of this command looks like the following JSON example.

{
"ClusterArn": "arn:aws:kafka:us-east-1:0123456789012:cluster/exampleName/
abcd1234-0123-abcd-5678-1234abcd-1",

23
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the broker type using the API

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

2. To get the result of the update-broker-type operation, run the following command, replacing
ClusterOperationArn with the ARN that you obtained in the output of the update-broker-
type command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",
"ClusterArn": "arn:aws:kafka:us-east-1:0123456789012:cluster/exampleName/
abcd1234-0123-abcd-5678-1234abcd-1",
"CreationTime": "2021-01-09T02:24:22.198000+00:00",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-operation/
exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "UPDATE_COMPLETE",
"OperationType": "UPDATE_BROKER_TYPE",
"SourceClusterInfo": {
"InstanceType": "t3.small"
},
"TargetClusterInfo": {
"InstanceType": "m5.large"
}
}
}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the describe-
cluster-operation command again.

Updating the broker type using the API


To update the broker type using the API, see UpdateBrokerType.

Updating the configuration of an Amazon MSK


cluster
To update the configuration of a cluster, make sure that the cluster is in the ACTIVE state. You must also
ensure that the number of partitions per broker on your MSK cluster is under the limits described in the
section called “ Right-size your cluster: Number of partitions per broker” (p. 138). You can't update the
configuration of a cluster that exceeds these limits.

For information about MSK configuration, including how to create a custom configuration, which
properties you can update, and what happens when you update the configuration of an existing cluster,
see Configuration (p. 34).

24
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the configuration of
a cluster using the Amazon CLI

Updating the configuration of a cluster using the


Amazon CLI
1. Copy the following JSON and save it to a file. Name the file configuration-info.json. Replace
ConfigurationArn with the Amazon Resource Name (ARN) of the configuration that you want to
use to update the cluster. The ARN string must be in quotes in the following JSON.

Replace Configuration-Revision with the revision of the configuration that you want to use.
Configuration revisions are integers (whole numbers) that start at 1. This integer mustn't be in
quotes in the following JSON.

{
"Arn": ConfigurationArn,
"Revision": Configuration-Revision
}

2. Run the following command, replacing ClusterArn with the ARN that you obtained when you
created your cluster. If you don't have the ARN for your cluster, you can find it by listing all clusters.
For more information, see the section called “Listing clusters” (p. 17).

Replace Path-to-Config-Info-File with the path to your configuration info file. If you named
the file that you created in the previous step configuration-info.json and saved it in the
current directory, then Path-to-Config-Info-File is configuration-info.json.

Replace Current-Cluster-Version with the current version of the cluster.


Important
Cluster versions aren't simple integers. To find the current version of the cluster, use the
DescribeCluster operation or the describe-cluster Amazon CLI command. An example
version is KTVPDKIKX0DER.

aws kafka update-cluster-configuration --cluster-arn ClusterArn --configuration-info


file://Path-to-Config-Info-File --current-version Current-Cluster-Version

The following is an example of how to use this command:

aws kafka update-cluster-configuration --cluster-arn "arn:aws:kafka:us-


east-1:0123456789012:cluster/exampleName/abcd1234-0123-abcd-5678-1234abcd-1" --
configuration-info file://c:\users\tester\msk\configuration-info.json --current-version
"K1X5R6FKA87"

The output of this update-cluster-configuration command looks like the following JSON
example.

{
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

3. To get the result of the update-cluster-configuration operation, run the following command,
replacing ClusterOperationArn with the ARN that you obtained in the output of the update-
cluster-configuration command.

25
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the configuration of a cluster using the API

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2019-06-20T21:08:57.735Z",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "UPDATE_COMPLETE",
"OperationType": "UPDATE_CLUSTER_CONFIGURATION",
"SourceClusterInfo": {},
"TargetClusterInfo": {
"ConfigurationInfo": {
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/
ExampleConfigurationName/abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",
"Revision": 1
}
}
}
}

In this output, OperationType is UPDATE_CLUSTER_CONFIGURATION. If OperationState has


the value UPDATE_IN_PROGRESS, wait a while, then run the describe-cluster-operation
command again.

Updating the configuration of a cluster using the API


To use the API to update the configuration of a cluster, see UpdateClusterConfiguration.

Expanding an Amazon MSK cluster


Use this Amazon MSK operation when you want to increase the number of brokers in your MSK cluster.
To expand a cluster, make sure that it is in the ACTIVE state.
Important
If you want to expand an MSK cluster, make sure to use this Amazon MSK operation . Don't try
to add brokers to a cluster without using this operation.

For information about how to rebalance partitions after you add brokers to a cluster, see the section
called “Reassign partitions” (p. 141).

Expanding a cluster using the Amazon Web Services


Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster whose number of brokers you want to increase.
3. On the cluster details page, choose the Edit button next to the Cluster-Level Broker Details
heading.

26
Amazon Managed Streaming for
Apache Kafka Developer Guide
Expanding a cluster using the Amazon CLI

4. Enter the number of brokers that you want the cluster to have per Availability Zone and then choose
Save changes.

Expanding a cluster using the Amazon CLI


1. Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that
you obtained when you created your cluster. If you don't have the ARN for your cluster, you can find
it by listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

Replace Current-Cluster-Version with the current version of the cluster.


Important
Cluster versions aren't simple integers. To find the current version of the cluster, use the
DescribeCluster operation or the describe-cluster Amazon CLI command. An example
version is KTVPDKIKX0DER.

The Target-Number-of-Brokers parameter represents the total number of broker nodes that
you want the cluster to have when this operation completes successfully. The value you specify for
Target-Number-of-Brokers must be a whole number that is greater than the current number of
brokers in the cluster. It must also be a multiple of the number of Availability Zones.

aws kafka update-broker-count --cluster-arn ClusterArn --current-version Current-


Cluster-Version --target-number-of-broker-nodes Target-Number-of-Brokers

The output of this update-broker-count operation looks like the following JSON.

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

2. To get the result of the update-broker-count operation, run the following command, replacing
ClusterOperationArn with the ARN that you obtained in the output of the update-broker-
count command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2019-09-25T23:48:04.794Z",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "UPDATE_COMPLETE",
"OperationType": "INCREASE_BROKER_COUNT",
"SourceClusterInfo": {
"NumberOfBrokerNodes": 9
},

27
Amazon Managed Streaming for
Apache Kafka Developer Guide
Expanding a cluster using the API

"TargetClusterInfo": {
"NumberOfBrokerNodes": 12
}
}
}

In this output, OperationType is INCREASE_BROKER_COUNT. If OperationState has the value


UPDATE_IN_PROGRESS, wait a while, then run the describe-cluster-operation command
again.

Expanding a cluster using the API


To increase the number of brokers in a cluster using the API, see UpdateBrokerCount.

Updating a cluster's security settings


Use this Amazon MSK operation to update the authentication and client-broker encryption settings of
your MSK cluster. You can also update the Private Security Authority used to sign certificates for mutual
TLS authentication. You can't change the in-cluster (broker-to-broker) encryption setting.

The cluster must be in the ACTIVE state for you to update its security settings.

If you turn on authentication using IAM, SASL, or TLS, you must also turn on encryption between clients
and brokers. The following table shows the possible combinations.

Authentication Client-broker encryption Broker-broker encryption


options

Unauthenticated TLS, PLAINTEXT, Can be on or off.


TLS_PLAINTEXT

mTLS TLS, TLS_PLAINTEXT Must be on.

SASL/SCRAM TLS Must be on.

SASL/IAM TLS Must be on.

When client-broker encryption is set to TLS_PLAINTEXT and client-authentication is set to mTLS,


Amazon MSK creates two types of listeners for clients to connect to: one listener for clients to
connect using mTLS authentication with TLS Encryption, and another for clients to connect without
authentication or encryption (plaintext).

For more information about security settings, see Security (p. 57).

Updating a cluster's security settings using the


Amazon Web Services Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster that you want to update.
3. In the Security settings section, choose Edit.

28
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating a cluster's security settings using the Amazon CLI

4. Choose the authentication and encryption settings that you want for the cluster, then choose Save
changes.

Updating a cluster's security settings using the


Amazon CLI
1. Create a JSON file that contains the encryption settings that you want the cluster to have. The
following is an example.
Note
You can only update the client-broker encryption setting. You can't update the in-cluster
(broker-to-broker) encryption setting.

{"EncryptionInTransit":{"ClientBroker": "TLS"}}

2. Create a JSON file that contains the authentication settings that you want the cluster to have. The
following is an example.

{"Sasl":{"Scram":{"Enabled":true}}}

3. Run the following Amazon CLI command:

aws kafka update-security --cluster-arn ClusterArn --current-version Current-Cluster-


Version --client-authentication file://Path-to-Authentication-Settings-JSON-File --
encryption-info file://Path-to-Encryption-Settings-JSON-File

The output of this update-security operation looks like the following JSON.

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

4. To see the status of the update-security operation, run the following command, replacing
ClusterOperationArn with the ARN that you obtained in the output of the update-security
command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2021-09-17T02:35:47.753000+00:00",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "PENDING",

29
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating a cluster's security settings using the API

"OperationType": "UPDATE_SECURITY",
"SourceClusterInfo": {},
"TargetClusterInfo": {}
}
}

If OperationState has the value PENDING or UPDATE_IN_PROGRESS, wait a while, then run the
describe-cluster-operation command again.

Updating a cluster's security settings using the API


To update the security settings for a cluster using the API, see UpdateSecurity.
Note
The Amazon CLI and API operations for updating the security settings of a cluster are
idempotent. This means that if you invoke the security update operation and specify an
authentication or encryption setting that is the same setting that the cluster currently has, that
setting won't change.

Rebooting a broker for an Amazon MSK cluster


Use this Amazon MSK operation when you want to reboot a broker for your MSK cluster. To reboot a
broker for a cluster, make sure that the cluster in the ACTIVE state.

The Amazon MSK service may reboot the brokers for your MSK cluster during system maintenance,
such as patching or version upgrades. Rebooting a broker manually lets you test resilience of your Kafka
clients to determine how they respond to system maintenance.

Rebooting a broker using the Amazon Web Services


Management Console
1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.
2. Choose the MSK cluster whose broker you want to reboot.
3. Scroll down to the Broker details section, and choose the broker you want to reboot.
4. Choose the Reboot broker button.

Rebooting a broker using the Amazon CLI


1. Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that
you obtained when you created your cluster, and the BrokerId with the ID of the broker that you
want to reboot.
Note
The reboot-broker operation only supports rebooting one broker at a time.

If you don't have the ARN for your cluster, you can find it by listing all clusters. For more
information, see the section called “Listing clusters” (p. 17).

If you don't have the broker IDs for your cluster, you can find them by listing the broker nodes. For
more information, see list-nodes.

aws kafka reboot-broker --cluster-arn ClusterArn --broker-ids BrokerId

30
Amazon Managed Streaming for
Apache Kafka Developer Guide
Rebooting a broker using the API

The output of this reboot-broker operation looks like the following JSON.

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

2. To get the result of the reboot-broker operation, run the following command, replacing
ClusterOperationArn with the ARN that you obtained in the output of the reboot-broker
command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2019-09-25T23:48:04.794Z",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "REBOOT_IN_PROGRESS",
"OperationType": "REBOOT_NODE",
"SourceClusterInfo": {},
"TargetClusterInfo": {}
}
}

When the reboot operation is complete, the OperationState is REBOOT_COMPLETE.

Rebooting a broker using the API


To reboot a broker in a cluster using the API, see RebootBroker.

Tagging an Amazon MSK cluster


You can assign your own metadata in the form of tags to an Amazon MSK resource, such as an MSK
cluster. A tag is a key-value pair that you define for the resource. Using tags is a simple yet powerful way
to manage Amazon resources and organize data, including billing data.

Topics
• Tag basics (p. 32)
• Tracking costs using tagging (p. 32)
• Tag restrictions (p. 32)
• Tagging resources using the Amazon MSK API (p. 33)

31
Amazon Managed Streaming for
Apache Kafka Developer Guide
Tag basics

Tag basics
You can use the Amazon MSK API to complete the following tasks:

• Add tags to an Amazon MSK resource.


• List the tags for an Amazon MSK resource.
• Remove tags from an Amazon MSK resource.

You can use tags to categorize your Amazon MSK resources. For example, you can categorize your
Amazon MSK clusters by purpose, owner, or environment. Because you define the key and value for
each tag, you can create a custom set of categories to meet your specific needs. For example, you might
define a set of tags that help you track clusters by owner and associated application.

The following are several examples of tags:

• Project: Project name


• Owner: Name
• Purpose: Load testing
• Environment: Production

Tracking costs using tagging


You can use tags to categorize and track your Amazon costs. When you apply tags to your Amazon
resources, including Amazon MSK clusters, your Amazon cost allocation report includes usage and costs
aggregated by tags. You can organize your costs across multiple services by applying tags that represent
business categories (such as cost centers, application names, or owners). For more information, see Use
Cost Allocation Tags for Custom Billing Reports in the Amazon Billing User Guide.

Tag restrictions
The following restrictions apply to tags in Amazon MSK.

Basic restrictions

• The maximum number of tags per resource is 50.


• Tag keys and values are case-sensitive.
• You can't change or edit tags for a deleted resource.

Tag key restrictions

• Each tag key must be unique. If you add a tag with a key that's already in use, your new tag overwrites
the existing key-value pair.
• You can't start a tag key with aws: because this prefix is reserved for use by Amazon. Amazon creates
tags that begin with this prefix on your behalf, but you can't edit or delete them.
• Tag keys must be between 1 and 128 Unicode characters in length.
• Tag keys must consist of the following characters: Unicode letters, digits, white space, and the
following special characters: _ . / = + - @.

Tag value restrictions

• Tag values must be between 0 and 255 Unicode characters in length.

32
Amazon Managed Streaming for
Apache Kafka Developer Guide
Tagging resources using the Amazon MSK API

• Tag values can be blank. Otherwise, they must consist of the following characters: Unicode letters,
digits, white space, and any of the following special characters: _ . / = + - @.

Tagging resources using the Amazon MSK API


You can use the following operations to tag or untag an Amazon MSK resource or to list the current set
of tags for a resource:

• ListTagsForResource
• TagResource
• UntagResource

33
Amazon Managed Streaming for
Apache Kafka Developer Guide
Custom configurations

Amazon MSK configuration


Amazon MSK provides a default configuration for brokers, topics, and Apache ZooKeeper nodes. You
can also create custom configurations and use them to create new MSK clusters or to update existing
clusters. An MSK configuration consists of a set of properties and their corresponding values.

Topics
• Custom MSK configurations (p. 34)
• The default Amazon MSK configuration (p. 40)
• Amazon MSK configuration operations (p. 42)

Custom MSK configurations


Amazon MSK enables you to create a custom MSK configuration where you set the following properties.
Properties that you don't set explicitly get the values they have in the section called “Default
configuration” (p. 40). For more information about configuration properties, see Apache Kafka
Configuration.

Apache Kafka configuration properties that you can set

Name Description

allow.everyone.if.no.acl.found If you want to set this property to false, first


make sure you define Apache Kafka ACLs for
your cluster. If you set this property to false
without first defining Apache Kafka ACLs, you will
lose access to the cluster. If that happens, you
can update the configuration again and set this
property to true in order to regain access to the
cluster.

auto.create.topics.enable Enables topic autocreation on the server.

compression.type The final compression type for a given topic. You


can set this property to the standard compression
codecs (gzip, snappy, lz4, and zstd). It
additionally accepts uncompressed, which is
equivalent to no compression; and producer,
which means retain the original compression
codec set by the producer.

connections.max.idle.ms Idle connections timeout in milliseconds. The


server socket processor threads close the
connections that are idle for more than the value
that you set for this property.

default.replication.factor The default replication factor for automatically


created topics.

delete.topic.enable Enables the delete topic operation. If this config


is turned off, you can't delete a topic through the
admin tool.

34
Amazon Managed Streaming for
Apache Kafka Developer Guide
Custom configurations

Name Description

group.initial.rebalance.delay.ms Amount of time the group coordinator waits


for more consumers to join a new group before
performing the first rebalance. A longer delay
means potentially fewer rebalances, but increases
the time until processing begins.

group.max.session.timeout.ms Maximum session timeout for registered


consumers. Longer timeouts give consumers more
time to process messages in between heartbeats
at the cost of a longer time to detect failures.

group.min.session.timeout.ms Minimum session timeout for registered


consumers. Shorter timeouts result in quicker
failure detection at the cost of more frequent
consumer heartbeating, which can overwhelm
broker resources.

leader.imbalance.per.broker.percentage The ratio of leader imbalance allowed per broker.


The controller triggers a leader balance if it goes
above this value per broker. This value is specified
in percentage.

log.cleaner.delete.retention.ms Amount of time that you want Apache Kafka to


retain deleted records. The minimum value is 0.

log.cleaner.min.cleanable.ratio This configuration property can have values


between 0 and 1. It determines how frequently
the log compactor attempts to clean the log
(assuming log compaction is enabled). By default,
Apache Kafka avoids cleaning a log where more
than 50% of the log has been compacted. This
ratio bounds the maximum space wasted in the
log by duplicates (at 50%, which means at most
50% of the log could be duplicates). A higher ratio
means fewer, more efficient cleanings but more
wasted space in the log.

log.cleanup.policy The default cleanup policy for segments beyond


the retention window. A comma-separated list
of valid policies. Valid policies are delete and
compact.

log.flush.interval.messages Number of messages accumulated on a log


partition before messages are flushed to disk.

log.flush.interval.ms Maximum time in ms that a message in any topic


is kept in memory before flushed to disk. If not
set, the value in log.flush.scheduler.interval.ms is
used. The minimum value is 0.

35
Amazon Managed Streaming for
Apache Kafka Developer Guide
Custom configurations

Name Description

log.message.timestamp.difference.max.ms The maximum difference allowed between the


timestamp when a broker receives a message
and the timestamp specified in the message.
If log.message.timestamp.type=CreateTime,
a message is rejected if the difference
in timestamp exceeds this threshold.
This configuration is ignored if
log.message.timestamp.type=LogAppendTime.

log.message.timestamp.type Specifies whether the timestamp in the message


is the message creation time or the log append
time. The allowed values are CreateTime and
LogAppendTime.

log.retention.bytes Maximum size of the log before deleting it.

log.retention.hours Number of hours to keep a log file before deleting


it, tertiary to the log.retention.ms property.

log.retention.minutes Number of minutes to keep a log


file before deleting it, secondary to
log.retention.ms property. If not set, the value in
log.retention.hours is used.

log.retention.ms Number of milliseconds to keep a log file before


deleting it (in milliseconds), If not set, the value in
log.retention.minutes is used.

log.roll.ms Maximum time before a new log segment is rolled


out (in milliseconds). If you don't set this property,
the value in log.roll.hours is used. The minimum
possible value for this property is 1.

log.segment.bytes Maximum size of a single log file.

max.incremental.fetch.session.cache.slots Maximum number of incremental fetch sessions


that are maintained.

message.max.bytes Largest record batch size allowed by Kafka. If


this is increased and there are consumers older
than 0.10.2, the consumers' fetch size must also
be increased so that the they can fetch record
batches this large.

In the latest message format version, records are


always grouped into batches for efficiency. In
previous message format versions, uncompressed
records are not grouped into batches and this
limit only applies to a single record in that case.

This can be set per topic with the topic level


max.message.bytes config.

36
Amazon Managed Streaming for
Apache Kafka Developer Guide
Custom configurations

Name Description

min.insync.replicas When a producer sets acks to "all" (or "-1"),


min.insync.replicas specifies the minimum number
of replicas that must acknowledge a write for
the write to be considered successful. If this
minimum cannot be met, the producer raises
an exception (either NotEnoughReplicas or
NotEnoughReplicasAfterAppend).

When used together, min.insync.replicas and


acks enable you to enforce greater durability
guarantees. A typical scenario would be to
create a topic with a replication factor of 3, set
min.insync.replicas to 2, and produce with acks of
"all". This ensures that the producer raises an
exception if a majority of replicas don't receive a
write.

num.io.threads The number of threads that the server uses for


processing requests, which may include disk I/O.

num.network.threads The number of threads that the server uses for


receiving requests from the network and sending
responses to it.

num.partitions Default number of log partitions per topic.

num.recovery.threads.per.data.dir The number of threads per data directory to be


used for log recovery at startup and for flushing
at shutdown.

num.replica.fetchers The number of fetcher threads used to replicate


messages from a source broker. Increasing this
value can increase the degree of I/O parallelism in
the follower broker.

offsets.retention.minutes After a consumer group loses all its consumers


(that is, it becomes empty) its offsets are kept for
this retention period before getting discarded.
For standalone consumers (that is, using manual
assignment), offsets are expired after the time of
the last commit plus this retention period.

offsets.topic.replication.factor The replication factor for the offsets topic (set


higher to ensure availability). Internal topic
creation fails until the cluster size meets this
replication factor requirement.

replica.fetch.max.bytes Number of bytes of messages to attempt to


fetch for each partition. This is not an absolute
maximum. If the first record batch in the first
non-empty partition of the fetch is larger than
this value, the record batch is returned to ensure
that progress can be made. The maximum
record batch size accepted by the broker is
defined via message.max.bytes (broker config) or
max.message.bytes (topic config).

37
Amazon Managed Streaming for
Apache Kafka Developer Guide
Custom configurations

Name Description

replica.fetch.response.max.bytes The maximum number of bytes expected for


the entire fetch response. Records are fetched in
batches, and if the first record batch in the first
non-empty partition of the fetch is larger than
this value, the record batch will still be returned
to ensure that progress can be made. This isn't
an absolute maximum. The message.max.bytes
(broker config) or max.message.bytes (topic
config) properties specify the maximum record
batch size that the broker accepts.

replica.lag.time.max.ms If a follower hasn't sent any fetch requests or


hasn't consumed up to the leader's log end offset
for at least this number of milliseconds, the leader
removes the follower from the ISR.

MinValue: 10000

MaxValue (inclusive) = 30000

replica.selector.class The fully-qualified class name that implements


ReplicaSelector. This is used by the broker
to find the preferred read replica. If you are
using Apache Kafka version 2.4.1 or higher,
and want to allow consumers to fetch from
the closest replica, set this property to
org.apache.kafka.common.replica.RackAwareReplicaS
For more information, see the section called
“Apache Kafka version 2.4.1 (use 2.4.1.1
instead)” (p. 128).

replica.socket.receive.buffer.bytes The socket receive buffer for network requests.

socket.receive.buffer.bytes The SO_RCVBUF buffer of the socket server


sockets. The minimum value to which you can set
this property is -1. If the value is -1, Amazon MSK
uses the OS default.

socket.request.max.bytes The maximum number of bytes in a socket


request.

socket.send.buffer.bytes The SO_SNDBUF buffer of the socket server


sockets. The minimum value to which you can set
this property is -1. If the value is -1, Amazon MSK
uses the OS default.

transaction.max.timeout.ms Maximum timeout for transactions. If a client's


requested transaction time exceed this, the broker
returns an error in InitProducerIdRequest. This
prevents a client from too large of a timeout,
which can stall consumers reading from topics
included in the transaction.

transaction.state.log.min.isr Overridden min.insync.replicas config for the


transaction topic.

38
Amazon Managed Streaming for
Apache Kafka Developer Guide
Dynamic configuration

Name Description

transaction.state.log.replication.factor The replication factor for the transaction topic.


Set it to a higher value to increase availability.
Internal topic creation fails until the cluster size
meets this replication factor requirement.

transactional.id.expiration.ms The time in milliseconds that the transaction


coordinator waits without receiving any
transaction status updates for the current
transaction before expiring its transactional ID.
This setting also influences producer ID expiration:
producer IDs are expired when this time elapses
after the last write with the given producer ID.
Producer IDs might expire sooner if the last write
from the producer ID is deleted due to the topic's
retention settings. The minimum value for this
property is 1 millisecond.

unclean.leader.election.enable Indicates whether to enable replicas not in the ISR


set to be elected as leader as a last resort, even
though doing so may result in data loss.

zookeeper.connection.timeout.ms Maximum time that the client waits to establish a


connection to ZooKeeper. If not set, the value in
zookeeper.session.timeout.ms is used.

zookeeper.session.timeout.ms The Apache ZooKeeper session timeout in


milliseconds.

MinValue = 6000

MaxValue (inclusive) = 18000

To learn how you can create a custom MSK configuration, list all configurations, or describe them, see
the section called “Configuration operations” (p. 42). To create an MSK cluster using a custom MSK
configuration or to update a cluster with a new custom configuration, see How it works (p. 10).

When you update your existing MSK cluster with a custom MSK configuration, Amazon MSK does rolling
restarts when necessary, using best practices to minimize customer downtime. For example, after
Amazon MSK restarts each broker, it tries to let the broker catch up on data that the broker might have
missed during the configuration update before it moves to the next broker.

Dynamic configuration
In addition to the configuration properties that Amazon MSK provides, you can dynamically set cluster-
and broker-level configuration properties that don't require a broker restart. You can dynamically set
configuration properties that aren't marked as read-only in the table under Broker Configs in the Apache
Kafka documentation. For information about dynamic configuration and example commands, see
Updating Broker Configs in the Apache Kafka documentation.
Note
You can set the advertised.listeners property, but not the listeners property.

39
Amazon Managed Streaming for
Apache Kafka Developer Guide
Topic-level configuration

Topic-level configuration
You can use Apache Kafka commands to set or modify topic-level configuration properties for new and
existing topics. For more information about topic-level configuration properties and examples on how to
set them, see Topic-Level Configs in the Apache Kafka documentation.

Configuration states
Amazon MSK configurations can be in the following states. To perform an operation on a configuration,
the configuration must be in the ACTIVE or DELETE_FAILED state:

• ACTIVE
• DELETING
• DELETE_FAILED

The default Amazon MSK configuration


When you create an MSK cluster without specifying a custom MSK configuration, Amazon MSK creates
and uses a default configuration with the values shown in the following table. For properties that aren't
in this table, Amazon MSK uses the defaults associated with your version of Apache Kafka. For a list of
these default values, see Apache Kafka Configuration.

Default configuration values

Name Description Default value

allow.everyone.if.no.acl.found If no resource patterns match a true


specific resource, the resource
has no associated ACLs. In this
case, if this property is set to
true, everyone is allowed to
access the resource, not just the
super users.

auto.create.topics.enable Enables autocreation of a topic false


on the server.

auto.leader.rebalance.enable Enables auto leader balancing. true


A background thread checks
and triggers leader balance if
required at regular intervals.

default.replication.factor Default replication factors for 3 for 3-AZ clusters, 2 for 2-AZ
automatically created topics. clusters

min.insync.replicas When a producer sets 2 for 3-AZ clusters, 1 for 2-AZ


acks to "all" (or "-1"), clusters
min.insync.replicas specifies
the minimum number of
replicas that must acknowledge
a write for the write to be
considered successful. If this
minimum can't be met, the
producer raises an exception

40
Amazon Managed Streaming for
Apache Kafka Developer Guide
Default configuration

Name Description Default value


(either NotEnoughReplicas or
NotEnoughReplicasAfterAppend).

When used together,


min.insync.replicas and acks
enable you to enforce greater
durability guarantees. A typical
scenario would be to create a
topic with a replication factor of
3, set min.insync.replicas to 2,
and produce with acks of "all".
This ensures that the producer
raises an exception if a majority
of replicas do not receive a write.

num.io.threads Number of threads that the 8


server uses for processing
requests, which may include disk
I/O.

num.network.threads Number of threads that the 5


server uses for receiving
requests from the network
and sending responses to the
network.

num.partitions Default number of log partitions 1


per topic.

num.replica.fetchers Number of fetcher threads used 2


to replicate messages from a
source broker. Increasing this
value can increase the degree of
I/O parallelism in the follower
broker.

replica.lag.time.max.ms If a follower hasn't sent 30000


any fetch requests or hasn't
consumed up to the leader's
log end offset for at least this
number of milliseconds, the
leader removes the follower
from the ISR.

socket.receive.buffer.bytes SO_RCVBUF buffer of the socket 102400


sever sockets. If the value is -1,
the OS default is used.

socket.request.max.bytes Maximum number of bytes in a 104857600


socket request.

socket.send.buffer.bytes SO_SNDBUF buffer of the socket 102400


sever sockets. If the value is -1,
the OS default is used.

41
Amazon Managed Streaming for
Apache Kafka Developer Guide
Configuration operations

Name Description Default value

unclean.leader.election.enable Indicates whether to enable true


replicas not in the ISR set to be
elected as leader as a last resort,
even though doing so may result
in data loss.

zookeeper.session.timeout.ms The Apache ZooKeeper session 18000


timeout in milliseconds.

zookeeper.set.acl Set client to use secure ACLs. false

For information about how to specify custom configuration values, see the section called “Custom
configurations” (p. 34).

Amazon MSK configuration operations


This topic describes how to create custom MSK configurations and how to perform operations on
them. For information about how to use MSK configurations to create or update clusters, see How it
works (p. 10).

This topic contains the following sections:


• To create an MSK configuration (p. 42)
• To update an MSK configuration (p. 43)
• To delete an MSK configuration (p. 44)
• To describe an MSK configuration (p. 44)
• To describe an MSK configuration revision (p. 44)
• To list all MSK configurations in your account for the current Region (p. 45)

To create an MSK configuration


1. Create a file where you specify the configuration properties that you want to set and the values that
you want to assign to them. The following are the contents of an example configuration file.

auto.create.topics.enable = true

zookeeper.connection.timeout.ms = 1000

log.roll.ms = 604800000

2. Run the following Amazon CLI command, replacing config-file-path with the path to the file
where you saved your configuration in the previous step.
Note
The name that you choose for your configuration must match the following regex: "^[0-9A-
Za-z][0-9A-Za-z-]{0,}$".

aws kafka create-configuration --name "ExampleConfigurationName" --description


"Example configuration description." --kafka-versions "1.1.1" --server-properties
fileb://config-file-path

42
Amazon Managed Streaming for
Apache Kafka Developer Guide
To update an MSK configuration

The following is an example of a successful response after running this command.

{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-1234-
abcd-1234-abcd123e8e8e-1",
"CreationTime": "2019-05-21T19:37:40.626Z",
"LatestRevision": {
"CreationTime": "2019-05-21T19:37:40.626Z",
"Description": "Example configuration description.",
"Revision": 1
},
"Name": "ExampleConfigurationName"
}

3. The previous command returns an Amazon Resource Name (ARN) for the newly created
configuration. Save this ARN because you need it to refer to this configuration in other commands.
If you lose your configuration ARN, you can find it again by listing all the configurations in your
account.

To update an MSK configuration


1. Create a file where you specify the configuration properties that you want to update and the values
that you want to assign to them. The following are the contents of an example configuration file.

auto.create.topics.enable = true

zookeeper.connection.timeout.ms = 1000

min.insync.replicas = 2

2. Run the following Amazon CLI command, replacing config-file-path with the path to the file
where you saved your configuration in the previous step.

Replace configuration-arn with the ARN you obtained when you created the configuration.
If you didn't save the ARN when you created the configuration, you can use the list-
configurations command to list all configuration in your account, and find the configuration that
you want in the list that appears in the response. The ARN of the configuration also appears in that
list.

aws kafka update-configuration --arn configuration-arn --description "Example


configuration revision description." --server-properties fileb://config-file-path

3. The following is an example of a successful response after running this command.

{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-1234-
abcd-1234-abcd123e8e8e-1",
"LatestRevision": {
"CreationTime": "2020-08-27T19:37:40.626Z",
"Description": "Example configuration revision description.",
"Revision": 2
}
}

43
Amazon Managed Streaming for
Apache Kafka Developer Guide
To delete an MSK configuration

To delete an MSK configuration


The following procedure shows how to delete a configuration that isn't attached to a cluster. You can't
delete a configuration that's attached to a cluster.

1. To run this example, replace configuration-arn with the ARN you obtained when you created
the configuration. If you didn't save the ARN when you created the configuration, you can use
the list-configurations command to list all configuration in your account, and find the
configuration that you want in the list that appears in the response. The ARN of the configuration
also appears in that list.

aws kafka delete-configuration --arn configuration-arn

2. The following is an example of a successful response after running this command.

{
"arn": " arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-1234-
abcd-1234-abcd123e8e8e-1",
"state": "DELETING"
}

To describe an MSK configuration


1. This command returns metadata about the configuration. To get a detailed description of the
configuration, run the describe-configuration-revision.

To run this example, replace configuration-arn with the ARN you obtained when you created
the configuration. If you didn't save the ARN when you created the configuration, you can use
the list-configurations command to list all configuration in your account, and find the
configuration that you want in the list that appears in the response. The ARN of the configuration
also appears in that list.

aws kafka describe-configuration --arn configuration-arn

2. The following is an example of a successful response after running this command.

{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-
abcd-1234-abcd-abcd123e8e8e-1",
"CreationTime": "2019-05-21T00:54:23.591Z",
"Description": "Example configuration description.",
"KafkaVersions": [
"1.1.1"
],
"LatestRevision": {
"CreationTime": "2019-05-21T00:54:23.591Z",
"Description": "Example configuration description.",
"Revision": 1
},
"Name": "SomeTest"
}

44
Amazon Managed Streaming for
Apache Kafka Developer Guide
To describe an MSK configuration revision

To describe an MSK configuration revision


Describing an MSK configuration using the describe-configuration command, gives you the
metadata of the configuration. To see a description of the configuration, use this command, describe-
configuration-revision, instead.

• Run the following command, replacing configuration-arn with the ARN you obtained when you
created the configuration. If you didn't save the ARN when you created the configuration, you can
use the list-configurations command to list all configuration in your account, and find the
configuration that you want in the list that appears in the response. The ARN of the configuration
also appears in that list.

aws kafka describe-configuration-revision --arn configuration-arn --revision 1

The following is an example of a successful response after running this command.

{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-
abcd-1234-abcd-abcd123e8e8e-1",
"CreationTime": "2019-05-21T00:54:23.591Z",
"Description": "Example configuration description.",
"Revision": 1,
"ServerProperties":
"YXV0by5jcmVhdGUudG9waWNzLmVuYWJsZSA9IHRydWUKCgp6b29rZWVwZXIuY29ubmVjdGlvbi50aW1lb3V0Lm1zID0gMTAwM
}

The value of ServerProperties is encoded using base64. If you use a base64 decoder (for
example, https://www.base64decode.org/) to manually decode it, you get the contents of the
original configuration file that you used to create the custom configuration. In this case, you get the
following:

auto.create.topics.enable = true

zookeeper.connection.timeout.ms = 1000

log.roll.ms = 604800000

To list all MSK configurations in your account for the


current Region
• Run the following command.

aws kafka list-configurations

The following is an example of a successful response after running this command.

{
"Configurations": [
{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/
abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",
"CreationTime": "2019-05-21T00:54:23.591Z",
"Description": "Example configuration description.",
"KafkaVersions": [

45
Amazon Managed Streaming for
Apache Kafka Developer Guide
To list all MSK configurations in
your account for the current Region
"1.1.1"
],
"LatestRevision": {
"CreationTime": "2019-05-21T00:54:23.591Z",
"Description": "Example configuration description.",
"Revision": 1
},
"Name": "SomeTest"
},
{
"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/
abcdabcd-1234-abcd-1234-abcd123e8e8e-1",
"CreationTime": "2019-05-03T23:08:29.446Z",
"Description": "Example configuration description.",
"KafkaVersions": [
"1.1.1"
],
"LatestRevision": {
"CreationTime": "2019-05-03T23:08:29.446Z",
"Description": "Example configuration description.",
"Revision": 1
},
"Name": "ExampleConfigurationName"
}
]
}

46
Amazon Managed Streaming for
Apache Kafka Developer Guide
Getting started tutorial

MSK Serverless
Note
MSK Serverless is available in the US East (Ohio), US East (N. Virginia), US West (Oregon),
Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe
(Stockholm) and Europe (Ireland) Regions.

MSK Serverless is a cluster type for Amazon MSK that makes it possible for you to run Apache Kafka
without having to manage and scale cluster capacity. It automatically provisions and scales capacity
while managing the partitions in your topic, so you can stream data without thinking about right-sizing
or scaling clusters. MSK Serverless offers a throughput-based pricing model, so you pay only for what
you use. Consider using a serverless cluster if your applications need on-demand streaming capacity that
scales up and down automatically.

MSK Serverless is fully compatible with Apache Kafka, so you can use any compatible client applications
to produce and consume data. It also integrates with the following services:

• Amazon PrivateLink to provide private connectivity


• Amazon Identity and Access Management (IAM) for authentication and authorization
• Amazon Glue Schema Registry for schema management
• Amazon Kinesis Data Analytics for Apache Flink-based stream processing
• Amazon Lambda for event processing

MSK Serverless requires IAM access control for all clusters. For more information, see the section called
“IAM access control” (p. 73).

For information about the service quota that apply to MSK Serverless, see the section called “Quota for
serverless clusters” (p. 123).

To help you get started with serverless clusters, and to learn more about configuration and monitoring
options for serverless clusters, see the following.

Topics
• Getting started using MSK Serverless clusters (p. 47)
• Configuration for serverless clusters (p. 53)
• Monitoring serverless clusters (p. 53)

Getting started using MSK Serverless clusters


This tutorial shows you an example of how you can create an MSK Serverless cluster, create a client
machine that can access it, and use the client to create topics on the cluster and to write data to those
topics. This exercise doesn't represent all the options that you can choose when you create a serverless
cluster. In different parts of this exercise, we choose default options for simplicity. This doesn't mean that
they're the only options that work for setting up a serverless cluster. You can also use the Amazon CLI or
the Amazon MSK API. For more information, see the Amazon MSK API Reference 2.0.

Topics

47
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 1: Create a cluster

• Step 1: Create an MSK Serverless cluster (p. 48)


• Step 2: Create an IAM role (p. 49)
• Step 3: Create a client machine (p. 50)
• Step 4: Create an Apache Kafka topic (p. 51)
• Step 5: Produce and consume data (p. 52)
• Step 6: Delete resources (p. 52)

Step 1: Create an MSK Serverless cluster


In this step, you perform two tasks. First, you create an MSK Serverless cluster with default settings.
Second, you gather information about the cluster. This is information that you need in later steps when
you create a client that can send data to the cluster.

To create a serverless cluster

1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.aws.amazon.com/msk/home.
2. Choose Create cluster.
3. For Creation method, leave the Quick create option selected. The Quick create option lets you
create a serverless cluster with default settings.
4. For Cluster name, enter a descriptive name, such as msk-serverless-tutorial-cluster.
5. For General cluster properties, choose Serverless as the Cluster type. Use the default values for the
remaining General cluster properties.
6. Note the table under All cluster settings. This table lists the default values for important settings
such as networking and availability, and indicates whether you can change each setting after you
create the cluster. To change a setting before you create the cluster, you should choose the Custom
create option under Creation method.
Note
You can connect clients from up to five different VPCs with MSK Serverless clusters. To help
client applications switch over to another Availability Zone in the event of an outage, you
must specify at least two subnets in each VPC.
7. Choose Create cluster.

To gather information about the cluster

1. In the Cluster summary section, choose View client information. This button remains grayed out
until Amazon MSK finishes creating the cluster. You might need to wait a few minutes until the
button becomes active so you can use it.
2. Copy the string under the label Endpoint. This is your bootstrap server string.
3. Choose the Properties tab.
4. Under the Networking settings section, copy the IDs of the subnets and the security group and save
them because you need this information later to create a client machine.
5. Choose any of the subnets. This opens the Amazon VPC Console. Find the ID of the Amazon VPC
that is associated with the subnet. Save this Amazon VPC ID for later use.

Next Step

Step 2: Create an IAM role (p. 49)

48
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 2: Create an IAM role

Step 2: Create an IAM role


In this step, you perform two tasks. The first task is to create an IAM policy that grants access to create
topics on the cluster and to send data to those topics. The second task is to create an IAM role and
associate this policy with it. In a later step, we create a client machine that assumes this role and uses it
to create a topic on the cluster and to send data to that topic.

To create an IAM policy that makes it possible to create topics and write to them

1. Open the IAM console at https://console.amazonaws.cn/iam/.


2. On the navigation pane, choose Policies.
3. Choose Create Policy.
4. Choose the JSON tab, then replace the JSON in the editor window with the following JSON.

Replace region with the code of the Amazon Web Services Region where you created your cluster.
Replace Account-ID with your account ID. Replace msk-serverless-tutorial-cluster with
the name of your serverless cluster.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kafka-cluster:Connect",
"kafka-cluster:AlterCluster",
"kafka-cluster:DescribeCluster"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:cluster/msk-serverless-tutorial-
cluster/*"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:*Topic*",
"kafka-cluster:WriteData",
"kafka-cluster:ReadData"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:topic/msk-serverless-tutorial-cluster/
*"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:AlterGroup",
"kafka-cluster:DescribeGroup"
],
"Resource": [
"arn:aws:kafka:region:Account-ID:group/msk-serverless-tutorial-cluster/
*"
]
}
]
}

For instructions on how to write secure policies, see the section called “IAM access
control” (p. 73).

49
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 3: Create a client machine

5. Choose Next: Tags.


6. Choose Next: Review.
7. For the policy name, enter a descriptive name, such as msk-serverless-tutorial-policy.
8. Choose Create policy.

To create an IAM role and attach the policy to it

1. On the navigation pane, choose Roles.


2. Choose Create role.
3. Under Common use cases, choose EC2, then choose Next: Permissions.
4. In the search box, enter the name of the policy that you previously created for this tutorial. Then
select the box to the left of the policy.
5. Choose Next: Tags.
6. Choose Next: Review.
7. For the role name, enter a descriptive name, such as msk-serverless-tutorial-role.
8. Choose Create role.

Next Step

Step 3: Create a client machine (p. 50)

Step 3: Create a client machine


In the step, you perform two tasks. The first task is to create an Amazon EC2 instance to use as an
Apache Kafka client machine. The second task is to install Java and Apache Kafka tools on the machine.

To create a client machine

1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.


2. Choose Launch instance.
3. Enter a descriptive Name for your client machine, such as msk-serverless-tutorial-client.
4. Leave the Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type selected for Amazon
Machine Image (AMI) type.
5. Leave the t2.micro instance type selected.
6. Under Key pair (login), choose Create a new key pair. Enter MSKServerlessKeyPair for Key pair
name. Then choose Download Key Pair. Alternatively, you can use an existing key pair.
7. For Network settings, choose Edit.
8. Under VPC, enter the ID of the virtual private cloud (VPC) for your serverless cluster . This is the VPC
based on the Amazon VPC service whose ID you saved after you created the cluster.
9. For Subnet, choose the subnet whose ID you saved after you created the cluster.
10. For Firewall (security groups), select the security group associated with the cluster. This value
works if that security group has an inbound rule that allows traffic from the security group to itself.
With such a rule, members of the same security group can communicate with each other. For more
information, see Security group rules in the Amazon VPC Developer Guide.
11. Expand the Advanced details section and choose the IAM role that you created in Step 2: Create an
IAM role (p. 49).
12. Choose Launch.

50
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 4: Create a topic

13. In the left navigation pane, choose Instances. Then choose the check box in the row that represents
your newly created Amazon EC2 instance. From this point forward, we call this instance the client
machine.
14. Choose Connect and follow the instructions to connect to the client machine.

To set up Apache Kafka client tools on the client machine

1. To install Java, run the following command on the client machine:

sudo yum -y install java-11

2. To get the Apache Kafka tools that we need to create topics and send data, run the following
commands:

wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz

tar -xzf kafka_2.12-2.8.1.tgz

3. Go to the kafka_2.12-2.8.1/libs directory, then run the following command to download the
Amazon MSK IAM JAR file. The Amazon MSK IAM JAR makes it possible for the client machine to
access the cluster.

wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-
auth-1.1.1-all.jar

4. Go to the kafka_2.12-2.8.1/bin directory. Copy the following property settings and paste them
into a new file. Name the file client.properties and save it.

security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

Next Step

Step 4: Create an Apache Kafka topic (p. 51)

Step 4: Create an Apache Kafka topic


In this step, you use the previously created client machine to create a topic on the serverless cluster.

To create a topic and write data to it

1. In the following export command, replace my-endpoint with the bootstrap-server string you that
you saved after you created the cluster. Then, go to the kafka_2.12-2.8.1/bin directory on the
client machine and run the export command.

export BS=my-endpoint

2. Run the following command to create a topic called msk-serverless-tutorial.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --bootstrap-server $BS --command-


config client.properties --create --topic msk-serverless-tutorial --partitions 6

51
Amazon Managed Streaming for
Apache Kafka Developer Guide
Step 5: Produce and consume data

Next Step

Step 5: Produce and consume data (p. 52)

Step 5: Produce and consume data


In this step, you produce and consume data using the topic that you created in the previous step.

To produce and consume messages

1. Run the following command to create a console producer.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-list $BS --


producer.config client.properties --topic msk-serverless-tutorial

2. Enter any message that you want, and press Enter. Repeat this step two or three times. Every time
you enter a line and press Enter, that line is sent to your cluster as a separate message.
3. Keep the connection to the client machine open, and then open a second, separate connection to
that machine in a new window.
4. Use your second connection to the client machine to create a console consumer with the following
command. Replace my-endpoint with the bootstrap server string that you saved after you created
the cluster.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-server my-


endpoint --consumer.config client.properties --topic msk-serverless-tutorial --from-
beginning

You start seeing the messages you entered earlier when you used the console producer command.
5. Enter more messages in the producer window, and watch them appear in the consumer window.

Next Step

Step 6: Delete resources (p. 52)

Step 6: Delete resources


In this step, you delete the resources that you created in this tutorial.

To delete the cluster

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/home.


2. In the list of clusters, choose the cluster that you created for this tutorial.
3. For Actions, choose Delete cluster.
4. Enter delete in the field, then choose Delete.

To stop the client machine

1. Open the Amazon EC2 console at https://console.amazonaws.cn/ec2/.


2. In the list of Amazon EC2 instances, choose the client machine that you created for this tutorial.
3. Choose Instance state, then choose Terminate instance.
4. Choose Terminate.

52
Amazon Managed Streaming for
Apache Kafka Developer Guide
Configuration

To delete the IAM policy and role

1. Open the IAM console at https://console.amazonaws.cn/iam/.


2. On the navigation pane, choose Roles.
3. In the search box, enter the name of the IAM role that you created for this tutorial.
4. Choose the role. Then choose Delete role, and confirm the deletion.
5. On the navigation pane, choose Policies.
6. In the search box, enter the name of the policy that you created for this tutorial.
7. Choose the policy to open its summary page. On the policy's Summary page, choose Delete policy.
8. Choose Delete.

Configuration for serverless clusters


Amazon MSK sets broker configuration properties for serverless clusters. You can't change these broker
configuration property settings. However, you can set the following topic configuration properties.

Configuration property Default Editable Maximum allowed


value

cleanup.policy Delete Yes, but only at topic


creation time

compression.type Producer Yes

max.message.bytes 1048588 Yes 8 MiB

message.timestamp.difference.max.ms
long.max Yes

message.timestamp.type CreateTime Yes

retention.bytes 250 GiB Yes 250 GiB

retention.ms 1 day Yes 1 day

You can also use Apache Kafka commands to set or modify topic-level configuration properties for new
or existing topics. For more information about topic-level configuration properties and examples of how
to set them, see Topic-Level Configs in the official Apache Kafka documentation.

Monitoring serverless clusters


Amazon MSK integrates with Amazon CloudWatch so that you can collect, view, and analyze metrics
for your MSK Serverless cluster. The metrics shown in the following table are available for all serverless
clusters. As these metrics are published as individual data points for each partition in the topic, we
recommend viewing them as a 'SUM' statistic to get the topic-level view.

Amazon MSK publishes PerSec metrics to CloudWatch at a frequency of once per minute. This
means that the 'SUM' statistic for a one-minute period accurately represents per-second data for
PerSec metrics. To collect per-second data for a period of longer than one minute, use the following
CloudWatch math expression: m1 * 60/PERIOD(m1).

53
Amazon Managed Streaming for
Apache Kafka Developer Guide
Monitoring

Metrics available at the DEFAULT monitoring level

Name When visible Dimensions Description

BytesInPerSec After a producer Cluster Name, The number of bytes per second
writes to a topic Topic received from clients. This metric is
available for each broker and also for
each topic.

BytesOutPerSec After a consumer Cluster Name, The number of bytes per second sent to
group consumes Topic clients. This metric is available for each
from a topic broker and also for each topic.

After a consumer
FetchMessageConversionsPerSec Cluster Name, The number of fetch message
group consumes Topic conversions per second for the broker.
from a topic

MaxEstimatedTimeLagAfter a consumer Cluster Name, A time estimate of the MaxOffsetLag


group consumes Consumer metric.
from a topic Group, Topic

MaxOffsetLag After a consumer Cluster Name, The maximum offset lag across all
group consumes Consumer partitions in a topic.
from a topic Group, Topic

MessagesInPerSec After a producer Cluster Name, The number of incoming messages per
writes to a topic Topic second for the broker.

After a producer
ProduceMessageConversionsPerSec Cluster Name, The number of produce message
writes to a topic Topic conversions per second for the broker.

SumOffsetLag After a consumer Cluster Name, The aggregated offset lag for all the
group consumes Consumer partitions in a topic.
from a topic Group, Topic

To view MSK Serverless metrics

1. Sign in to the Amazon Web Services Management Console and open the CloudWatch console at
https://console.amazonaws.cn/cloudwatch/.
2. In the navigation pane, under Metrics, choose All metrics.
3. In the metrics search for the term kafka.
4. Choose AWS/Kafka / Cluster Name, Topic or AWS/Kafka / Cluster Name, Consumer Group, Topic
to see different metrics.

54
Amazon Managed Streaming for
Apache Kafka Developer Guide

Cluster states
The following table shows the possible states of a cluster and describes what they mean. It also describes
what actions you can and cannot perform when a cluster is in one of these states. To find out the state
of a cluster, you can visit the Amazon Web Services Management Console. You can also use the describe-
cluster-v2 command or the DescribeClusterV2 operation to describe the cluster. The description of a
cluster includes its state.

Cluster state Meaning and possible actions

ACTIVE You can produce and consume data. You can


also perform Amazon MSK API and Amazon CLI
operations on the cluster.

CREATING Amazon MSK is setting up the cluster. You must


wait for the cluster to reach the ACTIVE state
before you can use it to produce or consume data
or to perform Amazon MSK API or Amazon CLI
operations on it.

DELETING The cluster is being deleted. You cannot use it


to produce or consume data. You also cannot
perform Amazon MSK API or Amazon CLI
operations on it.

FAILED The cluster creation or deletion process failed.


You cannot use the cluster to produce or consume
data. You can delete the cluster but cannot
perform Amazon MSK API or Amazon CLI update
operations on it.

HEALING Amazon MSK is running an internal operation,


like replacing an unhealthy broker. For example,
the broker might be unresponsive. You can still
use the cluster to produce and consume data.
However, you cannot perform Amazon MSK API or
Amazon CLI update operations on the cluster until
it returns to the ACTIVE state.

MAINTENANCE Amazon MSK is performing routine maintenance


operations on the cluster. Such maintenance
operations include security patching. You can still
use the cluster to produce and consume data.
However, you cannot perform Amazon MSK API or
Amazon CLI update operations on the cluster until
it returns to the ACTIVE state.

REBOOTING_BROKER Amazon MSK is rebooting a broker. You can still


use the cluster to produce and consume data.
However, you cannot perform Amazon MSK API or
Amazon CLI update operations on the cluster until
it returns to the ACTIVE state.

55
Amazon Managed Streaming for
Apache Kafka Developer Guide

Cluster state Meaning and possible actions

UPDATING A user-initiated Amazon MSK API or Amazon CLI


operation is updating the cluster. You can still
use the cluster to produce and consume data.
However, you cannot perform any additional
Amazon MSK API or Amazon CLI update
operations on the cluster until it returns to the
ACTIVE state.

56
Amazon Managed Streaming for
Apache Kafka Developer Guide
Data protection

Security in Amazon Managed


Streaming for Apache Kafka
Cloud security at Amazon is the highest priority. As an Amazon customer, you benefit from a data
center and network architecture that is built to meet the requirements of the most security-sensitive
organizations.

Security is a shared responsibility between Amazon and you. The shared responsibility model describes
this as security of the cloud and security in the cloud:

• Security of the cloud – Amazon is responsible for protecting the infrastructure that runs Amazon
services in the Amazon Cloud. Amazon also provides you with services that you can use securely.
Third-party auditors regularly test and verify the effectiveness of our security as part of the Amazon
Compliance Programs. To learn about the compliance programs that apply to Amazon Managed
Streaming for Apache Kafka, see Amazon Web Services in Scope by Compliance Program.
• Security in the cloud – Your responsibility is determined by the Amazon service that you use. You are
also responsible for other factors including the sensitivity of your data, your company's requirements,
and applicable laws and regulations.

This documentation helps you understand how to apply the shared responsibility model when using
Amazon MSK. The following topics show you how to configure Amazon MSK to meet your security and
compliance objectives. You also learn how to use other Amazon Web Services that help you to monitor
and secure your Amazon MSK resources.

Topics
• Data protection in Amazon Managed Streaming for Apache Kafka (p. 57)
• Authentication and authorization for Amazon MSK APIs (p. 61)
• Authentication and authorization for Apache Kafka APIs (p. 73)
• Changing an Amazon MSK cluster's security group (p. 89)
• Controlling access to Apache ZooKeeper (p. 90)
• Logging (p. 92)
• Compliance validation for Amazon Managed Streaming for Apache Kafka (p. 97)
• Resilience in Amazon Managed Streaming for Apache Kafka (p. 97)
• Infrastructure security in Amazon Managed Streaming for Apache Kafka (p. 97)

Data protection in Amazon Managed Streaming


for Apache Kafka
The Amazon shared responsibility model applies to data protection in Amazon Managed Streaming for
Apache Kafka. As described in this model, Amazon is responsible for protecting the global infrastructure
that runs all of the Amazon Web Services Cloud. You are responsible for maintaining control over
your content that is hosted on this infrastructure. This content includes the security configuration and
management tasks for the Amazon Web Services that you use. For more information about data privacy,
see the Data Privacy FAQ.

57
Amazon Managed Streaming for
Apache Kafka Developer Guide
Encryption

For data protection purposes, we recommend that you protect Amazon Web Services account credentials
and set up individual user accounts with Amazon Identity and Access Management (IAM). That way each
user is given only the permissions necessary to fulfill their job duties. We also recommend that you
secure your data in the following ways:

• Use multi-factor authentication (MFA) with each account.


• Use SSL/TLS to communicate with Amazon resources. We recommend TLS 1.2 or later.
• Set up API and user activity logging with Amazon CloudTrail.
• Use Amazon encryption solutions, along with all default security controls within Amazon services.
• Use advanced managed security services such as Amazon Macie, which assists in discovering and
securing personal data that is stored in Amazon S3.
• If you require FIPS 140-2 validated cryptographic modules when accessing Amazon through a
command line interface or an API, use a FIPS endpoint. For more information about the available FIPS
endpoints, see Federal Information Processing Standard (FIPS) 140-2.

We strongly recommend that you never put confidential or sensitive information, such as your
customers' email addresses, into tags or free-form fields such as a Name field. This includes when
you work with Amazon MSK or other Amazon services using the console, API, Amazon CLI, or Amazon
SDKs. Any data that you enter into tags or free-form fields used for names may be used for billing or
diagnostic logs. If you provide a URL to an external server, we strongly recommend that you do not
include credentials information in the URL to validate your request to that server.

Topics
• Amazon MSK encryption (p. 58)
• How do I get started with encryption? (p. 59)

Amazon MSK encryption


Amazon MSK provides data encryption options that you can use to meet strict data management
requirements. The certificates that Amazon MSK uses for encryption must be renewed every 13 months.
Amazon MSK automatically renews these certificates for all clusters. It sets the state of the cluster to
MAINTENANCE when it starts the certificate-update operation. It sets it back to ACTIVE when the update
is done. While a cluster is in the MAINTENANCE state, you can continue to produce and consume data, but
you can't perform any update operations on it.

Encryption at rest
Amazon MSK integrates with Amazon Key Management Service (KMS) to offer transparent server-side
encryption. Amazon MSK always encrypts your data at rest. When you create an MSK cluster, you can
specify the Amazon KMS key that you want Amazon MSK to use to encrypt your data at rest. If you don't
specify a KMS key, Amazon MSK creates an Amazon managed key for you and uses it on your behalf.
For more information about KMS keys, see Amazon KMS keys in the Amazon Key Management Service
Developer Guide.

Encryption in transit
Amazon MSK uses TLS 1.2. By default, it encrypts data in transit between the brokers of your MSK
cluster. You can override this default at the time you create the cluster.

For communication between clients and brokers, you must specify one of the following three settings:

• Only allow TLS encrypted data. This is the default setting.

58
Amazon Managed Streaming for
Apache Kafka Developer Guide
How do I get started with encryption?

• Allow both plaintext, as well as TLS encrypted data.


• Only allow plaintext data.

Amazon MSK brokers use public Amazon Certificate Manager certificates. Therefore, any truststore that
trusts Amazon Trust Services also trusts the certificates of Amazon MSK brokers.

While we highly recommend enabling in-transit encryption, it can add additional CPU overhead and
a few milliseconds of latency. Most use cases aren't sensitive to these differences, however, and the
magnitude of impact depends on the configuration of your cluster, clients, and usage profile.

How do I get started with encryption?


When creating an MSK cluster, you can specify encryption settings in JSON format. The following is an
example.

{
"EncryptionAtRest": {
"DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-
abcd-1234-abcd123e8e8e"
},
"EncryptionInTransit": {
"InCluster": true,
"ClientBroker": "TLS"
}
}

For DataVolumeKMSKeyId, you can specify a customer managed key or the Amazon managed key for
MSK in your account (alias/aws/kafka). If you don't specify EncryptionAtRest, Amazon MSK still
encrypts your data at rest under the Amazon managed key. To determine which key your cluster is using,
send a GET request or invoke the DescribeCluster API operation.

For EncryptionInTransit, the default value of InCluster is true, but you can set it to false if you
don't want Amazon MSK to encrypt your data as it passes between brokers.

To specify the encryption mode for data in transit between clients and brokers, set ClientBroker to
one of three values: TLS, TLS_PLAINTEXT, or PLAINTEXT.

To specify encryption settings when creating a cluster

1. Save the contents of the previous example in a file and give the file any name that you want. For
example, call it encryption-settings.json.
2. Run the create-cluster command and use the encryption-info option to point to the file
where you saved your configuration JSON. The following is an example.

aws kafka create-cluster --cluster-name "ExampleClusterName" --broker-node-group-info


file://brokernodegroupinfo.json --encryption-info file://encryptioninfo.json --kafka-
version "2.2.1" --number-of-broker-nodes 3

The following is an example of a successful response after running this command.

{
"ClusterArn": "arn:aws:kafka:us-east-1:123456789012:cluster/SecondTLSTest/
abcdabcd-1234-abcd-1234-abcd123e8e8e",
"ClusterName": "ExampleClusterName",
"State": "CREATING"
}

59
Amazon Managed Streaming for
Apache Kafka Developer Guide
How do I get started with encryption?

To test TLS encryption

1. Create a client machine following the guidance in the section called “Step 2: Create a client
machine” (p. 5).
2. Install Apache Kafka on the client machine.
3. Run the following command on a machine that has the Amazon CLI installed, replacing clusterARN
with the ARN of your cluster (a cluster created with ClientBroker set to TLS like the example in
the previous procedure).

aws kafka describe-cluster --cluster-arn clusterARN

In the result, look for the value of ZookeeperConnectString and save it because you need it in
the next step.
4. Run the following command on your client machine to create a topic. Replace
ZookeeperConnectString with the value you obtained for ZookeeperConnectString in the
previous step.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --
zookeeper ZookeeperConnectString --replication-factor 3 --partitions 1 --topic
TLSTestTopic

5. In this example we use the JVM truststore to talk to the MSK cluster. To do this, first create a folder
named /tmp on the client machine. Then, go to the bin folder of the Apache Kafka installation, and
run the following command. (Your JVM path might be different.)

cp /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64/jre/lib/security/
cacerts /tmp/kafka.client.truststore.jks

6. While still in the bin folder of the Apache Kafka installation on the client machine, create a text file
named client.properties with the following contents.

security.protocol=SSL
ssl.truststore.location=/tmp/kafka.client.truststore.jks

7. Run the following command on a machine that has the Amazon CLI installed, replacing clusterARN
with the ARN of your cluster.

aws kafka get-bootstrap-brokers --cluster-arn clusterARN

A successful result looks like the following. Save this result because you need it for the next step.

{
"BootstrapBrokerStringTls": "a-1.example.g7oein.c2.kafka.us-
east-1.amazonaws.com:0123,a-3.example.g7oein.c2.kafka.us-
east-1.amazonaws.com:0123,a-2.example.g7oein.c2.kafka.us-east-1.amazonaws.com:0123"
}

8. Run the following command to create a console producer on your client machine. Replace
BootstrapBrokerStringTls with the value you obtained in the previous step. Leave this
producer command running.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-
list BootstrapBrokerStringTls --producer.config client.properties --topic TLSTestTopic

9. Open a new command window and connect to the same client machine. Then, run the following
command to create a console consumer.

60
Amazon Managed Streaming for
Apache Kafka Developer Guide
Authentication and authorization for Amazon MSK APIs

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-
server BootstrapBrokerStringTls --consumer.config client.properties --topic
TLSTestTopic

10. In the producer window, type a text message followed by a return, and look for the same message in
the consumer window. Amazon MSK encrypted this message in transit.

For more information about configuring Apache Kafka clients to work with encrypted data, see
Configuring Kafka Clients.

Authentication and authorization for Amazon MSK


APIs
Amazon Identity and Access Management (IAM) is an Amazon Web Service that helps an administrator
securely control access to Amazon resources. IAM administrators control who can be authenticated
(signed in) and authorized (have permissions) to use Amazon MSK resources. IAM is an Amazon Web
Service that you can use with no additional charge.

This page describes how you can use IAM to control who can perform Amazon MSK operations on your
cluster. For information on how to control who can perform Apache Kafka operations on your cluster, see
the section called “Authentication and authorization for Apache Kafka APIs” (p. 73).

Topics
• How Amazon MSK works with IAM (p. 61)
• Amazon MSK identity-based policy examples (p. 64)
• Using service-linked roles for Amazon MSK (p. 67)
• Amazon managed policies for Amazon MSK (p. 68)
• Troubleshooting Amazon MSK identity and access (p. 72)

How Amazon MSK works with IAM


Before you use IAM to manage access to Amazon MSK, you should understand what IAM features are
available to use with Amazon MSK. To get a high-level view of how Amazon MSK and other Amazon
services work with IAM, see Amazon Services That Work with IAM in the IAM User Guide.

Topics
• Amazon MSK identity-based policies (p. 61)
• Amazon MSK resource-based policies (p. 64)
• Amazon managed policies (p. 64)
• Authorization based on Amazon MSK tags (p. 64)
• Amazon MSK IAM roles (p. 64)

Amazon MSK identity-based policies


With IAM identity-based policies, you can specify allowed or denied actions and resources as well as the
conditions under which actions are allowed or denied. Amazon MSK supports specific actions, resources,

61
Amazon Managed Streaming for
Apache Kafka Developer Guide
How Amazon MSK works with IAM

and condition keys. To learn about all of the elements that you use in a JSON policy, see IAM JSON Policy
Elements Reference in the IAM User Guide.

Actions
Administrators can use Amazon JSON policies to specify who has access to what. That is, which principal
can perform actions on what resources, and under what conditions.

The Action element of a JSON policy describes the actions that you can use to allow or deny access
in a policy. Policy actions usually have the same name as the associated Amazon API operation. There
are some exceptions, such as permission-only actions that don't have a matching API operation. There
are also some operations that require multiple actions in a policy. These additional actions are called
dependent actions.

Include actions in a policy to grant permissions to perform the associated operation.

Policy actions in Amazon MSK use the following prefix before the action: kafka:. For example, to
grant someone permission to describe an MSK cluster with the Amazon MSK DescribeCluster API
operation, you include the kafka:DescribeCluster action in their policy. Policy statements must
include either an Action or NotAction element. Amazon MSK defines its own set of actions that
describe tasks that you can perform with this service.

To specify multiple actions in a single statement, separate them with commas as follows:

"Action": ["kafka:action1", "kafka:action2"]

You can specify multiple actions using wildcards (*). For example, to specify all actions that begin with
the word Describe, include the following action:

"Action": "kafka:Describe*"

To see a list of Amazon MSK actions, see Actions, resources, and condition keys for Amazon Managed
Streaming for Apache Kafka in the IAM User Guide.

Resources
Administrators can use Amazon JSON policies to specify who has access to what. That is, which principal
can perform actions on what resources, and under what conditions.

The Resource JSON policy element specifies the object or objects to which the action applies.
Statements must include either a Resource or a NotResource element. As a best practice, specify
a resource using its Amazon Resource Name (ARN). You can do this for actions that support a specific
resource type, known as resource-level permissions.

For actions that don't support resource-level permissions, such as listing operations, use a wildcard (*) to
indicate that the statement applies to all resources.

"Resource": "*"

The Amazon MSK instance resource has the following ARN:

arn:${Partition}:kafka:${Region}:${Account}:cluster/${ClusterName}/${UUID}

For more information about the format of ARNs, see Amazon Resource Names (ARNs) and Amazon
Service Namespaces.

62
Amazon Managed Streaming for
Apache Kafka Developer Guide
How Amazon MSK works with IAM

For example, to specify the CustomerMessages instance in your statement, use the following ARN:

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/CustomerMessages/abcd1234-abcd-
dcba-4321-a1b2abcd9f9f-2"

To specify all instances that belong to a specific account, use the wildcard (*):

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/*"

Some Amazon MSK actions, such as those for creating resources, cannot be performed on a specific
resource. In those cases, you must use the wildcard (*).

"Resource": "*"

To specify multiple resources in a single statement, separate the ARNs with commas.

"Resource": ["resource1", "resource2"]

To see a list of Amazon MSK resource types and their ARNs, see Resources Defined by Amazon Managed
Streaming for Apache Kafka in the IAM User Guide. To learn with which actions you can specify the ARN
of each resource, see Actions Defined by Amazon Managed Streaming for Apache Kafka.

Condition keys
Administrators can use Amazon JSON policies to specify who has access to what. That is, which principal
can perform actions on what resources, and under what conditions.

The Condition element (or Condition block) lets you specify conditions in which a statement is in
effect. The Condition element is optional. You can create conditional expressions that use condition
operators, such as equals or less than, to match the condition in the policy with values in the request.

If you specify multiple Condition elements in a statement, or multiple keys in a single Condition
element, Amazon evaluates them using a logical AND operation. If you specify multiple values for a single
condition key, Amazon evaluates the condition using a logical OR operation. All of the conditions must be
met before the statement's permissions are granted.

You can also use placeholder variables when you specify conditions. For example, you can grant an IAM
user permission to access a resource only if it is tagged with their IAM user name. For more information,
see IAM policy elements: variables and tags in the IAM User Guide.

Amazon supports global condition keys and service-specific condition keys. To see all Amazon global
condition keys, see Amazon global condition context keys in the IAM User Guide.

Amazon MSK defines its own set of condition keys and also supports using some global condition keys.
To see all Amazon global condition keys, see Amazon Global Condition Context Keys in the IAM User
Guide.

To see a list of Amazon MSK condition keys, see Condition Keys for Amazon Managed Streaming for
Apache Kafka in the IAM User Guide. To learn with which actions and resources you can use a condition
key, see Actions Defined by Amazon Managed Streaming for Apache Kafka.

Examples

To view examples of Amazon MSK identity-based policies, see Amazon MSK identity-based policy
examples (p. 64).

63
Amazon Managed Streaming for
Apache Kafka Developer Guide
Identity-based policy examples

Amazon MSK resource-based policies


Amazon MSK does not support resource-based policies.

Amazon managed policies

Authorization based on Amazon MSK tags


You can attach tags to Amazon MSK clusters. To control access based on tags, you provide tag
information in the condition element of a policy using the kafka:ResourceTag/key-name,
aws:RequestTag/key-name, or aws:TagKeys condition keys. For more information about tagging
Amazon MSK resources, see the section called “Tagging a cluster” (p. 31).

To view an example identity-based policy for limiting access to a cluster based on the tags on that
cluster, see Accessing Amazon MSK clusters based on tags (p. 66).

Amazon MSK IAM roles


An IAM role is an entity within your Amazon Web Services account that has specific permissions.

Using temporary credentials with Amazon MSK


You can use temporary credentials to sign in with federation, assume an IAM role, or to assume a cross-
account role. You obtain temporary security credentials by calling Amazon STS API operations such as
AssumeRole or GetFederationToken.

Amazon MSK supports using temporary credentials.

Service-linked roles
Service-linked roles allow Amazon Web Services to access resources in other services to complete an
action on your behalf. Service-linked roles appear in your IAM account and are owned by the service. An
IAM administrator can view but not edit the permissions for service-linked roles.

Amazon MSK supports service-linked roles. For details about creating or managing Amazon MSK service-
linked roles, the section called “Service-linked roles” (p. 67).

Amazon MSK identity-based policy examples


By default, IAM users and roles don't have permission to execute Amazon MSK API actions. An IAM
administrator must create IAM policies that grant users and roles permission to perform specific API
operations on the specified resources they need. The administrator must then attach those policies to
the IAM users or groups that require those permissions.

To learn how to create an IAM identity-based policy using these example JSON policy documents, see
Creating Policies on the JSON Tab in the IAM User Guide.

Topics
• Policy best practices (p. 65)
• Allow users to view their own permissions (p. 65)
• Accessing one Amazon MSK cluster (p. 66)
• Accessing Amazon MSK clusters based on tags (p. 66)

64
Amazon Managed Streaming for
Apache Kafka Developer Guide
Identity-based policy examples

Policy best practices


Identity-based policies determine whether someone can create, access, or delete Amazon MSK resources
in your account. These actions can incur costs for your Amazon Web Services account. When you create
or edit identity-based policies, follow these guidelines and recommendations:

• Get started with Amazon managed policies and move toward least-privilege permissions – To get
started granting permissions to your users and workloads, use the Amazon managed policies that grant
permissions for many common use cases. They are available in your Amazon Web Services account.
We recommend that you reduce permissions further by defining Amazon customer managed policies
that are specific to your use cases. For more information, see Amazon managed policies or Amazon
managed policies for job functions in the IAM User Guide.
• Apply least-privilege permissions – When you set permissions with IAM policies, grant only the
permissions required to perform a task. You do this by defining the actions that can be taken on
specific resources under specific conditions, also known as least-privilege permissions. For more
information about using IAM to apply permissions, see Policies and permissions in IAM in the IAM User
Guide.
• Use conditions in IAM policies to further restrict access – You can add a condition to your policies
to limit access to actions and resources. For example, you can write a policy condition to specify that
all requests must be sent using SSL. You can also use conditions to grant access to service actions if
they are used through a specific Amazon Web Service, such as Amazon CloudFormation. For more
information, see IAM JSON policy elements: Condition in the IAM User Guide.
• Use IAM Access Analyzer to validate your IAM policies to ensure secure and functional permissions
– IAM Access Analyzer validates new and existing policies so that the policies adhere to the IAM
policy language (JSON) and IAM best practices. IAM Access Analyzer provides more than 100 policy
checks and actionable recommendations to help you author secure and functional policies. For more
information, see IAM Access Analyzer policy validation in the IAM User Guide.
• Require multi-factor authentication (MFA) – If you have a scenario that requires IAM users or root
users in your account, turn on MFA for additional security. To require MFA when API operations are
called, add MFA conditions to your policies. For more information, see Configuring MFA-protected API
access in the IAM User Guide.

For more information about best practices in IAM, see Security best practices in IAM in the IAM User
Guide.

Allow users to view their own permissions


This example shows how you might create a policy that allows IAM users to view the inline and managed
policies that are attached to their user identity. This policy includes permissions to complete this action
on the console or programmatically using the Amazon CLI or Amazon API.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ViewOwnUserInfo",
"Effect": "Allow",
"Action": [
"iam:GetUserPolicy",
"iam:ListGroupsForUser",
"iam:ListAttachedUserPolicies",
"iam:ListUserPolicies",
"iam:GetUser"
],
"Resource": ["arn:aws-cn:iam::*:user/${aws:username}"]
},
{

65
Amazon Managed Streaming for
Apache Kafka Developer Guide
Identity-based policy examples

"Sid": "NavigateInConsole",
"Effect": "Allow",
"Action": [
"iam:GetGroupPolicy",
"iam:GetPolicyVersion",
"iam:GetPolicy",
"iam:ListAttachedGroupPolicies",
"iam:ListGroupPolicies",
"iam:ListPolicyVersions",
"iam:ListPolicies",
"iam:ListUsers"
],
"Resource": "*"
}
]
}

Accessing one Amazon MSK cluster


In this example, you want to grant an IAM user in your Amazon Web Services account access to one of
your clusters, purchaseQueriesCluster. This policy allows the user to describe the cluster, get its
bootstrap brokers, list its broker nodes, and update it.

{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"UpdateCluster",
"Effect":"Allow",
"Action":[
"kafka:Describe*",
"kafka:Get*",
"kafka:List*",
"kafka:Update*"
],
"Resource":"arn:aws:kafka:us-east-1:012345678012:cluster/purchaseQueriesCluster/
abcdefab-1234-abcd-5678-cdef0123ab01-2"
}
]
}

Accessing Amazon MSK clusters based on tags


You can use conditions in your identity-based policy to control access to Amazon MSK resources based on
tags. This example shows how you might create a policy that allows the user to describe the cluster, get
its bootstrap brokers, list its broker nodes, update it, and delete it. However, permission is granted only if
the cluster tag Owner has the value of that user's user name.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AccessClusterIfOwner",
"Effect": "Allow",
"Action": [
"kafka:Describe*",
"kafka:Get*",
"kafka:List*",
"kafka:Update*",
"kafka:Delete*"
],

66
Amazon Managed Streaming for
Apache Kafka Developer Guide
Service-linked roles

"Resource": "arn:aws:kafka:us-east-1:012345678012:cluster/*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Owner": "${aws:username}"
}
}
}
]
}

You can attach this policy to the IAM users in your account. If a user named richard-roe attempts to
update an MSK cluster, the cluster must be tagged Owner=richard-roe or owner=richard-roe.
Otherwise, he is denied access. The condition tag key Owner matches both Owner and owner because
condition key names are not case-sensitive. For more information, see IAM JSON Policy Elements:
Condition in the IAM User Guide.

Using service-linked roles for Amazon MSK


Amazon MSK uses Amazon Identity and Access Management (IAM) service-linked roles. A service-
linked role is a unique type of IAM role that is linked directly to Amazon MSK. Service-linked roles
are predefined by Amazon MSK and include all the permissions that the service requires to call other
Amazon services on your behalf.

A service-linked role makes setting up Amazon MSK easier because you do not have to manually add the
necessary permissions. Amazon MSK defines the permissions of its service-linked roles. Unless defined
otherwise, only Amazon MSK can assume its roles. The defined permissions include the trust policy and
the permissions policy, and that permissions policy cannot be attached to any other IAM entity.

For information about other services that support service-linked roles, see Amazon Web Services That
Work with IAM, and look for the services that have Yes in the Service-Linked Role column. Choose a Yes
with a link to view the service-linked role documentation for that service.

Topics
• Service-linked role permissions for Amazon MSK (p. 67)
• Creating a service-linked role for Amazon MSK (p. 68)
• Editing a service-linked role for Amazon MSK (p. 68)
• Supported Regions for Amazon MSK service-linked roles (p. 68)

Service-linked role permissions for Amazon MSK


Amazon MSK uses the service-linked role named AWSServiceRoleForKafka – Allows Amazon MSK to
access Amazon resources on your behalf.

The AWSServiceRoleForKafka service-linked role trusts the following services to assume the role:

• kafka.amazonaws.com

The role permissions policy allows Amazon MSK to complete the following actions on the specified
resources:

• Action: ec2:CreateNetworkInterface on *
• Action: ec2:DescribeNetworkInterfaces on *
• Action: ec2:CreateNetworkInterfacePermission on *
• Action: ec2:AttachNetworkInterface on *
• Action: ec2:DeleteNetworkInterface on *

67
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon managed policies

• Action: ec2:DetachNetworkInterface on *
• Action: acm-pca:GetCertificateAuthorityCertificate on *
• Action: secretsmanager:ListSecrets on *
• Action: secretsmanager:GetResourcePolicy on secrets with the prefix AmazonMSK_ that you
create for Amazon MSK
• Action: secretsmanager:PutResourcePolicy on secrets with the prefix AmazonMSK_ that you
create for Amazon MSK
• Action: secretsmanager:DeleteResourcePolicy on secrets with the prefix AmazonMSK_ that you
create for Amazon MSK
• Action: secretsmanager:DescribeSecret on secrets with the prefix AmazonMSK_ that you create
for Amazon MSK

You must configure permissions to allow an IAM entity (such as a user, group, or role) to create, edit, or
delete a service-linked role. For more information, see Service-Linked Role Permissions in the IAM User
Guide.

Creating a service-linked role for Amazon MSK


You don't need to create a service-linked role manually. When you create an Amazon MSK cluster in the
Amazon Web Services Management Console, the Amazon CLI, or the Amazon API, Amazon MSK creates
the service-linked role for you.

If you delete this service-linked role, and then need to create it again, you can use the same process to
recreate the role in your account. When you create an Amazon MSK cluster, Amazon MSK creates the
service-linked role for you again.

Editing a service-linked role for Amazon MSK


Amazon MSK does not allow you to edit the AWSServiceRoleForKafka service-linked role. After you
create a service-linked role, you cannot change the name of the role because various entities might
reference the role. However, you can edit the description of the role using IAM. For more information, see
Editing a Service-Linked Role in the IAM User Guide.

Supported Regions for Amazon MSK service-linked roles


Amazon MSK supports using service-linked roles in all of the Regions where the service is available. For
more information, see Amazon Regions and Endpoints.

Amazon managed policies for Amazon MSK


To add permissions to users, groups, and roles, it is easier to use Amazon managed policies than to write
policies yourself. It takes time and expertise to create IAM customer managed policies that provide your
team with only the permissions they need. To get started quickly, you can use our Amazon managed
policies. These policies cover common use cases and are available in your Amazon Web Services account.
For more information about Amazon managed policies, see Amazon managed policies in the IAM User
Guide.

Amazon Web Services maintain and update Amazon managed policies. You can't change the permissions
in Amazon managed policies. Services occasionally add additional permissions to an Amazon managed
policy to support new features. This type of update affects all identities (users, groups, and roles) where
the policy is attached. Services are most likely to update an Amazon managed policy when a new feature
is launched or when new operations become available. Services do not remove permissions from an
Amazon managed policy, so policy updates won't break your existing permissions.

68
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon managed policies

Additionally, Amazon supports managed policies for job functions that span multiple services. For
example, the ViewOnlyAccess Amazon managed policy provides read-only access to many Amazon
Web Services and resources. When a service launches a new feature, Amazon adds read-only permissions
for new operations and resources. For a list and descriptions of job function policies, see Amazon
managed policies for job functions in the IAM User Guide.

Amazon managed policy: AmazonMSKFullAccess


This policy grants administrative permissions that allow a principal full access to all Amazon MSK actions.
The permissions in this policy are grouped as follows:

• The Amazon MSK permissions allow all Amazon MSK actions.


• Some of the Amazon EC2 permissions in this policy are required to validate the passed resources in an
API request. This is to make sure Amazon MSK is able to successfully use the resources with a cluster.
The rest of the Amazon EC2 permissions in this policy allow Amazon MSK to create Amazon resources
that are needed to make it possible for you to connect to your clusters.
• The Amazon KMS permissions are used during API calls to validate the passed resources in a request.
They are required for Amazon MSK to be able to use the passed key with the Amazon MSK cluster.
• The CloudWatch Logs, Amazon S3, and Amazon Kinesis Data Firehose permissions are required for
Amazon MSK to be able to ensure that the log delivery destinations are reachable, and that they are
valid for broker log use.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kafka:*",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeSecurityGroups",
"ec2:DescribeRouteTables",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcAttribute",
"kms:DescribeKey",
"kms:CreateGrant",
"logs:CreateLogDelivery",
"logs:GetLogDelivery",
"logs:UpdateLogDelivery",
"logs:DeleteLogDelivery",
"logs:ListLogDeliveries",
"logs:PutResourcePolicy",
"logs:DescribeResourcePolicies",
"logs:DescribeLogGroups",
"S3:GetBucketPolicy",
"firehose:TagDeliveryStream"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateVpcEndpoint"
],
"Resource": [
"arn:*:ec2:*:*:vpc/*",
"arn:*:ec2:*:*:subnet/*",
"arn:*:ec2:*:*:security-group/*"
]

69
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon managed policies

},
{
"Effect": "Allow",
"Action": [
"ec2:CreateVpcEndpoint"
],
"Resource": [
"arn:*:ec2:*:*:vpc-endpoint/*"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/AWSMSKManaged": "true"
},
"StringLike": {
"aws:RequestTag/ClusterArn": "*"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateTags"
],
"Resource": "arn:*:ec2:*:*:vpc-endpoint/*",
"Condition": {
"StringEquals": {
"ec2:CreateAction": "CreateVpcEndpoint"
}
}
},
{
"Effect": "Allow",
"Action": [
"ec2:DeleteVpcEndpoints"
],
"Resource": "arn:*:ec2:*:*:vpc-endpoint/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/AWSMSKManaged": "true"
},
"StringLike": {
"ec2:ResourceTag/ClusterArn": "*"
}
}
},
{
"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam::*:role/aws-service-role/kafka.amazonaws.com/
AWSServiceRoleForKafka*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "kafka.amazonaws.com"
}
}
},
{
"Effect": "Allow",
"Action": [
"iam:AttachRolePolicy",
"iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/kafka.amazonaws.com/
AWSServiceRoleForKafka*"
},
{

70
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon managed policies

"Effect": "Allow",
"Action": "iam:CreateServiceLinkedRole",
"Resource": "arn:aws:iam::*:role/aws-service-role/
delivery.logs.amazonaws.com/AWSServiceRoleForLogDelivery*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "delivery.logs.amazonaws.com"
}
}
}

]
}

Amazon managed policy: AmazonMSKReadOnlyAccess


This policy grants read-only permissions that allow users to view information in Amazon MSK. Principals
with this policy attached can't make any updates or delete exiting resources, nor can they create new
Amazon MSK resources. For example, principals with these permissions can view the list of clusters and
configurations associated with their account, but cannot change the configuration or settings of any
clusters. The permissions in this policy are grouped as follows:

• The Amazon MSK permissions allow you to list Amazon MSK resources, describe them, and get
information about them.
• The Amazon EC2 permissions are used to describe the Amazon VPC, subnets, security groups, and ENIs
that are associated with a cluster.
• The Amazon KMS permission is used to describe the key that is associated with the cluster.

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"kafka:Describe*",
"kafka:List*",
"kafka:Get*",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"kms:DescribeKey"
],
"Effect": "Allow",
"Resource": "*"
}
]
}

Amazon managed policy: KafkaServiceRolePolicy


You can't attach KafkaServiceRolePolicy to your IAM entities. This policy is attached to a service-linked
role that allows Amazon MSK to perform actions on your behalf. For more information, see the section
called “Service-linked roles” (p. 67).

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",

71
Amazon Managed Streaming for
Apache Kafka Developer Guide
Troubleshooting

"Action": [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:CreateNetworkInterfacePermission",
"ec2:AttachNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:DetachNetworkInterface",
"acm-pca:GetCertificateAuthorityCertificate",
"secretsmanager:ListSecrets"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:PutResourcePolicy",
"secretsmanager:DeleteResourcePolicy",
"secretsmanager:DescribeSecret"
],
"Resource": "*",
"Condition": {
"ArnLike": {
"secretsmanager:SecretId":
"arn:*:secretsmanager:*:*:secret:AmazonMSK_*"
}
}
}
]
}

Amazon MSK updates to Amazon managed policies


View details about updates to Amazon managed policies for Amazon MSK since this service began
tracking these changes.

Change Description Date

AmazonMSKFullAccess (p. 69) Amazon MSK added new November 30, 2021
– Update to an existing policy Amazon EC2 permissions to
make it possible to connect to a
cluster.

AmazonMSKFullAccess (p. 69) Amazon MSK added a new November 19, 2021
– Update to an existing policy permission to allow it to
describe Amazon EC2 route
tables.

Amazon MSK started tracking Amazon MSK started tracking November 19, 2021
changes changes for its Amazon
managed policies.

Troubleshooting Amazon MSK identity and access


Use the following information to help you diagnose and fix common issues that you might encounter
when working with Amazon MSK and IAM.

Topics
• I Am not authorized to perform an action in Amazon MSK (p. 73)

72
Amazon Managed Streaming for
Apache Kafka Developer Guide
Authentication and authorization for Apache Kafka APIs

I Am not authorized to perform an action in Amazon MSK


If the Amazon Web Services Management Console tells you that you're not authorized to perform an
action, then you must contact your administrator for assistance. Your administrator is the person that
provided you with your user name and password.

The following example error occurs when the mateojackson IAM user tries to use the console to delete
a cluster but does not have kafka:DeleteCluster permissions.

User: arn:aws-cn:iam::123456789012:user/mateojackson is not authorized to perform:


kafka:DeleteCluster on resource: purchaseQueriesCluster

In this case, Mateo asks his administrator to update his policies to allow him to access the
purchaseQueriesCluster resource using the kafka:DeleteCluster action.

Authentication and authorization for Apache Kafka


APIs
You can use IAM to authenticate clients and to allow or deny Apache Kafka actions. Alternatively, you can
use TLS or SASL/SCRAM to authenticate clients, and Apache Kafka ACLs to allow or deny actions.

For information on how to control who can perform Amazon MSK operations on your cluster, see the
section called “Authentication and authorization for Amazon MSK APIs” (p. 61).

Topics
• IAM access control (p. 73)
• Mutual TLS authentication (p. 81)
• Username and password authentication with Amazon Secrets Manager (p. 85)
• Apache Kafka ACLs (p. 88)

IAM access control


IAM access control for Amazon MSK enables you to handle both authentication and authorization for
your MSK cluster. This eliminates the need to use one mechanism for authentication and another for
authorization. For example, when a client tries to write to your cluster, Amazon MSK uses IAM to check
whether that client is an authenticated identity and also whether it is authorized to produce to your
cluster.

Amazon MSK logs access events so you can audit them. For more information, see the section called
“CloudTrail events” (p. 94).

To make IAM access control possible, Amazon MSK makes minor modifications to Apache Kafka source
code. These modifications won't cause a noticeable difference in your Apache Kafka experience.
Important
IAM access control doesn't apply to Apache ZooKeeper nodes. For information about how
you can control access to those nodes, see the section called “Controlling access to Apache
ZooKeeper” (p. 90).
Important
The allow.everyone.if.no.acl.found Apache Kafka setting has no effect if your cluster
uses IAM access control.

73
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

Important
You can invoke Apache Kafka ACL APIs for an MSK cluster that uses IAM access control. However,
Apache Kafka ACLs stored in Apache ZooKeeper have no effect on authorization for IAM roles.
You must use IAM policies to control access for IAM roles.

How IAM access control for Amazon MSK works


To use IAM access control for Amazon MSK, perform the following steps, which are described in detail in
the rest of this section.

• the section called “Create a cluster that uses IAM access control” (p. 74)
• the section called “Configure clients for IAM access control” (p. 74)
• the section called “Create authorization policies” (p. 75)
• the section called “Get the bootstrap brokers for IAM access control” (p. 76)

Create a cluster that uses IAM access control


This section explains how you can use the Amazon Web Services Management Console, the API, or the
Amazon CLI to create a cluster that uses IAM access control. For information about how to turn on IAM
access control for an existing cluster, see the section called “Updating security” (p. 28).

Use the Amazon Web Services Management Console to create a cluster that uses IAM access
control

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.


2. Choose Create cluster.
3. Choose Create cluster with custom settings.
4. In the Authentication section, choose IAM access control.
5. Complete the rest of the workflow for creating a cluster.

Use the API or the Amazon CLI to create a cluster that uses IAM access control

• To create a cluster with IAM access control enabled, use the CreateCluster API or the create-
cluster CLI command, and pass the following JSON for the ClientAuthentication parameter:
"ClientAuthentication": { "Sasl": { "Iam": { "Enabled": true } }.

Configure clients for IAM access control


To enable clients to communicate with an MSK cluster that uses IAM access control, configure them as
described in the following steps.

1. Add the following to the client.properties file. Replace <PATH_TO_TRUST_STORE_FILE> with


the fully-qualified path to the trust store file on the client.
Note
If you don't want to use a specific certificate, you can remove
ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE> from
your client.properties file. When you don't specify a value for
ssl.truststore.location, the Java process uses the default certificate.

ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;

74
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

To use a named profile that you created for Amazon credentials, include awsProfileName="your
profile name"; in your client configuration file. For information about named profiles, see
Named profiles in the Amazon CLI documentation.
2. Download the latest stable aws-msk-iam-auth JAR file, and place it in the class path. If you use
Maven, add the following dependency, adjusting the version number as needed:

<dependency>
<groupId>software.amazon.msk</groupId>
<artifactId>aws-msk-iam-auth</artifactId>
<version>1.0.0</version>
</dependency>

The Amazon MSK client plugin is open-sourced under the Apache 2.0 license.

Create authorization policies


Attach an authorization policy to the IAM role that corresponds to the client. In an authorization policy,
you specify which actions to allow or deny for the role. If your client is on an Amazon EC2 instance,
associate the authorization policy with the IAM role for that Amazon EC2 instance. Alternatively, you
can configure your client to use a named profile, and then you associate the authorization policy with
the role for that named profile. the section called “Configure clients for IAM access control” (p. 74)
describes how to configure a client to use a named profile.

For information about how to create an IAM policy, see Creating IAM policies.

The following is an example authorization policy for a cluster named MyTestCluster. To understand the
semantics of the Action and Resource elements, see the section called “Semantics of actions and
resources” (p. 76).
Important
Changes that you make to an IAM policy are reflected in the IAM APIs and the Amazon CLI
immediately. However, it can take noticeable time for the policy change to take effect. In most
cases, policy changes take effect in less than a minute. Network conditions may sometimes
increase the delay.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kafka-cluster:Connect",
"kafka-cluster:AlterCluster",
"kafka-cluster:DescribeCluster"
],
"Resource": [
"arn:aws:kafka:us-east-1:0123456789012:cluster/MyTestCluster/abcd1234-0123-
abcd-5678-1234abcd-1"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:*Topic*",
"kafka-cluster:WriteData",
"kafka-cluster:ReadData"
],

75
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

"Resource": [
"arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/*"
]
},
{
"Effect": "Allow",
"Action": [
"kafka-cluster:AlterGroup",
"kafka-cluster:DescribeGroup"
],
"Resource": [
"arn:aws:kafka:us-east-1:0123456789012:group/MyTestCluster/*"
]
}
]
}

To learn how to create a policy with action elements that correspond to common Apache Kafka use
cases, like producing and consuming data, see the section called “Common use cases” (p. 80).

Get the bootstrap brokers for IAM access control


See the section called “Getting the bootstrap brokers” (p. 16).

Semantics of actions and resources


This section explains the semantics of the action and resource elements that you can use in an
IAM authorization policy. For an example policy, see the section called “Create authorization
policies” (p. 75).

Actions
The following table lists the actions that you can include in an authorization policy when you use IAM
access control for Amazon MSK. When you include in your authorization policy an action from the Action
column of the table, you must also include the corresponding actions from the Required actions column.

Action Description Required actions Required Applicable to


resources serverless clusters

kafka- Grants permission None cluster Yes


cluster:Connect to connect and
authenticate to
the cluster.

kafka- Grants permission kafka- cluster Yes


to describe various
cluster:DescribeCluster cluster:Connect
aspects of the
cluster, equivalent
to Apache
Kafka's DESCRIBE
CLUSTER ACL.

kafka- Grants permission kafka- cluster No


to alter various
cluster:AlterCluster cluster:Connect
aspects of the
cluster, equivalent kafka-
to Apache Kafka's cluster:DescribeCluster
ALTER CLUSTER
ACL.

76
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

Action Description Required actions Required Applicable to


resources serverless clusters

kafka- Grants permission kafka- cluster No


to describe
cluster:DescribeClusterDynamicConfiguration
cluster:Connect
the dynamic
configuration of a
cluster, equivalent
to Apache Kafka's
DESCRIBE_CONFIGS
CLUSTER ACL.

kafka- Grants permission kafka- cluster No


to alter the
cluster:AlterClusterDynamicConfiguration
cluster:Connect
dynamic
configuration of a kafka-
cluster, equivalent cluster:DescribeClusterDynamicConfiguration
to Apache Kafka's
ALTER_CONFIGS
CLUSTER ACL.

kafka- Grants permission kafka- cluster Yes


to write data
cluster:WriteDataIdempotently cluster:Connect
idempotently on a
cluster, equivalent kafka-
to Apache Kafka's cluster:WriteData
IDEMPOTENT_WRITE
CLUSTER ACL.

kafka- Grants permission kafka- topic Yes


to create topics
cluster:CreateTopic cluster:Connect
on a cluster,
equivalent to
Apache Kafka's
CREATE CLUSTER/
TOPIC ACL.

kafka- Grants permission kafka- topic Yes


to describe topics
cluster:DescribeTopic cluster:Connect
on a cluster,
equivalent to
Apache Kafka's
DESCRIBE TOPIC
ACL.

kafka- Grants permission kafka- topic Yes


to alter topics on a
cluster:AlterTopic cluster:Connect
cluster, equivalent
to Apache Kafka's kafka-
ALTER TOPIC ACL. cluster:DescribeTopic

kafka- Grants permission kafka- topic Yes


to delete topics
cluster:DeleteTopic cluster:Connect
on a cluster,
equivalent to kafka-
Apache Kafka's cluster:DescribeTopic
DELETE TOPIC
ACL.

77
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

Action Description Required actions Required Applicable to


resources serverless clusters

kafka- Grants permission kafka- topic Yes


to describe
cluster:DescribeTopicDynamicConfiguration
cluster:Connect
the dynamic
configuration of
topics on a cluster,
equivalent to
Apache Kafka's
DESCRIBE_CONFIGS
TOPIC ACL.

kafka- Grants permission kafka- topic Yes


to alter the
cluster:AlterTopicDynamicConfiguration
cluster:Connect
dynamic
configuration of kafka-
topics on a cluster, cluster:DescribeTopicDynamicConfiguration
equivalent to
Apache Kafka's
ALTER_CONFIGS
TOPIC ACL.

kafka- Grants permission kafka- topic Yes


cluster:ReadDatato read data from cluster:Connect
topics on a cluster,
equivalent to kafka-
Apache Kafka's cluster:DescribeTopic
READ TOPIC ACL.
kafka-
cluster:AlterGroup

kafka- Grants permission kafka- topic Yes


to write data to
cluster:WriteData cluster:Connect
topics on a cluster,
equivalent to kafka-
Apache Kafka's cluster:DescribeTopic
WRITE TOPIC ACL

kafka- Grants permission kafka- group Yes


to describe groups
cluster:DescribeGroup cluster:Connect
on a cluster,
equivalent to
Apache Kafka's
DESCRIBE GROUP
ACL.

kafka- Grants permission kafka- group Yes


to join groups on a
cluster:AlterGroup cluster:Connect
cluster, equivalent
to Apache Kafka's kafka-
READ GROUP ACL. cluster:DescribeGroup

78
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

Action Description Required actions Required Applicable to


resources serverless clusters

kafka- Grants permission kafka- group Yes


to delete groups
cluster:DeleteGroup cluster:Connect
on a cluster,
equivalent to kafka-
Apache Kafka's cluster:DescribeGroup
DELETE GROUP
ACL.

kafka- Grants permission kafka- transactional-id Yes


to describe
cluster:DescribeTransactionalId cluster:Connect
transactional
IDs on a cluster,
equivalent to
Apache Kafka's
DESCRIBE
TRANSACTIONAL_ID
ACL.

kafka- Grants permission kafka- transactional-id Yes


to alter
cluster:AlterTransactionalId cluster:Connect
transactional
IDs on a cluster, kafka-
equivalent cluster:DescribeTransactionalId
to Apache
Kafka's WRITE kafka-
TRANSACTIONAL_ID cluster:WriteData
ACL.

You can use the asterisk (*) wildcard any number of times in an action after the colon. The following are
examples.

• kafka-cluster:*Topic stands for kafka-cluster:CreateTopic, kafka-


cluster:DescribeTopic, kafka-cluster:AlterTopic, and kafka-cluster:DeleteTopic.
It doesn't include kafka-cluster:DescribeTopicDynamicConfiguration or kafka-
cluster:AlterTopicDynamicConfiguration.
• kafka-cluster:* stands for all permissions.

Resources
The following table shows the four types of resources that you can use in an authorization policy
when you use IAM access control for Amazon MSK. You can get the cluster Amazon Resource Name
(ARN) from the Amazon Web Services Management Console or by using the DescribeCluster API or the
describe-cluster Amazon CLI command. You can then use the cluster ARN to construct topic, group, and
transaction ID ARNs. To specify a resource in an authorization policy, use that resource's ARN.

Resource ARN format

Cluster arn:aws:kafka:region:account-id:cluster/cluster-name/cluster-uuid

Topic arn:aws:kafka:region:account-id:topic/cluster-name/cluster-uuid/topic-
name

79
Amazon Managed Streaming for
Apache Kafka Developer Guide
IAM access control

Resource ARN format

Group arn:aws:kafka:region:account-id:group/cluster-name/cluster-uuid/group-
name

Transaction ID arn:aws:kafka:region:account-id:transactional-id/cluster-name/cluster-
uuid/transactional-id

You can use the asterisk (*) wildcard any number of times anywhere in the part of the ARN that comes
after :cluster/, :topic/, :group/, and :transaction-id/. The following are some examples of
how you can use the asterisk (*) wildcard to refer to multiple resources:

• arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/*: all the topics in any


cluster named MyTestCluster, regardless of the cluster's UUID.
• arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/abcd1234-0123-
abcd-5678-1234abcd-1/*_test: all topics whose name ends with "_test" in the cluster whose
name is MyTestCluster and whose UUID is abcd1234-0123-abcd-5678-1234abcd-1.
• arn:aws:kafka:us-east-1:0123456789012:transactional-id/MyTestCluster/
*/5555abcd-1111-abcd-1234-abcd1234-1: all transactions whose transactional ID is
5555abcd-1111-abcd-1234-abcd1234-1, across all incarnations of a cluster named MyTestCluster
in your account. This means that if you create a cluster named MyTestCluster, then delete it, and
then create another cluster by the same name, you can use this resource ARN to represent the same
transactions ID on both clusters. However, the deleted cluster isn't accessible.

Common use cases


The first column in the following table shows some common use cases. To authorize a client to carry out
a given use case, include the required actions for that use case in the client's authorization policy, and set
Effect to Allow.

For information about all the actions that are part of IAM access control for Amazon MSK, see the section
called “Semantics of actions and resources” (p. 76).
Note
Actions are denied by default. You must explicitly allow every action that you want to authorize
the client to perform.

Use case Required actions

Admin kafka-cluster:*

Create a topic kafka-cluster:Connect

kafka-cluster:CreateTopic

Produce data kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

Consume data kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:DescribeGroup

80
Amazon Managed Streaming for
Apache Kafka Developer Guide
Mutual TLS authentication

Use case Required actions


kafka-cluster:AlterGroup

kafka-cluster:ReadData

Produce data idempotently kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

kafka-cluster:WriteDataIdempotently

Produce data transactionally kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

kafka-cluster:DescribeTransactionalId

kafka-cluster:AlterTransactionalId

Describe the configuration of a cluster kafka-cluster:Connect

kafka-
cluster:DescribeClusterDynamicConfiguration

Update the configuration of a cluster kafka-cluster:Connect

kafka-
cluster:DescribeClusterDynamicConfiguration

kafka-
cluster:AlterClusterDynamicConfiguration

Describe the configuration of a topic kafka-cluster:Connect

kafka-
cluster:DescribeTopicDynamicConfiguration

Update the configuration of a topic kafka-cluster:Connect

kafka-
cluster:DescribeTopicDynamicConfiguration

kafka-
cluster:AlterTopicDynamicConfiguration

Alter a topic kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:AlterTopic

Mutual TLS authentication


You can enable client authentication with TLS for connections from your applications to your Amazon
MSK brokers and ZooKeeper nodes. To use client authentication, you need an ACM Private CA. The ACM

81
Amazon Managed Streaming for
Apache Kafka Developer Guide
Mutual TLS authentication

Private CA can be either in the same Amazon Web Services account as your cluster, or in a different
account. For information about private CAs, see Creating and Managing a Private CA.
Note
TLS authentication is not currently available in the Beijing and Ningxia Regions.

Amazon MSK doesn't support certificate revocation lists (CRLs). To control access to your cluster topics
or block compromised certificates, use Apache Kafka ACLs and Amazon security groups. For information
about using Apache Kafka ACLs, see the section called “Apache Kafka ACLs” (p. 88).

This topic contains the following sections:


• To create a cluster that supports client authentication (p. 82)
• To set up a client to use authentication (p. 83)
• To produce and consume messages using authentication (p. 84)

To create a cluster that supports client authentication


This procedure shows you how to enable client authentication using a CA that is hosted by ACM.
Note
We highly recommend using independent ACM PCAs for each MSK cluster when you use
mutual TLS to control access. Doing so will ensure that TLS certificates signed by PCAs only
authenticate with a single MSK cluster.

1. Create a file named clientauthinfo.json with the following contents. Replace Private-CA-
ARN with the ARN of your PCA.

{
"Tls": {
"CertificateAuthorityArnList": ["Private-CA-ARN"]
}
}

2. Create a file named brokernodegroupinfo.json as described in the section called “Creating a


cluster using the Amazon CLI” (p. 12).
3. Client authentication requires that you also enable encryption in transit between clients and brokers.
Create a file named encryptioninfo.json with the following contents. Replace KMS-Key-ARN
with the ARN of your KMS key. You can set ClientBroker to TLS or TLS_PLAINTEXT.

{
"EncryptionAtRest": {
"DataVolumeKMSKeyId": "KMS-Key-ARN"
},
"EncryptionInTransit": {
"InCluster": true,
"ClientBroker": "TLS"
}
}

For more information about encryption, see the section called “Encryption” (p. 58).
4. On a machine where you have the Amazon CLI installed, run the following command to create a
cluster with authentication and in-transit encryption enabled. Save the cluster ARN provided in the
response.

aws kafka create-cluster --cluster-name "AuthenticationTest" --broker-node-group-info


file://brokernodegroupinfo.json --encryption-info file://encryptioninfo.json --client-

82
Amazon Managed Streaming for
Apache Kafka Developer Guide
Mutual TLS authentication

authentication file://clientauthinfo.json --kafka-version "2.2.1" --number-of-broker-


nodes 3

To set up a client to use authentication


1. Create an Amazon EC2 instance to use as a client machine. For simplicity, create this instance in the
same VPC you used for the cluster. See the section called “Step 2: Create a client machine” (p. 5) for
an example of how to create such a client machine.
2. Create a topic. For an example, see the instructions under the section called “Step 3: Create a
topic” (p. 6).
3. On a machine where you have the Amazon CLI installed, run the following command to get the
bootstrap brokers of the cluster. Replace Cluster-ARN with the ARN of your cluster.

aws kafka get-bootstrap-brokers --cluster-arn Cluster-ARN

Save the string associated with BootstrapBrokerStringTls in the response.


4. On your client machine, run the following command to use the JVM trust store to create your client
trust store. If your JVM path is different, adjust the command accordingly.

cp /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64/jre/lib/security/
cacerts kafka.client.truststore.jks

5. On your client machine, run the following command to create a private key for your client. Replace
Distinguished-Name, Example-Alias, Your-Store-Pass, and Your-Key-Pass with strings of
your choice.

keytool -genkey -keystore kafka.client.keystore.jks -validity 300 -storepass Your-


Store-Pass -keypass Your-Key-Pass -dname "CN=Distinguished-Name" -alias Example-Alias -
storetype pkcs12

6. On your client machine, run the following command to create a certificate request with the private
key you created in the previous step.

keytool -keystore kafka.client.keystore.jks -certreq -file client-cert-sign-request -


alias Example-Alias -storepass Your-Store-Pass -keypass Your-Key-Pass

7. Open the client-cert-sign-request file and ensure that it starts with -----BEGIN
CERTIFICATE REQUEST----- and ends with -----END CERTIFICATE REQUEST-----. If it
starts with -----BEGIN NEW CERTIFICATE REQUEST-----, delete the word NEW (and the single
space that follows it) from the beginning and the end of the file.
8. On a machine where you have the Amazon CLI installed, run the following command to sign your
certificate request. Replace Private-CA-ARN with the ARN of your PCA. You can change the
validity value if you want. Here we use 300 as an example.

aws acm-pca issue-certificate --certificate-authority-arn Private-CA-ARN --csr


fileb://client-cert-sign-request --signing-algorithm "SHA256WITHRSA" --validity
Value=300,Type="DAYS"

Save the certificate ARN provided in the response.


Note
To retrieve your client certificate, use the acm-pca get-certificate command and
specify your certificate ARN. For more information, see get-certificate in the Amazon CLI
Command Reference.

83
Amazon Managed Streaming for
Apache Kafka Developer Guide
Mutual TLS authentication

9. Run the following command to get the certificate that ACM signed for you. Replace Certificate-
ARN with the ARN you obtained from the response to the previous command.

aws acm-pca get-certificate --certificate-authority-arn Private-CA-ARN --certificate-


arn Certificate-ARN

10. From the JSON result of running the previous command, copy the strings associated with
Certificate and CertificateChain. Paste these two strings in a new file named signed-
certificate-from-acm. Paste the string associated with Certificate first, followed by the string
associated with CertificateChain. Replace the \n characters with new lines. The following is the
structure of the file after you paste the certificate and certificate chain in it.

-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

11. Run the following command on the client machine to add this certificate to your keystore so you can
present it when you talk to the MSK brokers.

keytool -keystore kafka.client.keystore.jks -import -file signed-certificate-from-acm -


alias Example-Alias -storepass Your-Store-Pass -keypass Your-Key-Pass

12. Create a file named client.properties with the following contents. Adjust the truststore and
keystore locations to the paths where you saved kafka.client.truststore.jks.

security.protocol=SSL
ssl.truststore.location=/tmp/kafka_2.12-2.2.1/kafka.client.truststore.jks
ssl.keystore.location=/tmp/kafka_2.12-2.2.1/kafka.client.keystore.jks
ssl.keystore.password=Your-Store-Pass
ssl.key.password=Your-Key-Pass

To produce and consume messages using authentication


1. Run the following command to create a topic.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --zookeeper ZooKeeper-


Connection-String --replication-factor 3 --partitions 1 --topic ExampleTopic

2. Run the following command to start a console producer. The file named client.properties is
the one you created in the previous procedure.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-
list BootstrapBroker-String --topic ExampleTopic --producer.config client.properties

3. In a new command window on your client machine, run the following command to start a console
consumer.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-
server BootstrapBroker-String --topic ExampleTopic --consumer.config client.properties

4. Type messages in the producer window and watch them appear in the consumer window.

84
Amazon Managed Streaming for
Apache Kafka Developer Guide
SASL/SCRAM authentication

Username and password authentication with Amazon


Secrets Manager
You can control access to your Amazon MSK clusters using usernames and passwords that are stored
and secured using Amazon Secrets Manager. Storing user credentials in Secrets Manager reduces the
overhead of cluster authentication such as auditing, updating, and rotating credentials. Secrets Manager
also lets you share user credentials across clusters.

This topic contains the following sections:


• How it works (p. 85)
• Setting up SASL/SCRAM authentication for an Amazon MSK cluster (p. 85)
• Working with users (p. 87)
• Limitations (p. 88)

How it works
Username and password authentication for Amazon MSK uses SASL/SCRAM (Simple Authentication
and Security Layer/ Salted Challenge Response Mechanism) authentication. To set up username and
password authentication for a cluster, you create a Secret resource in Amazon Secrets Manager, and
associate user names and passwords with that secret.

SASL/SCRAM is defined in RFC 5802. SCRAM uses secured hashing algorithms, and does not transmit
plaintext passwords between client and server.
Note
When you set up SASL/SCRAM authentication for your cluster, Amazon MSK turns on TLS
encryption for all traffic between clients and brokers.

Setting up SASL/SCRAM authentication for an Amazon MSK


cluster
To set up a secret in Amazon Secrets Manager, follow the Creating and Retrieving a Secret tutorial in the
Amazon Secrets Manager User Guide.

Note the following requirements when creating a secret for an Amazon MSK cluster:

• Choose Other type of secrets (e.g. API key) for the secret type.
• Your secret name must begin with the prefix AmazonMSK_.
• You must either use an existing custom Amazon KMS key or create a new custom Amazon KMS key for
your secret. Secrets Manager uses the default Amazon KMS key for a secret by default.
Important
A secret created with the default Amazon KMS key cannot be used with an Amazon MSK
cluster.
• Your user and password data must be in the following format to enter key-value pairs using the
Plaintext option.

{
"username": "alice",
"password": "alice-secret"
}

85
Amazon Managed Streaming for
Apache Kafka Developer Guide
SASL/SCRAM authentication

• Record the ARN (Amazon Resource Name) value for your secret.
• Important
You can't associate a Secrets Manager secret with a cluster that exceeds the limits described in
the section called “ Right-size your cluster: Number of partitions per broker” (p. 138).
• If you use the Amazon CLI to create the secret, specify a key ID or ARN for the kms-key-id parameter.
Don't specify an alias.
• To associate the secret with your cluster, use either the Amazon MSK console, or the
BatchAssociateScramSecret operation.
Important
When you associate a secret with a cluster, Amazon MSK attaches a resource policy to the
secret that allows your cluster to access and read the secret values that you defined. You
should not modify this resource policy. Doing so can prevent your cluster from accessing your
secret.

The following example JSON input for the BatchAssociateScramSecret operation associates a
secret with a cluster:

{
"clusterArn" : "arn:aws:kafka:us-west-2:0123456789019:cluster/SalesCluster/abcd1234-
abcd-cafe-abab-9876543210ab-4",
"secretArnList": [
"arn:aws:secretsmanager:us-west-2:0123456789019:secret:AmazonMSK_MyClusterSecret"
]
}

Connecting to your cluster with a username and password


After you create a secret and associate it with your cluster, you can connect your client to the cluster.
The following example steps demonstrate how to connect a client to a cluster that uses SASL/SCRAM
authentication, and how to produce to and consume from an example topic.

1. Retrieve your cluster details with the following command. Replace ClusterArn with the Amazon
Resource Name (ARN) of your cluster:

aws kafka describe-cluster --cluster-arn "ClusterArn"

From the JSON result of the command, save the value associated with the string named
ZookeeperConnectString.
2. To create an example topic, run the following command on your client machine. Replace
ZookeeperConnectString with the string you recorded in the previous step.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --
zookeeper ZookeeperConnectString --replication-factor 3 --partitions 1 --
topic ExampleTopicName

3. On your client machine, create a JAAS configuration file that contains the user credentials stored
in your secret. For example, for the user alice, create a file called users_jaas.conf with the
following content.

KafkaClient {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="alice"
password="alice-secret";
};

86
Amazon Managed Streaming for
Apache Kafka Developer Guide
SASL/SCRAM authentication

4. Use the following command to export your JAAS config file as a KAFKA_OPTS environment
parameter.

export KAFKA_OPTS=-Djava.security.auth.login.config=<path-to-jaas-file>/users_jaas.conf

5. Create a file named kafka.client.truststore.jks in a ./tmp directory.


6. Use the following command to copy the JDK key store file from your JVM cacerts folder into the
kafka.client.truststore.jks file that you created in the previous step. Replace JDKFolder
with the name of the JDK folder on your instance. For example, your JDK folder might be named
java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64.

cp /usr/lib/jvm/JDKFolder/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks

7. In the bin directory of your Apache Kafka installation, create a client properties file called
client_sasl.properties with the following contents. This file defines the SASL mechanism and
protocol.

security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
ssl.truststore.location=<path-to-keystore-file>/kafka.client.truststore.jks

8. Retrieve your bootstrap brokers string with the following command. Replace ClusterArn with the
Amazon Resource Name (ARN) of your cluster:

aws kafka get-bootstrap-brokers --cluster-arn ClusterArn

From the JSON result of the command, save the value associated with the string named
BootstrapBrokerStringSaslScram.
9. To produce to the example topic that you created, run the following command on your client
machine. Replace BootstrapBrokerStringSaslScram with the value that you retrieved in the
previous step.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-
list BootstrapBrokerStringSaslScram --topic ExampleTopicName --producer.config
client_sasl.properties

10. To consume from the topic you created, run the following command on your client machine. Replace
BootstrapBrokerStringSaslScram with the value that you obtained previously.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-
server BootstrapBrokerStringSaslScram --topic ExampleTopicName --from-beginning --
consumer.config client_sasl.properties

Working with users


Creating users: You create users in your secret as key-value pairs. When you use the Plaintext option in
the Secrets Manager console, you should specify username and password data in the following format.

{
"username": "alice",
"password": "alice-secret"
}

87
Amazon Managed Streaming for
Apache Kafka Developer Guide
Apache Kafka ACLs

Revoking user access: To revoke a user's credentials to access a cluster, we recommend that you
first remove or enforce an ACL on the cluster, and then disassociate the secret. This is because of the
following:

• Removing a user does not close existing connections.


• Changes to your secret take up to 10 minutes to propagate.

For information about using an ACL with Amazon MSK, see Apache Kafka ACLs (p. 88).

We recommend that you restrict access to your zookeeper nodes to prevent users from modifying ACLs.
For more information, see Controlling access to Apache ZooKeeper (p. 90).

Limitations
Note the following limitations when using SCRAM secrets:

• Amazon MSK only supports SCRAM-SHA-512 authentication.


• An Amazon MSK cluster can have up to 1000 users.
• You must use an Amazon KMS key with your Secret. You cannot use a Secret that uses the default
Secrets Manager encryption key with Amazon MSK. For information about creating a KMS key, see
Creating symmetric encryption KMS keys.
• You can't use an asymmetric KMS key with Secrets Manager.
• You can associate up to 10 secrets with a cluster at a time using the BatchAssociateScramSecret
operation.
• The name of secrets associated with an Amazon MSK cluster must have the prefix AmazonMSK_.
• Secrets associated with an Amazon MSK cluster must be in the same Amazon Web Services account
and Amazon region as the cluster.

Apache Kafka ACLs


Apache Kafka has a pluggable authorizer and ships with an out-of-box authorizer implementation
that uses Apache ZooKeeper to store all ACLs. Amazon MSK enables this authorizer in the
server.properties file on the brokers. For Apache Kafka version 2.4.1, the authorizer is
AclAuthorizer. For earlier versions of Apache Kafka, it is SimpleAclAuthorizer.

Apache Kafka ACLs have the format "Principal P is [Allowed/Denied] Operation O From Host H on any
Resource R matching ResourcePattern RP". If RP doesn't match a specific resource R, then R has no
associated ACLs, and therefore no one other than super users is allowed to access R. To change this
Apache Kafka behavior, you set the property allow.everyone.if.no.acl.found to true. Amazon
MSK sets it to true by default. This means that with Amazon MSK clusters, if you don't explicitly set
ACLs on a resource, all principals can access this resource. If you enable ACLs on a resource, only the
authorized principals can access it. If you want to restrict access to a topic and authorize a client using
TLS mutual authentication, add ACLs using the Apache Kafka authorizer CLI. For more information about
adding, removing, and listing ACLs, see Kafka Authorization Command Line Interface.

In addition to the client, you also need to grant all your brokers access to your topics so that the brokers
can replicate messages from the primary partition. If the brokers don't have access to a topic, replication
for the topic fails.

To add or remove read and write access to a topic

1. Add your brokers to the ACL table to allow them to read from all topics that have ACLs in place. To
grant your brokers read access to a topic, run the following command on a client machine that can
communicate with the MSK cluster.

88
Amazon Managed Streaming for
Apache Kafka Developer Guide
Changing security groups

Replace ZooKeeper-Connection-String with your Apache ZooKeeper connection string.


For information on how to get this string, see the section called “Getting the Apache ZooKeeper
connection string” (p. 14).

Replace Distinguished-Name with the DNS of any of your cluster's bootstrap


brokers, then replace the string before the first period in this distinguished
name by an asterisk (*). For example, if one of your cluster's bootstrap
brokers has the DNS b-6.mytestcluster.67281x.c4.kafka.us-
east-1.amazonaws.com, replace Distinguished-Name in the following command with
*.mytestcluster.67281x.c4.kafka.us-east-1.amazonaws.com. For information on how to
get the bootstrap brokers, see the section called “Getting the bootstrap brokers” (p. 16).

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=ZooKeeper-Connection-String --add --allow-principal
"User:CN=Distinguished-Name" --operation Read --group=* --topic Topic-Name

2. To grant read access to a topic, run the following command on your client machine. If you use
mutual TLS authentication, use the same Distinguished-Name you used when you created the
private key.

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=ZooKeeper-Connection-String --add --allow-principal
"User:CN=Distinguished-Name" --operation Read --group=* --topic Topic-Name

To remove read access, you can run the same command, replacing --add with --remove.
3. To grant write access to a topic, run the following command on your client machine. If you use
mutual TLS authentication, use the same Distinguished-Name you used when you created the
private key.

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=ZooKeeper-Connection-String --add --allow-principal
"User:CN=Distinguished-Name" --operation Write --topic Topic-Name

To remove write access, you can run the same command, replacing --add with --remove.

Changing an Amazon MSK cluster's security group


This page explains how to change the security group of an existing MSK cluster. You might need to
change a cluster's security group in order to provide access to a certain set of users or to limit access to
the cluster. For information about security groups, see Security groups for your VPC in the Amazon VPC
user guide.

1. Use the ListNodes API or the list-nodes command in the Amazon CLI to get a list of the brokers in
your cluster. The results of this operation include the IDs of the elastic network interfaces (ENIs) that
are associated with the brokers.
2. Sign in to the Amazon Web Services Management Console and open the Amazon EC2 console at
https://console.amazonaws.cn/ec2/.
3. Using the dropdown list near the top-right corner of the screen, select the Region in which the
cluster is deployed.
4. In the left pane, under Network & Security, choose Network Interfaces.
5. Select the first ENI that you obtained in the first step. Choose the Actions menu at the top of the
screen, then choose Change Security Groups. Assign the new security group to this ENI. Repeat this
step for each of the ENIs that you obtained in the first step.

89
Amazon Managed Streaming for
Apache Kafka Developer Guide
Controlling access to Apache ZooKeeper

Note
Changes that you make to a cluster's security group using the Amazon EC2 console aren't
reflected in the MSK console under Network settings.
6. Configure the new security group's rules to ensure that your clients have access to the brokers. For
information about setting security group rules, see Adding, Removing, and Updating Rules in the
Amazon VPC user guide.

Important
If you change the security group that is associated with the brokers of a cluster, and then
add new brokers to that cluster, Amazon MSK associates the new brokers with the original
security group that was associated with the cluster when the cluster was created. However,
for a cluster to work correctly, all of its brokers must be associated with the same security
group. Therefore, if you add new brokers after changing the security group, you must follow the
previous procedure again and update the ENIs of the new brokers.

Controlling access to Apache ZooKeeper


For security reasons you can limit access to the Apache ZooKeeper nodes that are part of your Amazon
MSK cluster. To limit access to the nodes, you can assign a separate security group to them. You can then
decide who gets access to that security group.

This topic contains the following sections:


• To place your Apache ZooKeeper nodes in a separate security group (p. 90)
• Using TLS security with Apache ZooKeeper (p. 91)

To place your Apache ZooKeeper nodes in a separate


security group
1. Get the Apache ZooKeeper connection string for your cluster. To learn how, see the section called
“Getting the Apache ZooKeeper connection string” (p. 14). The connection string contains the DNS
names of your Apache ZooKeeper nodes.
2. Use a tool like host or ping to convert the DNS names you obtained in the previous step to IP
addresses. Save these IP addresses because you need them later in this procedure.
3. Sign in to the Amazon Web Services Management Console and open the Amazon EC2 console at
https://console.amazonaws.cn/ec2/.
4. In the left pane, under NETWORK & SECURITY, choose Network Interfaces.
5. In the search field above the table of network interfaces, type the name of your cluster, then type
return. This limits the number of network interfaces that appear in the table to those interfaces that
are associated with your cluster.
6. Select the check box at the beginning of the row that corresponds to the first network interface in
the list.
7. In the details pane at the bottom of the page, look for the Primary private IPv4 IP. If this IP address
matches one of the IP addresses you obtained in the first step of this procedure, this means that this
network interface is assigned to an Apache ZooKeeper node that is part of your cluster. Otherwise,
deselect the check box next to this network interface, and select the next network interface in the
list. The order in which you select the network interfaces doesn't matter. In the next steps, you will
perform the same operations on all network interfaces that are assigned to Apache ZooKeeper
nodes, one by one.

90
Amazon Managed Streaming for
Apache Kafka Developer Guide
Using TLS security with Apache ZooKeeper

8. When you select a network interface that corresponds to an Apache ZooKeeper node, choose the
Actions menu at the top of the page, then choose Change Security Groups. Assign a new security
group to this network interface. For information about creating security groups, see Creating a
Security Group in the Amazon VPC documentation.
9. Repeat the previous step to assign the same new security group to all the network interfaces that
are associated with the Apache ZooKeeper nodes of your cluster.
10. You can now choose who has access to this new security group. For information about setting
security group rules, see Adding, Removing, and Updating Rules in the Amazon VPC documentation.

Using TLS security with Apache ZooKeeper


You can use TLS security for encryption in transit between your clients and your Apache ZooKeeper
nodes. To implement TLS security with your Apache ZooKeeper nodes, do the following:

• Clusters must use Apache Kafka version 2.5.1 or later to use TLS security with Apache ZooKeeper.
• Enable TLS security when you create or configure your cluster. Clusters created with Apache
Kafka version 2.5.1 or later with TLS enabled automatically use TLS security with Apache
ZooKeeper endpoints. For information about setting up TLS security, see How do I get started with
encryption? (p. 59).
• Retrieve the TLS Apache ZooKeeper endpoints using the DescribeCluster operation.
• Create an Apache ZooKeeper configuration file for use with the kafka-configs.sh and kafka-
acls.sh tools, or with the ZooKeeper shell. With each tool, you use the --zk-tls-config-file
parameter to specify your Apache ZooKeeper config.

The following example shows a typical Apache ZooKeeper configuration file:

zookeeper.ssl.client.enable=true
zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
zookeeper.ssl.keystore.location=kafka.jks
zookeeper.ssl.keystore.password=test1234
zookeeper.ssl.truststore.location=truststore.jks
zookeeper.ssl.truststore.password=test1234

• For other commands (such as kafka-topics), you must use the KAFKA_OPTS environment variable
to configure Apache ZooKeeper parameters. The following example shows how to configure the
KAFKA_OPTS environment variable to pass Apache ZooKeeper parameters into other commands:

export KAFKA_OPTS="
-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
-Dzookeeper.client.secure=true
-Dzookeeper.ssl.trustStore.location=/home/ec2-user/kafka.client.truststore.jks
-Dzookeeper.ssl.trustStore.password=changeit"

After you configure the KAFKA_OPTS environment variable, you can use CLI commands normally. The
following example creates an Apache Kafka topic using the Apache ZooKeeper configuration from the
KAFKA_OPTS environment variable:

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --
zookeeper ZooKeeperTLSConnectString --replication-factor 3 --partitions 1 --topic
AWSKafkaTutorialTopic

Note
The names of the parameters you use in your Apache ZooKeeper configuration file and those
you use in your KAFKA_OPTS environment variable are not consistent. Pay attention to which

91
Amazon Managed Streaming for
Apache Kafka Developer Guide
Logging

names you use with which parameters in your configuration file and KAFKA_OPTS environment
variable.

For more information about accessing your Apache ZooKeeper nodes with TLS, see KIP-515: Enable ZK
client to use the new TLS supported authentication.

Logging
You can deliver Apache Kafka broker logs to one or more of the following destination types: Amazon
CloudWatch Logs, Amazon S3, Amazon Kinesis Data Firehose. You can also log Amazon MSK API calls
with Amazon CloudTrail.

Broker logs
Broker logs enable you to troubleshoot your Apache Kafka applications and to analyze their
communications with your MSK cluster. You can configure your new or existing MSK cluster to deliver
INFO-level broker logs to one or more of the following types of destination resources: a CloudWatch log
group, an S3 bucket, a Kinesis Data Firehose delivery stream. Through Kinesis Data Firehose you can then
deliver the log data from your delivery stream to OpenSearch Service. You must create a destination
resource before you configure your cluster to deliver broker logs to it. Amazon MSK doesn't create these
destination resources for you if they don't already exist. For information about these three types of
destination resources and how to create them, see the following documentation:

• Amazon CloudWatch Logs


• Amazon S3
• Amazon Kinesis Data Firehose

Note
Amazon MSK does not support delivering broker logs to Kinesis Data Firehose in the Asia Pacific
(Osaka) Region.

Required permissions
To configure a destination for Amazon MSK broker logs, the IAM identity that you use for
Amazon MSK actions must have the permissions described in the Amazon managed policy:
AmazonMSKFullAccess (p. 69) policy.

To stream broker logs to an S3 bucket, you also need the s3:PutBucketPolicy permission. For
information about S3 bucket policies, see How Do I Add an S3 Bucket Policy? in the Amazon S3 Console
User Guide. For information about IAM policies in general, see Access Management in the IAM User
Guide.

Required KMS key policy for use with SSE-KMS buckets


If you enabled server-side encryption for your S3 bucket using Amazon KMS-managed keys (SSE-KMS)
with a customer managed key, add the following to the key policy for your KMS key so that Amazon MSK
can write broker files to the bucket.

{
"Sid": "Allow Amazon MSK to use the key.",
"Effect": "Allow",
"Principal": {
"Service": [

92
Amazon Managed Streaming for
Apache Kafka Developer Guide
Broker logs

"delivery.logs.amazonaws.com"
]
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}

Configuring broker logs using the Amazon Web Services


Management Console
If you are creating a new cluster, look for the Broker log delivery heading in the Monitoring section. You
can specify the destinations to which you want Amazon MSK to deliver your broker logs.

For an existing cluster, choose the cluster from your list of clusters, then choose the Properties tab.
Scroll down to the Log delivery section and then choose its Edit button. You can specify the destinations
to which you want Amazon MSK to deliver your broker logs.

Configuring broker logs using the Amazon CLI


When you use the create-cluster or the update-monitoring commands, you can optionally
specify the logging-info parameter and pass to it a JSON structure like the following example. In this
JSON, all three destination types are optional.

{
"BrokerLogs": {
"S3": {
"Bucket": "ExampleBucketName",
"Prefix": "ExamplePrefix",
"Enabled": true
},
"Firehose": {
"DeliveryStream": "ExampleDeliveryStreamName",
"Enabled": true
},
"CloudWatchLogs": {
"Enabled": true,
"LogGroup": "ExampleLogGroupName"
}
}
}

Configuring broker logs using the API


You can specify the optional loggingInfo structure in the JSON that you pass to the CreateCluster or
UpdateMonitoring operations.
Note
By default, when broker logging is enabled, Amazon MSK logs INFO level logs to the specified
destinations. However, users of Apache Kafka 2.4.X and later can dynamically set the broker
log level to any of the log4j log levels. For information about dynamically setting the broker
log level, see KIP-412: Extend Admin API to support dynamic application log levels. If you
dynamically set the log level to DEBUG or TRACE, we recommend using Amazon S3 or Kinesis

93
Amazon Managed Streaming for
Apache Kafka Developer Guide
CloudTrail events

Data Firehose as the log destination. If you use CloudWatch Logs as a log destination and you
dynamically enable DEBUG or TRACE level logging, Amazon MSK may continuously deliver a
sample of logs. This can significantly impact broker performance and should only be used when
the INFO log level is not verbose enough to determine the root cause of an issue.

Logging API calls with Amazon CloudTrail


Note
Amazon CloudTrail logs are available for Amazon MSK only when you use IAM access
control (p. 73).

Amazon MSK is integrated with Amazon CloudTrail, a service that provides a record of actions taken by
a user, role, or an Amazon service in Amazon MSK. CloudTrail captures API calls for as events. The calls
captured include calls from the Amazon MSK console and code calls to the Amazon MSK API operations.
It also captures Apache Kafka actions such as creating and altering topics and groups.

If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket,
including events for Amazon MSK. If you don't configure a trail, you can still view the most recent
events in the CloudTrail console in Event history. Using the information collected by CloudTrail, you can
determine the request that was made to Amazon MSK or the Apache Kafka action, the IP address from
which the request was made, who made the request, when it was made, and additional details.

To learn more about CloudTrail, including how to configure and enable it, see the Amazon CloudTrail
User Guide.

Amazon MSK information in CloudTrail


CloudTrail is enabled on your Amazon Web Services account when you create the account. When
supported event activity occurs in an MSK cluster, that activity is recorded in a CloudTrail event along
with other Amazon service events in Event history. You can view, search, and download recent events
in your Amazon Web Services account. For more information, see Viewing Events with CloudTrail Event
History.

For an ongoing record of events in your Amazon Web Services account, including events for Amazon
MSK, create a trail. A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. By default,
when you create a trail in the console, the trail applies to all Regions. The trail logs events from all
Regions in the Amazon partition and delivers the log files to the Amazon S3 bucket that you specify.
Additionally, you can configure other Amazon services to further analyze and act upon the event data
collected in CloudTrail logs. For more information, see the following:

• Overview for Creating a Trail


• CloudTrail Supported Services and Integrations
• Configuring Amazon SNS Notifications for CloudTrail
• Receiving CloudTrail Log Files from Multiple Regions and Receiving CloudTrail Log Files from Multiple
Accounts

Amazon MSK logs all Amazon MSK operations as events in CloudTrail log files. In addition, it logs the
following Apache Kafka actions.

• kafka-cluster:DescribeClusterDynamicConfiguration
• kafka-cluster:AlterClusterDynamicConfiguration
• kafka-cluster:CreateTopic
• kafka-cluster:DescribeTopicDynamicConfiguration

94
Amazon Managed Streaming for
Apache Kafka Developer Guide
CloudTrail events

• kafka-cluster:AlterTopic
• kafka-cluster:AlterTopicDynamicConfiguration
• kafka-cluster:DeleteTopic

Every event or log entry contains information about who generated the request. The identity
information helps you determine the following:

• Whether the request was made with root or Amazon Identity and Access Management (IAM) user
credentials.
• Whether the request was made with temporary security credentials for a role or federated user.
• Whether the request was made by another Amazon service.

For more information, see the CloudTrail userIdentity Element.

Example: Amazon MSK log file entries


A trail is a configuration that enables delivery of events as log files to an Amazon S3 bucket that you
specify. CloudTrail log files contain one or more log entries. An event represents a single request from
any source and includes information about the requested action, the date and time of the action, request
parameters, and so on. CloudTrail log files aren't an ordered stack trace of the public API calls and
Apache Kafka actions, so they don't appear in any specific order.

The following example shows CloudTrail log entries that demonstrate the DescribeCluster and
DeleteCluster Amazon MSK actions.

{
"Records": [
{
"eventVersion": "1.05",
"userIdentity": {
"type": "IAMUser",
"principalId": "ABCDEF0123456789ABCDE",
"arn": "arn:aws:iam::012345678901:user/Joe",
"accountId": "012345678901",
"accessKeyId": "AIDACKCEVSQ6C2EXAMPLE",
"userName": "Joe"
},
"eventTime": "2018-12-12T02:29:24Z",
"eventSource": "kafka.amazonaws.com",
"eventName": "DescribeCluster",
"awsRegion": "us-east-1",
"sourceIPAddress": "192.0.2.0",
"userAgent": "aws-cli/1.14.67 Python/3.6.0 Windows/10 botocore/1.9.20",
"requestParameters": {
"clusterArn": "arn%3Aaws%3Akafka%3Aus-east-1%3A012345678901%3Acluster
%2Fexamplecluster%2F01234567-abcd-0123-abcd-abcd0123efa-2"
},
"responseElements": null,
"requestID": "bd83f636-fdb5-abcd-0123-157e2fbf2bde",
"eventID": "60052aba-0123-4511-bcde-3e18dbd42aa4",
"readOnly": true,
"eventType": "AwsApiCall",
"recipientAccountId": "012345678901"
},
{
"eventVersion": "1.05",
"userIdentity": {
"type": "IAMUser",
"principalId": "ABCDEF0123456789ABCDE",

95
Amazon Managed Streaming for
Apache Kafka Developer Guide
CloudTrail events

"arn": "arn:aws:iam::012345678901:user/Joe",
"accountId": "012345678901",
"accessKeyId": "AIDACKCEVSQ6C2EXAMPLE",
"userName": "Joe"
},
"eventTime": "2018-12-12T02:29:40Z",
"eventSource": "kafka.amazonaws.com",
"eventName": "DeleteCluster",
"awsRegion": "us-east-1",
"sourceIPAddress": "192.0.2.0",
"userAgent": "aws-cli/1.14.67 Python/3.6.0 Windows/10 botocore/1.9.20",
"requestParameters": {
"clusterArn": "arn%3Aaws%3Akafka%3Aus-east-1%3A012345678901%3Acluster
%2Fexamplecluster%2F01234567-abcd-0123-abcd-abcd0123efa-2"
},
"responseElements": {
"clusterArn": "arn:aws:kafka:us-east-1:012345678901:cluster/
examplecluster/01234567-abcd-0123-abcd-abcd0123efa-2",
"state": "DELETING"
},
"requestID": "c6bfb3f7-abcd-0123-afa5-293519897703",
"eventID": "8a7f1fcf-0123-abcd-9bdb-1ebf0663a75c",
"readOnly": false,
"eventType": "AwsApiCall",
"recipientAccountId": "012345678901"
}
]
}

The following example shows a CloudTrail log entry that demonstrates the kafka-
cluster:CreateTopic action.

{
"eventVersion": "1.08",
"userIdentity": {
"type": "IAMUser",
"principalId": "ABCDEFGH1IJKLMN2P34Q5",
"arn": "arn:aws:iam::111122223333:user/Admin",
"accountId": "111122223333",
"accessKeyId": "CDEFAB1C2UUUUU3AB4TT",
"userName": "Admin"
},
"eventTime": "2021-03-01T12:51:19Z",
"eventSource": "kafka-cluster.amazonaws.com",
"eventName": "CreateTopic",
"awsRegion": "us-east-1",
"sourceIPAddress": "198.51.100.0/24",
"userAgent": "aws-msk-iam-auth/unknown-version/aws-internal/3 aws-sdk-java/1.11.970
Linux/4.14.214-160.339.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/25.272-b10 java/1.8.0_272
scala/2.12.8 vendor/Red_Hat,_Inc.",
"requestParameters": {
"kafkaAPI": "CreateTopics",
"resourceARN": "arn:aws:kafka:us-east-1:111122223333:topic/IamAuthCluster/3ebafd8e-
dae9-440d-85db-4ef52679674d-1/Topic9"
},
"responseElements": null,
"requestID": "e7c5e49f-6aac-4c9a-a1d1-c2c46599f5e4",
"eventID": "be1f93fd-4f14-4634-ab02-b5a79cb833d2",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"eventCategory": "Management",
"recipientAccountId": "111122223333"
}

96
Amazon Managed Streaming for
Apache Kafka Developer Guide
Compliance validation

Compliance validation for Amazon Managed


Streaming for Apache Kafka
Third-party auditors assess the security and compliance of Amazon Managed Streaming for Apache
Kafka as part of Amazon compliance programs. These include PCI and HIPAA BAA.

For a list of Amazon services in scope of specific compliance programs, see Amazon Web Services in
Scope by Compliance Program. For general information, see Amazon Compliance Programs.

You can download third-party audit reports using Amazon Artifact. For more information, see
Downloading Reports in Amazon Artifact.

Your compliance responsibility when using Amazon MSK is determined by the sensitivity of your data,
your company's compliance objectives, and applicable laws and regulations. Amazon provides the
following resources to help with compliance:

• Security and Compliance Quick Start GuidesSecurity and Compliance Quick Start Guides – These
deployment guides discuss architectural considerations and provide steps for deploying security- and
compliance-focused baseline environments on Amazon.
• Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how
companies can use Amazon to create HIPAA-compliant applications.
• Amazon Compliance Resources – This collection of workbooks and guides might apply to your industry
and location.
• Evaluating Resources with Rules in the Amazon Config Developer Guide – The Amazon Config service
assesses how well your resource configurations comply with internal practices, industry guidelines, and
regulations.
• Amazon Security Hub – This Amazon service provides a comprehensive view of your security state
within Amazon that helps you check your compliance with security industry standards and best
practices.

Resilience in Amazon Managed Streaming for


Apache Kafka
The Amazon global infrastructure is built around Amazon Regions and Availability Zones. Amazon
Regions provide multiple physically separated and isolated Availability Zones, which are connected
with low-latency, high-throughput, and highly redundant networking. With Availability Zones, you can
design and operate applications and databases that automatically fail over between zones without
interruption. Availability Zones are more highly available, fault tolerant, and scalable than traditional
single or multiple data center infrastructures.

For more information about Amazon Regions and Availability Zones, see Amazon Global Infrastructure.

Infrastructure security in Amazon Managed


Streaming for Apache Kafka
As a managed service, Amazon Managed Streaming for Apache Kafka is protected by the Amazon global
network security procedures that are described in the Amazon Web Services: Overview of Security
Processes whitepaper.

97
Amazon Managed Streaming for
Apache Kafka Developer Guide
Infrastructure security

You use Amazon published API calls to access Amazon MSK through the network. Clients must support
Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must also support
cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diffie-Hellman (DHE) or Elliptic Curve
Ephemeral Diffie-Hellman (ECDHE). Most modern systems such as Java 7 and later support these modes.

Additionally, requests must be signed by using an access key ID and a secret access key that is associated
with an IAM principal. Or you can use the Amazon Security Token Service (Amazon STS) to generate
temporary security credentials to sign requests.

98
Amazon Managed Streaming for
Apache Kafka Developer Guide
Public access

Connecting to an Amazon MSK


cluster
By default, clients can access an MSK cluster only if they're in the same VPC as the cluster. To connect
to your MSK cluster from a client that's in the same VPC as the cluster, make sure the cluster's security
group has an inbound rule that accepts traffic from the client's security group. For information about
setting up these rules, see Security Group Rules. For an example of how to access a cluster from an
Amazon EC2 instance that's in the same VPC as the cluster, see Getting started (p. 5).

To connect to your MSK cluster from a client that's outside the cluster's VPC, see the following topics:

Topics
• Public access (p. 99)
• Access from within Amazon but outside cluster's VPC (p. 101)
• Port information (p. 103)

Public access
Amazon MSK gives you the option to turn on public access to the brokers of MSK clusters running
Apache Kafka 2.6.0 or later versions. For security reasons, you can't turn on public access while creating
an MSK cluster. However, you can update an existing cluster to make it publicly accessible. You can also
create a new cluster and then update it to make it publicly accessible.

You can turn on public access to an MSK cluster at no additional cost, but standard Amazon data transfer
costs apply for data transfer in and out of the cluster. For information about pricing, see Amazon EC2
On-Demand Pricing.

To turn on public access to a cluster, first ensure that the cluster meets all of the following conditions:

• The subnets that are associated with the cluster must be public. This means that the subnets must
have an associated route table with an internet gateway attached. For information about how to
create and attach an internet gateway, see Internet gateways in the Amazon VPC user guide.
• Unauthenticated access control must be off and at least one of the following access-control methods
must be on: SASL/IAM, SASL/SCRAM, mTLS. For information about how to update the access-control
method of a cluster, see the section called “Updating security” (p. 28).
• Encryption within the cluster must be turned on. The on setting is the default when creating a cluster.
It's not possible to turn on encryption within the cluster for a cluster that was created with it turned
off. It is therefore not possible to turn on public access for a cluster that was created with encryption
within the cluster turned off.
• Plaintext traffic between brokers and clients must be off. For information about how to turn it off if it's
on, see the section called “Updating security” (p. 28).
• If you are using the SASL/SCRAM or mTLS access-control methods, you must set Apache Kafka
ACLs for your cluster. After you set the Apache Kafka ACLs for your cluster, update the cluster's
configuration to have the property allow.everyone.if.no.acl.found to false for the cluster. For
information about how to update the configuration of a cluster, see the section called “Configuration
operations” (p. 42). If you are using IAM access control and want to apply authorization policies or
update your authorization policies, see the section called “IAM access control” (p. 73). For information
about Apache Kafka ACLs, see the section called “Apache Kafka ACLs” (p. 88).

99
Amazon Managed Streaming for
Apache Kafka Developer Guide
Public access

After you ensure that an MSK cluster meets the conditions listed above, you can use the Amazon Web
Services Management Console, the Amazon CLI, or the Amazon MSK API to turn on public access.
After you turn on public access to a cluster, you can get a public bootstrap-brokers string for it. For
information about getting the bootstrap brokers for a cluster, see the section called “Getting the
bootstrap brokers” (p. 16).
Important
In addition to turning on public access, ensure that the cluster's security groups have inbound
TCP rules that allow public access from your IP address. We recommend that you make these
rules as restrictive as possible. For information about security groups and inbound rules, see
Security groups for your VPC in the Amazon VPC User Guide. For port numbers, see the section
called “Port information” (p. 103). For instructions on how to change a cluster's security group,
see the section called “Changing security groups” (p. 89).
Note
If you use the following instructions to turn on public access and then still cannot access
the cluster, see the section called “Unable to access cluster that has public access turned
on” (p. 135).

Turning on public access using the console

1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. In the list of clusters, choose the cluster to which you want to turn on public access.
3. Choose the Properties tab, then find the Network settings section.
4. Choose Edit public access.

Turning on public access using the Amazon CLI

1. Run the following Amazon CLI command, replacing ClusterArn and Current-Cluster-Version
with the ARN and current version of the cluster. To find the current version of the cluster, use the
DescribeCluster operation or the describe-cluster Amazon CLI command. An example version is
KTVPDKIKX0DER.

aws kafka update-connectivity --cluster-arn ClusterArn --current-


version Current-Cluster-Version --connectivity-info '{"PublicAccess": {"Type":
"SERVICE_PROVIDED_EIPS"}}'

The output of this update-connectivity command looks like the following JSON example.

{
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

Note
To turn off public access, use a similar Amazon CLI command, but with the following
connectivity info instead:

'{"PublicAccess": {"Type": "DISABLED"}}'

2. To get the result of the update-connectivity operation, run the following command,
replacing ClusterOperationArn with the ARN that you obtained in the output of the update-
connectivity command.

100
Amazon Managed Streaming for
Apache Kafka Developer Guide
Access from within Amazon

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2019-06-20T21:08:57.735Z",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "UPDATE_COMPLETE",
"OperationType": "UPDATE_CONNECTIVITY",
"SourceClusterInfo": {
"ConnectivityInfo": {
"PublicAccess": {
"Type": "DISABLED"
}
}
},
"TargetClusterInfo": {
"ConnectivityInfo": {
"PublicAccess": {
"Type": "SERVICE_PROVIDED_EIPS"
}
}
}
}
}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the describe-
cluster-operation command again.

Turning on public access using the Amazon MSK API

• To use the API to turn public access to a cluster on or off, see UpdateConnectivity.

Note
For security reasons, Amazon MSK doesn't allow public access to Apache ZooKeeper nodes. For
information about how to control access to the Apache ZooKeeper nodes of your MSK cluster
from within Amazon, see the section called “Controlling access to Apache ZooKeeper” (p. 90).

Access from within Amazon but outside cluster's


VPC
To connect to an MSK cluster from inside Amazon but outside the cluster's Amazon VPC, the following
options exist.

101
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon VPC peering

Amazon VPC peering


To connect to your MSK cluster from a VPC that's different from the cluster's VPC, you can create a
peering connection between the two VPCs. For information about VPC peering, see the Amazon VPC
Peering Guide.

Amazon Direct Connect


Amazon Direct Connect links your on-premise network to Amazon over a standard 1 gigabit or 10 gigabit
Ethernet fiber-optic cable. One end of the cable is connected to your router, the other to an Amazon
Direct Connect router. With this connection in place, you can create virtual interfaces directly to the
Amazon cloud and Amazon VPC, bypassing Internet service providers in your network path. For more
information, see Amazon Direct Connect.

Amazon Transit Gateway


Amazon Transit Gateway is a service that enables you to connect your VPCs and your on-premises
networks to a single gateway. For information about how to use Amazon Transit Gateway, see Amazon
Transit Gateway.

VPN connections
You can connect your MSK cluster's VPC to remote networks and users using the VPN connectivity
options described in the following topic: VPN Connections.

REST proxies
You can install a REST proxy on an instance running within your cluster's Amazon VPC. REST proxies
enable your producers and consumers to communicate with the cluster through HTTP API requests.

Multiple Region multi-VPC connectivity


The following document describes connectivity options for multiple VPCs that reside in different
Regions: Multiple Region Multi-VPC Connectivity.

EC2-Classic
Use the following procedure to connect to your cluster from an EC2-Classic instance.

1. Follow the guidance described in ClassicLink to connect your EC2-Classic instance to your cluster's
VPC.
2. Find and copy the private IP associated with your EC2-Classic instance.
3. Using the Amazon CLI, run the following command, replacing ClusterArn with the Amazon
Resource Name (ARN) for your MSK cluster.

aws kafka describe-cluster --cluster-arn "ClusterArn"

4. In the output of the describe-cluster command, look for SecurityGroups and save the ID of
the security group for your MSK cluster.
5. Open the Amazon VPC console at https://console.amazonaws.cn/vpc/.
6. In the left pane, choose Security Groups.

102
Amazon Managed Streaming for
Apache Kafka Developer Guide
Port information

7. Choose the security group whose ID you saved after you ran the describe-cluster command.
Select the box at the beginning of the row corresponding to this security group.
8. In the lower half of the page, choose Inbound Rules.
9. Choose Edit rules, then choose Add Rule.
10. For the Type field, choose All traffic in the drop-down list.
11. Leave the Source set to Custom and enter the private IP of your EC2-Classic instance, followed
immediately by /32 with no intervening spaces.
12. Choose Save rules.

Port information
The following list provides the numbers of the ports that Amazon MSK uses to communicate with client
machines.

• To communicate with brokers in plaintext, use port 9092.


• To communicate with brokers by using TLS encryption, use port 9094 for access from within Amazon
and port 9194 for public access.
• To communicate with brokers by using SASL/SCRAM, use port is 9096 for access from within Amazon
and port 9196 for public access.
• To communicate with brokers in a cluster that is set up to use the section called “IAM access
control” (p. 73), use port 9098 for access from within Amazon and port 9198 for public access.
• Apache ZooKeeper nodes use port 2181 by default. To communicate with Apache ZooKeeper by using
TLS encryption, use port 2182.

103
Amazon Managed Streaming for
Apache Kafka Developer Guide
Migrating your Apache Kafka cluster to Amazon MSK

Migrating clusters using Apache


Kafka's MirrorMaker
You can mirror or migrate your cluster using MirrorMaker, which is part of Apache Kafka. For example,
you can use it to migrate your Apache Kafka cluster to Amazon MSK or to migrate from one MSK cluster
to another. For information about how to use MirrorMaker, see Mirroring data between clusters in the
Apache Kafka documentation. We recommend setting up MirrorMaker in a highly available configuration.

An outline of the steps to follow when using MirrorMaker to migrate to an MSK cluster

1. Create the destination MSK cluster


2. Start MirrorMaker from an Amazon EC2 instance within the same Amazon VPC as the destination
cluster.
3. Inspect the MirrorMaker lag.
4. After MirrorMaker catches up, redirect producers and consumers to the new cluster using the MSK
cluster bootstrap brokers.
5. Shut down MirrorMaker.

Migrating your Apache Kafka cluster to Amazon


MSK
Suppose that you have an Apache Kafka cluster named CLUSTER_ONPREM. That cluster is populated
with topics and data. If you want to migrate that cluster to a newly created Amazon MSK cluster named
CLUSTER_AWSMSK, this procedure provides a high-level view of the steps that you need to follow.

To migrate your existing Apache Kafka cluster to Amazon MSK

1. In CLUSTER_AWSMSK, create all the topics that you want to migrate.

You can't use MirrorMaker for this step because it doesn't automatically re-create the topics that you
want to migrate with the right replication level. You can create the topics in Amazon MSK with the
same replication factors and numbers of partitions that they had in CLUSTER_ONPREM. You can also
create the topics with different replication factors and numbers of partitions.
2. Start MirrorMaker from an instance that has read access to CLUSTER_ONPREM and write access to
CLUSTER_AWSMSK.
3. Run the following command to mirror all topics:

<path-to-your-kafka-installation>/bin/kafka-mirror-maker.sh --consumer.config
config/mirrormaker-consumer.properties --producer.config config/mirrormaker-
producer.properties --whitelist '.*'

In this command, config/mirrormaker-consumer.properties points to a bootstrap broker


in CLUSTER_ONPREM; for example, bootstrap.servers=localhost:9092. And config/
mirrormaker-producer.properties points to a bootstrap broker in CLUSTER_AWSMSK; for
example, bootstrap.servers=10.0.0.237:9092,10.0.2.196:9092,10.0.1.233:9092.
4. Keep MirrorMaker running in the background, and continue to use CLUSTER_ONPREM. MirrorMaker
mirrors all new data.

104
Amazon Managed Streaming for
Apache Kafka Developer Guide
Migrating from one Amazon MSK cluster to another

5. Check the progress of mirroring by inspecting the lag between the last offset for each topic and the
current offset from which MirrorMaker is consuming.

Remember that MirrorMaker is simply using a consumer and a producer. So, you can check the lag
using the kafka-consumer-groups.sh tool. To find the consumer group name, look inside the
mirrormaker-consumer.properties file for the group.id, and use its value. If there is no such
key in the file, you can create it. For example, set group.id=mirrormaker-consumer-group.
6. After MirrorMaker finishes mirroring all topics, stop all producers and consumers, and then
stop MirrorMaker. Then redirect the producers and consumers to the CLUSTER_AWSMSK cluster
by changing their producer and consumer bootstrap brokers values. Restart all producers and
consumers on CLUSTER_AWSMSK.

Migrating from one Amazon MSK cluster to


another
You can use Apache MirrorMaker to migrate an MSK cluster to another cluster. For example, you
can migrate from one version of Apache Kafka to another. For an example of how to use Amazon
CloudFormation to do this, see AWS::MSK::Cluster Examples (search for the example titled Create Two
MSK Clusters To Use With Apache MirrorMaker.

MirrorMaker 1.0 best practices


This list of best practices applies to MirrorMaker 1.0.

• Run MirrorMaker on the destination cluster. This way, if a network problem happens, the messages are
still available in the source cluster. If you run MirrorMaker on the source cluster and events are buffered
in the producer and there is a network issue, events might be lost.
• If encryption is required in transit, run it in the source cluster.
• For consumers, set auto.commit.enabled=false
• For producers, set
• max.in.flight.requests.per.connection=1
• retries=Int.Max_Value
• acks=all
• max.block.ms = Long.Max_Value
• For a high producer throughput:
• Buffer messages and fill message batches — tune buffer.memory, batch.size, linger.ms
• Tune socket buffers — receive.buffer.bytes, send.buffer.bytes
• To avoid data loss, turn off auto commit at the source, so that MirrorMaker can control the commits,
which it typically does after it receives the ack from the destination cluster. If the producer has acks=all
and the destination cluster has min.insync.replicas set to more than 1, the messages are persisted on
more than one broker at the destination before the MirrorMaker consumer commits the offset at the
source.
• If order is important, you can set retries to 0. Alternatively, for a production environment, set max
inflight connections to 1 to ensure that the batches sent out are not committed out of order if a batch
fails in the middle. This way, each batch sent is retried until the next batch is sent out. If max.block.ms
is not set to the maximum value, and if the producer buffer is full, there can be data loss (depending
on some of the other settings). This can block and back-pressure the consumer.
• For high throughput
• Increase buffer.memory.

105
Amazon Managed Streaming for
Apache Kafka Developer Guide
MirrorMaker 2.* advantages

• Increase batch size.


• Tune linger.ms to allow the batches to fill. This also allows for better compression, less network
bandwidth usage, and less storage on the cluster. This results in increased retention.
• Monitor CPU and memory usage.
• For high consumer throughput
• Increase the number of threads/consumers per MirrorMaker process — num.streams.
• Increase the number of MirrorMaker processes across machines first before increasing threads to
allow for high availability.
• Increase the number of MirrorMaker processes first on the same machine and then on different
machines (with the same group ID).
• Isolate topics that have very high throughput and use separate MirrorMaker instances.
• For management and configuration
• Use Amazon CloudFormation and configuration management tools like Chef and Ansible.
• Use Amazon EFS mounts to keep all configuration files accessible from all Amazon EC2 instances.
• Use containers for easy scaling and management of MirrorMaker instances.
• Typically, it takes more than one consumer to saturate a producer in MirrorMaker. So, set up multiple
consumers. First, set them up on different machines to provide high availability. Then, scale individual
machines up to having a consumer for each partition, with consumers equally distributed across
machines.
• For high throughput ingestion and delivery, tune the receive and send buffers because their defaults
might be too low. For maximum performance, ensure that the total number of streams (num.streams)
matches all of the topic partitions that MirrorMaker is trying to copy to the destination cluster.

MirrorMaker 2.* advantages


• Makes use of the Apache Kafka Connect framework and ecosystem.
• Detects new topics and partitions.
• Automatically syncs topic configuration between clusters.
• Supports "active/active" cluster pairs, as well as any number of active clusters.
• Provides new metrics including end-to-end replication latency across multiple data centers and
clusters.
• Emits offsets required to migrate consumers between clusters and provides tooling for offset
translation.
• Supports a high-level configuration file for specifying multiple clusters and replication flows in one
place, compared to low-level producer/consumer properties for each MirrorMaker 1.* process.

106
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon MSK metrics for monitoring with CloudWatch

Monitoring an Amazon MSK cluster


Amazon MSK gathers Apache Kafka metrics and sends them to Amazon CloudWatch where you can view
them. For more information about Apache Kafka metrics, including the ones that Amazon MSK surfaces,
see Monitoring in the Apache Kafka documentation.

You can also monitor your MSK cluster with Prometheus, an open-source monitoring application.
For information about Prometheus, see Overview in the Prometheus documentation. To learn
how to monitor your cluster with Prometheus, see the section called “Open monitoring with
Prometheus” (p. 117).

Topics
• Amazon MSK metrics for monitoring with CloudWatch (p. 107)
• Viewing Amazon MSK metrics using CloudWatch (p. 116)
• Consumer-lag monitoring (p. 116)
• Open monitoring with Prometheus (p. 117)

Amazon MSK metrics for monitoring with


CloudWatch
Amazon MSK integrates with Amazon CloudWatch so that you can collect, view, and analyze
CloudWatch metrics for your Amazon MSK cluster. The metrics that you configure for your MSK
cluster are automatically collected and pushed to CloudWatch. You can set the monitoring level
for an MSK cluster to one of the following: DEFAULT, PER_BROKER, PER_TOPIC_PER_BROKER, or
PER_TOPIC_PER_PARTITION. The tables in the following sections show all the metrics that are
available starting at each monitoring level.

DEFAULT-level metrics are free. Pricing for other metrics is described in the Amazon CloudWatch pricing
page.

DEFAULT Level monitoring


The metrics described in the following table are available at the DEFAULT monitoring level. They are
free.

Metrics available at the DEFAULT monitoring level

Name When visible Dimensions


Description

After the cluster gets to


ActiveControllerCount Cluster Only one controller per cluster should
the ACTIVE state. Name be active at any given time.

BurstBalance After the cluster gets to Cluster The remaining balance of input-output
the ACTIVE state. Name , burst credits for EBS volumes in the
Broker cluster. Use it to investigate latency or
ID decreased throughput.

BurstBalance is not reported for


EBS volumes when the baseline
performance of a volume is higher than
the maximum burst performance. For

107
Amazon Managed Streaming for
Apache Kafka Developer Guide
DEFAULT Level monitoring

Name When visible Dimensions


Description
more information, see I/O Credits and
burst performance.

BytesInPerSec After you create a topic. Cluster The number of bytes per second
Name, received from clients. This metric is
Broker available per broker and also per topic.
ID,
Topic

BytesOutPerSec After you create a topic. Cluster The number of bytes per second sent
Name, to clients. This metric is available per
Broker broker and also per topic.
ID,
Topic

After the cluster gets to


ClientConnectionCount Cluster The number of active authenticated
the ACTIVE state. Name, client connections.
Broker
ID,
Client
Authentication

ConnectionCount After the cluster gets to Cluster The number of active authenticated,
the ACTIVE state. Name, unauthenticated, and inter-broker
Broker connections.
ID

CPUCreditBalance After the cluster gets to Cluster This metric can help you monitor CPU
the ACTIVE state. Name, credit balance on the brokers. If your
Broker CPU usage is sustained above the
ID baseline level of 20% utilization, you
can run out of the CPU credit balance,
which can have a negative impact on
cluster performance. You can take
steps to reduce CPU load. For example,
you can reduce the number of client
requests or update the broker type to
an M5 broker type.

CpuIdle After the cluster gets to Cluster The percentage of CPU idle time.
the ACTIVE state. Name,
Broker
ID

CpuIoWait After the cluster gets to Cluster The percentage of CPU idle time during
the ACTIVE state. Name, a pending disk operation.
Broker
ID

CpuSystem After the cluster gets to Cluster The percentage of CPU in kernel space.
the ACTIVE state. Name,
Broker
ID

108
Amazon Managed Streaming for
Apache Kafka Developer Guide
DEFAULT Level monitoring

Name When visible Dimensions


Description

CpuUser After the cluster gets to Cluster The percentage of CPU in user space.
the ACTIVE state. Name,
Broker
ID

After the cluster gets to


GlobalPartitionCount Cluster The number of partitions across all
the ACTIVE state. Name topics in the cluster, excluding replicas.
Because GlobalPartitionCount
doesn't include replicas, the sum of
the PartitionCount values can be
higher than GlobalPartitionCount if the
replication factor for a topic is greater
than 1.

GlobalTopicCount After the cluster gets to Cluster Total number of topics across all
the ACTIVE state. Name brokers in the cluster.

EstimatedMaxTimeLagAfter consumer group Consumer Time estimate (in seconds) to drain


consumes from a topic. Group, MaxOffsetLag.
Topic

After the cluster gets to


KafkaAppLogsDiskUsed Cluster The percentage of disk space used for
the ACTIVE state. Name, application logs.
Broker
ID

After the cluster gets to


KafkaDataLogsDiskUsed Cluster The percentage of disk space used for
(Cluster Name, the ACTIVE state. Name, data logs.
Broker ID Broker
dimension) ID

After the cluster gets to


KafkaDataLogsDiskUsed Cluster The percentage of disk space used for
(Cluster Name the ACTIVE state. Name data logs.
dimension)

LeaderCount After the cluster gets to Cluster The total number of leaders of
the ACTIVE state. Name, partitions per broker, not including
Broker replicas.
ID

MaxOffsetLag After consumer group Consumer The maximum offset lag across all
consumes from a topic. Group, partitions in a topic.
Topic

MemoryBuffered After the cluster gets to Cluster The size in bytes of buffered memory
the ACTIVE state. Name, for the broker.
Broker
ID

MemoryCached After the cluster gets to Cluster The size in bytes of cached memory for
the ACTIVE state. Name, the broker.
Broker
ID

109
Amazon Managed Streaming for
Apache Kafka Developer Guide
DEFAULT Level monitoring

Name When visible Dimensions


Description

MemoryFree After the cluster gets to Cluster The size in bytes of memory that is free
the ACTIVE state. Name, and available for the broker.
Broker
ID

HeapMemoryAfterGC After the cluster gets to Cluster The percentage of total heap memory
the ACTIVE state. Name, in use after garbage collection.
Broker
ID

MemoryUsed After the cluster gets to Cluster The size in bytes of memory that is in
the ACTIVE state. Name, use for the broker.
Broker
ID

MessagesInPerSec After the cluster gets to Cluster The number of incoming messages per
the ACTIVE state. Name, second for the broker.
Broker
ID

NetworkRxDropped After the cluster gets to Cluster The number of dropped receive
the ACTIVE state. Name, packages.
Broker
ID

NetworkRxErrors After the cluster gets to Cluster The number of network receive errors
the ACTIVE state. Name, for the broker.
Broker
ID

NetworkRxPackets After the cluster gets to Cluster The number of packets received by the
the ACTIVE state. Name, broker.
Broker
ID

NetworkTxDropped After the cluster gets to Cluster The number of dropped transmit
the ACTIVE state. Name, packages.
Broker
ID

NetworkTxErrors After the cluster gets to Cluster The number of network transmit errors
the ACTIVE state. Name, for the broker.
Broker
ID

NetworkTxPackets After the cluster gets to Cluster The number of packets transmitted by
the ACTIVE state. Name, the broker.
Broker
ID

After the cluster gets to


OfflinePartitionsCount Cluster Total number of partitions that are
the ACTIVE state. Name offline in the cluster.

PartitionCount After the cluster gets to Cluster The total number of topic partitions per
the ACTIVE state. Name, broker, including replicas.
Broker
ID

110
Amazon Managed Streaming for
Apache Kafka Developer Guide
DEFAULT Level monitoring

Name When visible Dimensions


Description

After the cluster gets to


ProduceTotalTimeMsMean Cluster The mean produce time in milliseconds.
the ACTIVE state. Name,
Broker
ID

RequestBytesMean After the cluster gets to Cluster The mean number of request bytes for
the ACTIVE state. Name, the broker.
Broker
ID

RequestTime After request throttling Cluster The average time in milliseconds spent
is applied. Name, in broker network and I/O threads to
Broker process requests.
ID

RootDiskUsed After the cluster gets to Cluster The percentage of the root disk used by
the ACTIVE state. Name, the broker.
Broker
ID

SumOffsetLag After consumer group Consumer The aggregated offset lag for all the
consumes from a topic. Group, partitions in a topic.
Topic

SwapFree After the cluster gets to Cluster The size in bytes of swap memory that
the ACTIVE state. Name, is available for the broker.
Broker
ID

SwapUsed After the cluster gets to Cluster The size in bytes of swap memory that
the ACTIVE state. Name, is in use for the broker.
Broker
ID

TrafficShaping After the cluster gets to Cluster High-level metrics indicating the
the ACTIVE state. Name, number of packets shaped (dropped
Broker or queued) due to exceeding network
ID allocations. Finer detail is available with
PER_BROKER metrics.

After the cluster gets to


UnderMinIsrPartitionCount Cluster The number of under minIsr partitions
the ACTIVE state. Name, for the broker.
Broker
ID

After the cluster gets to


UnderReplicatedPartitions Cluster The number of under-replicated
the ACTIVE state. Name, partitions for the broker.
Broker
ID

After the cluster gets to


ZooKeeperRequestLatencyMsMean Cluster The mean latency in milliseconds for
the ACTIVE state. Name, Apache ZooKeeper requests from
Broker broker.
ID

111
Amazon Managed Streaming for
Apache Kafka Developer Guide
PER_BROKER Level monitoring

Name When visible Dimensions


Description

After the cluster gets to


ZooKeeperSessionState Cluster Connection status of broker's
the ACTIVE state. Name, ZooKeeper session which may be one
Broker of the following: NOT_CONNECTED:
ID '0.0', ASSOCIATING: '0.1', CONNECTING:
'0.5', CONNECTEDREADONLY: '0.8',
CONNECTED: '1.0', CLOSED: '5.0',
AUTH_FAILED: '10.0'.

PER_BROKER Level monitoring


When you set the monitoring level to PER_BROKER, you get the metrics described in the following table
in addition to all the DEFAULT level metrics. You pay for the metrics in the following table, whereas
the DEFAULT level metrics continue to be free. The metrics in this table have the following dimensions:
Cluster Name, Broker ID.

Additional metrics that are available starting at the PER_BROKER monitoring level

Name When visible Description

BwInAllowanceExceeded After the cluster gets to The number of packets shaped because
the ACTIVE state. the inbound aggregate bandwidth
exceeded the maximum for the broker.

BwOutAllowanceExceeded After the cluster gets to The number of packets shaped because
the ACTIVE state. the outbound aggregate bandwidth
exceeded the maximum for the broker.

ConnTrackAllowanceExceeded After the cluster gets to The number of packets shaped because
the ACTIVE state. the connection tracking exceeded the
maximum for the broker. Connection
tracking is related to security groups that
track each connection established to
ensure that return packets are delivered
as expected.

ConnectionCloseRate After the cluster gets to The number of connections closed per
the ACTIVE state. second per listener. This number is
aggregated per listener and filtered for
the client listeners.

ConnectionCreationRate After the cluster gets to The number of new connections


the ACTIVE state. established per second per listener. This
number is aggregated per listener and
filtered for the client listeners.

CpuCreditUsage After the cluster gets to This metric can help you monitor CPU
the ACTIVE state. credit usage on the instances. If your CPU
usage is sustained above the baseline
level of 20%, you can run out of the CPU
credit balance, which can have a negative
impact on cluster performance. You can
monitor and alarm on this metric to take
corrective actions.

112
Amazon Managed Streaming for
Apache Kafka Developer Guide
PER_BROKER Level monitoring

Name When visible Description

After there's a
FetchConsumerLocalTimeMsMean The mean time in milliseconds that the
producer/consumer. consumer request is processed at the
leader.

After there's a
FetchConsumerRequestQueueTimeMsMean The mean time in milliseconds that the
producer/consumer. consumer request waits in the request
queue.

After there's a
FetchConsumerResponseQueueTimeMsMean The mean time in milliseconds that the
producer/consumer. consumer request waits in the response
queue.

After there's a
FetchConsumerResponseSendTimeMsMean The mean time in milliseconds for the
producer/consumer. consumer to send a response.

After there's a
FetchConsumerTotalTimeMsMean The mean total time in milliseconds that
producer/consumer. consumers spend on fetching data from
the broker.

After there's a
FetchFollowerLocalTimeMsMean The mean time in milliseconds that the
producer/consumer. follower request is processed at the
leader.

After there's a
FetchFollowerRequestQueueTimeMsMean The mean time in milliseconds that the
producer/consumer. follower request waits in the request
queue.

After there's a
FetchFollowerResponseQueueTimeMsMean The mean time in milliseconds that the
producer/consumer. follower request waits in the response
queue.

After there's a
FetchFollowerResponseSendTimeMsMean The mean time in milliseconds for the
producer/consumer. follower to send a response.

After there's a
FetchFollowerTotalTimeMsMean The mean total time in milliseconds that
producer/consumer. followers spend on fetching data from
the broker.

After you create a topic. The number of fetch message


FetchMessageConversionsPerSec
conversions per second for the broker.

FetchThrottleByteRate After bandwidth The number of throttled bytes per


throttling is applied. second.

FetchThrottleQueueSize After bandwidth The number of messages in the throttle


throttling is applied. queue.

FetchThrottleTime After bandwidth The average fetch throttle time in


throttling is applied. milliseconds.

After the cluster gets to


NetworkProcessorAvgIdlePercent The average percentage of the time the
the ACTIVE state. network processors are idle.

PpsAllowanceExceeded After the cluster gets to The number of packets shaped because
the ACTIVE state. the bidirectional PPS exceeded the
maximum for the broker.

113
Amazon Managed Streaming for
Apache Kafka Developer Guide
PER_BROKER Level monitoring

Name When visible Description

ProduceLocalTimeMsMean After the cluster gets to The mean time in milliseconds that the
the ACTIVE state. request is processed at the leader.

After you create a topic. The number of produce message


ProduceMessageConversionsPerSec
conversions per second for the broker.

After the cluster gets to


ProduceMessageConversionsTimeMsMean The mean time in milliseconds spent on
the ACTIVE state. message format conversions.

After the cluster gets to


ProduceRequestQueueTimeMsMean The mean time in milliseconds that
the ACTIVE state. request messages spend in the queue.

After the cluster gets to


ProduceResponseQueueTimeMsMean The mean time in milliseconds that
the ACTIVE state. response messages spend in the queue.

After the cluster gets to


ProduceResponseSendTimeMsMean The mean time in milliseconds spent on
the ACTIVE state. sending response messages.

ProduceThrottleByteRate After bandwidth The number of throttled bytes per


throttling is applied. second.

ProduceThrottleQueueSize After bandwidth The number of messages in the throttle


throttling is applied. queue.

ProduceThrottleTime After bandwidth The average produce throttle time in


throttling is applied. milliseconds.

ProduceTotalTimeMsMean After the cluster gets to The mean produce time in milliseconds.
the ACTIVE state.

ReplicationBytesInPerSec After you create a topic. The number of bytes per second received
from other brokers.

ReplicationBytesOutPerSec After you create a topic. The number of bytes per second sent to
other brokers.

After request throttling


RequestExemptFromThrottleTime The average time in milliseconds spent
is applied. in broker network and I/O threads to
process requests that are exempt from
throttling.

After the cluster gets to


RequestHandlerAvgIdlePercent The average percentage of the time the
the ACTIVE state. request handler threads are idle.

RequestThrottleQueueSize After request throttling The number of messages in the throttle


is applied. queue.

RequestThrottleTime After request throttling The average request throttle time in


is applied. milliseconds.

TcpConnections After the cluster gets to Shows number of incoming and outgoing
the ACTIVE state. TCP segments with the SYN flag set.

TrafficBytes After the cluster gets to Shows network traffic in overall


the ACTIVE state. bytes between clients (producers and
consumers) and brokers. Traffic between
brokers isn't reported.

114
Amazon Managed Streaming for
Apache Kafka Developer Guide
PER_TOPIC_PER_BROKER Level monitoring

Name When visible Description

VolumeQueueLength After the cluster gets to The number of read and write operation
the ACTIVE state. requests waiting to be completed in a
specified time period.

VolumeReadBytes After the cluster gets to The number of bytes read in a specified
the ACTIVE state. time period.

VolumeReadOps After the cluster gets to The number of read operations in a


the ACTIVE state. specified time period.

VolumeTotalReadTime After the cluster gets to The total number of seconds spent by
the ACTIVE state. all read operations that completed in a
specified time period.

VolumeTotalWriteTime After the cluster gets to The total number of seconds spent by
the ACTIVE state. all write operations that completed in a
specified time period.

VolumeWriteBytes After the cluster gets to The number of bytes written in a


the ACTIVE state. specified time period.

VolumeWriteOps After the cluster gets to The number of write operations in a


the ACTIVE state. specified time period.

PER_TOPIC_PER_BROKER Level monitoring


When you set the monitoring level to PER_TOPIC_PER_BROKER, you get the metrics described in the
following table, in addition to all the metrics from the PER_BROKER and DEFAULT levels. Only the
DEFAULT level metrics are free. The metrics in this table have the following dimensions: Cluster Name,
Broker ID, Topic.
Important
For an Amazon MSK cluster that uses Apache Kafka 2.4.1 or a newer version, the metrics in the
following table appear only after their values become nonzero for the first time. For example, to
see BytesInPerSec, one or more producers must first send data to the cluster.

Additional metrics that are available starting at the PER_TOPIC_PER_BROKER monitoring


level

Name When visible Description

After you create a


FetchMessageConversionsPerSec The number of fetched messages converted per
topic. second.

MessagesInPerSec After you create a The number of messages received per second.
topic.

After you create a


ProduceMessageConversionsPerSec The number of conversions per second for
topic. produced messages.

PER_TOPIC_PER_PARTITION Level monitoring


When you set the monitoring level to PER_TOPIC_PER_PARTITION, you get the metrics described in
the following table, in addition to all the metrics from the PER_TOPIC_PER_BROKER, PER_BROKER, and

115
Amazon Managed Streaming for
Apache Kafka Developer Guide
Viewing Amazon MSK metrics using CloudWatch

DEFAULT levels. Only the DEFAULT level metrics are free. The metrics in this table have the following
dimensions: Consumer Group, Topic, Partition.

Additional metrics that are available starting at the PER_TOPIC_PER_PARTITION monitoring


level

Name When visible Description

EstimatedTimeLag After consumer Time estimate (in seconds) to drain the partition
group consumes offset lag.
from a topic.

OffsetLag After consumer Partition-level consumer lag in number of offsets.


group consumes
from a topic.

Viewing Amazon MSK metrics using CloudWatch


You can monitor metrics for Amazon MSK using the CloudWatch console, the command line, or the
CloudWatch API. The following procedures show you how to access metrics using these different
methods.

To access metrics using the CloudWatch console

Sign in to the Amazon Web Services Management Console and open the CloudWatch console at https://
console.amazonaws.cn/cloudwatch/.

1. In the navigation pane, choose Metrics.


2. Choose the All metrics tab, and then choose Amazon/Kafka.
3. To view topic-level metrics, choose Topic, Broker ID, Cluster Name; for broker-level metrics, choose
Broker ID, Cluster Name; and for cluster-level metrics, choose Cluster Name.
4. (Optional) In the graph pane, select a statistic and a time period, and then create a CloudWatch
alarm using these settings.

To access metrics using the Amazon CLI

Use the list-metrics and get-metric-statistics commands.

To access metrics using the CloudWatch CLI

Use the mon-list-metrics and mon-get-stats commands.

To access metrics using the CloudWatch API

Use the ListMetrics and GetMetricStatistics operations.

Consumer-lag monitoring
Monitoring consumer lag allows you to identify slow or stuck consumers that aren't keeping up with
the latest data available in a topic. When necessary, you can then take remedial actions, such as scaling
or rebooting those consumers. To monitor consumer lag, you can use Amazon CloudWatch or open
monitoring with Prometheus.

116
Amazon Managed Streaming for
Apache Kafka Developer Guide
Open monitoring with Prometheus

Consumer lag metrics quantify the difference between the latest data written to your topics and the data
read by your applications. Amazon MSK provides the following consumer-lag metrics, which you can get
through Amazon CloudWatch or through open monitoring with Prometheus: EstimatedMaxTimeLag,
EstimatedTimeLag, MaxOffsetLag, OffsetLag, and SumOffsetLag. For information about these
metrics, see the section called “Amazon MSK metrics for monitoring with CloudWatch” (p. 107).

Amazon MSK supports consumer lag metrics for clusters with Apache Kafka 2.2.1 or a later version.
Note
To turn on consumer-lag monitoring for a cluster that was created before November 23, 2020,
ensure that the cluster is running Apache Kafka 2.2.1 or a later version, then create a support
case.

Open monitoring with Prometheus


You can monitor your MSK cluster with Prometheus, an open-source monitoring system for time-series
metric data. You can publish this data to Amazon Managed Service for Prometheus using Prometheus's
remote write feature. You can also use tools that are compatible with Prometheus-formatted metrics
or tools that integrate with Amazon MSK Open Monitoring, like Datadog, Lenses, New Relic, and Sumo
logic. Open monitoring is available for free but charges apply for the transfer of data across Availability
Zones. For information about Prometheus, see the Prometheus documentation.

Creating an Amazon MSK cluster with open


monitoring enabled
Using the Amazon Web Services Management Console

1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. In the Monitoring section, select the check box next to Enable open monitoring with Prometheus.
3. Provide the required information in all the sections of the page, and review all the available options.
4. Choose Create cluster.

Using the Amazon CLI

• Invoke the create-cluster command and specify its open-monitoring option. Enable the
JmxExporter, the NodeExporter, or both. If you specify open-monitoring, the two exporters
can't be disabled at the same time.

Using the API

• Invoke the CreateCluster operation and specify OpenMonitoring. Enable the jmxExporter, the
nodeExporter, or both. If you specify OpenMonitoring, the two exporters can't be disabled at the
same time.

Enabling open monitoring for an existing Amazon


MSK cluster
To enable open monitoring, make sure that the cluster is in the ACTIVE state.

117
Amazon Managed Streaming for
Apache Kafka Developer Guide
Setting up a Prometheus host on an Amazon EC2 instance

Using the Amazon Web Services Management Console

1. Sign in to the Amazon Web Services Management Console, and open the Amazon MSK console at
https://console.amazonaws.cn/msk/home?region=us-east-1#/home/.
2. Choose the name of the cluster that you want to update. This takes you to a page the contains
details for the cluster.
3. On the Properties tab, scroll down to find the Monitoring section.
4. Choose Edit.
5. Select the check box next to Enable open monitoring with Prometheus.
6. Choose Save changes.

Using the Amazon CLI

• Invoke the update-monitoring command and specify its open-monitoring option. Enable the
JmxExporter, the NodeExporter, or both. If you specify open-monitoring, the two exporters
can't be disabled at the same time.

Using the API

• Invoke the UpdateMonitoring operation and specify OpenMonitoring. Enable the jmxExporter,
the nodeExporter, or both. If you specify OpenMonitoring, the two exporters can't be disabled at
the same time.

Setting up a Prometheus host on an Amazon EC2


instance
1. Download the Prometheus server from https://prometheus.io/download/#prometheus to your
Amazon EC2 instance.
2. Extract the downloaded file to a directory and go to that directory.
3. Create a file with the following contents and name it prometheus.yml.

# file: prometheus.yml
# my global config
global:
scrape_interval: 60s

# A scrape configuration containing exactly one endpoint to scrape:


# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from
this config.
- job_name: 'prometheus'
static_configs:
# 9090 is the prometheus server port
- targets: ['localhost:9090']
- job_name: 'broker'
file_sd_configs:
- files:
- 'targets.json'

4. Use the ListNodes operation to get a list of your cluster's brokers.


5. Create a file named targets.json with the following JSON. Replace broker_dns_1,
broker_dns_2, and the rest of the broker DNS names with the DNS names you obtained for your

118
Amazon Managed Streaming for
Apache Kafka Developer Guide
Prometheus metrics

brokers in the previous step. Include all of the brokers you obtained in the previous step. Amazon
MSK uses port 11001 for the JMX Exporter and port 11002 for the Node Exporter.

[
{
"labels": {
"job": "jmx"
},
"targets": [
"broker_dns_1:11001",
"broker_dns_2:11001",
.
.
.
"broker_dns_N:11001"
]
},
{
"labels": {
"job": "node"
},
"targets": [
"broker_dns_1:11002",
"broker_dns_2:11002",
.
.
.
"broker_dns_N:11002"
]
}
]

6. To start the Prometheus server on your Amazon EC2 instance, run the following command
in the directory where you extracted the Prometheus files and saved prometheus.yml and
targets.json.

./prometheus

7. Find the IPv4 public IP address of the Amazon EC2 instance where you ran Prometheus in the
previous step. You need this public IP address in the following step.
8. To access the Prometheus web UI, open a browser that can access your Amazon EC2 instance, and go
to Prometheus-Instance-Public-IP:9090, where Prometheus-Instance-Public-IP is the
public IP address you got in the previous step.

Prometheus metrics
All metrics emitted by Apache Kafka to JMX are accessible using open monitoring with Prometheus. For
information about Apache Kafka metrics, see Monitoring in the Apache Kafka documentation. Along
with Apache Kafka metrics, consumer-lag metrics are also available at port 11001 under the JMX MBean
name kafka.consumer.group:type=ConsumerLagMetrics. You can also use the Prometheus Node
Exporter to get CPU and disk metrics for your brokers at port 11002.

Storing Prometheus metrics in amazon managed


service for Prometheus
Amazon Managed Service for Prometheus is a Prometheus-compatible monitoring and alerting service
that you can use to monitor Amazon MSK clusters. It is a fully-managed service that automatically scales

119
Amazon Managed Streaming for
Apache Kafka Developer Guide
Storing Prometheus metrics in amazon
managed service for Prometheus
the ingestion, storage, querying, and alerting of your metrics. It also integrates with Amazon security
services to give you fast and secure access to your data. You can use the open-source PromQL query
language to query your metrics and alert on them.

For more information, see Getting started with Amazon Managed Service for Prometheus.

120
Amazon Managed Streaming for
Apache Kafka Developer Guide

Using LinkedIn's Cruise Control for


Apache Kafka with Amazon MSK
You can use LinkedIn's Cruise Control to rebalance your Amazon MSK cluster, detect and fix anomalies,
and monitor the state and health of the cluster.

To download and build Cruise Control

1. Create an Amazon EC2 instance in the same Amazon VPC as the Amazon MSK cluster.
2. Install Prometheus on the Amazon EC2 instance that you created in the previous step. Note the
private IP and the port. The default port number is 9090. For information on how to configure
Prometheus to aggregate metrics for your cluster, see the section called “Open monitoring with
Prometheus” (p. 117).
3. Download Cruise Control on the Amazon EC2 instance. (Alternatively, you can use a separate
Amazon EC2 instance for Cruise Control if you prefer.) For a cluster that has Apache Kafka version
2.4.*, use the latest 2.4.* Cruise Control release. If your cluster has an Apache Kafka version that is
older than 2.4.*, use the latest 2.0.* Cruise Control release.
4. Decompress the Cruise Control file, then go to the decompressed folder.
5. Run the following command to install git.

sudo yum -y install git

6. Run the following command to initialize the local repo. Replace Your-Cruise-Control-Folder
with the name of your current folder (the folder that you obtained when you decompressed the
Cruise Control download).

git init && git add . && git commit -m "Init local repo." && git tag -a Your-Cruise-
Control-Folder -m "Init local version."

7. Run the following command to build the source code.

./gradlew jar copyDependantLibs

To configure and run Cruise Control

1. Make the following updates to the config/cruisecontrol.properties file. Replace the


example bootstrap servers and Apache ZooKeeper connection string with the values for your cluster.
To get these strings for your cluster, you can see the cluster details in the console. Alternatively, you
can use the GetBootstrapBrokers and DescribeCluster API operations or their CLI equivalents.

# If using TLS encryption, use 9094; use 9092 if using plaintext


bootstrap.servers=b-1.test-cluster.2skv42.c1.kafka.us-
east-1.amazonaws.com:9094,b-2.test-cluster.2skv42.c1.kafka.us-
east-1.amazonaws.com:9094,b-3.test-cluster.2skv42.c1.kafka.us-east-1.amazonaws.com:9094
zookeeper.connect=z-1.test-cluster.2skv42.c1.kafka.us-
east-1.amazonaws.com:2181,z-2.test-cluster.2skv42.c1.kafka.us-
east-1.amazonaws.com:2181,z-3.test-cluster.2skv42.c1.kafka.us-east-1.amazonaws.com:2181

# SSL properties, needed if cluster is using TLS encryption


security.protocol=SSL

121
Amazon Managed Streaming for
Apache Kafka Developer Guide

ssl.truststore.location=/home/ec2-user/kafka.client.truststore.jks

# Use the Prometheus Metric Sampler


metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricS

# Prometheus Metric Sampler specific configuration


prometheus.server.endpoint=1.2.3.4:9090 # Replace with your Prometheus IP and port

# Change the capacity config file and specify its path; details below
capacity.config.file=config/capacityCores.json

2. Edit the config/capacityCores.json file to specify the right disk size and CPU cores and
network in/out limits. You can use the DescribeCluster API operation (or its CLI equivalent) to obtain
the disk size. For CPU cores and network in/out limits, see Amazon EC2 Instance Types.

{
"brokerCapacities": [
{
"brokerId": "-1",
"capacity": {
"DISK": "10000",
"CPU": {
"num.cores": "2"
},
"NW_IN": "5000000",
"NW_OUT": "5000000"
},
"doc": "This is the default capacity. Capacity unit used for disk is in MB, cpu
is in number of cores, network throughput is in KB."
}
]
}

3. You can optionally install the Cruise Control UI. To download it, go to Setting Up Cruise Control
Frontend.
4. Run the following command to start Cruise Control. Consider using a tool like screen or tmux to
keep a long-running session open.

<path-to-your-kafka-installation>/bin/kafka-cruise-control-start.sh config/
cruisecontrol.properties 9091

5. Use the Cruise Control APIs or the UI to make sure that Cruise Control has the cluster load data and
that it's making rebalancing suggestions. It might take several minutes to get a valid window of
metrics.

122
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon MSK quota

Amazon MSK quota


Amazon MSK quota
• Up to 90 brokers per account and 30 brokers per cluster. To request higher quota, create a support
case.
• A minimum of 1 GiB of storage per broker.
• A maximum of 16384 GiB of storage per broker.
• A cluster that uses the section called “IAM access control” (p. 73) can have up
to 3000 TCP connections per broker at any given time. To increase this limit,
you can adjust the listener.name.client_iam.max.connections or the
listener.name.client_iam_public.max.connections configuration property using the Kafka
AlterConfig API or the kafka-configs.sh tool. It's important to note that increasing either property
to a high value can result in unavailability.
• Limits on TCP connections. A cluster that uses the section called “IAM access control” (p. 73) can accept
new connections at a rate of up to 20 TCP connections per broker per second for all broker types,
except for the type kafka.t3.small. Brokers of type kafka.t3.small are limited to 4 TCP connections
per broker per second. If you created your cluster after May 25, 2022, it also supports connection rate
bursts. If you want an older cluster to support connection rate bursts, you can create a support case.

To handle retries on failed connections, you can set the reconnect.backoff.ms configuration
parameter on the client side. For example, if you want a client to retry connections after 1 second, set
reconnect.backoff.ms to 1000. For more information, see reconnect.backoff.ms in the Apache
Kafka documentation.
• Up to 100 configurations per account. To request higher quota, create a support case.
• A maximum of 50 revisions per configuration.
• To update the configuration or the Apache Kafka version of an MSK cluster, first ensure the number
of partitions per broker is under the limits described in the section called “ Right-size your cluster:
Number of partitions per broker” (p. 138).

MSK Serverless quota


Dimension Quota

Maximum ingress throughput 200 MBps

Maximum egress throughput 400 MBps

Maximum retention duration 24 hours. To request a quota adjustment, create a


support case.

Maximum number of client connections 1000

Maximum connection attempts 100 per second

Maximum message size 8 MB

Maximum request size 100 MB

123
Amazon Managed Streaming for
Apache Kafka Developer Guide
MSK Connect quota

Dimension Quota

Maximum request rate 15,000 per second

Maximum fetch bytes per request 55 MB

Maximum number of consumer groups 500

Maximum number of partitions 120

Maximum rate of partition creation and deletion 120 in 5 minutes

Maximum ingress throughput per partition 5 MBps

Maximum egress throughput per partition 10 MBps

Maximum partition size 250 GB

Maximum number of client VPCs per serverless 5


cluster

Maximum number of serverless clusters per 3


account

MSK Connect quota


• Up to 100 custom plugins.
• Up to 100 worker configurations.
• Up to 60 connect workers. If a connector is set up to have auto scaled capacity, then the maximum
number of workers that the connector is set up to have is the number MSK Connect uses to calculate
the quota for the account.
• Up to 10 workers per connector.

To request higher quota for MSK Connect, create a support case.

124
Amazon Managed Streaming for
Apache Kafka Developer Guide

Amazon MSK resources


The term resources has two meanings in Amazon MSK, depending on the context. In the context of APIs
a resource is a structure on which you can invoke an operation. For a list of these resources and the
operations that you can invoke on them, see Resources in the Amazon MSK API Reference. In the context
of the section called “IAM access control” (p. 73), a resource is an entity to which you can allow or deny
access, as defined in the the section called “Resources” (p. 79) section.

125
Amazon Managed Streaming for
Apache Kafka Developer Guide
Supported Apache Kafka versions

Apache Kafka versions


When you create an Amazon MSK cluster, you specify which Apache Kafka version you want to have on it.
You can also update the Apache Kafka version of an existing cluster.

Topics
• Supported Apache Kafka versions (p. 126)
• Updating the Apache Kafka version (p. 129)

Supported Apache Kafka versions


Amazon Managed Streaming for Apache Kafka (Amazon MSK) supports the following Apache Kafka and
Amazon MSK versions.

Topics
• Apache Kafka version 3.2.0 (p. 126)
• Apache Kafka version 3.1.1 (p. 126)
• Apache Kafka version 2.8.1 (p. 127)
• Apache Kafka version 2.8.0 (p. 127)
• Apache Kafka version 2.7.2 (p. 127)
• Apache Kafka version 2.7.1 (p. 127)
• Apache Kafka version 2.6.3 (p. 127)
• Apache Kafka version 2.6.2 [recommended] (p. 127)
• Apache Kafka version 2.7.0 (p. 127)
• Apache Kafka version 2.6.1 (p. 127)
• Apache Kafka version 2.6.0 (p. 127)
• Apache Kafka version 2.5.1 (p. 127)
• Amazon MSK bug-fix version 2.4.1.1 (p. 128)
• Apache Kafka version 2.4.1 (use 2.4.1.1 instead) (p. 128)
• Apache Kafka version 2.3.1 (p. 129)
• Apache Kafka version 2.2.1 (p. 129)
• Apache Kafka version 1.1.1 (for existing clusters only) (p. 129)

Apache Kafka version 3.2.0


For information about Apache Kafka version 3.2.0, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 3.1.1


For information about Apache Kafka version 3.1.1, see its release notes on the Apache Kafka downloads
site.

126
Amazon Managed Streaming for
Apache Kafka Developer Guide
Apache Kafka version 2.8.1

Apache Kafka version 2.8.1


For information about Apache Kafka version 2.8.1, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.8.0


For information about Apache Kafka version 2.8.0, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.7.2


For information about Apache Kafka version 2.7.2, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.7.1


For information about Apache Kafka version 2.7.1, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.6.3


For information about Apache Kafka version 2.6.3, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.6.2 [recommended]


For information about Apache Kafka version 2.6.2, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.7.0


For information about Apache Kafka version 2.7.0, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.6.1


For information about Apache Kafka version 2.6.1, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.6.0


For information about Apache Kafka version 2.6.0, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.5.1


Apache Kafka version 2.5.1 includes several bug fixes and new features, including encryption in-transit
for Apache ZooKeeper and administration clients. Amazon MSK provides TLS ZooKeeper endpoints,
which you can query with the DescribeCluster operation.

The output of the DescribeCluster operation includes the ZookeeperConnectStringTls node, which
lists the TLS zookeeper endpoints.

127
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon MSK bug-fix version 2.4.1.1

The following example shows the ZookeeperConnectStringTls node of the response for the
DescribeCluster operation:

"ZookeeperConnectStringTls": "z-3.awskafkatutorialc.abcd123.c3.kafka.us-
east-1.amazonaws.com:2182,z-2.awskafkatutorialc.abcd123.c3.kafka.us-
east-1.amazonaws.com:2182,z-1.awskafkatutorialc.abcd123.c3.kafka.us-
east-1.amazonaws.com:2182"

For information about using TLS encryption with zookeeper, see Using TLS security with Apache
ZooKeeper (p. 91).

For more information about Apache Kafka version 2.5.1, see its release notes on the Apache Kafka
downloads site.

Amazon MSK bug-fix version 2.4.1.1


This release is an Amazon MSK-only bug-fix version of Apache Kafka version 2.4.1. This bug-fix release
contains a fix for KAFKA-9752, a rare issue that causes consumer groups to continuously rebalance and
remain in the PreparingRebalance state. This issue affects clusters running Apache Kafka versions
2.3.1 and 2.4.1. This release contains a community-produced fix that is available in Apache Kafka version
2.5.0.
Note
Amazon MSK clusters running version 2.4.1.1 are compatible with any Apache Kafka client that
is compatible with Apache Kafka version 2.4.1.

We recommend that you use MSK bug-fix version 2.4.1.1 for new Amazon MSK clusters if you prefer
to use Apache Kafka 2.4.1. You can update existing clusters running Apache Kafka version 2.4.1 to this
version to incorporate this fix. For information about upgrading an existing cluster, see Updating the
Apache Kafka version (p. 129).

To work around this issue without upgrading the cluster to version 2.4.1.1, see the Consumer group
stuck in PreparingRebalance state (p. 132) section of the Troubleshooting your Amazon MSK
cluster (p. 132) guide.

Apache Kafka version 2.4.1 (use 2.4.1.1 instead)


Note
You can no longer create an MSK cluster with Apache Kafka version 2.4.1. Instead, you can
use Amazon MSK bug-fix version 2.4.1.1 (p. 128) with clients compatible with Apache Kafka
version 2.4.1. And if you already have an MSK cluster with Apache Kafka version 2.4.1, we
recommend you update it to use Apache Kafka version 2.4.1.1 instead.

KIP-392 is one of the key Kafka Improvement Proposals that are included in the 2.4.1 release of
Apache Kafka. This improvement allows consumers to fetch from the closest replica. To use this
feature, set client.rack in the consumer properties to the ID of the consumer's Availability Zone.
An example AZ ID is use1-az1. Amazon MSK sets broker.rack to the IDs of the Availability
Zones of the brokers. You must also set the replica.selector.class configuration property to
org.apache.kafka.common.replica.RackAwareReplicaSelector, which is an implementation
of rack awareness provided by Apache Kafka.

When you use this version of Apache Kafka, the metrics in the PER_TOPIC_PER_BROKER monitoring
level appear only after their values become nonzero for the first time. For more information about this,
see the section called “PER_TOPIC_PER_BROKER Level monitoring” (p. 115).

For information about how to find Availability Zone IDs, see AZ IDs for Your Resource in the Amazon
Resource Access Manager user guide.

128
Amazon Managed Streaming for
Apache Kafka Developer Guide
Apache Kafka version 2.3.1

For information about setting configuration properties, see Configuration (p. 34).

For more information about KIP-392, see Allow Consumers to Fetch from Closest Replica in the
Confluence pages.

For more information about Apache Kafka version 2.4.1, see its release notes on the Apache Kafka
downloads site.

Apache Kafka version 2.3.1


For information about Apache Kafka version 2.3.1, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 2.2.1


For information about Apache Kafka version 2.2.1, see its release notes on the Apache Kafka downloads
site.

Apache Kafka version 1.1.1 (for existing clusters only)


You can no longer create a new MSK cluster with Apache Kafka version 1.1.1. You can continue to use
existing clusters that are configured with Apache Kafka version 1.1.1. For information about Apache
Kafka version 1.1.1, see its release notes on the Apache Kafka downloads site.

Updating the Apache Kafka version


You can update an existing MSK cluster to a newer version of Apache Kafka. You can't update it to an
older version. When you update the Apache Kafka version of an MSK cluster, also check your client-side
software to make sure its version enables you to use the features of the cluster's new Apache Kafka
version. Amazon MSK only updates the server software. It doesn't update your clients.

For information about how to make a cluster highly available during an update, see the section called
“Build highly available clusters” (p. 139).
Important
You can't update the Apache Kafka version for an MSK cluster that exceeds the limits described
in the section called “ Right-size your cluster: Number of partitions per broker” (p. 138).

Updating the Apache Kafka version using the Amazon Web Services Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.


2. Choose the MSK cluster on which you want to update the Apache Kafka version.
3. On the Properties tab choose Upgrade in the Apache Kafka version section.

Updating the Apache Kafka version using the Amazon CLI

1. Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that
you obtained when you created your cluster. If you don't have the ARN for your cluster, you can find
it by listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

aws kafka get-compatible-kafka-versions --cluster-arn ClusterArn

The output of this command includes a list of the Apache Kafka versions to which you can update
the cluster. It looks like the following example.

129
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the Apache Kafka version

{
"CompatibleKafkaVersions": [
{
"SourceVersion": "2.2.1",
"TargetVersions": [
"2.3.1",
"2.4.1",
"2.4.1.1",
"2.5.1"
]
}
]
}

2. Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that
you obtained when you created your cluster. If you don't have the ARN for your cluster, you can find
it by listing all clusters. For more information, see the section called “Listing clusters” (p. 17).

Replace Current-Cluster-Version with the current version of the cluster. For TargetVersion
you can specify any of the target versions from the output of the previous command.
Important
Cluster versions aren't simple integers. To find the current version of the cluster, use the
DescribeCluster operation or the describe-cluster Amazon CLI command. An example
version is KTVPDKIKX0DER.

aws kafka update-cluster-kafka-version --cluster-arn ClusterArn --current-


version Current-Cluster-Version --target-kafka-version TargetVersion

The output of the previous command looks like the following JSON.

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef"
}

3. To get the result of the update-cluster-kafka-version operation, run the following command,
replacing ClusterOperationArn with the ARN that you obtained in the output of the update-
cluster-kafka-version command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON
example.

{
"ClusterOperationInfo": {
"ClientRequestId": "62cd41d2-1206-4ebf-85a8-dbb2ba0fe259",
"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/
abcdefab-1234-abcd-5678-cdef0123ab01-2",
"CreationTime": "2021-03-11T20:34:59.648000+00:00",
"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-
operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-
abcd-4f7f-1234-9876543210ef",
"OperationState": "UPDATE_IN_PROGRESS",

130
Amazon Managed Streaming for
Apache Kafka Developer Guide
Updating the Apache Kafka version

"OperationSteps": [
{
"StepInfo": {
"StepStatus": "IN_PROGRESS"
},
"StepName": "INITIALIZE_UPDATE"
},
{
"StepInfo": {
"StepStatus": "PENDING"
},
"StepName": "UPDATE_APACHE_KAFKA_BINARIES"
},
{
"StepInfo": {
"StepStatus": "PENDING"
},
"StepName": "FINALIZE_UPDATE"
}
],
"OperationType": "UPDATE_CLUSTER_KAFKA_VERSION",
"SourceClusterInfo": {
"KafkaVersion": "2.4.1"
},
"TargetClusterInfo": {
"KafkaVersion": "2.6.1"
}
}
}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the
describe-cluster-operation command again. When the operation is complete, the value of
OperationState becomes UPDATE_COMPLETE. Because the time required for Amazon MSK to
complete the operation varies, you might need to check repeatedly until the operation is complete.

Updating the Apache Kafka version using the API

1. Invoke the GetCompatibleKafkaVersions operation to get a list of the Apache Kafka versions to
which you can update the cluster.
2. Invoke the UpdateClusterKafkaVersion operation to update the cluster to one of the compatible
Apache Kafka versions.

131
Amazon Managed Streaming for
Apache Kafka Developer Guide
Consumer group stuck in PreparingRebalance state

Troubleshooting your Amazon MSK


cluster
The following information can help you troubleshoot problems that you might have with your Amazon
MSK cluster. You can also post your issue to Amazon Web Services re:Post.

Topics
• Consumer group stuck in PreparingRebalance state (p. 132)
• Error delivering broker logs to Amazon CloudWatch Logs (p. 133)
• No default security group (p. 133)
• Cluster appears stuck in the CREATING state (p. 134)
• Cluster state goes from CREATING to FAILED (p. 134)
• Cluster state is ACTIVE but producers cannot send data or consumers cannot receive data (p. 134)
• Amazon CLI doesn't recognize Amazon MSK (p. 134)
• Partitions go offline or replicas are out of sync (p. 134)
• Disk space is running low (p. 134)
• Memory running low (p. 134)
• Producer gets NotLeaderForPartitionException (p. 135)
• Under-replicated partitions (URP) greater than zero (p. 135)
• Cluster has topics called __amazon_msk_canary and __amazon_msk_canary_state (p. 135)
• Partition replication fails (p. 135)
• Unable to access cluster that has public access turned on (p. 135)
• Unable to access cluster from within Amazon: Networking issues (p. 136)
• Failed authentication: Too many connects (p. 137)
• MSK Serverless: Cluster creation fails (p. 137)

Consumer group stuck in PreparingRebalance


state
If one or more of your consumer groups is stuck in a perpetual rebalancing state, the cause might be
Apache Kafka issue KAFKA-9752, which affects Apache Kafka versions 2.3.1 and 2.4.1.

To resolve this issue, we recommend that you upgrade your cluster to Amazon MSK bug-fix version
2.4.1.1 (p. 128), which contains a fix for this issue. For information about updating an existing cluster to
Amazon MSK bug-fix version 2.4.1.1, see Updating the Apache Kafka version (p. 129).

The workarounds for solving this issue without upgrading the cluster to Amazon MSK bug-fix version
2.4.1.1 are to either set the Kafka clients to use Static membership protocol (p. 132) , or to Identify and
reboot (p. 133) the coordinating broker node of the stuck consumer group.

Implementing static membership protocol


To implement Static Membership Protocol in your clients, do the following:

132
Amazon Managed Streaming for
Apache Kafka Developer Guide
Identify and reboot

1. Set the group.instance.id property of your Kafka Consumers configuration to a static string
that identifies the consumer in the group.
2. Ensure that other instances of the configuration are updated to use the static string.
3. Deploy the changes to your Kafka Consumers.

Using Static Membership Protocol is more effective if the session timeout in the client configuration
is set to a duration that allows the consumer to recover without prematurely triggering a consumer
group rebalance. For example, if your consumer application can tolerate 5 minutes of unavailability, a
reasonable value for the session timeout would be 4 minutes instead of the default value of 10 seconds.
Note
Using Static Membership Protocol only reduces the probability of encountering this issue. You
may still encounter this issue even when using Static Membership Protocol.

Rebooting the coordinating broker node


To reboot the coordinating broker node, do the following:

1. Identify the group coordinator using the kafka-consumer-groups.sh command.


2. Restart the group coordinator of the stuck consumer group using the RebootBroker API action.

Error delivering broker logs to Amazon


CloudWatch Logs
When you try to set up your cluster to send broker logs to Amazon CloudWatch Logs, you might get one
of two exceptions.

If you get an InvalidInput.LengthOfCloudWatchResourcePolicyLimitExceeded exception,


try again but use log groups that start with /aws/vendedlogs/. For more information, see Enabling
Logging from Certain Amazon Web Services.

If you get an InvalidInput.NumberOfCloudWatchResourcePoliciesLimitExceeded exception,


choose an existing Amazon CloudWatch Logs policy in your account, and append the following JSON to
it.

{"Sid":"AWSLogDeliveryWrite","Effect":"Allow","Principal":
{"Service":"delivery.logs.amazonaws.com"},"Action":
["logs:CreateLogStream","logs:PutLogEvents"],"Resource":["*"]}

If you try to append the JSON above to an existing policy but get an error that says you've reached the
maximum length for the policy you picked, try to append the JSON to another one of your Amazon
CloudWatch Logs policies. After you append the JSON to an existing policy, try once again to set up
broker-log delivery to Amazon CloudWatch Logs.

No default security group


If you try to create a cluster and get an error indicating that there's no default security group, it might be
because you are using a VPC that was shared with you. Ask your administrator to grant you permission
to describe the security groups on this VPC and try again. For an example of a policy that allows
this action, see Amazon EC2: Allows Managing EC2 Security Groups Associated With a Specific VPC,
Programmatically and in the Console .

133
Amazon Managed Streaming for
Apache Kafka Developer Guide
Cluster appears stuck in the CREATING state

Cluster appears stuck in the CREATING state


Sometimes cluster creation can take up to 30 minutes. Wait for 30 minutes and check the state of the
cluster again.

Cluster state goes from CREATING to FAILED


Try creating the cluster again.

Cluster state is ACTIVE but producers cannot send


data or consumers cannot receive data
• If the cluster creation succeeds (the cluster state is ACTIVE), but you can't send or receive data, ensure
that your producer and consumer applications have access to the cluster. For more information, see the
guidance in the section called “Step 2: Create a client machine” (p. 5).

• If your producers and consumers have access to the cluster but still experience problems producing
and consuming data, the cause might be KAFKA-7697, which affects Apache Kafka version 2.1.0 and
can lead to a deadlock in one or more brokers. Consider migrating to Apache Kafka 2.2.1, which is not
affected by this bug. For information about how to migrate, see Migration (p. 104).

Amazon CLI doesn't recognize Amazon MSK


If you have the Amazon CLI installed, but it doesn't recognize the Amazon MSK commands, upgrade
your Amazon CLI to the latest version. For detailed instructions on how to upgrade the Amazon CLI, see
Installing the Amazon Command Line Interface. For information about how to use the Amazon CLI to run
Amazon MSK commands, see How it works (p. 10).

Partitions go offline or replicas are out of sync


These can be symptoms of low disk space. See the section called “Disk space is running low” (p. 134).

Disk space is running low


See the following best practices for managing disk space: the section called “Monitor disk
space” (p. 140) and the section called “Adjust data retention parameters” (p. 140).

Memory running low


If you see the MemoryUsed metric running high or MemoryFree running low, that doesn't mean there's a
problem. Apache Kafka is designed to use as much memory as possible, and it manages it optimally.

134
Amazon Managed Streaming for
Apache Kafka Developer Guide
Producer gets NotLeaderForPartitionException

Producer gets NotLeaderForPartitionException


This is often a transient error. Set the producer's retries configuration parameter to a value that's
higher than its current value.

Under-replicated partitions (URP) greater than


zero
The UnderReplicatedPartitions metric is an important one to monitor. In a healthy MSK cluster,
this metric has the value 0. If it's greater than zero, that might be for one of the following reasons.

• If UnderReplicatedPartitions is spiky, the issue might be that the cluster isn't provisioned at the
right size to handle incoming and outgoing traffic. See Best practices (p. 138).
• If UnderReplicatedPartitions is consistently greater than 0 including during low-traffic periods,
the issue might be that you've set restrictive ACLs that don't grant topic access to brokers. To replicate
partitions, brokers must be authorized to both READ and DESCRIBE topics. DESCRIBE is granted by
default with the READ authorization. For information about setting ACLs, see Authorization and ACLs
in the Apache Kafka documentation.

Cluster has topics called __amazon_msk_canary


and __amazon_msk_canary_state
You might see that your MSK cluster has a topic with the name __amazon_msk_canary and another
with the name __amazon_msk_canary_state. These are internal topics that Amazon MSK creates and
uses for cluster health and diagnostic metrics. These topics are negligible in size and can't be deleted.

Partition replication fails


Ensure that you haven't set ACLs on CLUSTER_ACTIONS.

Unable to access cluster that has public access


turned on
If your cluster has public access turned on, but you still cannot access it from the internet, follow these
steps:

1. Ensure that the cluster's security group's inbound rules allow your IP address and the cluster's port.
For a list of cluster port numbers, see the section called “Port information” (p. 103). Also ensure
that the security group's outbound rules allow outbound communications. For more information
about security groups and their inbound and outbound rules, see Security groups for your VPC in the
Amazon VPC User Guide.
2. Make sure that your IP address and the cluster's port are allowed in the inbound rules of the cluster's
VPC network ACL. Unlike security groups, network ACLs are stateless. This means that you must
configure both inbound and outbound rules. In the outbound rules, allow all traffic (port range:

135
Amazon Managed Streaming for
Apache Kafka Developer Guide
Unable to access cluster from
within Amazon: Networking issues
0-65535) to your IP address. For more information, see Add and delete rules in the Amazon VPC
User Guide.
3. Make sure that you are using the public-access bootstrap-brokers string to access the cluster. An
MSK cluster that has public access turned on has two different bootstrap-brokers strings, one for
public access, and one for access from within Amazon. For more information, see the section called
“Getting the bootstrap brokers using the Amazon Web Services Management Console” (p. 16).

Unable to access cluster from within Amazon:


Networking issues
If you have an Apache Kafka application that is unable to communicate successfully with an MSK cluster,
start by performing the following connectivity test.

1. Use any of the methods described in the section called “Getting the bootstrap brokers” (p. 16) to get
the addresses of the bootstrap brokers.
2. In the following command replace bootstrap-broker with one of the broker addresses that you
obtained in the previous step. Replace port-number with 9094 if the cluster is set up to use TLS
authentication. If the cluster doesn't use TLS authentication, replace port-number with 9092. Run
the command from the client machine.

telnet bootstrap-broker port-number

3. Repeat the previous command for all the bootstrap brokers.


4. Use any of the methods described in the section called “Getting the Apache ZooKeeper connection
string” (p. 14) to get the addresses of the cluster's Apache ZooKeeper nodes.
5. On the client machine run the following command, replacing Apache-ZooKeeper-node with the
address of one of the Apache ZooKeeper nodes that you obtained in the previous step. The number
2181 is the port number. Repeat for all the Apache ZooKeeper nodes.

telnet Apache-ZooKeeper-node 2181

If the client machine is able to access the brokers and the Apache ZooKeeper nodes, this means there
are no connectivity issues. In this case, run the following command to check whether your Apache Kafka
client is set up correctly. To get bootstrap-brokers, use any of the methods described in the section
called “Getting the bootstrap brokers” (p. 16). Replace topic with the name of your topic.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-list bootstrap-


brokers --producer.config client.properties —topic topic

If the previous command succeeds, this means that your client is set up correctly. If you're still unable to
produce and consume from an application, debug the problem at the application level.

If the client machine is unable to access the brokers and the Apache ZooKeeper nodes, see the following
subsections for guidance that is based on your client-machine setup.

Amazon EC2 client and MSK cluster in the same VPC


If the client machine is in the same VPC as the MSK cluster, make sure the cluster's security group has an
inbound rule that accepts traffic from the client machine's security group. For information about setting
up these rules, see Security Group Rules. For an example of how to access a cluster from an Amazon EC2
instance that's in the same VPC as the cluster, see Getting started (p. 5).

136
Amazon Managed Streaming for
Apache Kafka Developer Guide
Amazon EC2 client and MSK cluster in different VPCs

Amazon EC2 client and MSK cluster in different VPCs


If the client machine and the cluster are in two different VPCs, ensure the following:

• The two VPCs are peered.


• The status of the peering connection is active.
• The route tables of the two VPCs are set up correctly.

For information about VPC peering, see Working with VPC Peering Connections.

On-premises client
In the case of an on-premises client that is set up to connect to the MSK cluster using Amazon VPN,
ensure the following:

• The VPN connection status is UP. For information about how to check the VPN connection status, see
How do I check the current status of my VPN tunnel?.
• The route table of the cluster's VPC contains the route for an on-premises CIDR whose target has the
format Virtual private gateway(vgw-xxxxxxxx).
• The MSK cluster's security group allows traffic on port 2181, port 9092 (if your cluster accepts
plaintext traffic), and port 9094 (if your cluster accepts TLS-encrypted traffic).

For more Amazon VPN troubleshooting guidance, see Troubleshooting Client VPN.

Amazon Direct Connect


If the client uses Amazon Direct Connect, see Troubleshooting Amazon Direct Connect.

If the previous troubleshooting guidance doesn't resolve the issue, ensure that no firewall is blocking
network traffic. For further debugging, use tools like tcpdump and Wireshark to analyze traffic and to
make sure that it is reaching the MSK cluster.

Failed authentication: Too many connects


The Failed authentication ... Too many connects error indicates that a broker is protecting
itself because one or more IAM clients are trying to connect to it at an aggressive rate. To help
brokers accept a higher rate of new IAM connections, you can increase the reconnect.backoff.ms
configuration parameter.

To learn more about the rate limits for new connections per broker, see the Amazon MSK quota (p. 123)
page.

MSK Serverless: Cluster creation fails


If you try to create an MSK Serverless cluster and the workflow fails, you may not have permission
to create a VPC endpoint. Verify that your administrator has granted you permission to create a VPC
endpoint by allowing the ec2:CreateVpcEndpoint action.

For a complete list of permissions required to perform all Amazon MSK actions, see Amazon managed
policy: AmazonMSKFullAccess (p. 69).

137
Amazon Managed Streaming for
Apache Kafka Developer Guide
Right-size your cluster: Number of partitions per broker

Best practices
This topic outlines some best practices to follow when using Amazon MSK.

Right-size your cluster: Number of partitions per


broker
The following table shows the maximum number of partitions (including leader and follower replicas)
that you can have per broker.

Broker type Maximum number of partitions (including


leader and follower replicas) per broker

kafka.t3.small 300

kafka.m5.large or kafka.m5.xlarge 1000

kafka.m5.2xlarge 2000

kafka.m5.4xlarge, kafka.m5.8xlarge, 4000


kafka.m5.12xlarge, kafka.m5.16xlarge, or
kafka.m5.24xlarge

If the number of partitions per broker exceeds the maximum value specified in the previous table, you
cannot perform any of the following operations on the cluster:

• Update the cluster configuration


• Update the Apache Kafka version for the cluster
• Update the cluster to a smaller broker type
• Associate an Amazon Secrets Manager secret with a cluster that has SASL/SCRAM authentication

For guidance on choosing the number of partitions, see Apache Kafka Supports 200K Partitions
Per Cluster. We also recommend that you perform your own testing to determine the right type for
your brokers. For more information about the different broker types, see the section called “Broker
types” (p. 10).

Right-size your cluster: Number of brokers per


cluster
To determine the right number of brokers for your MSK cluster and understand costs, see the MSK
Sizing and Pricing spreadsheet. This spreadsheet provides an estimate for sizing an MSK cluster and
the associated costs of Amazon MSK compared to a similar, self-managed, EC2-based Apache Kafka
cluster. For more information about the input parameters in the spreadsheet, hover over the parameter

138
Amazon Managed Streaming for
Apache Kafka Developer Guide
Build highly available clusters

descriptions. Estimates provided by this sheet are conservative and provide a starting point for a new
cluster. Cluster performance, size, and costs are dependent on your use case and we recommend that you
verify them with actual testing.

To understand how the underlying infrastructure affects Apache Kafka performance, see Best practices
for right-sizing your Apache Kafka clusters to optimize performance and cost in the Amazon Big Data
Blog. The blog post provides information about how to size your clusters to meet your throughput,
availability, and latency requirements. It also provides answers to questions such as when you should
scale up versus scale out, and guidance on how to continuously verify the size of your production
clusters.

Build highly available clusters


Use the following recommendations so that your MSK cluster can be highly available during an update
(such as when you're updating the broker type or Apache Kafka version, for example) or when Amazon
MSK is replacing a broker.

• Set up a three-AZ cluster.


• Ensure that the replication factor (RF) is at least 3. Note that a RF of 1 can lead to offline partitions
during a rolling update; and a RF of 2 may lead to data loss.
• Set minimum in-sync replicas (minISR) to at most RF - 1. A minISR that is equal to the RF can prevent
producing to the cluster during a rolling update. A minISR of 2 allows three-way replicated topics to be
available when one replica is offline.
• Ensure client connection strings include at least one broker from each availability zone. Having
multiple brokers in a client's connection string allows for failover when a specific broker is offline for
an update. For information about how to get a connection string with multiple brokers, see the section
called “Getting the bootstrap brokers” (p. 16).

Monitor CPU usage


Amazon MSK strongly recommends that you maintain the total CPU utilization for your brokers (defined
as CPU User + CPU System) under 60%. When you have at least 40% of your cluster's total CPU
available, Apache Kafka can redistribute CPU load across brokers in the cluster when necessary. One
example of when this is necessary is when Amazon MSK detects and recovers from a broker fault; in
this case, Amazon MSK performs automatic maintenance, like patching. Another example is when a
user requests a broker-type change or version upgrade; in these two cases, Amazon MSK deploys rolling
workflows that take one broker offline at a time. When brokers with lead partitions go offline, Apache
Kafka reassigns partition leadership to redistribute work to other brokers in the cluster. By following
this best practice you can ensure you have enough CPU headroom in your cluster to tolerate operational
events like these.

You can use Amazon CloudWatch metric math to create a composite metric that is CPU User + CPU
System. Set an alarm that gets triggered when the composite metric reaches an average CPU utilization
of 60%. When this alarm is triggered, scale the cluster using one of the following options:

• Option 1 (recommended): Update your broker type to the next larger type. For example, if the current
type is kafka.m5.large, update the cluster to use kafka.m5.xlarge. Keep in mind that when
you update the broker type in the cluster, Amazon MSK takes brokers offline in a rolling fashion
and temporarily reassigns partition leadership to other brokers. A size update typically takes 10-15
minutes per broker.
• Option 2: If there are topics with all messages ingested from producers that use round-robin writes (in
other words, messages aren't keyed and ordering isn't important to consumers), expand your cluster by

139
Amazon Managed Streaming for
Apache Kafka Developer Guide
Monitor disk space

adding brokers. Also add partitions to existing topics with the highest throughput. Next, use kafka-
topics.sh --describe to ensure that newly added partitions are assigned to the new brokers. The
main benefit of this option compared to the previous one is that you can manage resources and costs
more granularly. Additionally, you can use this option if CPU load significantly exceeds 60% because
this form of scaling doesn't typically result in increased load on existing brokers.
• Option 3: Expand your cluster by adding brokers, then reassign existing partitions by using the
partition reassignment tool named kafka-reassign-partitions.sh. However, if you use this
option, the cluster will need to spend resources to replicate data from broker to broker after partitions
are reassigned. Compared to the two previous options, this can significantly increase the load on the
cluster at first. As a result, Amazon MSK doesn't recommend using this option when CPU utilization
is above 70% because replication causes additional CPU load and network traffic. Amazon MSK only
recommends using this option if the two previous options aren't feasible.

Other recommendations:

• Monitor total CPU utilization per broker as a proxy for load distribution. If brokers have consistently
uneven CPU utilization it might be a sign that load isn't evenly distributed within the cluster. Amazon
MSK recommends using Cruise Control to continuously manage load distribution via partition
assignment.
• Monitor produce and consume latency. Produce and consume latency can increase linearly with CPU
utilization.

Monitor disk space


To avoid running out of disk space for messages, create a CloudWatch alarm that watches the
KafkaDataLogsDiskUsed metric. When the value of this metric reaches or exceeds 85%, perform one
or more of the following actions:

• Use the section called “Automatic scaling” (p. 20). You can also manually increase broker storage as
described in the section called “Manual scaling” (p. 22).
• Reduce the message retention period or log size. For information on how to do that, see the section
called “Adjust data retention parameters” (p. 140).
• Delete unused topics.

For information on how to set up and use alarms, see Using Amazon CloudWatch Alarms. For a full list of
Amazon MSK metrics, see Monitoring a cluster (p. 107).

Adjust data retention parameters


Consuming messages doesn't remove them from the log. To free up disk space regularly, you can
explicitly specify a retention time period, which is how long messages stay in the log. You can also
specify a retention log size. When either the retention time period or the retention log size are reached,
Apache Kafka starts removing inactive segments from the log.

To specify a retention policy at the cluster level, set one or more of the following
parameters: log.retention.hours, log.retention.minutes, log.retention.ms, or
log.retention.bytes. For more information, see the section called “Custom configurations” (p. 34).

You can also specify retention parameters at the topic level:

• To specify a retention time period per topic, use the following command.

140
Amazon Managed Streaming for
Apache Kafka Developer Guide
Monitor Apache Kafka memory

kafka-configs.sh --zookeeper ZooKeeperConnectionString --alter --entity-type topics --


entity-name TopicName --add-config retention.ms=DesiredRetentionTimePeriod

• To specify a retention log size per topic, use the following command.

kafka-configs.sh --zookeeper ZooKeeperConnectionString --alter --entity-type topics --


entity-name TopicName --add-config retention.bytes=DesiredRetentionLogSize

The retention parameters that you specify at the topic level take precedence over cluster-level
parameters.

Monitor Apache Kafka memory


We recommend that you monitor the memory that Apache Kafka uses. Otherwise, the cluster may
become unavailable.

To determine how much memory Apache Kafka uses, you can monitor the HeapMemoryAfterGC metric.
HeapMemoryAfterGC is the percentage of total heap memory that is in use after garbage collection. We
recommend that you create a CloudWatch alarm that takes action when HeapMemoryAfterGC increases
above 60%.

The steps that you can take to decrease memory usage vary. They depend on the way that you
configure Apache Kafka. For example, if you use transactional message delivery, you can decrease the
transactional.id.expiration.ms value in your Apache Kafka configuration from 604800000 ms
to 86400000 ms (from 7 days to 1 day). This decreases the memory footprint of each transaction.

Don't add non-MSK brokers


If you use Apache ZooKeeper commands to add brokers, these brokers don't get added to your MSK
cluster, and your Apache ZooKeeper will contain incorrect information about the cluster. This might
result in data loss. For supported cluster operations, see How it works (p. 10).

Enable in-transit encryption


For information about encryption in transit and how to enable it, see the section called “Encryption in
transit” (p. 58).

Reassign partitions
To move partitions to different brokers on the same cluster, you can use the partition reassignment
tool named kafka-reassign-partitions.sh. For example, after you add new brokers to expand
a cluster, you can rebalance that cluster by reassigning partitions to the new brokers. For information
about how to add brokers to a cluster, see the section called “Expanding a cluster” (p. 26). For
information about the partition reassignment tool, see Expanding your cluster in the Apache Kafka
documentation.

141
Amazon Managed Streaming for
Apache Kafka Developer Guide

Amazon glossary
For the latest Amazon terminology, see the Amazon glossary in the Amazon General Reference.

142

You might also like