AmazonKinesisDataStreamsandFirehose:AComprehensiveGuide
立即解锁
发布时间: 2025-08-30 01:42:07 阅读量: 10 订阅数: 12 AIGC 

### Amazon Kinesis Data Streams and Firehose: A Comprehensive Guide
#### 1. Monitoring and Scaling with Amazon Kinesis Data Streams
When designing data pipelines, ensuring reliability and scalability at each stage is crucial. As data volume or velocity spikes, the system should adapt to maintain the data flow. For instance, the Kinesis Data Stream Scaling Utility can adjust the shard count based on changes in data volume and velocity.
##### 1.1 CloudWatch Metrics for Amazon Kinesis Data Streams
Amazon KDS and Amazon CloudWatch are closely integrated. With minimal effort, you can collect, view, and analyze metrics for data streams, producers, and consumers using CloudWatch. Stream - level metrics are enabled by default upon stream creation.
| Metric | Description |
| ---- | ---- |
| IncomingBytes and OutgoingBytes | Helps determine the correct number of shards in the stream |
| WriteProvisionedThroughputExceeded and ReadProvisionedThroughputExceeded | Monitors if producers and consumers exceed the stream's capacity |
| MillisBehindLatest | Indicates how far behind the GetRecords response is from the stream's head |
Stream metrics are automatically collected and sent to CloudWatch every minute. Default metrics have no additional cost, but enhanced metrics do. CloudWatch can monitor various stream metrics such as record throughput, consumer latency, and failures. These metrics can trigger dynamic scaling processes.
Here are some of the key metrics recorded in CloudWatch:
- **PutRecord.Bytes**: Total bytes put into the Amazon Kinesis stream over a specified time.
- **PutRecord.Latency**: Monitors the performance of the PutRecord operation over a specified time.
- **PutRecord.Success**: Counts the successful PutRecord operations over a specified time.
- **WriteProvisionedThroughputExceeded**: Number of records rejected due to exceeded write capacity.
- **GetRecords.IteratorAgeMilliseconds**: Monitors data processing flow performance. A value close to zero means consumers have caught up with the stream's data.
The following entities send relevant metrics to CloudWatch:
- **CloudWatch metrics**: Kinesis Data Streams send detailed monitoring metrics for each stream and optionally at the shard level.
- **Kinesis Agent**: Sends custom metric data to monitor producer performance and stability.
- **API logging**: Kinesis Data Streams send API event data to AWS CloudTrail.
- **The KCL**: Sends custom metrics to monitor consumer performance and stability.
- **The KPL**: Sends custom metrics to monitor the producer application's performance and stability.
##### 1.2 X - Ray Tracing with Amazon Kinesis Data Streams
As records flow through multiple components, tracing data from its origin to the destination is essential. Data lineage involves tracking the data's origin and flow between different data systems. AWS X - Ray provides visibility for tracing errors and monitoring performance. It can track and display data as it moves from the source to the processed destination, offering a visual map of errors with links to insights for finding root causes.
AWS X - Ray works by adding tracing markers to requests and logs. Applications can use the AWS X - Ray SDK to include custom tracing annotations for custom context data in tracing analytics.
##### 1.3 Scaling up with Amazon Kinesis Data Streams
Kinesis manages many aspects of data stream operation, including storage, security, replication, sharding, and monitoring. However, it doesn't offer out - of - the - box shard autoscaling based on data velocity. The Kinesis Scaling Utility (https://github.com/awslabs/amazon - kinesis - scaling - utils) is an open - source, Java - based tool that can automatically adjust the shard count as stream shards approach capacity intervals. It is useful for handling seasonal data spikes, like those in SmartCity weekday morning and evening peak usage.
##### 1.4 Securing Amazon Kinesis Data Streams
When building data pipelines, data and infrastructure security are based on business requirements. Key security practices include:
**Implementing least - privilege access**: Decide the necessary permissions for users and integrated services. For example, a producer may only need write access, while a consumer may only need read access. Implementing least - privilege access reduces risks such as malicious intent or errors.
**Using IAM roles**: Instead of granting long - term credentials, use IAM roles for producer and consumer applications. Roles provide short - lived, automatically rotated temporary credentials that can be applied directly to EC2 instances or Lambda functions.
**
0
0
复制全文
相关推荐






