Cloud (AWS) Core Concepts
Cloud (AWS) Core Concepts
Observability
Operational Excellence
Mental Model :
1. Fault Isolation
When thinking about operational
excellence in the cloud, it is useful
2. Limits to think of it in terms of
automation.
Mental Model Reliability
When thinking about reliability in
the cloud, it is useful to think in 1. Identity and Access Management (IAM)
terms of blast radius. To build
reliable systems, you want to 2. Network Security
minimize the blast radius of any
individual component.
3. Data Encryption
1.1.1. IaC is the process of managing infrastructure through machine-readable configuration files
1.1.3. You can apply the same tools and processes to your infrastructure as you do to your code
1.1.4. Use services like CloudFormation and CDK to implement IaC on AWS
1.2. Observability
1.2.1. Observability is the process of measuring the internal state of your system to achieve some
desired end state
1.2.3. You can collect metrics at the service, application, and account level
1.2.4. You can analyze metrics through services like CloudWatch Log Insight, Athena, Elasticsearch
Service, RDS, and Redshift
1.2.5. You can act on your metrics by creating monitoring and alarms and dashboards and tracking
performance and business KPIs
1.3. Mental Model : When thinking about operational excellence in the cloud, it is useful to think of it in
terms of **automation.**
2. Security
2.1. 1. Identity and Access Management (IAM)
2.1.1. * IAM policies declare the access boundaries for entities within AWS
2.1.3. * IAM policies can be used to enforce the principle of least privilege
2.1.4. * IAM has many policy types - identity-based and resource-based are two examples
2.1.5. * IAM evaluates access based on evaluating all policy types that are applicable for a given
resource
2.2.1. * Network security involves mechanisms designed to safeguard the access and usability of the
network and network-accessible resources
2.2.2. * A zero trust approach to network security involves implementing defense in depth at all layers
of your network
2.2.3. * VPCs and WAFs allow you to apply security measures at the network level
2.2.4. * Security groups allow you to apply security measures at the resource level
2.3.1. * Encryption is the process of encoding information in such a way that only parties with the
correct key can decipher the information
2.3.2. * Secure data by encrypting it in transit and at rest
2.3.3. * All storage and database services on AWS provide encryption at rest and in transit
2.3.4. * You can use AWS networking services like the ALB to enforce encryption in transit for your
own services
2.3.5. * You can use a CMK to unlock advanced functionality like creating audit trails, using your own
custom keys, and automatic key rotation
2.4. Mental Model When thinking about security in the cloud, it is useful to adopt the model of zero trust.
In this model, all application components and services are considered discrete and potentially malicious
entities. This involves the underlying network fabric, all agents that have access to your resources, as
well as the software that runs inside your service.
3. Cost Optimization
3.1. 1. Pay For Use
3.1.1. AWS services are pay for use - you get charged on the capacity that you use
3.1.2. You can right size your instances to save money on services that don’t match your workload
3.1.3. You can use serverless technologies to ensure you only pay when customers use your service
3.1.4. You can use reservations to get discounts in exchange for an upfront commitment
3.1.5. You can use spot instances to get discounts running fault-tolerant workloads
3.2. 2. Cost Optimization Lifecycle
3.2.1. The cost optimization lifecycle is a continuous process to improve your cloud spend over time
3.2.2. The cost optimization lifecycle consists of reviewing, tracking, and optimizing your spend
3.2.3. Reviewing your spend involves the use of tools like Cost Explorer and the cost and usage
report to understand your spend
3.2.4. Tracking your spend involves the use of cost allocation tags and budgets to filter the data along
dimensions relevant to your business
3.2.5. Optimizing your spend involves using techniques from the previous section as part of an
overarching budget goal
3.3. Mental Model When thinking about cost optimization in the cloud, it is useful to think of cloud spend
in terms of OpEx instead of CapEx. OpEx is an ongoing pay-as-you-go model whereas CapEx is a one-
time purchase model.
4. Performance Efficiency
4.1. 1. Selection
4.1.1. Implementing a workload on AWS involves selecting services across the compute, storage,
database and network categories
4.1.2. Within each category, you can select the right type of service based on your use case
4.1.3. Within each type, you can select the specific service based on your desired degree of
management
4.1.4. Within each service, you can select the specific configuration based on the specific
performance characteristics you want to achieve
4.2. 2. Scaling
4.2.1. * Scaling vertically is simpler operationally but represents an availability risk and has lower
limits
4.2.2. * Scaling horizontally requires more overhead but comes with much better reliability and much
higher limits
4.3. Mental Model When thinking about performance efficiency in the cloud, it is useful to think of your
services as cattle, not pets.
5. Reliability
You can think of blast radius as the maximum impact that might be sustained in the event of a
system failure.
5.1.1. Use fault isolation zones to limit the blast radius of service or infrastructure disruptions
5.1.2. Fault isolation at the resource and request level is built into the design of every AWS service -
this requires no additional actions on your part
5.1.3. Fault isolation at the AZ level is achieved by deploying your services across multiple AZs - this
can be done with minimal latency impact
5.1.4. Fault isolation at the region level is achieved by deploying your services across multiple regions
- this requires significant operational overhead
5.2. 2. Limits
5.2.1. Limits are constraint that can be applied to protect a service from excessive load
5.2.2. AWS service limits can be tracked and managed using the Service Quota service
5.2.3. There are soft limits which can be increased and hard limits which can not
5.2.4. Monitor limits for services that you are using and plan your limit increases accordingly to avoid
service disruption
5.3. Mental Model When thinking about reliability in the cloud, it is useful to think in terms of blast radius.
To build reliable systems, you want to minimize the blast radius of any individual component.