GCP-Cloud Engineer
GCP-Cloud Engineer
A Google Cloud project is an organizing entity for your Google Cloud resources.
---------- Colocation (physical servers)--------> virtualized data centre (VMs) ----------> container based architecture
Quotas:
Rate quota --> resets after a specific time
Allocation quota --> governs number of resources
Project:
Project ID, Project number- assigned by google, globally unique, immutable
Project name - chosen by customer, mutable
Folder:
Folder can contain other folders, projects
Organization Node: Cloud SDK : all below tools are under bin directory
Org. policy administrator, Project Creator gsutil --> command line tool for cloud storage
gcloud --> command line tool for google cloud products/services
bq --> command line tool for BigQuery
Cloud Shell :
Command line access from browser
Debian based VM with 5 GB persistent home dir
[Link] -gcp-a-decision-tree
[Link]
[Link]
[Link]
Compute Engine: Shielded VMs are hardened virtual machines that use Secure Boot, virtual trusted
Billed per second, 1 minute minimum platform module enabled Measured Boot, and integrity monitoring.
Sustained use discounts / committed use discounts
Normally the boot disk defaults to being deleted automatically when the instance is
deleted. But sometimes you will want to override this behaviour. This feature is very
important because you cannot create an image from a boot disk when it is attached to a
running instance.
So you would need to disable Delete boot disk when instance is deleted to enable
creating a system image from the boot disk.
create VM (enable keep the disk option) --> install Apache web server and configure it to boot after VM start/restart --> reset (stop and reboot) the VM and verify the
web server is up --> then delete the VM --> verify disk is still available -- > create image from the disk --> delete the disk
An instance template is an API resource that you can use to create VM instances and managed instance groups. Instance templates define the machine type, boot disk
image, subnet, labels, and other instance properties.
Managed instance group health checks proactively signal to delete and recreate instances that become unhealthy.
CPU utilization, HTTP load balancing utilization, Cloud pub/sub queue, Cloud monitoring metric
BigQuery data transfer --> To connect your web host (VM instance hosting your web app) to cloud SQL, add
Cloud Storage to BigQuery data transfer, free service, done using a single command a network (IP or set of IP of your host), in the Connections tab.
Try to put web app host and SQL instance in the same region and zone to achieve
Cloud Spanner: best performance
SQL relational DBMS with joins and secondary indices, HA, strong global consistency, DB size > 2 TB, high
number of IOPS
Bigtable is based on HBASE API. Check out !
Firestore:
Automatic multi-region data replication, strong consistency guarantee, atomic batch operations, real
transaction support
Bigtable:
Powers google maps, gmail, google search, google analytics
In Google Cloud, load balancers can be proxied or pass-through. Proxied load balancers terminate
connections and proxy them to new connections internally. Pass-through load balancers pass the
connections directly to the backends.
3 types of Role:
Cloud Identity:
Organizations can define policies and manage their users and groups using Google Cloud Console
• Folders are used to group resources that share common IAM policies.
• Service accounts are specific to a set of operating requirements within a project.
• Permissions are associated with roles but not directly with folders.
• IAM roles are granted to identities, not folders.
VPC combines scalability of public cloud with privacy of private cloud (private data centre)
Google VPC networks are global and can have subnets in any google cloud region worldwide
VPC creation in auto mode -- subnets are created in each region automatically
Deny-all-ingress and allow-all-egress have lower priority (higher integers)
Routes -
map an IP range to a destination
are created when a subnet is created
have destination mentioned in CIDR notation
must match with firewall rules to deliver the traffic
Every VPC network has two implied firewall rules that block all
incoming connections and allow all outgoing connections.
You can SSH into a VM instance that does not have external IP address using IAP
(identity aware proxy) tunnel.
When instances do not have external IP addresses, they can only be reached by
other instances on the network via a managed VPN gateway or via a Cloud IAP
tunnel. Cloud IAP enables context-aware access to VMs via SSH and RDP without
bastion hosts. To learn more about this, see the blog post Cloud IAP enables context-
aware access to VMs via SSH and RDP without bastion hosts .
IAP uses your existing project roles and permissions when you connect to VM
instances. By default, instance owners are the only users that have the IAP Secured
Tunnel User role.
The Cloud NAT gateway implements outbound NAT, but not inbound
NAT. In other words, hosts outside of your VPC network can only
respond to connections initiated by your instances; they cannot initiate
their own, new connections to your instances via NAT.
Constraints are the standard way to restrict where resources can be created and applying policies with constraints
Cloud Audit Logs maintain three audit logs: will enforce those constraints for all resources in the organization. If the policy were applied at the folder level, it
Admin Activity logs would have to be applied for all folders and that is not as efficient as applying at the organization level.
Data Access logs
System Event logs. Four golden signals: Latency, Traffic, Saturation, Errors
Types of logs
Can create custom metrics (in addition to what GCP already provides):
Either use cloud monitoring API
OR
Use OpenCensus
There are few resources that generate a lot of logs, e.g. logging agent on VM instance, cloud load balancer. We can exclude those from our logs view.
Common logs that we can exclude are:
Load balancer (90%)
VPC flow logs (percentages, CIDR)
HTTP 200 OK messages
Google's free DNS service - [Link] Cloud Identity provides domain verification records, which are added to DNS settings for the
domain.
Cloud DNS - 100% uptime SLA
Google Cloud alias IP ranges let you assign ranges of internal IP addresses as aliases to a
virtual machine's (VM) network interfaces. This is useful if you have multiple services
running on a VM and you want to assign each service a different IP address. Alias IP
ranges also work with GKE Pods.
If you have only one service running on a VM, you can reference it by using the
interface's primary IP address. If you have multiple services running on a VM, you might
want to assign each one a different internal IP address.
Rate quotas: resets at regular intervals, e.g. 1000 API calls per 100 seconds
Allocation quotas: e.g. 5 VPC networks per project
Google Cloud HTTP(S) load balancing is implemented at the edge of Google's network in Google's
points of presence (POP) around the world. User traffic directed to an HTTP(S) load balancer enters
the POP closest to the user and is then load-balanced over Google's global network to the closest
backend that has sufficient available capacity.
Security:
Migrate For Anthos: below architecture depicts how migrate for Anthos works step by step
Migrate for compute engine: bring your application in on-prem/non-gcp cloud into VMs in google cloud
Standard:
Persistent storage with queries, sorting and transactions
Auto scaling and load balancing
Async task queues for performing work outside the scope of a request
Scheduled tasks for triggering events at specified times or regular intervals
Integration with other GCP services and API
APIgee Edge
Cloud endpoints
The gcloud command often requires you to specify values such as a Region, Zone, or Project ID. Entering them
repeatedly increases the chance of making typing errors. If you use Cloud Shell frequently, you may want to set common
values in environment variables and use them instead of typing the actual values.
You can use environment variables like this in gcloud commands to reduce the opportunities for typos and so that you
won't have to remember a lot of detailed information
INFRACLASS_REGION= asia-east1
Add env variables (project id, name, region, zone etc.) in a config file
and add below command to .profile (bash profile) so these variables are
always loaded when you open up cloud shell:
source some_dir_name/config
create a subnet
gcloud compute networks subnets create managementsubnet-us --project=qwiklabs-gcp-03-9f64a1c6b0f0 --range=[Link]/20 --stack-type=IPV4_ONLY --
network=managementnet --region=us-central1
create a VM
gcloud compute instances create privatenet-us-vm --zone=us-central1-c --machine-type=f1-micro --subnet=privatesubnet-us --image-family=debian-10 --image-project=debian-cloud --
boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=privatenet-us-vm
Connect your on-prem network to your GCP VPC network over public internet using IPSec tunnel
Supports both static and dynamic routes (need to configure cloud router which uses BGP)
Features:
Regional resource
Good for low volume data connections
Traffic encrypted by one VPN gateway and decrypted by the other VPN gateway
99.9% SLA
Supports: site-to-site VPN, static routes, dynamic routes (cloud router), IKEv1/IKEv2 ciphers
[Link]
Checkout Trifacta
You can see VMs that are created (master and workers) in the cluster, SSH into master
Regional internal load balancers use Andromeda (Google's SDN network virtualization stack)
Regional network load balancers use Maglev (Google's NLB, large distributed software that
runs on commodity hardware)
Regional managed instance groups are preferred over zonal managed instance groups
as your instance groups are not restricted to a zone OR you do not need to manage
multiple zonal instance groups.
Load balancers:
URL maps -- some URLs are mapped to a set of instances and some others to another set of
instances
Terraform:
Infrastructure automation tool
Repeatable deployment process
Focus on the application
Parallel deployment
Template-driven
create multiple VMs with count meta-argument Declarative vs Imperative approach to infrastructure:
Imperative: Give me 5 servers (may lead to repeated creation of 5 servers)
Declarative: I should have 5 servers (always compares desired vs current state)
Dependency graph
Implicit Vs Explicit dependency
K8S:
K8S 'watch loop' to bring the system to its desired state and maintain it there
If a pod contains more than 1 container, they are tightly coupled and share networking and storage space within the pod.
Each pod has a unique IP assigned.
Pods do not auto-heal. So, better to user controller objects (Deployment, StatefulSet, DaemonSet, Job)
e.g. you want 3 NGINX servers up and running all the time, instead of creating 3 pods, create a Deployment which
creates a ReplicaSet object which manages the desired state of 3 running NGINX pods.
You will work with Deployment objects directly much more often than ReplicaSet objects. But it's still helpful to
know about ReplicaSets, so that you can better understand how Deployments work. For example, one capability of
a Deployment is to allow a rolling upgrade of the Pods it manages. To perform the upgrade, the Deployment object
will create a second ReplicaSet object, and then increase the number of (upgraded) Pods in the second ReplicaSet
while it decreases the number in the first ReplicaSet.
Deployment: ensures that a defined set of pods is running at any given time
Deployment: ensures that a defined set of pods is running at any given time
Controller (e.g. Deployment Controller) is a k8s loop process that makes sure the observed
state of the cluster matches the desired state of the cluster (by creating/deploying necessary
pods, for example.)
apiVersion: v1 Whenever a pod is added to a node, emptyDir is created. It is stored in a local volume of a
kind: Service node. So, even if a container crashes, the emptyDir is safe. But if a pod is removed, data
metadata: stored in emptyDir is gone.
name: nginx
spec:
type: LoadBalancer
sessionAffinity: ClientIP
selector:
app: nginx
ports:
- protocol: TCP
port: 60000
targetPort: 80
A pod has an IP address and all containers within it share it. E.g. a legacy app is running in a container (port 8000) and us es an nginx reverse
proxy in another container in the same pod. nginx forwards inbound requests to [Link]:8000
PersistentVolumes are storage that is available to a Kubernetes cluster. PersistentVolumeClaims enable Pods to access PersistentVolumes.
Without PersistentVolumeClaims Pods are mostly ephemeral, so you should use PersistentVolumeClaims for any data that you expect to
survive Pod scaling, updating, or migrating.
GKE clusters are not exposed to internet. But they can be connected to authorized networks through external IP address OR other GCP services (e.g.
logging, monitoring)
Common business metrics: ROI, earnings before interest and tax(EBIT), employee turnover, customer churn
Common software metrics: Pageviews, User registration, Click-throughs, Checkouts
Error budget -- SLOs imply a certain acceptable level of unreliability. This is a budget that can be allocated.
an error budget is the amount of error that your service can accumulate over a certain period of time before your users start being unhappy. You can think of it as the pain
tolerance for your users, but applied to a certain dimension of your service: availability, latency, and so forth.
example:
Choose SLI specification from the above menu, for example.
Availability -- the profile page should load successfully.
Latency -- the profile page should load quickly.
Here’s an example. Imagine that you’re measuring the availability of your home page. The availability is measured by the amount of requests responded with an error,
divided by all the valid requests the home page receives, expressed as a percentage. If you decide that the objective of that availability is 99.9%, the error budget is 0.1%.
You can serve up to 0.1% of errors (preferably a bit less than 0.1%), and users will happily continue using the service.
gcloud compute firewall-rules update <<name of firewall rule>> --enable-logging (OR --no-enable-logging)
GCE, GKE, EXT systems need CLOUD DEBUGGER AGENT role in order to use debugger
Also, need access to source code location
Cloud Trace:
Each trace is a collection of Spans
A span wraps metrics about an application unit of work
a context, timing, and other metrics