Script_Google Cloud Infrastructure
Script_Google Cloud Infrastructure
I’m Aman, and I’m currently a Google Cloud Platform trainer at Koenig
Solutions. I specialize in helping individuals leverage the power of cloud computing for scalable,
secure, and innovative solutions.
Before stepping into cloud training, I spent a year working in the Generative AI space. During
that time, I had the opportunity to work on advanced model training projects, including working
with models like Gemini—where I focused on optimizing them for software code understanding
and generation. This gave me a deep insight into how AI can transform the way we build and
interact with software. So, that’s it for my introduction.
[0:00–5:00] Introduction
Today, we’ll explore essential cloud infrastructure components and how they power today’s AI
and machine learning workloads. Our agenda covers:
3. Storage Services
Finally, we have TPU’s which are built in hardwares in google cloud which we will discuss later
on.
Now lets talks about all of them in more details, starting with compute Engine:
Google Cloud offers multiple compute services tailored to different application needs:
•Cloud Run is a platform running stateless containers, automatically scaling to zero, when not
needed. Now, ‘Stateless’ actually means that each container instance doesn’t retain data or
state between requests, which simplifies scaling and ensures consistency. Cloud Run also
abstracts away all infrastructure management—there’s no need to provision or manage servers.
It launches instances in response to HTTP requests and scales them down when idle, which
makes it highly cost-effective for unpredictable workloads. This makes it perfect for deploying
APIs, microservices, or webhook endpoints with variable or spiky traffic.
•App Engine (PaaS): Fully managed platform abstracting runtime management, auto-scaling,
integrated services—best suited for rapid application development.
•Cloud Functions (FaaS): Event-driven, fully serverless functions triggered by HTTP, Pub/Sub,
or Cloud Storage events—great for lightweight backend processes. For example, you can use a
Cloud Function to automatically resize images uploaded to Cloud Storage, respond to API
requests from a web app, or process Pub/Sub messages that trigger downstream workflows
such as sending notifications, transforming data, or logging events. This is ideal for event-
handling logic that doesn’t require maintaining a server or complex infrastructure.
[Enter]
Before TPUs, Google used GPUs and CPUs for ML workloads, but as models like Google
Translate and later deep learning models scaled up, those processors couldn’t keep up with the
performance and energy efficiency demands. Google needed a custom solution that could
accelerate tensor-heavy operations—specifically matrix multiplications used in training and
inference—while being cost and power-efficient.
[Enter]
Google’s TPUs (Tensor Processing Units) are custom ASIC chips (Application-Specific
Integrated Circuits designed specifically for particular computing tasks) optimized specifically for
tensor operations central to ML workloads. TPUs provide:
•Up to 30 times faster processing compared to GPUs and around 80% greater energy
efficiency.
•Integrated seamlessly with ML frameworks like TensorFlow and PyTorch through XLA
compilers. XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear algebra
that optimizes TensorFlow and PyTorch models to run efficiently on Google’s TPUs. Instead of
interpreting each operation at runtime, XLA compiles entire subgraphs of the model into
optimized TPU-executable code, reducing execution overhead and boosting performance.
Use TPUs primarily for large-scale ML model training and intensive inference tasks like natural
language processing (BERT models) and vision tasks.
Google Cloud separates storage from compute, i.e decoupled Storage, enabling independent
scalability and cost optimization:
•Cloud Storage (Unstructured Data): A highly durable and scalable object storage service
designed for images, videos, backups, and raw data.
•Standard: For frequently accessed data (hot storage), such as active datasets or media
for streaming.
•Nearline: Best for data accessed around once per month, such as monthly reports.
•Firestore: A NoSQL document database designed for web and mobile apps. Offers
real-time updates and offline sync, with flexible document schema and scalability from regional
to global.
•Bigtable: A wide-column NoSQL database for very large, low-latency workloads. Often
used in time-series data, IoT telemetry, and real-time analytics. It offers high write throughput
and integrates with tools like Dataflow and BigQuery.
These services are designed to support diverse data models—SQL for structured relationships
and NoSQL for schema flexibility and scale—based on access patterns, scalability needs, and
consistency requirements.
Your database choice depends on workload type (transactional vs analytical), schema flexibility,
and scale:
1. Atomicity: This property ensures that all operations within a transaction are treated as a
single, indivisible unit. Either all operations within a transaction are completed successfully, or
none of them are. If any operation fails, the entire transaction is rolled back to its original state,
ensuring data integrity.
2. Consistency: This property guarantees that the database always transitions from one valid
state to another during a transaction. It enforces the rules and constraints of the database,
ensuring that all operations within a transaction adhere to these rules. For example, a bank
transfer transaction must ensure that the amount debited from one account is also credited to
another, maintaining the integrity of the overall balance.
3. Isolation: This property ensures that concurrent transactions do not interfere with each other.
Each transaction is isolated from other transactions, preventing one transaction from seeing the
intermediate states of another transaction. This isolation helps maintain data consistency and
prevents corruption caused by concurrent access.
4. Durability: This property ensures that once a transaction is committed, the changes are
permanent and will survive even in the event of system failures, such as power outages or
hardware malfunctions. The committed changes are stored reliably on the database, ensuring
data persistence and integrity.
Considerations include SLAs (Service Level Agreements, formal commitments from service
providers specifying uptime, performance metrics, and penalties), consistency guarantees,
compliance needs, and cost.