Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam

This document provides a cheat sheet overview of topics covered in Google Cloud's Professional Machine Learning Engineer Beta exam, including: 1. Common ML abbreviations and an overview of the machine learning process from data preparation through model deployment. 2. Key concepts for each stage of ML including exploratory data analysis, feature engineering, model development, and model monitoring. 3. Descriptions of common supervised and unsupervised learning algorithms like naive Bayes, decision trees, and support vector machines.

Uploaded by

Arpit Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views2 pages

Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam

Uploaded by

Arpit Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Abbreviations: Lists common abbreviations used in machine learning, especially relevant for the Google Cloud Platform exam context.
I. Preparation for ML: Details the initial steps in preparing for machine learning including understanding data science steps, defining problems, and setting ML objectives.

Topic Cheatsheet for GCP’s Professional Machine Learning Engineer Beta Exam


Authors/contributors: David Chen, PhD
Credits & disclaimers can be found in README section of the source repository. Some references are included as %comments or hyperlinks the source file.

Abbreviations Exploratory Data Analysis (EDA) II. ML Model Development

Common abbreviations. ML, machine learning; DL, deep learn- • Evaluation of data quality (domain- and organization- Model Development At-a-Glance
ing; AI, artificial intelligence, CV, computer vision; GC(P), Google specific knowledge/information may be needed) Generic ML Workflow
Cloud (Platform); CI/CD: continuous integration / continuous de- • Data visualization (descriptive statistics)
livery; SDK, software development kit; API, application program- • Inferential statistics (e.g. t-test to compare means, KS-tests 1. Training
ming interface; K8s, Kubernetes; GKE, Google Kubernetes Engine. to compare distributions) as needed, scale as needed • Choose a model framework
MLE, maximum likelihood estimation; ROC, receiver-operation – Supervised
curve; AU(RO)C, area under the (receiver-operation) curve Feature Engineering – Unsupervised
• Consider Transfer Learning (if applicable)
• Necessary (e.g. time series) or beneficial in many ML tasks: • Monitoring / tracking metrics
I. Preparation for ML • Encoding structured data types • Strategies to handle overfitting (e.g. regularization,
• Feature Crosses: used to define a synthetic feature when data ensemble learning, drop-out) & underfitting (increase
Understanding the ”Data Science Steps for ML” cannot be linearly separated (e.g. feature cross products model complexity)
1. Data extraction x1 × x2 • Interpretability
2. Exploratory data analysis • Feature selection, e.g. 2. Validation
3. Data preparation for the ML Task – Univariate statistical methods (e.g. 𝜒 2 test, t- • Check overfitting & underfitting
4. Model training test/linear model) • Compare trained model against pre-defined baseline
5. Model evaluation – Recursive Feature Elimination (RFE) (e.g. simple model or benchmark)
6. Model validation • Unit tests
7. Model serving Special considerations
3. Scale-up & Serving**
• Microservices with REST API • Imbalanced class distributions • Unit tests
• Deployment on mobile devices – Needs to be known, at minimum • Cloud AI model explainability
• Batch predictions – Affects the metrics to employ (e.g. F1 score, AUC • Distributed training
8. Model monitoring would be superior to crude accuracy in imbalanced bi- • Scalable Model Analysis
nary classification)
Deﬁning an ML Problem – Can affect optimization choices: modify objective ML Models
function; oversampling the minority class(es) Gradient descent is used to optimize the objective functions of a
ML as Solution to Business Problems • Data Leakage machine-learning model:
• (Re)define your business problems – Certain features available in your training data might Gradient Descent n Resolution
• Consider whether the problem could be solved without ML not be available in the unknowns to predict!
– When training, be careful not to include raw or engi- Full-batch all (N) complete
• Define/anticipate utility of the ML output
neered features that are computed from the classifica- Mini-batch 1<n<N intermediate
• Identify data sources
tion/regression label Stochastic 1 noisy approximation
• Pre-define ”success” to solving the business challenge
– Metric(s) used to define success An epoch is the number of passes through the entire training dataset,
– Key results (product or deliverables)
Data Pipelines
and is a hyperparameter to be defined/tuned by the user.
– Incorrect or low-quality output (i.e. ”unsuccessful” • Should be designed & built in advance for at-scale applica-
models) tions Supervised Learning (with related concepts)
• Batching vs. Streaming • Naive Bayes (flavors: Gaussian, Bernoulli, Multinomial)
Components of an ML Solution – Batching: Use of data stored in data lakes, processed in • Decision trees (concept of entropy)
periodic intervals • Support Vector Machine (SVM)
• Define Predictive Outcome – Streaming (data streams): Use of data from live streams.
• Identify Problem Type: Supervised (Classification or Regres- – Linearly vs. non-linearly separable
Unique challenge due to 3𝑉𝑠: Volume, Velocity (real- – Kernels
sion), Unsupervised, Reinforcement time), Variety (esp. unstructred data) (useful tool:
• Identify Input Feature Format Cloud Dataflow) Unsupervised Learning
• Feasibility and implementation • Monitoring • Clustering
– ”Four Golden Signals” of your cloud-based service: la- – K-means
Data Preparation tency, traffic, error, saturation – Hierarchical Clustering
– Dashboards (Stackdriver Cloud Monitoring Dash- – DBSCAN
Data Ingestion
boards API) can be a powerful tool in displaying mul- • Dimensionality reduction
• Obtaining & importing data for use or storage tiple metrics – Principal Component Analysis (PCA)
• File input types • Privacy, compliance, legal issues: Know what the restrictions – t-SNE
• Database maintenance, migration are and plan ahead (e.g. privacy-preserving ML/AI, corrupt- • Gaussian Mixture Model (GMM), optimized by Expectation-
• Streaming data (from IoT devices, databases, or enduser) ing input, ...)(useful tool: Cloud IAM) Maximization (EM):
1. E step ∗ Undercomplete autoencoder – Multi Cloud: multiple clouds designated for different


2. M step ∗ De-noising autodencoders tasks (*but unlike parallel computing, synchroniza-
Repeat until convergence ∗ Sparse autencoders tion across different ventors is NOT essential)
– Application Procedures during Implementing a Training Pipeline
Overﬁtting ∗ Data representation (feature engineering) • Perform data validation (e.g. via Cloud Dataprep)
Bias-variance trade-off ∗ Dimensionality reduction / data compression • Decouple components with Cloud Build (fully server-less
CI/CD platform supporting any language)
• Characteristics of Loss vs. iteration curves, separately plot-
ted for
III. Production-level ML with Cloud – Add layer of technical abstraction
– Separate content producer & end users
– Training set MLOps: CI/CD in an ML System
– Validation and/or test set – Ensures software components are not tightly depen-
• Underfitting vs. overffiting patterns DevOps Data Engineering MLOps dent on one another
• Construct & test parametrized pipeline definition in SDK (e.g.
Ways to address overfitting Version ctrl. Code Code Code, data, model gcloud ml-engine)
Pipeline - Data, ETL Training, serving • Tune compute performance
1. Get more high-quality, well-labeled training data Validation Unit tests Unit tests Model valid.
2. Regularization • Store data & generated artifacts (e.g. binaries, tarballs) via
CI/CD Production Data pipeline (both) Cloud Storage
• L2 penalty
• L1 (LASSO) penalty Type Transac.? Complex Q? Cap.
Tools for virtualization:
• Elastic net
3. Ensemble learning
• Virtual Machines (VMs) Cloud Datastore NoSQL ✓ 7 Terabytes+
• Bagging
• Containers Bigtable NoSQL (limited) 7 Petabytes+
– Random Forest: Only the randomly chosen 1 ≤
– Clusters Cloud Storage Blobstore 7 7 Petabytes+
– Pods Cloud SQL SQL ✓ ✓ Terabytes
𝑚 < 𝑀 features used in split
• Kubernetes (K8s) Cloud Spanner SQL ✓ ✓ Petabytes
– Bagged Trees: all 𝑀 features available used in
split BigQuery SQL 7 ✓ Petabytes+
GCP tools:
• Boosting (e.g. Gradient Boosted Trees/XGBoost) • BigQuery Considerations for Implementing the Serving Pipeline
• Model binary options
Recommendation Systems – Google-managed data warehouse
• Google Cloud serving options
– Highly scalable, fast, optimized
User info Domain knowledge – Suitable for analysis & storage of structured data • Testing for target performance
– Multi-processing enabled • Setup of trigger & pipeline schedule
Content-based ✓ Deployment with CI/CD (final step in MLOps), along with
Collaborative Filtering ✓ • Cloud Dataprep
– Managed cloud service for quick data exploration & • A/B testing: Google Optimize
Knowledge-based ✓ • Canary testing, automated by GKE with Spinnaker
transformation
A hybrid recommendation systems uses more than one of the above, – Auto-scalable, eases data-preparation process
though not 100% possible at all times, it is generally the preferred ML Solution Monitoring
• Cloud Dataflow: provides serverless, parallel, distributed in-
solution. frastructure for both batch & stream data processing by mak- Considerations in monitoring ML solutions:
ing use of Apache Beam TM 1. Monitor performance/quality of ML model predictions on
Deep Learning an ongoing-basis (via Cloud Monitoring (Compute Engine)
• Cloud ML APIs
Subtypes of Neural Networks – Cloud Vision AI with a metric model), and then debug with Cloud Debugger
• Feed forward neural network – Cloud Natural Language 2. Use robust logging strategies (e.g. Cloud Logging, espe-
• Convolutional Neural Network (CNN) & computer vision – Cloud Speech to Text cially Stackdriver (aka Cloud Operations) with beautiful
• Recurrent Neural Network (RNN) – Cloud Video Intelligence dashboards)
– Sequence data (speech/text, time series) 3. Establish continuous evaluation metrics
– Vanishing gradient problem ML Pipeline Design Troubleshoot ML Solutions:
– Gated Recurrent Units (GRU) The ML code is only a small part of a production-level ML system • Permission issues (IAM)
– Long-short term memory (LSTM) • Identify components, parameters, triggers, compute needs • Training error
– Application to Natural Language Processing (NLP) • Orchestration Framework • Serving error
∗ Language models – Cloud Composer (based on Apache Airflow deploy- • ML system failures/biases (at production)
∗ Embeddings ment) Tune performance of ML solutions in production
∗ Architectures (e.g. transformers) – GCP App Engine • Simplify (optimize) of input pipeline
• Autoencoders (deep learning) – Cloud Storage – Reduce data redundancy in NLP model
– General architecture – Cloud Kubernetes Engine – Utilize Cloud Storage (e.g. object storage)
∗ Encoding layers – Cloud Logging & Monitoring – Simplification can take place in various places during
∗ Lower-dimensional representation (returned or • Strategies beyond single cloud: the pipeline
used as input for subsequent autoencoder in a – Hybrid Cloud: blend of public & private cloud for • Identify of appropriate retraining policy
stack) mixed computing, storage, & services, allowing for – Under what circumstance(s)? How often? (e.g. when
∗ Decoding layers agility (i.e. quick adaptation during business digital significant deviation or drift identified; periodically)
– Flavors to address trival solutions: transformation) – How? (e.g. by batch vs. online learning)

Google Cloud ML Engineer Exam Questions
100% (1)
Google Cloud ML Engineer Exam Questions
26 pages
Professional Machine Learning Engineer
No ratings yet
Professional Machine Learning Engineer
106 pages
Professional Machine Learning Engineer Exam Questions
No ratings yet
Professional Machine Learning Engineer Exam Questions
139 pages
Google Certleader Professional-Machine-Learning-Engineer Vce Download 2024-Jul-26 by Spencer 78q Vce
No ratings yet
Google Certleader Professional-Machine-Learning-Engineer Vce Download 2024-Jul-26 by Spencer 78q Vce
8 pages
Google Cloud ML Engineer Exam Tips
No ratings yet
Google Cloud ML Engineer Exam Tips
12 pages
PDE Exam Dump 3
No ratings yet
PDE Exam Dump 3
98 pages
GCP Pde
100% (4)
GCP Pde
200 pages
Professional Machine Learning Engineer Exam - Free Actual Q&as, Page 1 - ExamTopics
100% (2)
Professional Machine Learning Engineer Exam - Free Actual Q&as, Page 1 - ExamTopics
111 pages
Cloud Digital Leader
No ratings yet
Cloud Digital Leader
38 pages
Professional Cloud Architect Exam - Free Actual Q&as, Mar 32page
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&as, Mar 32page
335 pages
GCP Data Engineer
No ratings yet
GCP Data Engineer
100 pages
Professional Machine Learning Engineer Exam Questions
No ratings yet
Professional Machine Learning Engineer Exam Questions
2 pages
Understanding SecLookup in GCP
100% (5)
Understanding SecLookup in GCP
12 pages
Cloud Digital Leader
100% (1)
Cloud Digital Leader
39 pages
Cloud Digital Leader Exam Guide
No ratings yet
Cloud Digital Leader Exam Guide
30 pages
Cloud Computing Basics for Beginners
100% (1)
Cloud Computing Basics for Beginners
17 pages
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
No ratings yet
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
82 pages
Aws Certified Ai Practitioner Aif c01
No ratings yet
Aws Certified Ai Practitioner Aif c01
40 pages
GCP Dump Questions
50% (2)
GCP Dump Questions
62 pages
Google Cloud Architect Exam Q&A Guide
No ratings yet
Google Cloud Architect Exam Q&A Guide
89 pages
Generative AI Leader Questions
No ratings yet
Generative AI Leader Questions
3 pages
Quiz 2
100% (1)
Quiz 2
52 pages
GCP Associate Cloud Engineer Master Cheatsheet
100% (2)
GCP Associate Cloud Engineer Master Cheatsheet
45 pages
Google Cloud Certified - Professional Data Engineer Practice Exam 4 - Results
100% (3)
Google Cloud Certified - Professional Data Engineer Practice Exam 4 - Results
57 pages
GCP Fundamentals
100% (2)
GCP Fundamentals
178 pages
GCP Devops Examtopics
100% (1)
GCP Devops Examtopics
79 pages
NCA-GENL Exam Dumps
100% (2)
NCA-GENL Exam Dumps
13 pages
Google's Professional Data Engineer - ExamTopics
100% (4)
Google's Professional Data Engineer - ExamTopics
234 pages
GCP Cloud Digital Leader Exam Practice Samples Questions
100% (4)
GCP Cloud Digital Leader Exam Practice Samples Questions
62 pages
GCP Cloud Digital Leader Slides
100% (3)
GCP Cloud Digital Leader Slides
199 pages
Google Certified Professional Cloud Architect
100% (2)
Google Certified Professional Cloud Architect
446 pages
AIF-C01 Updated Questions - AWS Certified AI Practitioner
No ratings yet
AIF-C01 Updated Questions - AWS Certified AI Practitioner
34 pages
Cloud Engineer Exam Prep Guide
100% (3)
Cloud Engineer Exam Prep Guide
32 pages
Cloud Digital Leader Best Training Material and Practice Test Q&A From
100% (3)
Cloud Digital Leader Best Training Material and Practice Test Q&A From
60 pages
Google Professional Cloud Data Engineer Practice Exam PDF
No ratings yet
Google Professional Cloud Data Engineer Practice Exam PDF
34 pages
GCP ACE CheatSheets
100% (4)
GCP ACE CheatSheets
49 pages
Professional Data Engineer Exam - Free Actual Q&As, Page 1 - ExamTopics
100% (3)
Professional Data Engineer Exam - Free Actual Q&As, Page 1 - ExamTopics
124 pages
Notes For GCP Data Engineer Exam Preparation
No ratings yet
Notes For GCP Data Engineer Exam Preparation
46 pages
Google Cloud Professional Cloud Architect Exam Prep Sheet
100% (2)
Google Cloud Professional Cloud Architect Exam Prep Sheet
15 pages
Professional Cloud Database Engineer
No ratings yet
Professional Cloud Database Engineer
40 pages
Google - Associate Cloud Engineer.v2020 03 03.q38 PDF
No ratings yet
Google - Associate Cloud Engineer.v2020 03 03.q38 PDF
13 pages
MLOps: Continuous Delivery on AWS
No ratings yet
MLOps: Continuous Delivery on AWS
69 pages
Google Cloud Digital Leader 3
100% (1)
Google Cloud Digital Leader 3
8 pages
GCP Preparation Notes for Professionals
No ratings yet
GCP Preparation Notes for Professionals
7 pages
50 LLM Interview Questions
100% (2)
50 LLM Interview Questions
56 pages
Google Cloud GCP Google Cloud Platform Associate Cloud Engineer Practice Test 2021 by Stewart, Anthony
100% (1)
Google Cloud GCP Google Cloud Platform Associate Cloud Engineer Practice Test 2021 by Stewart, Anthony
148 pages
GCP Digital Leader Exam Prep Guide
100% (1)
GCP Digital Leader Exam Prep Guide
67 pages
Cloud Digital Leader Exam
0% (1)
Cloud Digital Leader Exam
80 pages
Machine Learning Interviews - Lessons From Both Sides - FSDL
100% (2)
Machine Learning Interviews - Lessons From Both Sides - FSDL
70 pages
Prepaway - GCP Professional Cloud Architect
100% (2)
Prepaway - GCP Professional Cloud Architect
91 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
Professional Cloud Devops Engineer
No ratings yet
Professional Cloud Devops Engineer
11 pages
118 GCP Digital Leader Cheat Sheet
50% (4)
118 GCP Digital Leader Cheat Sheet
5 pages
Google Cloud Digital Leader - Exam Preparation
100% (1)
Google Cloud Digital Leader - Exam Preparation
52 pages
Examcollection - GCP Professional Cloud Architect PDF
100% (2)
Examcollection - GCP Professional Cloud Architect PDF
88 pages
MLOps: Building Effective ML Systems
No ratings yet
MLOps: Building Effective ML Systems
68 pages
Machine Learning Guide: Basics to Deployment
No ratings yet
Machine Learning Guide: Basics to Deployment
2 pages
Unit-1 Introduction To Machine Learning (5hrs)
No ratings yet
Unit-1 Introduction To Machine Learning (5hrs)
8 pages
Designing Machine Learning Systems by Chip Huygen by Rick
100% (1)
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
Artificial Intelligence Infographic - 101
No ratings yet
Artificial Intelligence Infographic - 101
1 page
Web Traffic Time Series Forecasting: Kaggle Competition Review Bill Tubbs July 26, 2018
No ratings yet
Web Traffic Time Series Forecasting: Kaggle Competition Review Bill Tubbs July 26, 2018
39 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
Management Information System: A Personal Reflection or Essay Cleanup
No ratings yet
Management Information System: A Personal Reflection or Essay Cleanup
8 pages
Supplier Selection Using AHP and Neural Networks
No ratings yet
Supplier Selection Using AHP and Neural Networks
9 pages
Safety Instrumented Function Verification - The Three-Barriers
0% (1)
Safety Instrumented Function Verification - The Three-Barriers
23 pages
Courier Service System Project Report
No ratings yet
Courier Service System Project Report
32 pages
Customer-Centric Quality Management Strategies
100% (1)
Customer-Centric Quality Management Strategies
42 pages
Sheffield Hallam University
No ratings yet
Sheffield Hallam University
3 pages
Cascade Control
No ratings yet
Cascade Control
17 pages
Functions of Industrial Engineering 1615282531
No ratings yet
Functions of Industrial Engineering 1615282531
64 pages
Overview of Artificial Intelligence Concepts
No ratings yet
Overview of Artificial Intelligence Concepts
22 pages
Jr. Business Analyst Guide
No ratings yet
Jr. Business Analyst Guide
3 pages
Adaline Neural Network Overview
No ratings yet
Adaline Neural Network Overview
17 pages
Introduction to Control Systems Basics
No ratings yet
Introduction to Control Systems Basics
15 pages
Engineering Thermodynamics Odd Sem Mid Term 2018-19 (DITUStudentApp)
No ratings yet
Engineering Thermodynamics Odd Sem Mid Term 2018-19 (DITUStudentApp)
2 pages
SNOMED CT Usage Survey Results 2021
No ratings yet
SNOMED CT Usage Survey Results 2021
71 pages
Project On Srs
No ratings yet
Project On Srs
37 pages
Control Engineering Exam Paper 2016
No ratings yet
Control Engineering Exam Paper 2016
2 pages
ACID Properties and Concurrency Control in DBMS
No ratings yet
ACID Properties and Concurrency Control in DBMS
2 pages
ISO Certification for ABB Automation
No ratings yet
ISO Certification for ABB Automation
2 pages
Control Systems I: PD, PI, PID Design
No ratings yet
Control Systems I: PD, PI, PID Design
9 pages
UML Activity Diagram
No ratings yet
UML Activity Diagram
13 pages
Bhargavi: Summary
No ratings yet
Bhargavi: Summary
5 pages
Key Concepts in Software Architecture
No ratings yet
Key Concepts in Software Architecture
26 pages
IT8076 Software Testing Unit1&2 - MCQ
No ratings yet
IT8076 Software Testing Unit1&2 - MCQ
24 pages
The Maintenance Management Framework
100% (1)
The Maintenance Management Framework
12 pages
UNIT 34 SYSTEM ANALYSIS AND DESIGN ASSIGNMENT (A.M.Ranidu Chandima)
No ratings yet
UNIT 34 SYSTEM ANALYSIS AND DESIGN ASSIGNMENT (A.M.Ranidu Chandima)
54 pages
Phishing Website Detection Using ML 2-1
No ratings yet
Phishing Website Detection Using ML 2-1
20 pages
13 Cascade Control
No ratings yet
13 Cascade Control
39 pages

Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam

Uploaded by

Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam

Uploaded by

Topic Cheatsheet for GCP’s Professional Machine Learning Engineer Beta Exam

Abbreviations Exploratory Data Analysis (EDA) II. ML Model Development

You might also like