© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spark access control on Amazon EMR with
AWS Lake Formation
Anoop Johnson
A N T 2 2 3 - R
Principal Software Engineer
Amazon Web Services
What you will learn
Background
Authentication
Query execution
Demo
Lake Formation: Secure once, access in many ways
Amazon
Athena
Amazon
Redshift
Amazon
EMR
AWS
Glue
Amazon
S3
Data
catalog
Permissions
Lake Formation
Admin
Control data access with grant and revoke
Permissions on tables and columns, not Amazon
Simple Storage Service (Amazon S3)
View and audit data access
Why integrate Amazon EMR with Lake Formation?
• Fine-grained, column-level access to databases and tables
• Allows shared multi-tenant clusters to securely access data
• Uses the AWS Glue Data Catalog as the metadata store
• Federated single sign-on from your enterprise identity system
• Active Directory (AD FS), Auth0, Okta, and many others
• Uses Security Assertion Markup Language (SAML) 2.0
Query execution overview
Amazon
S3
Lake
Formation
Amazon
EMR
Amazon EMR authentication
Amazon EMR cluster
User
Used for
impersonation
Lake
Formation
Query execution under the hood
Amazon EMR worker node
1
32
4
Amazon
S3
5
6
Lake
Formation
Amazon EMR: Supported applications
• AWS Glue Data Catalog
• Identity providers with support for SAML
• Applications
• Spark SQL
• Amazon EMR Notebooks and Zeppelin with Livy
Integrating Amazon EMR with Lake Formation
• Establish trust relationship between your corporate IdP and AWS
• Configure IAM roles for Lake Formation
• Configure Amazon EMR security features
• Launch an Amazon EMR Lake Formation-enabled cluster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

PPTX
Breakdown of Microsoft Purview Solutions
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
PPTX
Azure purview
PDF
Data Lake: A simple introduction
PPTX
Adopting A Zero-Trust Model. Google Did It, Can You?
PPTX
Five steps to launch your data governance office
PDF
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
PDF
McAfee Total Protection for Data Loss Prevention (DLP)
Breakdown of Microsoft Purview Solutions
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
Azure purview
Data Lake: A simple introduction
Adopting A Zero-Trust Model. Google Did It, Can You?
Five steps to launch your data governance office
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
McAfee Total Protection for Data Loss Prevention (DLP)

What's hot (20)

PPTX
GTB DLP - Content Aware Security Suite
PPTX
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
PDF
Best Practices for Implementing Data Loss Prevention (DLP)
PDF
Microservices Patterns with GoldenGate
PDF
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
PPTX
Introduction to AWS Lake Formation.pptx
PPTX
Microsoft Purview
PPTX
Splunk Overview
PPTX
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
Big Data Security in Apache Projects by Gidon Gershinsky
PDF
Five Things to Consider About Data Mesh and Data Governance
PPTX
Most Common Data Governance Challenges in the Digital Economy
PDF
How to govern and secure a Data Mesh?
PDF
Active Governance Across the Delta Lake with Alation
PDF
ITI015En-The evolution of databases (I)
PPTX
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
PPTX
Presentation on 'Understanding and Utilising Threat Intelligence in Cybersecu...
PDF
DataOps , cbuswaw April '23
PPTX
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
GTB DLP - Content Aware Security Suite
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
Best Practices for Implementing Data Loss Prevention (DLP)
Microservices Patterns with GoldenGate
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Introduction to AWS Lake Formation.pptx
Microsoft Purview
Splunk Overview
ExpertsLive NL 2022 - Microsoft Purview - What's in it for my organization?
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Big Data Security in Apache Projects by Gidon Gershinsky
Five Things to Consider About Data Mesh and Data Governance
Most Common Data Governance Challenges in the Digital Economy
How to govern and secure a Data Mesh?
Active Governance Across the Delta Lake with Alation
ITI015En-The evolution of databases (I)
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Presentation on 'Understanding and Utilising Threat Intelligence in Cybersecu...
DataOps , cbuswaw April '23
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Ad

Recently uploaded (20)

PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PDF
Principles of operation, construction, theory, advantages and disadvantages, ...
PPTX
Principal presentation for NAAC (1).pptx
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PDF
Unit1 - AIML Chapter 1 concept and ethics
PPTX
MAD Unit - 3 User Interface and Data Management (Diploma IT)
PPTX
BBOC407 BIOLOGY FOR ENGINEERS (CS) - MODULE 1 PART 1.pptx
PDF
Present and Future of Systems Engineering: Air Combat Systems
PPTX
AI-Reporting for Emerging Technologies(BS Computer Engineering)
PDF
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
PPTX
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
DOCX
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
PDF
Project_Mgmt_Institute_-Marc Marc Marc .pdf
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PDF
Designing Fault-Tolerant Architectures for Resilient Oracle Cloud ERP and HCM...
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PPTX
Micro1New.ppt.pptx the main themes if micro
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
Principles of operation, construction, theory, advantages and disadvantages, ...
Principal presentation for NAAC (1).pptx
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
Unit1 - AIML Chapter 1 concept and ethics
MAD Unit - 3 User Interface and Data Management (Diploma IT)
BBOC407 BIOLOGY FOR ENGINEERS (CS) - MODULE 1 PART 1.pptx
Present and Future of Systems Engineering: Air Combat Systems
AI-Reporting for Emerging Technologies(BS Computer Engineering)
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
Project_Mgmt_Institute_-Marc Marc Marc .pdf
Environmental studies, Moudle 3-Environmental Pollution.pptx
Beginners-Guide-to-Artificial-Intelligence.pdf
Micro1New.ppt.pptx the mai themes of micfrobiology
Designing Fault-Tolerant Architectures for Resilient Oracle Cloud ERP and HCM...
Computer System Architecture 3rd Edition-M Morris Mano.pdf
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
Micro1New.ppt.pptx the main themes if micro
Ad

Spark access control on Amazon EMR with AWS Lake Formation

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spark access control on Amazon EMR with AWS Lake Formation Anoop Johnson A N T 2 2 3 - R Principal Software Engineer Amazon Web Services
  • 2. What you will learn Background Authentication Query execution Demo
  • 3. Lake Formation: Secure once, access in many ways Amazon Athena Amazon Redshift Amazon EMR AWS Glue Amazon S3 Data catalog Permissions Lake Formation Admin
  • 4. Control data access with grant and revoke
  • 5. Permissions on tables and columns, not Amazon Simple Storage Service (Amazon S3)
  • 6. View and audit data access
  • 7. Why integrate Amazon EMR with Lake Formation? • Fine-grained, column-level access to databases and tables • Allows shared multi-tenant clusters to securely access data • Uses the AWS Glue Data Catalog as the metadata store • Federated single sign-on from your enterprise identity system • Active Directory (AD FS), Auth0, Okta, and many others • Uses Security Assertion Markup Language (SAML) 2.0
  • 9. Amazon EMR authentication Amazon EMR cluster User Used for impersonation Lake Formation
  • 10. Query execution under the hood Amazon EMR worker node 1 32 4 Amazon S3 5 6 Lake Formation
  • 11. Amazon EMR: Supported applications • AWS Glue Data Catalog • Identity providers with support for SAML • Applications • Spark SQL • Amazon EMR Notebooks and Zeppelin with Livy
  • 12. Integrating Amazon EMR with Lake Formation • Establish trust relationship between your corporate IdP and AWS • Configure IAM roles for Lake Formation • Configure Amazon EMR security features • Launch an Amazon EMR Lake Formation-enabled cluster
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.