Next Generation of Data Integration
with Azure Data Factory
Tom Kerkhove
Azure Consultant at Codit, MSFT Azure MVP
Expo Sponsors
Event Sponsors
Expo Light Sponsors
Hi!
Tom Kerkhove
• Azure Consultant at Codit
• Microsoft Azure MVP & Advisor
• Belgian Azure User Group (AZUG)
blog.tomkerkhove.be
@TomKerkhove
tomkerkhove
Azure Serverless
Azure Logic AppsAzure Functions Azure Event Grid
Azure Serverless
Azure Logic AppsAzure Functions Azure Event GridAzure Data Factory
Disclaimer
Azure Data Factory 2.0 Preview
https://bit.ly/adf-v1-vs-v2
➔ Managed data orchestration service
➔ Allows you to run pipelines
➔ Execute SSIS packages
➔ Support for hybrid scenarios
➔ Data movement-as-a-service with 70+ connectors
➔ Visual tooling & programmability
➔ .NET, Python, REST, ARM
What is Azure Data Factory?
What is Azure Data Factory?
Trigger(s) Activity ActivityActivity
Activity
Activity
Pipeline
➔ A pipeline represents a business process with multiple “steps”
which are represented by activities and is started by a trigger
➔ Activities represent a steps in a business process that perform
a specific action.
➔ This is based on the outcome of the previous step and can be on success,
failure, skipped or completion
What is Azure Data Factory?
➔ Different types of triggers
➔ On-Demand (Via REST API, .NET, etc.)
• Azure API Management can make this easier
➔ Scheduled / Wall-clock
➔ Tumbling Windows (aka “data slicing”)
➔ Event-based (New file is added to blob storage)
➔ Support for passing parameters
Triggers
What is Azure Data Factory?
Trigger(s) Activity ActivityActivity
Activity
Activity
Pipeline
➔ Data Movement
➔ Azure, Databases, NoSQL, File, SaaS, Web, etc
➔ Data Transformation
➔ Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc.
➔ Control Flow
➔ Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc
➔ Custom
➔ Run commands on an Azure Batch cluster
➔ Run R scripts on a HDInsight cluster
Activities
➔ An activity can produce or consume a data set. It is a
representation of a data structure in a data store that can be
used as a source or sink.
➔ Linked Services define how an activity can connect to an
external system. This external system can be a data store or
compute resource.
What is Azure Data Factory?
What is Azure Data Factory?
Activity
Data
Set
Linked
Service
Represents data
stored in
Produces
Consumes
➔ Compute infrastructure used by Data Factory
➔ Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem)
➔ Core capabilities
➔ Data movement
➔ Pipeline activity execution
➔ SSIS package execution
➔ Pipelines issues commands & control, integration runtime executes
➔ Data movement is from IR to IR
➔ All executions are happening in sources & sinks
Integration Runtime (IR)
Integration Runtime (IR)
➔ Stores SSISDB in Azure SQL DB or Managed Instance
➔ Azure-SSIS integration runtime as compute-layer
➔ Compute part for running SSIS
➔ Managed cluster of Azure VMs
➔ Compute-layer
➔ Can be linked to VNET for hybrid scenarios
➔ Lift & shift packages to the cloud
Running SSIS packages in Azure
Running SSIS packages in Azure
➔ Native support for Managed Service Identity (MSI)
➔ Native integration with Azure Key Vault
➔ Encrypted-in-transit via HTTPS
➔ Supports encryption-at-rest with data stores
Security
Show it to me!
➔ Every user should be capable of requesting their data
Using Azure Serverless to become GDPR compliant
User Profile
information
StackExchange
Data Set
Kerkhove.tom
@gmail.com
Using Azure Serverless to become GDPR compliant
➔ Visual monitoring in the portal
➔ Monitoring per pipeline run
➔ Detailed information per activity
➔ Azure Monitor integration
➔ Diagnostic Logs
➔ Metrics
➔ Alerts
Monitoring
➔ Serverless orchestration
➔ Pay for what you use
➔ Data-centric vs Application-centric workflows
➔ Work together seamlessly
How is this different from Logic Apps?
➔ Azure Data Factory is a great way to orchestrate data
processes and build data-integration pipelines
➔ Very powerful for data-centric workloads
➔ Unsung hero in the serverless space
➔ A perfect match with Azure Logic Apps
➔ Allows you to get to market very quickly with the built-in
connectors
Conclusion
28

Intelligent Cloud Conference 2018 - Next Generation of Data Integration with Azure Data Factory

  • 2.
    Next Generation ofData Integration with Azure Data Factory Tom Kerkhove Azure Consultant at Codit, MSFT Azure MVP
  • 3.
  • 4.
    Hi! Tom Kerkhove • AzureConsultant at Codit • Microsoft Azure MVP & Advisor • Belgian Azure User Group (AZUG) blog.tomkerkhove.be @TomKerkhove tomkerkhove
  • 5.
    Azure Serverless Azure LogicAppsAzure Functions Azure Event Grid
  • 6.
    Azure Serverless Azure LogicAppsAzure Functions Azure Event GridAzure Data Factory
  • 7.
    Disclaimer Azure Data Factory2.0 Preview https://bit.ly/adf-v1-vs-v2
  • 8.
    ➔ Managed dataorchestration service ➔ Allows you to run pipelines ➔ Execute SSIS packages ➔ Support for hybrid scenarios ➔ Data movement-as-a-service with 70+ connectors ➔ Visual tooling & programmability ➔ .NET, Python, REST, ARM What is Azure Data Factory?
  • 9.
    What is AzureData Factory? Trigger(s) Activity ActivityActivity Activity Activity Pipeline
  • 10.
    ➔ A pipelinerepresents a business process with multiple “steps” which are represented by activities and is started by a trigger ➔ Activities represent a steps in a business process that perform a specific action. ➔ This is based on the outcome of the previous step and can be on success, failure, skipped or completion What is Azure Data Factory?
  • 11.
    ➔ Different typesof triggers ➔ On-Demand (Via REST API, .NET, etc.) • Azure API Management can make this easier ➔ Scheduled / Wall-clock ➔ Tumbling Windows (aka “data slicing”) ➔ Event-based (New file is added to blob storage) ➔ Support for passing parameters Triggers
  • 12.
    What is AzureData Factory? Trigger(s) Activity ActivityActivity Activity Activity Pipeline
  • 13.
    ➔ Data Movement ➔Azure, Databases, NoSQL, File, SaaS, Web, etc ➔ Data Transformation ➔ Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc. ➔ Control Flow ➔ Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc ➔ Custom ➔ Run commands on an Azure Batch cluster ➔ Run R scripts on a HDInsight cluster Activities
  • 14.
    ➔ An activitycan produce or consume a data set. It is a representation of a data structure in a data store that can be used as a source or sink. ➔ Linked Services define how an activity can connect to an external system. This external system can be a data store or compute resource. What is Azure Data Factory?
  • 15.
    What is AzureData Factory? Activity Data Set Linked Service Represents data stored in Produces Consumes
  • 16.
    ➔ Compute infrastructureused by Data Factory ➔ Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem) ➔ Core capabilities ➔ Data movement ➔ Pipeline activity execution ➔ SSIS package execution ➔ Pipelines issues commands & control, integration runtime executes ➔ Data movement is from IR to IR ➔ All executions are happening in sources & sinks Integration Runtime (IR)
  • 18.
  • 19.
    ➔ Stores SSISDBin Azure SQL DB or Managed Instance ➔ Azure-SSIS integration runtime as compute-layer ➔ Compute part for running SSIS ➔ Managed cluster of Azure VMs ➔ Compute-layer ➔ Can be linked to VNET for hybrid scenarios ➔ Lift & shift packages to the cloud Running SSIS packages in Azure
  • 20.
  • 21.
    ➔ Native supportfor Managed Service Identity (MSI) ➔ Native integration with Azure Key Vault ➔ Encrypted-in-transit via HTTPS ➔ Supports encryption-at-rest with data stores Security
  • 22.
  • 23.
    ➔ Every usershould be capable of requesting their data Using Azure Serverless to become GDPR compliant User Profile information StackExchange Data Set Kerkhove.tom @gmail.com
  • 24.
    Using Azure Serverlessto become GDPR compliant
  • 25.
    ➔ Visual monitoringin the portal ➔ Monitoring per pipeline run ➔ Detailed information per activity ➔ Azure Monitor integration ➔ Diagnostic Logs ➔ Metrics ➔ Alerts Monitoring
  • 26.
    ➔ Serverless orchestration ➔Pay for what you use ➔ Data-centric vs Application-centric workflows ➔ Work together seamlessly How is this different from Logic Apps?
  • 27.
    ➔ Azure DataFactory is a great way to orchestrate data processes and build data-integration pipelines ➔ Very powerful for data-centric workloads ➔ Unsung hero in the serverless space ➔ A perfect match with Azure Logic Apps ➔ Allows you to get to market very quickly with the built-in connectors Conclusion
  • 28.

Editor's Notes

  • #8 Evangelistisch overkomen – adviseren Gevoel: heeft visie .
  • #23 Evangelistisch overkomen – adviseren Gevoel: heeft visie .