0% found this document useful (0 votes)

30 views9 pages

Implementing Parameterization in ADF

The document discusses implementing parameterization in Azure Data Factory (ADF) using pipeline, dataset, and activity parameters to read dynamic files from Azure Blob Storage. It also explains Slowly Changing Dimension Type 2 (SCD2) for maintaining historical changes in data and describes the process of Incremental Load for efficiently transferring only new or changed data. Key features, implementation steps, and benefits of each concept are outlined to facilitate understanding and application in data warehousing and ETL processes.

Uploaded by

Pramod Narkhede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views9 pages

Implementing Parameterization in ADF

Uploaded by

Pramod Narkhede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Implementing Parameterization in ADF

Parameterization in ADF is achieved using pipeline parameters, dataset

parameters, and activity parameters. Here's an example of parameterizing file
names and paths to read data from Azure Blob Storage.

Example Scenario: Read Dynamic Files from Azure Blob Storage

1. Requirement: A pipeline should read files from an Azure Blob Storage

container. The file name and folder path will be passed as parameters.

Steps to Implement

1. Create a Pipeline with Parameters:

o In ADF, create a new pipeline.
o Define parameters for FolderPath and FileName.

json
Copy code
{
"parameters": {
"FolderPath": {
"type": "string",
"defaultValue": ""
},
"FileName": {
"type": "string",
"defaultValue": ""
}
}
}

2. Create a Dataset with Parameters:

o Create a dataset pointing to Azure Blob Storage.
o Add parameters for FolderPath and FileName in the dataset.
o Modify the dataset connection properties to use these parameters.
Example JSON for dataset parameters:

json
Copy code
{
"parameters": {
"FolderPath": {
"type": "string"
},
"FileName": {
"type": "string"
}
},
"typeProperties": {
"fileName": "@dataset().FileName",
"folderPath": "@dataset().FolderPath"
}
}

3. Connect the Dataset to the Pipeline:

o In the pipeline, add an activity (e.g., Copy Data).
o Link the dataset to the activity.
o Map the pipeline parameters to the dataset parameters:
 For FolderPath: Pass
@pipeline().parameters.FolderPath.
 For FileName: Pass
@pipeline().parameters.FileName.
4. Trigger the Pipeline:
o Use the "Add Trigger" option to test the pipeline.
o Pass different values for FolderPath and FileName when
triggering the pipeline.

Example Configuration

 Pipeline Parameters:

json
Copy code
{
"FolderPath": "input-data",
"FileName": "sales-data.json"
}

 Dataset Connection:
o FolderPath: input-data
o FileName: sales-data.json

Final Outcome

By passing parameters dynamically, the same pipeline can read different files, such
as:

 input-data/sales-data.json
 input-data/inventory-data.csv

&&&&&&&&&&&&&&&

What is SCD2 (Slowly Changing Dimension Type 2)?

SCD2 (Slowly Changing Dimension Type 2) is a method used in data warehousing

to track and maintain historical changes to dimensional data over time. It creates
multiple records for a given entity in the dimension table, with each record
representing a version of the data valid for a specific time period.

Key Features of SCD2:

1. Tracks Historical Data: Maintains a history of changes to data.

2. Versioning: Each change results in a new version of the dimension record.
3. Validity Range: Typically uses date fields (e.g., StartDate and
EndDate) or a flag (e.g., IsCurrent) to indicate the validity of each
version.

How SCD2 Works:

1. Initial Load:
o Load the dimension table with the first version of data.
2. Change Detection:
o Compare the incoming data (source) with the existing data (dimension
table) to detect changes.
3. Insert New Records:
o For records that have changes, close the existing record by updating
the EndDate or setting IsCurrent to false.
o Insert a new record with the updated data, new version number, and
StartDate.
4. Unchanged Records:
o Retain existing records if no changes are detected.

Example Scenario: Employee Dimension

Initial State of Dimension Table:

EmployeeID Name Department StartDate EndDate IsCurrent

101 John Doe HR 2023-01-01 9999-12-31 true
102 Jane Doe IT 2023-01-01 9999-12-31 true

Incoming Source Data (New Load):

EmployeeID Name Department

101 John Doe Finance
102 Jane Doe IT

Steps to Implement SCD2:

1. Compare Records:
o Compare EmployeeID and Department between the source and
the dimension table.
o Detect that the department for EmployeeID 101 has changed.
2. Update Existing Record:
o For EmployeeID 101, update the EndDate to the current date
(e.g., 2024-01-01) and set IsCurrent to false.
3. Insert New Record:
o Insert a new record for EmployeeID 101 with the updated
department and StartDate as the current date.

Updated Dimension Table:

EmployeeID Name Department StartDate EndDate IsCurrent

101 John Doe HR 2023-01-01 2024-01-01 false
101 John Doe Finance 2024-01-01 9999-12-31 true
102 Jane Doe IT 2023-01-01 9999-12-31 true

Key Benefits of SCD2:

 Provides a complete history of changes.

 Supports complex historical analysis, such as tracking how values evolve
over time.

SCD2 in Practice (Implementation in SQL):

sql
Copy code
-- Update the current record to set the EndDate and
IsCurrent flag
UPDATE DimensionTable
SET EndDate = GETDATE(),
IsCurrent = 0
WHERE EmployeeID = @EmployeeID
AND IsCurrent = 1;

-- Insert the new record

INSERT INTO DimensionTable (EmployeeID, Name,
Department, StartDate, EndDate, IsCurrent)
VALUES (@EmployeeID, @Name, @Department, GETDATE(),
'9999-12-31', 1);
What is Incremental Load?

Incremental Load refers to the process of loading only the new or changed data
from a source system into a target system, rather than reloading the entire dataset.
It is a crucial process in ETL (Extract, Transform, Load) operations, as it optimizes
performance, reduces data transfer, and minimizes load times.

Types of Incremental Loads

1. Insert Only: Loads only new records.

2. Insert and Update: Loads new records and updates existing records.
3. Insert, Update, and Delete: Handles new records, updates existing records,
and deletes records that no longer exist in the source.

How Incremental Load Works

1. Detect Changes:
o Use a mechanism to identify new or modified records in the source
data. Common methods include:
 Timestamps: Compare a LastModified column with the
last processed time.
 Watermarking: Use a high watermark (e.g., max date or ID) to
track the last processed record.
 CDC (Change Data Capture): Use database features to
capture data changes.
2. Extract Changes:
o Extract only the identified new or modified records from the source.
3. Load Changes:
o Insert new records into the target system.
o Update or delete existing records as needed.

Example: Incremental Load Using Timestamps

Scenario:
You have a source database table called Sales and need to incrementally load it
into a data warehouse table called SalesDW.

Source Table (Sales):

SalesID CustomerID Amount LastModified

1 101 100 2023-12-01 10:00:00
2 102 200 2023-12-02 11:00:00
3 103 300 2023-12-03 12:00:00

Target Table (SalesDW):

SalesID CustomerID Amount LastModified

1 101 100 2023-12-01 10:00:00

Steps to Implement Incremental Load:

1. Set a High Watermark:

o Determine the last LastModified timestamp processed (e.g.,
2023-12-01 10:00:00).
2. Extract Changes:
o Query the source table to fetch records with a LastModified
greater than the watermark.

sql
Copy code
SELECT *
FROM Sales
WHERE LastModified > '2023-12-01 10:00:00';

3. Result:

SalesID CustomerID Amount LastModified

2 102 200 2023-12-02 11:00:00
3 103 300 2023-12-03 12:00:00

4. Load Changes into Target:

o Insert New Records: Insert the extracted records into SalesDW.
o Update Existing Records: If a record already exists in the target,
update it based on SalesID.

sql
Copy code
-- Insert new records
INSERT INTO SalesDW (SalesID, CustomerID,
Amount, LastModified)
SELECT SalesID, CustomerID, Amount,
LastModified
FROM Sales
WHERE LastModified > '2023-12-01 10:00:00';

-- Update existing records (if needed)

UPDATE SalesDW
SET CustomerID = S.CustomerID,
Amount = S.Amount,
LastModified = S.LastModified
FROM SalesDW D
INNER JOIN Sales S ON D.SalesID = S.SalesID
WHERE S.LastModified > '2023-12-01 10:00:00';

5. Update the High Watermark:

o Set the new high watermark to the maximum LastModified
timestamp processed (2023-12-03 12:00:00).

Incremental Load in Tools

Azure Data Factory:

 Use Lookup or Filter Activities to fetch incremental data.

 Use a parameterized pipeline with a watermark value passed as a
parameter.
 Use a sink dataset to write the incremental data to the target.

Databricks with PySpark:

 Use max() on the LastModified column to determine the high

watermark.
 Filter the DataFrame based on this value.

Example PySpark Code:

python
Copy code
# Read source data
source_df =
spark.read.format("delta").load("/source_path")

# Read target data

target_df =
spark.read.format("delta").load("/target_path")

# Determine the high watermark

last_processed_time = target_df.agg({"LastModified":
"max"}).collect()[0][0]

# Filter for new or modified records

incremental_df =
source_df.filter(source_df.LastModified >
last_processed_time)

# Merge into target

incremental_df.write.format("delta").mode("append").sav
e("/target_path")

Benefits of Incremental Load

1. Efficiency: Reduces the volume of data processed.

2. Performance: Faster ETL processes as only a subset of data is handled.
3. Scalability: Suitable for large datasets where full loads are impractical.

This approach is widely used in data warehousing, ETL tools, and cloud services to
ensure efficient and scalable data pipelines.

S6 Incremental Loads
No ratings yet
S6 Incremental Loads
5 pages
Slowly Changing Dimensions - Type 1 and Type 2 Guide
No ratings yet
Slowly Changing Dimensions - Type 1 and Type 2 Guide
9 pages
SCD Patterns in SSIS for Data Warehousing
No ratings yet
SCD Patterns in SSIS for Data Warehousing
11 pages
Sem3 Unit1 DW
No ratings yet
Sem3 Unit1 DW
12 pages
Azure Data Factory Interview Questions & Answers - Claude
No ratings yet
Azure Data Factory Interview Questions & Answers - Claude
25 pages
DWM Notes 1
No ratings yet
DWM Notes 1
15 pages
Data Warehousing Concept Using ETL Process For SCD Type-2
No ratings yet
Data Warehousing Concept Using ETL Process For SCD Type-2
6 pages
Acceptance Testing and ETL Overview
No ratings yet
Acceptance Testing and ETL Overview
19 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
Azure Data Factory
No ratings yet
Azure Data Factory
47 pages
Infosys Data Engineering Questions and Answers - 2025
No ratings yet
Infosys Data Engineering Questions and Answers - 2025
25 pages
Business Intelligence Endsem
No ratings yet
Business Intelligence Endsem
10 pages
Incremental Load Logic at Mayo Clinic
No ratings yet
Incremental Load Logic at Mayo Clinic
11 pages
SCD Types
No ratings yet
SCD Types
5 pages
Name of Solution:: Please Rate This Solution and Share Your Feedback On Website
No ratings yet
Name of Solution:: Please Rate This Solution and Share Your Feedback On Website
3 pages
Z Data Warehouse Concepts
No ratings yet
Z Data Warehouse Concepts
19 pages
Ass 1
No ratings yet
Ass 1
31 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
Simplifying Lookup with Checksum in Informatica
No ratings yet
Simplifying Lookup with Checksum in Informatica
34 pages
Update To Dimension Table and ETL Process
No ratings yet
Update To Dimension Table and ETL Process
11 pages
ETL Design Challenges and Solutions Through Informatica
No ratings yet
ETL Design Challenges and Solutions Through Informatica
12 pages
Informatica SCD Type-2 Guide
No ratings yet
Informatica SCD Type-2 Guide
19 pages
Bahria University: Assignment # 5
No ratings yet
Bahria University: Assignment # 5
12 pages
Data Warehouse Fundamentals Explained
No ratings yet
Data Warehouse Fundamentals Explained
13 pages
ETL Interview Question Basic
No ratings yet
ETL Interview Question Basic
10 pages
Slowly Changing Dimensions
No ratings yet
Slowly Changing Dimensions
5 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
From Data Warehouses To Streaming Warehouses: A Survey On The Challenges For Real-Time Data Warehousing and Available Solutions
No ratings yet
From Data Warehouses To Streaming Warehouses: A Survey On The Challenges For Real-Time Data Warehousing and Available Solutions
4 pages
Selected Topics of Recent Trends in Information Technology
No ratings yet
Selected Topics of Recent Trends in Information Technology
21 pages
UNIVR BA2425 - L10 - DATA INTEGRATION p2
No ratings yet
UNIVR BA2425 - L10 - DATA INTEGRATION p2
32 pages
Bi Unit 3
No ratings yet
Bi Unit 3
26 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
Slowly Changing Dimensions
No ratings yet
Slowly Changing Dimensions
22 pages
SCD Types in Data Warehousing Explained
No ratings yet
SCD Types in Data Warehousing Explained
13 pages
Module 3
No ratings yet
Module 3
30 pages
SCD 20II 20implementation 20in 20datastage 207.X
No ratings yet
SCD 20II 20implementation 20in 20datastage 207.X
7 pages
Advanced ETL Techniques for DWH
No ratings yet
Advanced ETL Techniques for DWH
46 pages
Types of SCD With Example
No ratings yet
Types of SCD With Example
30 pages
How To Load Fact Tables
No ratings yet
How To Load Fact Tables
6 pages
ETL Process Overview in Agriculture
100% (1)
ETL Process Overview in Agriculture
42 pages
SCD Type 3 Implementation Using Informatica PowerCenter
0% (1)
SCD Type 3 Implementation Using Informatica PowerCenter
6 pages
ETL Process for Data Warehouse Integration
No ratings yet
ETL Process for Data Warehouse Integration
45 pages
Data Warehouse and ETL Testing Overview
No ratings yet
Data Warehouse and ETL Testing Overview
19 pages
ETL Process in Data Warehouse
67% (3)
ETL Process in Data Warehouse
40 pages
Exam - 1: October 5, 2016 Exam - 2: November 23, 2016 Quiz - 2: October 26, 2016 Quiz - 3: November 9, 2016
No ratings yet
Exam - 1: October 5, 2016 Exam - 2: November 23, 2016 Quiz - 2: October 26, 2016 Quiz - 3: November 9, 2016
14 pages
Azure Data Factory Interview Questions Answers 1740678784
No ratings yet
Azure Data Factory Interview Questions Answers 1740678784
9 pages
Session5 6 Etl
No ratings yet
Session5 6 Etl
22 pages
Data Warehouse Design Essentials
No ratings yet
Data Warehouse Design Essentials
10 pages
SQL Notes
100% (1)
SQL Notes
8 pages
Database Management System 1 - Lecture 1 and Activity 1
No ratings yet
Database Management System 1 - Lecture 1 and Activity 1
4 pages
SQL Notes 2025
No ratings yet
SQL Notes 2025
46 pages
SCDL - Data Mining
100% (3)
SCDL - Data Mining
89 pages
Introduction To Teradata
No ratings yet
Introduction To Teradata
137 pages
Sri Krishna College of Engineering and Technology: Department of AI&DS
No ratings yet
Sri Krishna College of Engineering and Technology: Department of AI&DS
17 pages
История рабства от древнейших до новых времен by Ингрэм Дж.
No ratings yet
История рабства от древнейших до новых времен by Ингрэм Дж.
348 pages
FALLSEM2025-26 VL PAMCA503 00100 ETH 2025-09-26 Sample-Questions - Indexing
No ratings yet
FALLSEM2025-26 VL PAMCA503 00100 ETH 2025-09-26 Sample-Questions - Indexing
2 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
5-HMIS Components Basic Functions v1
84% (19)
5-HMIS Components Basic Functions v1
36 pages
College Management System
No ratings yet
College Management System
2 pages
Mapping Document - 13 - Aggregator - Transformation
No ratings yet
Mapping Document - 13 - Aggregator - Transformation
48 pages
Transaction Management and Concurrency Discussion Questions Solution
No ratings yet
Transaction Management and Concurrency Discussion Questions Solution
6 pages
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en
No ratings yet
SAP HANA Modeling Guide For SAP HANA XS Advanced Model en
230 pages
02 Modernsql
No ratings yet
02 Modernsql
8 pages
Data Science Classification Etc
No ratings yet
Data Science Classification Etc
19 pages
Cisco Prime Infrastructure Backup Guide
No ratings yet
Cisco Prime Infrastructure Backup Guide
28 pages
Huayhua Vara Rhenzo
No ratings yet
Huayhua Vara Rhenzo
8 pages
SOCIAL COMPUTING - BE IT (End Sem) (2019 PAT) (Sem VIII) (Elective V) NOV 2024
No ratings yet
SOCIAL COMPUTING - BE IT (End Sem) (2019 PAT) (Sem VIII) (Elective V) NOV 2024
2 pages
Erwin (A Data Modeling and Design Tool) : - Oracle COE, LGS LTD
No ratings yet
Erwin (A Data Modeling and Design Tool) : - Oracle COE, LGS LTD
14 pages
Data Governance Notes 2
No ratings yet
Data Governance Notes 2
2 pages
Introduction To It Infrastructure Landscape
No ratings yet
Introduction To It Infrastructure Landscape
12 pages
Arogya AI
No ratings yet
Arogya AI
4 pages
Biomedical Informatics (Discovering Knowledge in Big Data) (2nd Edition) Holzinger
No ratings yet
Biomedical Informatics (Discovering Knowledge in Big Data) (2nd Edition) Holzinger
10 pages
Engineering Reference Styles Guide
No ratings yet
Engineering Reference Styles Guide
4 pages
AME - Setup For Purchasing & IProcurement - 434143.1
No ratings yet
AME - Setup For Purchasing & IProcurement - 434143.1
8 pages
WFH Policy
No ratings yet
WFH Policy
3 pages
Solving Crossword Puzzles Via The Google Api: David E. Goldschmidt
No ratings yet
Solving Crossword Puzzles Via The Google Api: David E. Goldschmidt
8 pages
Assignment - 3 NLP Abhiraj Singh
No ratings yet
Assignment - 3 NLP Abhiraj Singh
4 pages
Sreenidhi Institute of Science and Technology: Summer Industry Internship-II
No ratings yet
Sreenidhi Institute of Science and Technology: Summer Industry Internship-II
4 pages

Implementing Parameterization in ADF

Uploaded by

Implementing Parameterization in ADF

Uploaded by

Implementing Parameterization in ADF

Parameterization in ADF is achieved using pipeline parameters, dataset

Example Scenario: Read Dynamic Files from Azure Blob Storage

1. Requirement: A pipeline should read files from an Azure Blob Storage

1. Create a Pipeline with Parameters:

2. Create a Dataset with Parameters:

3. Connect the Dataset to the Pipeline:

What is SCD2 (Slowly Changing Dimension Type 2)?

SCD2 (Slowly Changing Dimension Type 2) is a method used in data warehousing

Key Features of SCD2:

1. Tracks Historical Data: Maintains a history of changes to data.

How SCD2 Works:

Example Scenario: Employee Dimension

Initial State of Dimension Table:

EmployeeID Name Department StartDate EndDate IsCurrent

Incoming Source Data (New Load):

EmployeeID Name Department

Steps to Implement SCD2:

Updated Dimension Table:

EmployeeID Name Department StartDate EndDate IsCurrent

Key Benefits of SCD2:

 Provides a complete history of changes.

SCD2 in Practice (Implementation in SQL):

-- Insert the new record

Types of Incremental Loads

1. Insert Only: Loads only new records.

How Incremental Load Works

Example: Incremental Load Using Timestamps

Source Table (Sales):

SalesID CustomerID Amount LastModified

Target Table (SalesDW):

SalesID CustomerID Amount LastModified

Steps to Implement Incremental Load:

1. Set a High Watermark:

SalesID CustomerID Amount LastModified

4. Load Changes into Target:

-- Update existing records (if needed)

5. Update the High Watermark:

Incremental Load in Tools

Azure Data Factory:

 Use Lookup or Filter Activities to fetch incremental data.

Databricks with PySpark:

 Use max() on the LastModified column to determine the high

Example PySpark Code:

# Read target data

# Determine the high watermark

# Filter for new or modified records

# Merge into target

Benefits of Incremental Load

1. Efficiency: Reduces the volume of data processed.

You might also like