0% found this document useful (0 votes)

140 views6 pages

Lookup Stage

Uploaded by

kalu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

140 views6 pages

Lookup Stage

Uploaded by

kalu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

Articles » General Reading » Uncategorised Technical Blogs » General

DataStage Batch Sparse Lookup

dingjing, 28 Jan 2014 CPOL

0.00 (No votes)

DataStage Batch Sparse Lookup

Introduction
DataStage sparse lookup is considered an expensive operation because of a round-trip database query for each
incoming row. It is appropriate if the following 2 conditions are met.

1. The size of reference table is huge, i.e., more than millions of rows. If the reference table is small enough to fit
into memory entirely, normal lookup is a better choice.
2. The number of input rows is less than 1% of the reference table. Otherwise, use a Join stage.

Is it possible to speedup sparse lookup by sending queries to database in batches of 10, 20 or 50 rows? In other words,
instead of sending the following SQL to database for each incoming row,

SELECT some_columns FROM my_table WHERE my_table.id_col = orchestrate.row_value

can we send one query for multiple rows like this?

SELECT some_columns FROM my_table WHERE my_table.id_col in (orchestrate.row_value_list)

Solution
In order to make the 2nd query work, we need 2 tricks. First, we need to concatenate values from multiple rows into a
single string, separated by a delimiter (e.g. comma). I am not sure how to do this in DataStage v8.1 or earlier. Not
impossible, but rather complicated. Since v8.5, the Transformer stage has loop capability, which makes the task of
concatenating multiple rows much easier.

The 2nd trick is to, at database side, split the value list from a comma-delimited string into an array. If we simply plug the
original string into the 2nd query, the database will interpret it literally as a single value. If only there is a standard SQL
function equivalent to string.split() in C# or Java. Different databases use their own tricks to achieve
string.split(). I will use Oracle in my example implementation.

1 of 6 12/23/2015 7:15 PM
DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

Implementation

Job Overview
The example implementation generates 50k rows using a Row Generator stage. Each row has a key column and a value
column. The rows are duplicated in Transformer_25. One copy is branched to Transformer_7, where multiple
rows of the keys are concatenated. The number of rows in each concatenation is set by a job parameter,
#BATCH_SIZE#. The concatenated keys are then sent to Lookup_0 for sparse lookup against an Oracle table with
5m rows. The lookup results are merged back to the original stream in Lookup_16.

Concatenate Multiple Rows in a Transformer Loop

Define the following stage variables and loop condition to concatenate multiple rows.

Variable Data Type Derivation

BatchCount Integer IF IsLast THEN 1 ELSE BatchCount + 1

IsLast Bit LastRow() or BatchCount = BATCH_SIZE
IF IsLast THEN TempList :
FinalList String(4000) DecimalToString(DSLink21.NUM_KEY,"suppress_zero")
ELSE ""
IF IsLast THEN "" ELSE TempList :
TempList String(4000) DecimalToString(DSLink21.NUM_KEY, "suppress_zero") :
","

2 of 6 12/23/2015 7:15 PM
DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

Loop condition: @ITERATION = 1 and IsLast

Batch Sparse Lookup

The concatenated keys need to be split in the Oracle connector. Splitting a comma-delimited string in Oracle can be
done using reqexp_substr() function and recursive query. For example,

SELECT regexp_substr('A,B,C', '[^,]+', 1, level) from dual connect by level <=

regexp_count('A,B,C', ',') + 1

This is how to setup the query in the Oracle connector stage.

3 of 6 12/23/2015 7:15 PM
DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

Test run

A test run with BATCH_SIZE of 50 is shown below. DSLink4 indicated that 1,000 queries, instead of 50,000, were
sent to the database.

Performance Evaluation
Another job (shown below) using regular sparse lookup compares the performance of batch sparse lookup.

4 of 6 12/23/2015 7:15 PM
DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

The result of the comparison is summarized in the chart below. Batch sparse lookup can cut down job running time by
~75%. The most effective batch size is between 20 to 50.

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

5 of 6 12/23/2015 7:15 PM
DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

About the Author

dingjing
United States

No Biography provided

You may also be interested in...

Sparse Procedural Volumetric SAPrefs - Netscape-like
Rendering Preferences Dialog

Generic Sparse Array and Window Tabs (WndTabs)

Sparse Matrices in C# Add-In for DevStudio

Managing Sparse Files on WPF: If Carlsberg did MVVM

Windows Frameworks: Part 3 of n

Comments and Discussions

0 messages have been posted for this article Visit http://www.codeproject.com/Articles/717353/DataStage-
Batch-Sparse-Lookup to post and view comments on this article, or click here to get a print view with messages.

6 of 6 12/23/2015 7:15 PM

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
MySQL for Python
From Everand
MySQL for Python
Albert Lukaszewski
5/5 (1)
Cookbook Systemverilog Uvm Coding Performance Guidelines Verification Academy
100% (1)
Cookbook Systemverilog Uvm Coding Performance Guidelines Verification Academy
73 pages
DataStage Tip For Beginners - Parallel Lookup Types
No ratings yet
DataStage Tip For Beginners - Parallel Lookup Types
6 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
JavaScript. A Comprehensive manual for creating dynamic, responsive websites and applications: Suitable For Both Novice And Experts.
From Everand
JavaScript. A Comprehensive manual for creating dynamic, responsive websites and applications: Suitable For Both Novice And Experts.
Abdulrazak Nugwa Ibrahim
5/5 (1)
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
The Book of JavaScript, 2nd Edition: A Practical Guide to Interactive Web Pages
From Everand
The Book of JavaScript, 2nd Edition: A Practical Guide to Interactive Web Pages
Thau
4.5/5 (3)
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
From Everand
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Emrys Callahan
5/5 (1)
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Amazon SimpleDB: LITE
From Everand
Amazon SimpleDB: LITE
Prabhakar Chaganti
No ratings yet
DevOps for the Desperate: A Hands-On Survival Guide
From Everand
DevOps for the Desperate: A Hands-On Survival Guide
Bradley Smith
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Advanced Java Interview Questions and Answers
From Everand
Advanced Java Interview Questions and Answers
Jaishree Soni
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Python for Data Science: A Hands-On Introduction
From Everand
Python for Data Science: A Hands-On Introduction
Yuli Vasiliev
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Data Lakes & Pipelines: A Modern Azure Guide
From Everand
Data Lakes & Pipelines: A Modern Azure Guide
Kameron Hussain
No ratings yet
Entity Framework Tutorial - Second Edition
From Everand
Entity Framework Tutorial - Second Edition
Joydip Kanjilal
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
DevOps for Networking
From Everand
DevOps for Networking
Steven Armstrong
4/5 (2)
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
From Everand
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Tim Peters
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Mastering C++ Network Automation
From Everand
Mastering C++ Network Automation
Justin Barbara
No ratings yet
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
From Everand
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
Justin Barbara
No ratings yet
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)
Learn MongoDB in 24 Hours
From Everand
Learn MongoDB in 24 Hours
Alex Nordeen
5/5 (2)
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Beginning C# 7 Programming with Visual Studio 2017
From Everand
Beginning C# 7 Programming with Visual Studio 2017
Benjamin Perkins
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Oracle GoldenGate 11g Implementer's guide
From Everand
Oracle GoldenGate 11g Implementer's guide
John P Jeffries
5/5 (1)
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
Bare Metal C: Embedded Programming for the Real World
From Everand
Bare Metal C: Embedded Programming for the Real World
Stephen Oualline
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Adafruit SSD1306
100% (1)
Adafruit SSD1306
14 pages
RPG Ile V7.1
No ratings yet
RPG Ile V7.1
898 pages
DataStage Vs Informatica
No ratings yet
DataStage Vs Informatica
3 pages
Ax2012 Enus Deviv 11 PDF
No ratings yet
Ax2012 Enus Deviv 11 PDF
42 pages
Structured COBOL Programming: Nancy Stern Hofstra University Robert A. Stern
No ratings yet
Structured COBOL Programming: Nancy Stern Hofstra University Robert A. Stern
65 pages
Oracle Workflow - by Dinesh Kumar S
100% (6)
Oracle Workflow - by Dinesh Kumar S
85 pages
BASIC TO ADVANCE FORMULAS IN EXCEL
No ratings yet
BASIC TO ADVANCE FORMULAS IN EXCEL
2 pages
MIDTERM
No ratings yet
MIDTERM
22 pages
ROZA
No ratings yet
ROZA
2 pages
VLOOKUP Function
No ratings yet
VLOOKUP Function
13 pages
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
No ratings yet
Red Hat Directory Server-8.2-Performance Tuning Guide-En-US
44 pages
Hardware Implementation of Softmax Function Based On Piecewise LUT
No ratings yet
Hardware Implementation of Softmax Function Based On Piecewise LUT
3 pages
Access XP Beginners Exercises
No ratings yet
Access XP Beginners Exercises
13 pages
Computer Application Major-Minor Sem 2
No ratings yet
Computer Application Major-Minor Sem 2
32 pages
Adavanced Excel Course Syllabus
No ratings yet
Adavanced Excel Course Syllabus
7 pages
Post Functions: Difference Between Formulas and Functions
No ratings yet
Post Functions: Difference Between Formulas and Functions
8 pages
50 Ms Excel Assignments PDF for Practice Free Download (1)
No ratings yet
50 Ms Excel Assignments PDF for Practice Free Download (1)
72 pages
Moncada National High School: Date Suggested Topics/Activities Resource Speakers
No ratings yet
Moncada National High School: Date Suggested Topics/Activities Resource Speakers
1 page
How To Use PowerPivot Instead of VLOOKUP - Excel Campus
No ratings yet
How To Use PowerPivot Instead of VLOOKUP - Excel Campus
14 pages
Syllabus Data and Analysis and Presentation Skills
No ratings yet
Syllabus Data and Analysis and Presentation Skills
3 pages
Data Analyst Steinbeis
No ratings yet
Data Analyst Steinbeis
30 pages
D7
No ratings yet
D7
2 pages
Akanki Front Page IT Skill Lab 2
No ratings yet
Akanki Front Page IT Skill Lab 2
49 pages
Excel Formulas & Functions
No ratings yet
Excel Formulas & Functions
81 pages
Ravinder Learning Sheet
No ratings yet
Ravinder Learning Sheet
14 pages
Gaurav Final
No ratings yet
Gaurav Final
42 pages
Informatica PowerCenter 9.x Level 1 Developer DS
No ratings yet
Informatica PowerCenter 9.x Level 1 Developer DS
4 pages
SCD 2
No ratings yet
SCD 2
9 pages
Eshan Bajaj
No ratings yet
Eshan Bajaj
48 pages

Lookup Stage

Uploaded by

Lookup Stage

Uploaded by

DataStage Batch Sparse Lookup - CodeProject http://www.codeproject.com/Articles/717353/DataStage-Batch-Sparse-L...

Articles » General Reading » Uncategorised Technical Blogs » General

DataStage Batch Sparse Lookup

0.00 (No votes)

DataStage Batch Sparse Lookup

SELECT some_columns FROM my_table WHERE my_table.id_col = orchestrate.row_value

can we send one query for multiple rows like this?

SELECT some_columns FROM my_table WHERE my_table.id_col in (orchestrate.row_value_list)

Concatenate Multiple Rows in a Transformer Loop

Variable Data Type Derivation

BatchCount Integer IF IsLast THEN 1 ELSE BatchCount + 1

Loop condition: @ITERATION = 1 and IsLast

Batch Sparse Lookup

SELECT regexp_substr('A,B,C', '[^,]+', 1, level) from dual connect by level <=

This is how to setup the query in the Oracle connector stage.

About the Author

You may also be interested in...

Generic Sparse Array and Window Tabs (WndTabs)

Managing Sparse Files on WPF: If Carlsberg did MVVM

Comments and Discussions

You might also like