0% found this document useful (0 votes)
43 views21 pages

DataStage Presentation

The document provides an overview of key stages in IBM DataStage, including Copy, Sort, Filter, Funnel, and Transformer stages, each with specific functionalities for data processing. It highlights the importance of utilizing stable sorts, macros, and routines for efficient ETL job design and management. Best practices are also discussed to enhance data transformation and ensure consistent outputs.

Uploaded by

vmkkumar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views21 pages

DataStage Presentation

The document provides an overview of key stages in IBM DataStage, including Copy, Sort, Filter, Funnel, and Transformer stages, each with specific functionalities for data processing. It highlights the importance of utilizing stable sorts, macros, and routines for efficient ETL job design and management. Best practices are also discussed to enhance data transformation and ensure consistent outputs.

Uploaded by

vmkkumar22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

IBM DataStage - Key Stages

Overview
Professional Summary in 20 Slides
Copy Stage

• Processing stage with one input, multiple outputs


• Copies input dataset to multiple outputs
• Useful for backup or parallel operations
• Optimization flag can prevent or allow stage removal
Sort Stage - Overview

• Single input and output link


• Sorts data on specified keys (asc/desc)
• Allow Duplicates: Retains or removes identical key records
Sort Stage - Output Statistics

• Enables tracking of sorted data stats


• Includes record count, distinct keys, memory usage, time
• Supports debugging, validation, and performance monitoring
Sort Stage - Additional Options

• Cluster Key Change Column: Marks group changes


• Key Change Column: Marks key value changes
• Stable Sort: Preserves order of identical key values
Sort Utilities

• DataStage sort: Uses tsort operator (default)


• UNIX sort: Uses psort operator, depends on UNIX command
Filter Stage - Overview

• Processes one input, multiple outputs and optional reject


• Filters data using WHERE-like conditions
Filter Stage - Criteria Example

• Input Data Criteria:


• 1. Salary > 30000
• 2. Number > 1
• 3. Number = 3
• Rejects remaining rows
Filter Stage - Output Control

• Output Row Only Once:


• True – Outputs on first matched condition only
• False – Outputs to all matched conditions
Funnel Stage - Overview

• Combines multiple datasets into one


• Requires identical structure across inputs
Funnel Stage - Types

• Continuous Funnel – no order guarantee


• Sort Funnel – ordered by key columns
• Sequence – copies all from input1, then input2, etc.
Remove Duplicates Stage

• Removes duplicates from sorted dataset


• Requires sorting on key columns
• Supports single input and output
Transformer Stage - Overview

• Performs row-level transformations


• Key components: Stage variables, Constraints, Column Derivations
Transformer - Looping Example

• Input: student_name, marks1, marks2, marks3


• Output: student_name, marks (in 3 rows)
• Loop with @ITERATION to generate rows
Transformer - Loop Condition

• Condition: @ITERATION <= 3


• Derivation: if/else to choose marks1, marks2, marks3
DS Macros

• DSJobName: Running job name


• DSJobStartTimestamp: Job start time
• DSProjectName: Project name
• DSUserName: User triggering the job
DS Routines

• Reusable code blocks in Transformer


• Written in BASIC, C, JavaScript
• Encapsulate complex logic, increase reusability
DS Routine - Age Calculation

• Routine 'CalculateAge' takes DOB as input


• Returns age based on today’s date
• Used in Transformer to derive Age column
Job Parameters - Overview

• Used in connection details


• Format: #MyParameter#
• Examples: User Name, Password, Client Number
Best Practices

• Use stable sort for consistent outputs


• Use macros and routines for reusable logic
• Filter and transformer stages enable complex business rules
Conclusion

• DataStage stages enhance ETL job design


• Efficient data transformation and management
• Key stages: Copy, Sort, Filter, Funnel, Transformer

You might also like