IBM DataStage - Key Stages
Overview
Professional Summary in 20 Slides
Copy Stage
• Processing stage with one input, multiple outputs
• Copies input dataset to multiple outputs
• Useful for backup or parallel operations
• Optimization flag can prevent or allow stage removal
Sort Stage - Overview
• Single input and output link
• Sorts data on specified keys (asc/desc)
• Allow Duplicates: Retains or removes identical key records
Sort Stage - Output Statistics
• Enables tracking of sorted data stats
• Includes record count, distinct keys, memory usage, time
• Supports debugging, validation, and performance monitoring
Sort Stage - Additional Options
• Cluster Key Change Column: Marks group changes
• Key Change Column: Marks key value changes
• Stable Sort: Preserves order of identical key values
Sort Utilities
• DataStage sort: Uses tsort operator (default)
• UNIX sort: Uses psort operator, depends on UNIX command
Filter Stage - Overview
• Processes one input, multiple outputs and optional reject
• Filters data using WHERE-like conditions
Filter Stage - Criteria Example
• Input Data Criteria:
• 1. Salary > 30000
• 2. Number > 1
• 3. Number = 3
• Rejects remaining rows
Filter Stage - Output Control
• Output Row Only Once:
• True – Outputs on first matched condition only
• False – Outputs to all matched conditions
Funnel Stage - Overview
• Combines multiple datasets into one
• Requires identical structure across inputs
Funnel Stage - Types
• Continuous Funnel – no order guarantee
• Sort Funnel – ordered by key columns
• Sequence – copies all from input1, then input2, etc.
Remove Duplicates Stage
• Removes duplicates from sorted dataset
• Requires sorting on key columns
• Supports single input and output
Transformer Stage - Overview
• Performs row-level transformations
• Key components: Stage variables, Constraints, Column Derivations
Transformer - Looping Example
• Input: student_name, marks1, marks2, marks3
• Output: student_name, marks (in 3 rows)
• Loop with @ITERATION to generate rows
Transformer - Loop Condition
• Condition: @ITERATION <= 3
• Derivation: if/else to choose marks1, marks2, marks3
DS Macros
• DSJobName: Running job name
• DSJobStartTimestamp: Job start time
• DSProjectName: Project name
• DSUserName: User triggering the job
DS Routines
• Reusable code blocks in Transformer
• Written in BASIC, C, JavaScript
• Encapsulate complex logic, increase reusability
DS Routine - Age Calculation
• Routine 'CalculateAge' takes DOB as input
• Returns age based on today’s date
• Used in Transformer to derive Age column
Job Parameters - Overview
• Used in connection details
• Format: #MyParameter#
• Examples: User Name, Password, Client Number
Best Practices
• Use stable sort for consistent outputs
• Use macros and routines for reusable logic
• Filter and transformer stages enable complex business rules
Conclusion
• DataStage stages enhance ETL job design
• Efficient data transformation and management
• Key stages: Copy, Sort, Filter, Funnel, Transformer