0% found this document useful (0 votes)
62 views

Final InformaticaHandBook

Uploaded by

Satish More
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Final InformaticaHandBook

Uploaded by

Satish More
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 133

INFORMATICA hand book

ORACLE STATEMENTS
Data Definition Language (DDL)
Create
Alter
Drop
Truncate
Data Manipulation Language (DML)
Insert
Update
Delete
Data Querying Language (DQL)
Select
Data Control Language (DCL)
Grant
Revoke
Transactional Control Language (TCL)
Commit
Rollback
Save point

Syntaxes:

CREATE OR REPLACE SYNONYM HZ_PARTIES FOR SCOTT.HZ_PARTIES

CREATE DATABASE LINK CAASEDW CONNECT TO ITO_ASA IDENTIFIED BY exact123 USING ' CAASEDW’

Materialized View syntax:


CREATE MATERIALIZED VIEW EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW
REFRESH COMPLETE
START WITH sysdate
NEXT TRUNC(SYSDATE+1)+ 4/24
WITH PRIMARY KEY
AS
select * from HWMD_MTH_ALL_METRICS_CURR_VW;
Another Method to refresh:
DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');

Case Statement:

Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP

Decode()
Select empname,Decode(address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address from emp;

1
INFORMATICA hand book

Procedure:
CREATE OR REPLACE PROCEDURE Update_bal (
cust_id_IN In NUMBER,
amount_IN In NUMBER DEFAULT 1) AS
BEGIN
Update account_tbl Set amount= amount_IN where cust_id= cust_id_IN
End
Trigger:

CREATE OR REPLACE TRIGGER EMP_AUR


AFTER/BEFORE UPDATE ON EMP
REFERENCING
NEW AS NEW
OLD AS OLD
FOR EACH ROW

DECLARE
BEGIN
IF (:NEW.last_upd_tmst <> :OLD.last_upd_tmst) THEN
-- Insert into Control table record
Insert into table emp_w values('wrk',sysdate)
ELSE
-- Exec procedure
Exec update_sysdate()
END;

ORACLE JOINS:

Equi join
Non-equi join
Self join
Natural join
Cross join
Outer join
 Left outer
 Right outer
 Full outer
Equi Join/Inner Join:

SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno=d.deptno;


USING CLAUSE
SQL> select empno,ename,job ,dname,loc from emp e join dept d using(deptno);
ON CLAUSE
SQL> select empno,ename,job,dname,loc from emp e join dept d on(e.deptno=d.deptno);
Non-Equi Join
A join which contains an operator other than ‘=’ in the joins condition.

2
INFORMATICA hand book

Ex: SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno > d.deptno;
Self Join
Joining the table itself is called self join.
Ex: SQL> select e1.empno,e2.ename,e1.job,e2.deptno from emp e1,emp e2 where e1.empno=e2.mgr;
Natural Join
Natural join compares all the common columns.
Ex: SQL> select empno,ename,job,dname,loc from emp natural join dept;
Cross Join
This will gives the cross product.
Ex: SQL> select empno,ename,job,dname,loc from emp cross join dept;
Outer Join
Outer join gives the non-matching records along with matching records.
Left Outer Join
This will display the all matching records and the records which are in left hand side table those that are not in right hand side
table.
Ex: SQL> select empno,ename,job,dname,loc from emp e left outer join dept d on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where
e.deptno=d.deptno(+);
Right Outer Join
This will display the all matching records and the records which are in right hand side table those that are not in left hand side
table.
Ex: SQL> select empno,ename,job,dname,loc from emp e right outer join dept d on(e.deptno=d.deptno);
Or
SQL> select empno,ename,job,dname,loc from emp e,dept d where e.deptno(+) = d.deptno;
Full Outer Join
This will display the all matching records and the non-matching records from both tables.
Ex: SQL> select empno,ename,job,dname,loc from emp e full outer join dept d on(e.deptno=d.deptno);
OR
SQL> select p.part_id, s.supplier_name
2 from part p, supplier s
3 where p.supplier_id = s.supplier_id (+)
4 union
5 select p.part_id, s.supplier_name
6 from part p, supplier s
7 where p.supplier_id (+) = s.supplier_id;

What’s the difference between View and Materialized View?

View:
Why Use Views?
• To restrict data access
• To make complex queries easy
• To provide data independence
A simple view is one that:
– Derives data from only one table
– Contains no functions or groups of data
– Can perform DML operations through the view.

3
INFORMATICA hand book

A complex view is one that:


– Derives data from many tables
– Contains functions or groups of data
– Does not always allow DML operations through the view

A view has a logical existence but a materialized view has a physical existence. Moreover a materialized view can be Indexed,
analysed and so on....that is all the things that we can do with a table can also be done with a materialized view.

We can keep aggregated data into materialized view. we can schedule the MV to refresh but table can’t. MV can be created based
on multiple tables.

Materialized View:
In DWH materialized views are very essential because in reporting side if we do aggregate calculations as per the business
requirement report performance would be de graded. So to improve report performance rather than doing report calculations
and joins at reporting side if we put same logic in the MV then we can directly select the data from MV without any joins and
aggregations. We can also schedule MV (Materialize View).
Inline view:
If we write a select statement in from clause that is nothing but inline view.

Ex:
Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno


From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b
Where
a.sal=b.sal and
a.deptno=b.deptno

What is the difference between view and materialized view?

View Materialized view


A view has a logical existence. It does not contain data. A materialized view has a physical existence.
Its not a database object. It is a database object.
We cannot perform DML operation on view. We can perform DML operation on materialized view.
When we do select * from view it will fetch the data from When we do select * from materialized view it will fetch
base table. the data from materialized view.
In view we cannot schedule to refresh. In materialized view we can schedule to refresh.
We can keep aggregated data into materialized view.
Materialized view can be created based on multiple tables.

What is the Difference between Delete, Truncate and Drop?


DELETE
The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove some rows. If no
WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or
ROLLBACK the transaction to make the change permanent or to undo it.
TRUNCATE
TRUNCATE removes all rows from a table. The operation cannot be rolled back. As such, TRUCATE is faster and doesn't use
as much undo space as a DELETE.
DROP
The DROP command removes a table from the database. All the tables' rows, indexes and privileges will also be removed. The
operation cannot be rolled back.

Difference between Rowid and Rownum?

4
INFORMATICA hand book

ROWID
A globally unique identifier for a row in a database. It is created at the time the row is inserted into a table, and destroyed when
it is removed from a table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the slot(row) number,
and FFFF is a file number.

ROWNUM
For each row returned by a query, the ROWNUM pseudo column returns a number indicating the order in which Oracle selects
the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

You can use ROWNUM to limit the number of rows returned by a query, as in this example:

SELECT * FROM employees WHERE ROWNUM < 10;


SELECT * FROM employees WHERE ROWNUM > 10; /** Greater than does not work on ROWNUM**/

Rowid Row-num

Rowid is an oracle internal id that is allocated every Row-num is a row number returned by a select
time a new record is inserted in a table. This ID is statement.
unique and cannot be changed by the user.
Rowid is permanent. Row-num is temporary.
Rowid is a globally unique identifier for a row in a The row-num pseudocoloumn returns a number
database. It is created at the time the row is inserted into indicating the order in which oracle selects the row from
the table, and destroyed when it is removed from a a table or set of joined rows.
table.

Order of where and having:


SELECT column, group_function
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the HAVING clause to restrict groups.

Differences between where clause and having clause

Where clause Having clause


Both where and having clause can be used to filter the data.
Where as in where clause it is not mandatory. But having clause we need to use it with the group by.
Where clause applies to the individual rows. Where as having clause is used to test some condition on
the group rather than on individual rows.
Where clause is used to restrict rows. But having clause is used to restrict groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is filtered based on where. In having clause it is with aggregate records (group by

5
INFORMATICA hand book

functions).

MERGE Statement

You can use merge command to perform insert and update in a single command.
Ex: Merge into student1 s1
Using (select * from student2) s2
On (s1.no=s2.no)
When matched then
Update set marks = s2.marks
When not matched then
Insert (s1.no, s1.name, s1.marks) Values (s2.no, s2.name, s2.marks);
What is the difference between sub-query & co-related sub query?
A sub query is executed once for the parent statement
whereas the correlated sub query is executed once for each
row of the parent query.
Sub Query:
Example:
Select deptno, ename, sal from emp a where sal in (select sal from Grade where sal_grade=’A’ or sal_grade=’B’)
Co-Related Sun query:
Example:
Find all employees who earn more than the average salary in their department.
SELECT last-named, salary, department_id FROM employees A
WHERE salary > (SELECT AVG (salary)
FROM employees B WHERE B.department_id =A.department_id
Group by B.department_id)
EXISTS:
The EXISTS operator tests for existence of rows in
the results set of the subquery.
Select dname from dept where exists
(select 1 from EMP
where dept.deptno= emp.deptno);

Sub-query Co-related sub-query


A sub-query is executed once for the parent Query Where as co-related sub-query is executed once for each
row of the parent query.
Example: Example:
Select * from emp where deptno in (select deptno from Select a.* from emp e where sal >= (select avg(sal) from
dept); emp a where a.deptno=e.deptno group by a.deptno);

Indexes:
1. Bitmap indexes are most appropriate for columns having low distinct values—such as GENDER,
MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a
bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems.
In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate
for primary key) is as efficient as a B-tree index.
2. When to Create an Index
3. You should create an index if:
4. A column contains a wide range of values
5. A column contains a large number of null values

6
INFORMATICA hand book

6. One or more columns are frequently used together in a WHERE clause or a join condition
7. The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows
8. By default if u create index that is nothing but b-tree index.

Why hints Require?

It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that promises to optimize a
query's execution plan. When this optimizer is really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the optimizers (or more
accurately, its statistics) are out of date. In this case, a hint could help.
You should first get the explain plan of your SQL and determine what changes can be done to make the code operate without
using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can
take a wild optimizer and give you optimal performance

Tables analyze and update Analyze Statement


The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed
exactly, or estimated based on a specific number of rows, or a percentage of rows:
ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

Automatic Optimizer Statistics Collection


By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called GATHER_STATS_JOB. By
default this job runs within maintenance windows between 10 P.M. to 6 A.M. week nights and all day on weekends. The job
calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which gathers statistics for tables
with either empty or stale statistics, similar to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the
GATHER AUTO option. The main difference is that the internal job prioritizes the work such that tables most urgently
requiring statistics updates are processed first.

Hint categories:
Hints can be categorized as follows:
 ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
 FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
 CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.
 Hints for Join Orders,
 Hints for Join Operations,
 Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2 or 4 or 16
 Additional Hints
 HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to

7
INFORMATICA hand book

find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Use Hint to force using index

SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1,COL2 FROM TABLE_NAME


Select ( /*+ hash */ ) empno from
ORDERED- This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it
first may speed execution in a join.
PARALLEL (table, instances)This specifies the operation is to be done in parallel.
If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and update example---in where clase like
st,not in ,>,< ,<> then we will use.
Explain Plan:
Explain plan will tell us whether the query properly using indexes or not.whatis the cost of the table whether it is doing full table
scan or not, based on these statistics we can tune the query.
The explain plan process stores data in the PLAN_TABLE. This table can be located in the current schema or a shared schema
and is created using in SQL*Plus as follows:
SQL> CONN sys/password AS SYSDBA
Connected
SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql
SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?
If query taking long time then First will run the query in Explain Plan, The explain plan process stores data in the
PLAN_TABLE.
it will give us execution plan of the query like whether the query is using the relevant indexes on the joining columns or
indexes to support the query are missing.
If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be more then will
create the indexes on the joining columns and will run the query it should give better performance and also needs to analyze
the tables if analyzation happened long back. The ANALYZE statement can be used to gather statistics for a specific table,
index or cluster using
ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue then will use HINTS, hint is nothing but a clue. We can use hints like
 ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
(/*+ ALL_ROWS */)
 FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
(/*+ FIRST_ROWS */)
 CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.
 HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to
find corresponding records. Therefore not suitable for < or > join conditions.
/*+ use_hash */
Hints are most useful to optimize the query performance.

What are the differences between stored procedures and triggers?

8
INFORMATICA hand book

Stored procedure normally used for performing tasks


But the Trigger normally used for tracing and auditing logs.

Stored procedures should be called explicitly by the user in order to execute


But the Trigger should be called implicitly based on the events defined in the table.

Stored Procedure can run independently


But the Trigger should be part of any DML events on the table.

Stored procedure can be executed from the Trigger


But the Trigger cannot be executed from the Stored procedures.

Stored Procedures can have parameters.


But the Trigger cannot have any parameters.
Stored procedures are compiled collection of programs or SQL statements in the database.
Using stored procedure we can access and modify data present in many tables.
Also a stored procedure is not associated with any particular database object.
But triggers are event-driven special procedures which are attached to a specific database object say a table.
Stored procedures are not automatically run and they have to be called explicitly by the user. But triggers get executed when
the particular event associated with the event gets fired.
Packages:
Packages provide a method of encapsulating related procedures, functions, and associated cursors and variables together as a
unit in the database.
package that contains several procedures and functions that process related to same transactions.
A package is a group of related procedures and functions, together with the cursors and variables they use,
Packages provide a method of encapsulating related procedures, functions, and associated cursors and variables together as a
unit in the database.

Triggers:
Oracle lets you define procedures called triggers that run implicitly when an INSERT, UPDATE, or DELETE statement is
issued against the associated table
Triggers are similar to stored procedures. A trigger stored in the database can include SQL and PL/SQL

Types of Triggers
This section describes the different types of triggers:
 Row Triggers and Statement Triggers
 BEFORE and AFTER Triggers
 INSTEAD OF Triggers
 Triggers on System Events and User Events
Row Triggers
A row trigger is fired each time the table is affected by the triggering statement. For example, if an UPDATE statement updates
multiple rows of a table, a row trigger is fired once for each row affected by the UPDATE statement. If a triggering statement
affects no rows, a row trigger is not run.
BEFORE and AFTER Triggers
When defining a trigger, you can specify the trigger timing--whether the trigger action is to be run before or after the triggering
statement. BEFORE and AFTER apply to both statement and row triggers.
BEFORE and AFTER triggers fired by DML statements can be defined only on tables, not on views.
Difference between Trigger and Procedure
Triggers Stored Procedures
In trigger no need to execute manually. Triggers will be Where as in procedure we need to execute manually.
fired automatically.
Triggers that run implicitly when an INSERT, UPDATE,

9
INFORMATICA hand book

or DELETE statement is issued against the associated table.


Differences between stored procedure and functions
Stored Procedure Functions
Stored procedure may or may not return values. Function should return at least one output parameter. Can
return more than one parameter using OUT argument.
Stored procedure can be used to solve the business logic. Function can be used to calculations
Stored procedure is a pre-compiled statement. But function is not a pre-compiled statement.
Stored procedure accepts more than one argument. Whereas function does not accept arguments.
Stored procedures are mainly used to process the tasks. Functions are mainly used to compute values
Cannot be invoked from SQL statements. E.g. SELECT Can be invoked form SQL statements e.g. SELECT
Can affect the state of database using commit. Cannot affect the state of database.
Stored as a pseudo-code in database i.e. compiled form. Parsed and compiled at runtime.

Data files Overview:


A tablespace in an Oracle database consists of one or more physical datafiles. A datafile can be associated with only one
tablespace and only one database.
Table Space:
Oracle stores data logically in tablespaces and physically in datafiles associated with the corresponding tablespace.
A database is divided into one or more logical storage units called tablespaces. Tablespaces are divided into logical units of
storage called segments.
Control File:
A control file contains information about the associated database that is required for access by an instance, both at startup and
during normal operation. Control file information can be modified only by Oracle; no database administrator or user can edit a
control file.

IMPORTANT QUERIES

1. Get duplicate rows from the table:


Select empno, count (*) from EMP group by empno having count (*)>1;
2. Remove duplicates in the table:
Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);
3. Below query transpose columns into rows.
Name No Add1 Add2
abc 100 hyd bang
xyz 200 Mysore pune

Select name, no, add1 from A


UNION
Select name, no, add2 from A;

4. Below query transpose rows into columns.


select
emp_id,
max(decode(row_id,0,address))as address1,
max(decode(row_id,1,address)) as address2,
max(decode(row_id,2,address)) as address3
from (select emp_id,address,mod(rownum,3) row_id from temp order by emp_id )
group by emp_id
Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,

10
INFORMATICA hand book

max(decode(rank_id,2,address)) as add2,
max(decode(rank_id,3,address))as add3
from
(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )
group by
emp_id
5. Rank query:
Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);
6. Dense rank query:
The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:
Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from emp);
7. Top 5 salaries by using rank:
Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from emp) where r<=5;
Or
Select * from (select * from EMP order by sal desc) where rownum<=5;
8. 2 nd highest Sal:
Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from EMP) where r=2;
9. Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
10. How to display alternative rows in a table?

SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2) from emp);
11. Hierarchical queries
Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but
process the child rows.
SELECT department_id, employee_id, last_name, job_id, salary
FROM employees
WHERE last_name! = ’Higgins’
START WITH manager_id IS NULL
CONNECT BY PRIOR employee_id = manager_id;

11
INFORMATICA hand book

DWH CONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision
making. It leverages methods and technologies that focus on counts, statistics and business objectives to improve business
performance.
The objective of Business Intelligence is to better understand customers and improve customer service, make the supply and
distribution chain more efficient, and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose. It is used for predictions, timeseries analysis, financial
Analysis, what -if simulations etc. Basically it is used for better decision making.

What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile collection of data in support of decision
making".
In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/department level.
Subject Oriented:
Data that gives information about a particular subject instead of about a company's ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile:
Data is stable in a data warehouse. More data is added but data is never removed.
What is a DataMart?
Datamart is usually sponsored at the department level and developed with a specific details or subject in mind, a Data Mart is
a subset of data warehouse with a focused objective.

What is the difference between a data warehouse and a data mart?

In terms of design data warehouse and data mart are almost the same.
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/department level.
A data mart only contains data specific to a particular subject areas.

Difference between data mart and data warehouse

Data Mart Data Warehouse

Data mart is usually sponsored at the department level and Data warehouse is a “Subject-Oriented, Integrated, Time-
developed with a specific issue or subject in mind, a data Variant, Nonvolatile collection of data in support of

12
INFORMATICA hand book

mart is a data warehouse with a focused objective. decision making”.


A data mart is used on a business division/ department A data warehouse is used on an enterprise level
level.
A Data Mart is a subset of data from a Data Warehouse. A Data Warehouse is simply an integrated consolidation of
Data Marts are built for specific user groups. data from a variety of sources that is specially designed to
support strategic and tactical decision making.
By providing decision makers with only a subset of data The main objective of Data Warehouse is to provide an
from the Data Warehouse, Privacy, Performance and integrated environment and coherent picture of the business
Clarity Objectives can be attained. at a point in time.

What is fact less fact table?

A fact table that contains only primary keys from the dimension tables, and that do not contain any measures that type of
fact table is called fact less fact table .

What is a Schema?

Graphical Representation of the datastructure. First Phase in implementation of Universe

What are the most important features of a data warehouse?

DRILL DOWN, DRILL ACROSS, Graphs, PI charts, dashboards and TIME HANDLING

To be able to drill down/drill across is the most basic requirement of an end user in a data warehouse. Drilling down most
directly addresses the natural end-user need to see more detail in an result. Drill down should be as generic as possible becuase
there is absolutely no good way to predict users drill-down path.

What does it mean by grain of the star schema?

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level of detail provided
by a star schema.
It is usually given as the number of records per key within the table. In general, the grain of the fact table is the grain of the star
schema.

What is a star schema?


Star schema is a data warehouse schema where there is only one “fact table" and many denormalized dimension tables.

Fact table contains primary keys from all the dimension tables and other numeric columns columns of additive, numeric facts.

13
INFORMATICA hand book

What is a snowflake schema?

Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree like structure with many nesting
levels.

Snowflake schema is easier to maintain but queries require more joins.

What is the difference between snow flake and star schema

Star Schema Snow Flake Schema


The star schema is the simplest data warehouse scheme. Snowflake schema is a more complex data warehouse
model than a star schema.
In star schema each of the dimensions is represented in a In snow flake schema at least one hierarchy should exists
single table .It should not have any hierarchies between between dimension tables.
dims.
It contains a fact table surrounded by dimension tables. If It contains a fact table surrounded by dimension tables. If a
the dimensions are de-normalized, we say it is a star dimension is normalized, we say it is a snow flaked design.
schema design.
In star schema only one join establishes the relationship In snow flake schema since there is relationship between
between the fact table and any one of the dimension tables. the dimensions tables it has to do many joins to fetch the
data.
A star schema optimizes the performance by keeping Snowflake schemas normalize dimensions to eliminated
queries simple and providing fast response time. All the redundancy. The result is more complex queries and
information about the each level is stored in one row. reduced query performance.
It is called a star schema because the diagram resembles a It is called a snowflake schema because the diagram
star. resembles a snowflake.

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is essentially an entry point for getting at the
facts. Dimensions are things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for analyzing the factual measures.

What is Fact Table?

A Fact Table in a dimensional model consists of one or more numeric facts of importance to a business. Examples of facts are

14
INFORMATICA hand book

as follows:
 the number of products sold
 the value of products sold
 the number of products produced
 the number of service calls received

What is Factless Fact Table?

Factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They
are often used to record events or coverage information.

Common examples of factless fact tables include:


 Identifying product promotion events (to determine promoted products that didn’t sell)
 Tracking student attendance or registration events
 Tracking insurance-related accident events

Types of facts?
There are three types of facts:

 Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
 Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but
not the others.
 Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact
table.

What is Granularity?

Principle: create fact tables with the most granular data possible to support analysis of the business process.

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level of detail provided by
a star schema.

It is usually given as the number of records per key within the table. In general, the grain of the fact table is the grain of the star
schema.

Facts: Facts must be consistent with the grain.all facts are at a uniform grain.

 Watch for facts of mixed granularity


 Total sales for day & montly total
Dimensions: each dimension associated with fact table must take on a single value for each fact row.
 Each dimension attribute must take on one value.
 Outriggers are the exception, not the rule.

What is slowly Changing Dimension?


Slowly changing dimensions refers to the change in dimensional attributes over time.
An example of slowly changing dimension is a Resource dimension where attributes of a particular employee change over time
like their designation changes or dept changes etc.
What is Conformed Dimension?
Conformed Dimensions (CD): these dimensions are something that is built once in your model and can be reused multiple times
with different fact tables. For example, consider a model containing multiple fact tables, representing different data marts.
Now look for a dimension that is common to these facts tables. In this example let’s consider that the product dimension is

15
INFORMATICA hand book

common and hence can be reused by creating short cuts and joining the different fact tables.Some of the examples are time
dimension, customer dimensions, product dimension.

What is Junk Dimension?


A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular
dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good
example would be a trade fact in a company that brokers equity trades.
When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will have few records in
them, cluttering your database with these mini ‘identifier’ tables, all records from all these small dimension tables are loaded
into ONE dimension table and we call this dimension table Junk dimension table. (Since we are storing all the junk in this one
table) For example: a company might have handful of manufacture plants, handful of order types, and so on, so forth, and we
can consolidate them in one dimension table called junked dimension table
It’s a dimension table which is used to keep junk attributes

What is De Generated Dimension?


An item that is in the fact table but is stripped off of its description, because the description belongs in dimension table, is
referred to as Degenerated Dimension. Since it looks like dimension, but is really in fact table and has been degenerated of its
description, hence is called degenerated dimension..
Degenerated Dimension: a dimension which is located in fact table known as Degenerated dimension

Dimensional Model:
A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables:
dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records
all the "fact", or measures.
Data modeling
There are three levels of data modeling. They are conceptual, logical, and physical.

The differences between a logical data model and physical data model are shown below.
Logical vs Physical Data Modeling
Logical Data Model Physical Data Model
Represents business information and defines business Represents the physical implementation of the model in a
rules database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment

Below is the simple data model

16
INFORMATICA hand book

Below is the sq for project dim

17
INFORMATICA hand book

18
INFORMATICA hand book

EDIII – Logical Design

ACW_ORGANIZATION_D
ACW_DF_FEES_STG ACW_DF_FEES_F Primary Key
Non-Key Attributes Primary Key ORG_KEY [PK1]
SEGMENT1 ACW_DF_FEES_KEY Non-Key Attributes
ORGANIZATION_ID [PK1] ORGANIZATION_CODE
ITEM_TYPE
Non-Key Attributes CREA TED_BY
BUYER_ID
PRODUCT_KEY CREA TION_DATE
COST_REQUIRED
ORG_KEY LAST_UPDATE_DATE
QUARTER_1_COST
DF_MGR_KEY LAST_UPDATED_BY
QUARTER_2_COST
COST_REQUIRED D_CREATED_BY
QUARTER_3_COST
DF_FEES D_CREATION_DATE PID for DF Fees
QUARTER_4_COST
COSTED_BY D_LAST_UPDATE_DATE
COSTED_BY
COSTED_DATE D_LAST_UPDATED_BY
COSTED_DATE
APPROV ING_MGR
APPROV ED_BY
APPROV ED_DATE
APPROV ED_DATE
D_CREATED_BY
D_CREATION_DATE ACW_USERS_D
D_LAST_UPDATE_BY Primary Key
D_LAST_UPDATED_DATE USER_KEY [PK1]
Non-Key Attributes
EDW_TIME_HIERARCHY
PERSON_ID
EMAIL_ADDRESS
ACW_PCBA_A PPROVAL_F LAST_NAME
Primary Key FIRST_NAME
ACW_PCBA_A PPROVAL_STG FULL_NAME
PCBA _APPROVAL_KEY
Non-Key Attributes [PK1] EFFECTIV E_STA RT_DATE
INV ENTORY_ITEM_ID Non-Key Attributes EFFECTIV E_END_DATE
LATEST_REV EMPLOYEE_NUMBER
PART_KEY
LOCATION_ID LAST_UPDATED_BY
CISCO_PART_NUMBER
LOCATION_CODE LAST_UPDATE_DATE
SUPPLY_CHANNEL_KEY
APPROV AL_FLAG CREA TION_DATE
NPI
ADJUSTMENT CREA TED_BY
APPROV AL_FLAG
APPROV AL_DATE D_LAST_UPDATED_BY
ADJUSTMENT
TOTA L_ADJUSTMENT D_LAST_UPDATE_DATE
APPROV AL_DATE
TOTA L_ITEM_COST D_CREATION_DATE
ADJUSTMENT_AMT
DEMAND D_CREATED_BY
SPEND_BY _ASSEMBLY
COMM_MGR ACW_PRODUCTS_D
COMM_MGR_KEY
BUYER_ID Primary Key
BUYER_ID
BUYER ACW_PART_TO_PID_D PRODUCT_KEY [PK1]
RFQ_CREATED
RFQ_CREATED Users
Primary Key Non-Key Attributes
RFQ_RESPONSE
RFQ_RESPONSE
CSS PART_TO_PID_KEY [PK1] PRODUCT_NA ME
CSS
D_CREATED_BY Non-Key Attributes BUSINESS_UNIT_ID
D_CREATED_DATE PART_KEY BUSINESS_UNIT
D_LAST_UPDATED_BY CISCO_PART_NUMBER PRODUCT_FAMILY_ID
ACW_DF_A PPROVAL_STG D_LAST_UPDATE_DATE PRODUCT_KEY PRODUCT_FAMILY
Non-Key Attributes PRODUCT_NA ME ITEM_TYPE
LATEST_REVISION D_CREATED_BY
INV ENTORY_ITEM_ID ACW_DF_A PPROVAL_F
D_CREATED_BY D_CREATION_DATE
CISCO_PART_NUMBER Primary Key
D_CREATION_DATE D_LAST_UPDATE_BY
LATEST_REV
DF_APPROVAL_KEY D_LAST_UPDATED_BY D_LAST_UPDATED_DATE
PCBA _ITEM_FLAG
[PK1] D_LAST_UPDATE_DATE
APPROV AL_FLAG
Non-Key Attributes
APPROV AL_DATE
LOCATION_ID PART_KEY
LOCATION_CODE CISCO_PART_NUMBER
BUYER SUPPLY_CHANNEL_KEY
BUYER_ID PCBA _ITEM_FLAG
RFQ_CREATED APPROV ED ACW_SUPPLY_CHA NNEL_D
RFQ_RESPONSE APPROV AL_DATE
Primary Key
CSS BUYER_ID
SUPPLY_CHANNEL_KEY
RFQ_CREATED
[PK1]
RFQ_RESPONSE
CSS Non-Key Attributes
D_CREATED_BY SUPPLY_CHANNEL
D_CREATION_DATE DESCRIPTION
D_LAST_UPDATED_BY LAST_UPDATED_BY
D_LAST_UPDATE_DATE LAST_UPDATE_DATE
CREA TED_BY
CREA TION_DATE
D_LAST_UPDATED_BY
D_LAST_UPDATE_DATE
D_CREATED_BY
D_CREATION_DATE

19
INFORMATICA hand book

What is Staging area why we need it in DWH?

If target and source databases are different and target table volume is high it contains some millions of records in this scenario
without staging table we need to design your informatica using look up to find out whether the record exists or not in the target
table since target has huge volumes so its costly to create cache it will hit the performance.

If we create staging tables in the target database we can simply do outer join in the source qualifier to determine insert/update
this approach will give you good performance.

It will avoid full table scan to determine insert/updates on target.


And also we can create index on staging tables since these tables were designed for specific application it will not impact to any
other schemas/users.

While processing flat files to data warehousing we can perform cleansing.


Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data is correct and accurate. During data
cleansing, records are checked for accuracy and consistency.

 Since it is one-to-one mapping from ODS to staging we do truncate and reload.

 We can create indexes in the staging state, to perform our source qualifier best.

 If we have the staging area no need to relay on the informatics transformation to known whether the record
exists or not.

Data cleansing

Weeding out unnecessary or unwanted things (characters and spaces etc) from incoming data to make it more
meaningful and informative

Data merging
Data can be gathered from heterogeneous systems and put together

Data scrubbing
Data scrubbing is the process of fixing or eliminating individual pieces of data that are incorrect, incomplete or
duplicated before the data is passed to end user.

Data scrubbing is aimed at more than eliminating errors and redundancy. The goal is also to bring consistency to
various data sets that may have been created with different, incompatible business rules.

ODS (Operational Data Sources):

My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden on production system
(OLTP) while fetching data for loading targets. Hence its a mandate Requirement for every Warehouse.
So every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the performance as well as if
something goes wrong while fetching data from OLTP to data warehouse it will directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
enables management to gain a consistent picture of the business.

What is a surrogate key?


A surrogate key is a substitution for the natural primary key. It is a unique identifier or number ( normally created by a
database sequence generator ) for each record of a dimension table that can be used for the primary key to the table.

A surrogate key is useful because natural keys may change.

20
INFORMATICA hand book

What is the difference between a primary key and a surrogate key?

A primary key is a special constraint on a column or set of columns. A primary key constraint ensures that the column(s) so
designated have no NULL values, and that every value is unique. Physically, a primary key is implemented by the database
system using a unique index, and all the columns in the primary key must have been declared NOT NULL. A table may have
only one primary key, but it may be composite (consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the primary key instead of a "real" or natural key.
Sometimes there can be several natural keys that could be declared as the primary key, and these are all called candidate keys.
So a surrogate is a candidate key. A table could actually have more than one surrogate key, although this would be unusual. The
most common type of surrogate key is an incrementing integer, such as an auto increment column in MySQL, or a sequence in
Oracle, or an identity column in SQL Server.

ETL-INFORMATICA
Differences between connected lookup and unconnected lookup

Connected Lookup Unconnected Lookup


This is connected to pipleline and receives the input Which is not connected to pipeline and receives input
values from pipleline. values from the result of a: LKP expression in another
transformation via arguments.
We cannot use this lookup more than once in a We can use this transformation more than once within
mapping. the mapping
We can return multiple columns from the same row. Designate one return port (R), returns one column from

21
INFORMATICA hand book

each row.
We can configure to use dynamic cache. We cannot configure to use dynamic cache.
Pass multiple output values to another transformation. Pass one output value to another transformation. The
Link lookup/output ports to another transformation. lookup/output/return port passes the value to the
transformation calling: LKP expression.
Use a dynamic or static cache Use a static cache
Supports user defined default values. Does not support user defined default values.
Cache includes the lookup source column in the lookup Cache includes all lookup/output ports in the lookup
condition (as index cache) and the lookup source condition and the lookup/return port.
columns that are output ports (as data cache).

Differences between dynamic lookup and static lookup

Dynamic Lookup Cache Static Lookup Cache


In dynamic lookup the cache memory will get refreshed In static lookup the cache memory will not get
as soon as the record get inserted or updated/deleted in refreshed even though record inserted or updated in the
the lookup table. lookup table it will refresh only in the next session run.
When we configure a lookup transformation to use a It is a default cache.
dynamic lookup cache, you can only use the equality
operator in the lookup condition.
NewLookupRow port will enable automatically.
Best example where we need to use dynamic cache is if If we use static lookup first record it will go to lookup
suppose first record and last record both are same but and check in the lookup cache based on the condition it
there is a change in the address. What informatica will not find the match so it will return null value then
mapping has to do here is first record needs to get insert in the router it will send that record to insert flow.
and last record should get update in the target table. But still this record dose not available in the cache
memory so when the last record comes to lookup it will
check in the cache it will not find the match so it
returns null value again it will go to insert flow through
router but it is suppose to go to update flow because
cache didn’t get refreshed when the first record get
inserted into target table.

What is the difference between joiner and lookup

Joiner Lookup

In joiner on multiple matches it will return all matching In lookup it will return either first record or last record or
records. any value or error value.
In joiner we cannot configure to use persistence cache, Where as in lookup we can configure to use persistence
shared cache, uncached and dynamic cache cache, shared cache, uncached and dynamic cache.
We cannot override the query in joiner We can override the query in lookup to fetch the data from
multiple tables.
We can perform outer join in joiner transformation. We cannot perform outer join in lookup transformation. But
lookup by default work as a left outer join
We cannot use relational operators in joiner transformation. Where as in lookup we can use the relation operators. (i.e.
(i.e. <,>,<= and so on) <,>,<= and so on)

What is the difference between source qualifier and lookup

Source Qualifier Lookup

22
INFORMATICA hand book

In source qualifier it will push all the matching records. Where as in lookup we can restrict whether to display first
value, last value or any value
In source qualifier there is no concept of cache. Where as in lookup we concentrate on cache concept.
When both source and lookup are in same database we can When the source and lookup table exists in different
use source qualifier. database then we need to use lookup.

Have you done any Performance tuning in informatica?

1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into staging table we don’t have any
transformation inside the mapping its 1 to 1 mapping .Here nothing is there to optimize the mapping so I created
session partitions using key range on effective date column. It improved performance lot, rather than 4 hours it was
running in 30 minutes for entire 40millions.Using partitions DTM will creates multiple reader and writer threads.

2) There was one more scenario where I got very good performance in the mapping level .Rather than using lookup
transformation if we can able to do outer join in the source qualifier query override this will give you good
performance if both lookup table and source were in the same database. If lookup tables is huge volumes then
creating cache is costly.

3) And also if we can able to optimize mapping using less no of transformations always gives you good performance.

4) If any mapping taking long time to execute then first we need to look in to source and target statistics in the monitor
for the throughput and also find out where exactly the bottle neck by looking busy percentage in the session log will
come to know which transformation taking more time ,if your source query is the bottle neck then it will show in the
end of the session log as “query issued to database “that means there is a performance issue in the source query.we
need to tune the query using .

Informatica Session Log shows busy percentage

If we look into session logs it shows busy percentage based on that we need to find out where is bottle neck.

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****

Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG] has completed:
Total Run Time = [7.193083] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG] has
completed. The total run time was insufficient for any meaningful statistics.

Thread [WRITER_1_*_1] created for [the write stage] of partition point [ACW_PCBA_APPROVAL_F1,
ACW_PCBA_APPROVAL_F] has completed: Total Run Time = [0.806521] secs, Total Idle Time = [0.000000] secs, Busy
Percentage = [100.000000]

If suppose I've to load 40 lacs records in the target table and the workflow is taking about 10 - 11 hours to finish. I've
already increased the cache size to 128MB. There are no joiner, just lookups and expression transformations

(1) If the lookups is uncached and have many records, try creating indexes on the columns used in the lkp condition. And try
increasing the lookup cache.If this doesnt increase the performance. If the target has any indexes disable them in the target pre
load and enable them in the target post load.

(2) Three things you can do w.r.t it.

1. Increase the Commit intervals ( by default its 10000)


2. Use bulk mode instead of normal mode incase ur target doesn't have primary keys or use pre and post session SQL to

23
INFORMATICA hand book

implement the same (depending on the business req.)


3. Uses Key partitionning to load the data faster.

(3)If your target consists key constraints and indexes u slow the loading of data. To improve the session performance in this
case drop constraints and indexes before you run the session and rebuild them after completion of session.

What is Constraint based loading in informatica?

By setting Constraint Based Loading property at session level in Configaration tab we can load the data into parent and child
relational tables (primary foreign key).

Genarally What it do is it will load the data first in parent table then it will load it in to child table.

What is use of Shortcuts in informatica?

If we copy source definaltions or target definations or mapplets from Shared folder to any other folders that will become a
shortcut.

Let’s assume we have imported some source and target definitions in a shared folder after that we are using those sources and
target definitions in another folders as a shortcut in some mappings.

If any modifications occur in the backend (Database) structure like adding new columns or drop existing columns either in
source or target I f we reimport into shared folder those new changes automatically it would reflect in all folder/mappings
wherever we used those sources or target definitions.

Target Update Override

If we don’t have primary key on target table using Target Update Override option we can perform updates.By default, the
Integration Service updates target tables based on key values. However, you can override the default UPDATE statement for
each target in a mapping. You might want to update the target based on non-key columns.

Overriding the WHERE Clause

You can override the WHERE clause to include non-key columns. For example, you might want to update records for
employees named Mike Smith only. To do this, you edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED =:TU.DATE_SHIPPED,


TOTAL_SALES = :TU.TOTAL_SALES WHERE EMP_NAME = :TU.EMP_NAME and
EMP_NAME = 'MIKE SMITH'

If you modify the UPDATE portion of the statement, be sure to use :TU to specify ports.

SCD Type-II Effective-Date Approach


 We have one of the dimension in current project called resource dimension. Here we are maintaining the history to
keep track of SCD changes.
 To maintain the history in slowly changing dimension or resource dimension. We followed SCD Type-II Effective-
Date approach.
 My resource dimension structure would be eff-start-date, eff-end-date, s.k and source columns.
 Whenever I do a insert into dimension I would populate eff-start-date with sysdate, eff-end-date with future date and
s.k as a sequence number.
 If the record already present in my dimension but there is change in the source data. In that case what I need to do is
 Update the previous record eff-end-date with sysdate and insert as a new record with source data.

24
INFORMATICA hand book

Informatica design to implement SCD Type-II effective-date approach


 Once you fetch the record from source qualifier. We will send it to lookup to find out whether the record is present in
the target or not based on source primary key column.
 Once we find the match in the lookup we are taking SCD column from lookup and source columns from SQ to
expression transformation.
 In lookup transformation we need to override the lookup override query to fetch Active records from the dimension
while building the cache.
 In expression transformation I can compare source with lookup return data.
 If the source and target data is same then I can make a flag as ‘S’.
 If the source and target data is different then I can make a flag as ‘U’.
 If source data does not exists in the target that means lookup returns null value. I can flag it as ‘I’.
 Based on the flag values in router I can route the data into insert and update flow.
 If flag=’I’ or ‘U’ I will pass it to insert flow.
 If flag=’U’ I will pass this record to eff-date update flow
 When we do insert we are passing the sequence value to s.k.
 Whenever we do update we are updating the eff-end-date column based on lookup return s.k value.

Complex Mapping
 We have one of the order file requirement. Requirement is every day in source system they will place filename with
timestamp in informatica server.
 We have to process the same date file through informatica.
 Source file directory contain older than 30 days files with timestamps.
 For this requirement if I hardcode the timestamp for source file name it will process the same file every day.
 So what I did here is I created $InputFilename for source file name.
 Then I am going to use the parameter file to supply the values to session variables ($InputFilename).
 To update this parameter file I have created one more mapping.
 This mapping will update the parameter file with appended timestamp to file name.
 I make sure to run this parameter file update mapping before my actual mapping.

How to handle errors in informatica?


 We have one of the source with numerator and denominator values we need to calculate num/deno
 When populating to target.
 If deno=0 I should not load this record into target table.
 We need to send those records to flat file after completion of 1st session run. Shell script will check the file size.
 If the file size is greater than zero then it will send email notification to source system POC (point of contact) along
with deno zero record file and appropriate email subject and body.
 If file size<=0 that means there is no records in flat file. In this case shell script will not send any email notification.
 Or
 We are expecting a not null value for one of the source column.
 If it is null that means it is a error record.
 We can use the above approach for error handling.

Why we need source qualifier?


Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.

Parameter file it will supply the values to session level variables and mapping level variables.

25
INFORMATICA hand book

Variables are of two types:


 Session level variables
 Mapping level variables
Session level variables:
Session parameters, like mapping parameters, represent values you might want to change between sessions, such as a database
connection or source file. Use session parameters in the session properties, and then define the parameters in a parameter file.
You can specify the parameter file for the session to use in the session properties. You can also specify it when you
use pmcmd to start the session.The Workflow Manager provides one built-in session parameter, $PMSessionLogFile.With
$PMSessionLogFile, you can change the name of the session log generated for the session.The Workflow Manager also allows
you to create user-defined session parameters.

Naming Conventions for User-Defined Session Parameters


Parameter Type Naming Convention
Database Connection $DBConnectionName
Source File $InputFileName
Target File $OutputFileName
Lookup File $LookupFileName
Reject File $BadFileName

Use session parameters to make sessions more flexible. For example, you have the same type of transactional data written to
two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases. You want
to use the same mapping for both tables. Instead of creating two sessions for the same mapping, you can create a database
connection parameter, $DBConnectionSource, and use it as the source database connection for the session. When you create a
parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After the session completes, you
set $DBConnectionSource to TransDB2 and run the session again.
You might use several session parameters together to make session management easier. For example, you might use source file
and database connection parameters to configure a session to read data from different source files and write the results to
different target databases. You can then use reject file parameters to write the session reject files to the target machine. You can
use the session log parameter, $PMSessionLogFile, to write to different session logs in the target machine, as well.
When you use session parameters, you must define the parameters in the parameter file. Session parameters do not have default
values. When the PowerCenter Server cannot find a value for a session parameter, it fails to initialize the session.
Mapping level variables are of two types:
 Variable
 Parameter

What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.

Flat File
Flat file is a collection of data in a file in the specific format.
Informatica can support two types of files
 Delimiter
 Fixed Width
In delimiter we need to specify the separator.
In fixed width we need to known about the format first. Means how many character to read for particular column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know about the headers.
If the file contains the header then in definition we need to skip the first row.

List file:

26
INFORMATICA hand book

If you want to process multiple files with same structure. We don’t need multiple mapping and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main mapping.

Parameter file Format:


It is a text file below is the format for parameter file. We use to place this file in the unix box where we have installed our
informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_AUSTRI]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_G
EHC_APO_BAAN_SALES_HIST_BELUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495

How do you perform incremental logic or Delta or CDC?


Incremental means suppose today we processed 100 records ,for tomorrow run u need to extract whatever the records inserted
newly and updated after previous run based on last updated timestamp (Yesterday run) this process called as incremental or
delta.
Approach_1: Using set max var ()
1) First need to create mapping var ($$Pre_sess_max_upd)and assign initial value as old date (01/01/1940).
2) Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_max_upd (Mapping var)
3) In the expression assign max last_upd_date value to $$Pre_sess_max_upd(mapping var) using set max var
4) Because its var so it stores the max last upd_date value in the repository, in the next run our source qualifier
query will fetch only the records updated or inseted after previous run.
Approach_2: Using parameter file
1 First need to create mapping parameter ($$Pre_sess_start_tmst )and assign initial value as old date
(01/01/1940) in the parameterfile.
2 Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_start_tmst (Mapping var)
3 Update mapping parameter($$Pre_sess_start_tmst) values in the parameter file using shell script or another
mapping after first session get completed successfully
4 Because its mapping parameter so every time we need to update the value in the parameter file after
comptetion of main session.

Approach_3: Using oracle Control tables


1 First we need to create two control tables cont_tbl_1 and cont_tbl_2 with structure of
session_st_time,wf_name
2 Then insert one record in each table with session_st_time=1/1/1940 and workflow_name
3 create two store procedures one for update cont_tbl_1 with session st_time, set property of store procedure
type as Source_pre_load .
4 In 2nd store procedure set property of store procedure type as Target _Post_load.this proc will update the
session _st_time in Cont_tbl_2 from cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select session_st_time from

27
INFORMATICA hand book

cont_tbl_2 where workflow name=’Actual work flow name’.

Approach_1: Using set max var ()


1) First need to create mapping var ($$INCREMENT_TS)and assign initial value as old date (01/01/1940).
2) Then override source qualifier query to fetch only LAT_UPD_DATE >=($$INCREMENT_TS (Mapping var)
3) In the expression assign max last_upd_date value to ($$INCREMENT_TS (mapping var) using set max var
4) Because its var so it stores the max last upd_date value in the repository, in the next run our source qualifier
query will fetch only the records updated or inseted after previous run.

Logic in the mapping variable is

28
INFORMATICA hand book

Logic in the SQ is

29
INFORMATICA hand book

In expression assign max last update date value to the variable using function set max variable.

30
INFORMATICA hand book

Logic in the update strategy is below

31
INFORMATICA hand book

Approach_2: Using parameter file


First need to create mapping parameter ($$LastUpdateDate Time )and assign initial value as old date (01/01/1940) in the
parameterfile.
Then override source qualifier query to fetch only LAT_UPD_DATE >=($$LastUpdateDate Time (Mapping var)
Update mapping parameter($$LastUpdateDate Time) values in the parameter file using shell script or another mapping after
first session get completed successfully
Because its mapping parameter so every time we need to update the value in the parameter file after comptetion of main
session.
Parameterfile:

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_
GEHC_APO_BAAN_SALES_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
$$LastUpdateDate Time =01/01/1940

Updating parameter File

Logic in the expression

32
INFORMATICA hand book

Main mapping

Sql override in SQ Transformation

33
INFORMATICA hand book

Workflod Design

34
INFORMATICA hand book

Informatica Tuning

The aim of performance tuning is optimize session performance so sessions run during the available load window for the
Informatica Server. Increase the session performance by following.

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less
than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on
session performance.

1. Cache lookups if source table is under 500,000 rows and DON’T cache for tables over 500,000 rows.
2. Reduce the number of transformations. Don’t use an Expression Transformation to collect fields. Don’t use an Update
Transformation if only inserting. Insert mode is the default.
3. If a value is used in multiple ports, calculate the value once (in a variable) and reuse the result instead of recalculating
it for multiple ports.
4. Reuse objects where possible.
5. Delete unused ports particularly in the Source Qualifier and Lookups.
6. Use Operators in expressions over the use of functions.
7. Avoid using Stored Procedures, and call them only once during the mapping if possible.
8. Remember to turn off Verbose logging after you have finished debugging.
9. Use default values where possible instead of using IIF (ISNULL(X),,) in Expression port.
10. When overriding the Lookup SQL, always ensure to put a valid Order By statement in the SQL. This will cause the
database to perform the order rather than Informatica Server while building the Cache.
11. Improve session performance by using sorted data with the Joiner transformation. When the Joiner transformation is
configured to use sorted data, the Informatica Server improves performance by minimizing disk input and output.
12. Improve session performance by using sorted input with the Aggregator Transformation since it reduces the amount
of data cached during the session.
13. Improve session performance by using limited number of connected input/output or output ports to reduce the amount
of data the Aggregator transformation stores in the data cache.
14. Use a Filter transformation prior to Aggregator transformation to reduce unnecessary aggregation.
15. Performing a join in a database is faster than performing join in the session. Also use the Source Qualifier to perform
the join.
16. Define the source with less number of rows and master source in Joiner Transformations, since this reduces the search
time and also the cache.
17. When using multiple conditions in a lookup conditions, specify the conditions with the equality operator first.
18. Improve session performance by caching small lookup tables.
19. If the lookup table is on the same database as the source table, instead of using a Lookup transformation, join the
tables in the Source Qualifier Transformation itself if possible.
20. If the lookup table does not change between sessions, configure the Lookup transformation to use a persistent lookup
cache. The Informatica Server saves and reuses cache files from session to session, eliminating the time required to
read the lookup table.
21. Use :LKP reference qualifier in expressions only when calling unconnected Lookup Transformations.
22. Informatica Server generates an ORDER BY statement for a cached lookup that contains all lookup ports. By
providing an override ORDER BY clause with fewer columns, session performance can be improved.
23. Eliminate unnecessary data type conversions from mappings.
24. Reduce the number of rows being cached by using the Lookup SQL Override option to add a WHERE clause to the
default SQL statement.
Tuning

Tuning a PowerCenter 8 ETL environment is not that straightforward. A chain is only as strong as the weakest link. There are
four crucial domains that require attention: system, network, database and the PowerCenter 8 installation itself. It goes without
saying that without a well performing infrastructure the tuning of the PowerCenter 8 environment will not make much of a
difference.

35
INFORMATICA hand book

As the first three domains are located in the realms of administrators, this article will only briefly touch these subjects and will
mainly focus on the items available to developers within PowerCenter 8.

Tuning is an iterative process: at each iteration the largest bottleneck is removed, gradually improving performance.
Bottlenecks can occur on the system, on the database (either source or target), or within the mapping or session ran by the
Integration Service. To identify bottlenecks, run test sessions, monitor the system usage and gather advanced performance
statistics while running. Examine the session log in detail as it provides valuable information concerning session performance.
From the perspective of a developer, search for performance problems in the following order:

 source / target
 mapping
 session
 system

If tuning the mapping and session still proves to be inadequate, the underlying system will need to be examined closer. This
extended examination needs to be done in close collaboration with the system administrators and database administrators
(DBA). They have several options to improve performance without invoking hardware changes. Examples are distributing
database files over different disks, improving network bandwidth and lightening the server workload by moving other
applications. However if none of this helps, only hardware upgrades will bring redemption to your performance problems.

Session logs

The PowerCenter session log provides very detailed information that can be used to establish a baseline and will identify
potential problems.

Very useful for the developer are the detailed thread statistics that will help benchmarking your actions. The thread statistics
will show if the bottlenecks occur while transforming data or while reading/writing. Always focus attention on the thread with
the highest busy percentage first. For every thread, detailed information on the run and idle time are presented. The busy
percentage is calculated: (run time – idle time) / (run time * 100).

Each session has a minimum of three threads:

 reader thread
 transformation thread
 writer thread

An example:

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has
completed: Total Run Time = [31.984171] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_X_T_CT_F_SITE_WK_ENROLL] has
completed: Total Run Time = [0.624996] secs, Total Idle Time = [0.453115] secs, Busy Percentage = [27.501083].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total Run
Time = [476.668825] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].

In this particular case it is obvious that the database can be considered as the main bottleneck. Both reading and writing use
most of the execution time. The actual transformations only use a very small amount of time. If a reader or writer thread is

36
INFORMATICA hand book

100% busy, consider partitioning the session. This will allow the mapping to open several connections to the database, each
reading/writing data from/to a partition thus improving data read/write speed.

Severity Timestamp Node Thread Message Code Message


INFO 23/Dec/2008 09:02:22 node_etl02 MANAGER PETL_24031

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has completed. The
total run time was insufficient for any meaningful statistics.

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_T_CT_F_SITE_WK_BSC] has
completed: Total Run Time = [22.765478] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
Thread [WRITER_1_*_1] created for [the write stage] of partition point [T_CT_F_SITE_WK_BSC] has completed: Total Run
Time = [30.937302] secs, Total Idle Time = [20.345600] secs, Busy Percentage = [34.23602355].

In the example above, the transformation thread poses the largest bottleneck and needs to be dealt with first. The reader thread
finished so quickly no meaningful statistics were possible. The writer thread spends the majority of time in the idle state,
waiting for data to emerge from the transformation thread. Perhaps an unsorted aggregator is used, causing the Integration
Service to sort all data before releasing any aggregated record?

The number of threads can increase if the sessions will read/write to multiple targets, if sessions have multiple execution paths,
if partitioning is used …

Establishing a baseline

To be able to benchmark the increase or decrease of performance following an action it is important to establish a baseline to
compare with. It is good practice to log in detail every iteration in the tuning process. This log will enable the developer to
clearly identify the actions that enhanced or decreased performance and serve as later reference for future tuning efforts. Never
assume that an action will improve performance because the action is a best practice or worked before: always test and compare
with hard figures. The thread statistics are used to build this log.

This log file could look like this:

Optimally reading sources

37
INFORMATICA hand book

Reading from sources breaks down into two distinct categories: reading relational sources and reading flat files. Sometimes, both
source types are combined in a single mapping.
A homogeneous join is a join between relational sources that combine data from a single origin: for example a number of Oracle
tables being joined.
A heterogeneous joins is a join between sources that combine data from different origins: when for example Oracle-data is joined
with a flat file.

Whatever source you are trying to read, always try to limit the incoming data stream maximally. Place filters as early as possible
in the mapping, preferably in the source qualifier. This will ensure only data needed by the Integration Services is picked up from
the database and transported over the network. If you suspect the performance of reading relational data is not optimal, replace
the relational source with a flat file source containing the same data. If there is a difference in performance, the path towards the
source database should be investigated more closely, such as execution plan of the query, database performance, network
performance, network package sizes,…

When using homogeneous relational sources, use a single source qualifier with a user defined join instead of a joiner
transformation. This will force the join being executed on the database instead of the PowerCenter 8 platform. If a joiner
transformation is used instead, all data is first picked up from the database server, then transported to the PowerCenter 8
platform, sorted and only as a last step joined by the Integration server.
Consider pre-sorting the data in the database, this will make further sorting for later aggregators, joiners,… by the Integration
Service unnecessary. Make sure the query executed on the database has a favourable execution plan. Use the explain plan
(Oracle) facility to verify the query's execution plan if indexes are optimally used. Do not use synonyms or database links unless
really needed as these will slow down the data stream.

In general it is good practice to always generate keys for primary key and foreign key fields. If no key is available or known a
dummy key should be used. In the reference table an extra dummy record should be inserted. This method will improve join
performance when using homogenous joins. In general three dummy rows should be included:

 999999 Not applicable


 999998 Not available
 999997 Missing

When using heterogeneous sources there is no alternative but to use a joiner transformation. To ease up matters for the
Integration service, ensure that all relational sources are sorted and joined in advance in the source qualifier. Flat file sources
need to be sorted before joining, using a sorter transformation. When joining the 2 sorted sources, check the sorted input property
at the joiner transformation. The sorted input option allows the joiner transformation to start passing records to subsequent
transformations as soon as the key value changes. Normal behaviour would be to hold passing data until all data is sorted and
processed in the joiner transformation.

By matching the session property Line Sequential buffer length to the size of exactly one record overhead is minimized. If
possible stage flat files in a staging table. Joining and filtering can then be done in the database.

Optimally writing to targets

One of the most common performance issues in PowerCenter 8 is slow writing to target databases. This is usually caused by a
lack of database or network performance. You can test for this behaviour by replacing the relational target with a flat file target.
If performance increases considerably it is clear something is wrong with the relational target.
Indexes are usually the root cause of slow target behaviour. In Oracle, every index on a table will decrease the performance of an
insert statement by 25%. The more indexes are defined, the slower insert/update statements will be. Every time an update/insert
statement is executed, the indexes need to be updated as well. Try dropping the indexes on the target table. If this does not
increases performance, the network is likely causing the problem.

In general avoid having too many targets in a single mapping. Increasing the commit interval will decrease the amount of session
overhead. Three different commit types are available for targets:

 target based commit: fastest

38
INFORMATICA hand book

 source base commit: in between


 user defined commit: slowest , avoid using user defined commit when not really necessary

There are two likely scenarios when writing to targets:

Session is only inserting data

PowerCenter has two methods for inserting data in a relational target: normal or bulk loads. Normal loads will generate DML-
statements. Bulk loads will bypass the database log and are available for DB2, Sybase, Oracle (SQL Loader), or Microsoft SQL
Server. This loading method has a considerable performance gain but has two drawbacks: the recovery of a session will not be
possible as no rollback data is kept by the database. When bulk loading, the target table cannot have any indexes defined upon, so
drop and recreate the indexes before and after the session. For every case you will have to weigh if dropping and recreating the
indexes while using a bulk load outperforms a classic insert-statement with all indexes in place.

Remember to use a very large commit interval when using bulk loads with Oracle and Microsoft to avoid unnecessary overhead.
Dropping and recreating indexes can be done by using pre -and post session tasks or by calling a stored procedure within the
mapping.

Session is mainly updating a limited set of data

When the session is updating a set of records in a large table, the use of a primary key or unique index is absolutely necessary. Be
sure to check the explain plan and verify the proper index usage. Sometimes it is faster to only keep unique indexes while loading
data and dropping the non-unique indexes not needed by the session. These indexes can be recreated at the end of the session.

Clever mapping logic

Now data is being read and written in the most optimal way, it is time to focus our attention to the actual mapping. The basic idea
is simple: minimize the incoming data stream and create as little as possible overhead within the mapping. A first step in
achieving this goal is to reduce the number of transformations to a minimum by reusing common logic. Perhaps the use of the
same lookup in different pipes could be redesigned to only use the lookup once? By using clever caching strategies, cache can be
reused in the mapping or throughout the workflow. Especially with active transformations (transformations where the number of
records is being changed) the use of caching is extremely important. Active transformations that reduce the number of records
should be placed as early as possible in the mapping.

Data type conversions between transformations in the mapping are costly. Be sure to check if all explicit and implicit conversions
really are necessary. When the data from a source is passed directly to a target without any other actions to be done, connect the
source qualifier directly to the target without the use of other transformations.

Single pass reading allows multiple targets being populated using the data from the same source qualifier. Consider using single
pass reading if there are multiple sessions using the same source: the existing mapping logic can be combined by using multiple
pipelines. Common data manipulations for all pipelines should be done before splitting out the pipeline.

By times it is better not to create mappings at all: staging mappings could be replaced by snapshots or replication in the database.
Databases are specialized in these types of data transfer and are in general far more efficient in processing then passing the data
through PowerCenter .

Transformation Mechanics

Every transformation has its specifics related to performance. In the section below the most important items are discussed.

A joiner transformation should be used to join heterogeneous data sources. Homogeneous sources should always be joined in the
database by using the user defined join in the source qualifier transformation.

39
INFORMATICA hand book

If not sorted at database level, always use a sorter transformation to sort the data before entering the joiner transformation. Make
sure the sorter transformation has sufficient cache to enable a 1-pass sort. Not having sufficient cache will plummet performance.
The data could be sorted in the joiner, but there are three advantages of using the sorter transformation:

The use of sorted input enables the joiner transformation to start passing data to subsequent transformations before all data was
passed in the joiner transformation. Consequently, the transformations following the joiner transformation start receiving data
nearly immediately and do not have to wait until all the data was sorted and joined in the joiner transformation. This logic is only
valid when the source can be sorted in the database: for example when joining SQL-Server and Oracle. Both sources can be
sorted in the database, making additional sorting using sorters superfluous. When a sorter is needed, for example when joining
Oracle and a flat file, the sorter will have to wait until all data is read from the flat file before records to the joiner transformation
can be passed.

The sorting algorithm used in the sorter is faster then the algorithm used in joiners or aggregators.

The use of sorted input in the joiner transformation, allows for a smaller cache size, leaving more memory for other
transformations or sessions. Again, when a flat file is used, a sorter will be needed prior to the joiner transformation. Although
the joiner transformation uses less cache, the sorter cache will need to be sufficiently large to enable sorting all input records.

As outer joins are far more expensive than inner joins, try to avoid them as much as possible. The master source should be
designated as the source containing fewer rows then the detail source. Join as early as possible in the pipeline as this limits the
number of pipes and decreases the amount of data being sent to other transformations.

Only use a filter transformation for non-relational sources. When using relational sources, filter in the source qualifier. Filter as
early as possible in the data stream. Try filtering by using numeric lookup conditions. Numeric matching is considerably faster
then the matching of strings. Avoid complex logic in the filter condition. Be creative in rewriting complex expressions to the
shortest possible length. When multiple filters are needed, consider using a router transformation as this will simplify mapping
logic.

A lookup transformation is used to lookup values from another table. Clever use of lookup caches can make a huge difference in
performance.

By default, lookup transformations are cached. The selected lookup fields from the lookup table are read into memory and a
lookup cache file is built every time the lookup is called. To minimize the usage of lookup cache, only retrieve lookup ports that
are really needed.

However, to cache or not to cache a lookup really depends on the situation. An uncached lookup makes perfect sense if only a
small percentage of lookup rows will be used. For example if we only need 200 rows in a 10 000 000 rows table. In this
particular case, building the lookup cache would require an extensive amount of time. A direct select to the database for every
lookup row will be much faster on the condition that the lookup key in the database is indexed.

40
INFORMATICA hand book

Sometimes a lookup is used multiple times in an execution path of a mapping or workflow. Re-caching the lookup every time
would be time consuming and unnecessary, as long as the lookup source table remains unchanged. The persistent
lookup cache property was created to handle this type of situation. Only when calling the lookup the first time, the lookup cache
file is refreshed. All following lookups reuse the persistent cache file. Using a persistent cache can improve performance
considerably because the Integration Service builds the memory cache from the cache files instead of the database.

Use dynamic lookup cache when the lookup source table is a target in the mapping and updated dynamically throughout the
mapping. Normal lookups caches are static. The records that were inserted in the session are not available to the lookup cache.
When using dynamic lookup cache, newly inserted or updated records are update in the lookup cache immediately.

Ensure sufficient memory cache is available for the lookup. If not, the Integration server will have to write to disk, slowing down.

By using the Additional Concurrent Pipelines property at session level, lookup caches will start building concurrently at the start
of the mapping. Normal behaviour would be that a lookup cache is created only when the lookup is called. Pre-building caches
versus building caches on demand can increase the total session performance considerably, but only when the pre-build lookups
will be used for sure in the session. Again, the performance gain of setting this property will depend on the particular situation.

An aggregator transformation is an active transformation, used to group and aggregate data. If the input was not sorted already,
always use a sorter transformation in front of the aggregator transformation. As with the joiner transformation the aggregator
transformation will accumulate data until the dataset is complete and only starts processing and sending output records from there
on. When sorted input is used, the aggregator will process and sent output records as soon as the first set of records is complete.
This will allow for much faster processing and smaller caches. Use as little as possible functions and difficult nested conditions.
Especially avoid the use of complex expressions in the group by ports. If needed use an expression transformation to build these
expressions in advance. When using change data capture, incremental aggregation will enhance performance considerably.

Sometimes, simple aggregations can be done by using an expression transformation that uses variables. In certain cases this could
be a valid alternative for an aggregation transformation.

Expression transformations are generally used to calculate variables. Try not to use complex nested conditions, use decode
instead. Functions are more expensive than operators; avoid using functions if the same can be achieved by using operators.
Implicit data type conversion is expensive. Try to convert data types as little as possible. Working with numbers is generally
faster then working with strings. Be creative in rewriting complex expressions to the shortest possible length.

The use of a sequence generator transformation versus a database sequence depends on the load method of the target table. If
using bulk loading, database sequences cannot be used. The sequence generator transformation can overcome this problem. Every
row is given a unique sequence number. Typically a number of values are cached for performance reasons.

There is however a big catch. Unused sequence numbers at the end of the session are lost. The next time the session is run, the
sequence generator will cache a new batch of numbers.

For example: a sequence generator caches 10000 values. 10000 rows are loaded, using the cached values. At row 10001, a new
batch of sequence values is cached: from 10000 to 20000. However the last row in the session is 10002. All values between
10002 and 20000 are lost. Next time the session is ran, the first inserted row will have a key of 20001.

To avoid these gaps use a sequence generator in combination with an unconnected lookup. First look up the latest key value in
the target table. Then use an expression that will call the sequence generator and add one to the key value that was just retrieved.
The sequence generator should restart numbering at every run. There are some advantages to this approach:

 a database sequence is emulated and bulk loads remain possible.


 if rows are deleted, gaps between key values will be caused by deletes and not by caching issues.
 the limits of a sequence will not be reached so quickly

As an added advantage, the use of this method will prevent migration problems with persistent values between repositories. This
method is easy to implement and does not imply a performance penalty while running the session.

41
INFORMATICA hand book

Clean up error handling

Dealing with transformation errors in advance can save a lot of time while executing a session. By default, every error row is
written into a bad file, which is a text based file containing all error rows. On top of that the error data is logged into the session
log that, if sufficient transformation errors occur, will explode in size. Both cause a lot of extra overhead and slow down a session
considerably.

In general, it is better to capture all data quality issues that could cause transformation errors in advance and write flawed records
into an error table.

Collecting Advanced Performance data

To really get into a session and understand exactly what is happening within a session, even more detailed performance data than
available into the session log can be captured. This can be done by, at session level, enabling the checkbox ‘Collect performance
data'.

This option will allow the developer to see detailed transformation based statistics in the Workflow Monitor while running the
session. When finished a performance file is written to the session log folder. For every source qualifier and target definition
performance details are provided, along with counters that show performance information about each transformation.

A number of counters are of particular interest to the developer:

 errorrows: should always be zero, if errors occur, remove the cause


 readfromdisk/writetodisk: indicates not enough cache memory is available. Increase the cache size until this counter is
no longer shown.

42
INFORMATICA hand book

Memory Optimization

Memory plays an important role when the Integration Service is running sessions. Optimizing cache sizes can really make a huge
difference in performance.

Buffer memory is used to hold source and target data while processing and is allocated when the session is initialized. DTM
Buffer is used to create the internal data structures. Buffer blocks are used to bring data in and out of the Integration Service.
Increasing the DTM buffer size will increase the amount of blocks available to the Integration Service. Ideally a buffer block can
contain 100 rows at the same time.
You can configure the amount of buffer memory yourself or you can configure the Integration Service to automatically calculate
buffer settings at run time. In stead of calculating all values manually or by trial and error, run the session once on auto and
retrieve the correct values from the session log:

Severity Timestamp Node Thread Message Code Message


INFO 12/30/2008 1:23:57 AM INFO8_ASPMIS003 MAPPING TM_6660 Total Buffer Pool size is 90000000 bytes and Block
size is 65536 bytes.

43
INFORMATICA hand book

The Integration server uses the index and data caches for mostly active transformations: aggregator, joiner, sorter, lookup,
rank…

Configuring the correct amount of cache is really necessary as the Integration Server will write and read from disk if not properly
sized. The index cache should be about half of the data cache. Cache files should be stored on a fast drive and surely not on a
network share.
The easiest way of calculating the correct cache sizes is by keeping the defaults on auto and examining the session log. In the
session log a line is written for every lookup that looks like this:

Severity Timestamp Node Thread Message Code Message


INFO 12/30/2008 1:26:11 AM INFO8_ASPMIS003 LKPDP_2:TRANSF_1_1 DBG_21641 LKP_VORIG_RECORD: Index
cache size = [12000800], Data cache size = [24002560]
Copy these values into the session properties. Be sure to verify the performance counters to validate no disk reads/writes are
done.
Sorter transformations need special attention concerning cache sizes as well. If not enough cache is available, the sorter will
require a multi pass sort, dropping the session performance. If so, a warning will be displayed in the session log:

TRANSF_1_1_1> SORT_40427 Sorter Transformation [srt_PRESTATIE] required 4-pass sort (1-pass temp I/O: 19578880
bytes). You may try to set the cache size to 27 MB or higher for 1-pass in-memory sort.

The maximum amount of memory used by transformation caches is set by two properties:

 Maximum Memory Allowed for Automatic Memory Attributes


 Maximum Percentage of Total Memory Allowed for Automatic Memory Attributes

The smaller of the two is used. When the value is 0, the automatic memory attributes are disabled. If this value is set too low, an
error will occur if a lookup with manually configure cache wants to allocate more memory then available. Keep in mind that
sessions can run in parallel: every session will try to allocate RAM-memory.

Ensure plenty of RAM-memory is available for the Integration Service. Do not assume that adding cache memory will increase
performance, at a certain point optimum performance is reached and adding further memory will not be beneficial.

Further session optimization

High precision

The high precision mode will allow using decimals up to a precision of 28 digits. Using this kind of precision will result in a
performance penalty in reading and writing data. It is therefore recommended to disable high precision when not really needed.
When turned off, decimals are converted to doubles that have a precision up to 15 digits.

Concurrent sessions

Depending on the available hardware, sessions can be run concurrently instead of sequential. At Integration Service level the
number of concurrent sessions can be set. This value is set default to 10. Depending on the number of CPU's at the PowerCenter
server and at source and target database this value can be increased or decreased. The next step is in designing a workflow that
will launch a number of sessions concurrently. By trial and error an optimal sampling can be achieved.

Session logging

The amount detail in a session log is determined by the tracing level. This level ranges from ‘Terse' to ‘Verbose Data'. For
debugging or testing purposes the ‘Verbose Data' option will trace every row that passes in the mapping in the session log. At
terse, only initialization information, error messages, and notification of rejected data are logged. It is quite clear the ‘Verbose
Data' option causes a severe performance penalty.

44
INFORMATICA hand book

For lookups, use the ‘Additional Concurrent Pipelines for Lookup Creation' to start building lookups as soon as the session is
initialized. By the time the lookups are needed in the session, the cache creation hopefully is already finished.

Partitioning

If a transformation thread is 100% busy, consider adding a partition point in the segment. Pipeline partitioning will allow for
parallel execution within a single session. A session will have multiple threads for processing data concurrently. Processing data
in pipeline partitions can improve performance, but only if enough CPU's are available. As a rule of thumb, 1.5 CPU's should be
available per partition. Adding a partition point will increase the number of pipeline stages. This means a transformation will
logically be used a number of times, so remember to multiply the cache memory of transformations, session,.. by the number of
partitions. Partitioning can be specified on sources / targets and the mapping transformations.

Using partitioning r equires the ‘Partition Option' in the PowerCenter license.

Pushdown Optimization

Pushdown optimization will push the transformation processing to the database level without extracting the data. This will reduce
the movement of data when source and target are in the same database instance. Possibly, more optimal specific database
processing can be used to even further enhancing performance. The metadata en lineage however is kept in PowerCenter.

Three different options are possible:

 Partial pushdown optimization to source: one or more transformations can be processed in the source
 Partial pushdown optimization to target: one or more transformations can be processed in the target
 Full pushdown optimization: all transformations can be processed in the database A number of transformations are not
supported for pushdown: XML, ranker, router, Normalizer, Update Strategy, …
Pushdown optimization can be used with sessions with multiple partitions, if the partition types are pass-through of key range
partitioning. You can configure a session for pushdown optimization in the session properties. Use the Pushdown Optimization
Viewer to examine the transformations that can be pushed to the database. Using pushdown requires the ‘Pushdown Optimization
Option' in the PowerCenter license.

Architecture

64-Bit PowerCenter versions will allow better memory usage as the 2GB limitation is removed. When PowerCenter 8 is run on a
grid, the workflows can be configured to use resources efficiently and maximize scalability. Within a grid, tasks are distributed to
nodes. To improve performance on a grid, the network bandwidth between the nodes is of importance as a lot of data is
transferred between the nodes. This data should always be stored on local disks for optimal performance. This includes the
caches and any source and target file. Of course even 64-bit computing will not help if the system is not properly setup. Make
sure plenty of disk space is available at the PowerCenter 8 server.

45
INFORMATICA hand book

For optimal performance, consider running the Integration service in ASCII data movement mode when all sources and targets
use 7 or 8-bit ASCII as UNICODE can take up to 16 bits.

The repository database should be located on the PowerCenter machine. If not, the repository database should be physically
separated from any target or source database. This will prevent the same database machine is writing to a target while reading
from the repository. Always use native connections over ODBC connections as they are a lot faster. Maximize the use of parallel
operations on the database. The use of parallelism will cut execution times considerably. Remove any other application from the
PowerCenter server apart from the repository database installation.

Increase the database network packet size to further improve performance. For Oracle this can be done in the listener.ora and
tnsnames.ora. Each database vender has some specific options that can be beneficial for performance. For Oracle, the use of IPC
protocol over TCP will result in a performance gain of at least by a factor 2 to 6. Inter Process Control (IPC) will remove the
network layer between the client and Oracle database-server. This can only be used if the database is residing on the same
machine as the PowerCenter 8 server. Check the product documentation for further documentation.

By careful load monitoring of the target/source databases and the servers of PowerCenter and databases while running a session,
potential bottlenecks at database or system level can be identified. Perhaps the database memory is insufficient? Perhaps too
much swapping is occurring on the PowerCenter 8 server ? Perhaps the CPU's are overloaded ?
The tuning of servers and database is just as important as delivering an optimized mapping and should not be ignored. Tuning a
system for a datawarehouse poses different challenges then tuning a system for an OLTP-application. Try to i2nvolve DBA's and
admins as soon as possible in this process so they fully understand the sensitivities involved with data warehousing.

Development Guidelines & UTP


The starting point of the development is the logical model created by the Data Architect. This logical model forms the foundation
for metadata, which will be continuously be maintained throughout the Data Warehouse Development Life Cycle (DWDLC).
The logical model is formed from the requirements of the project. At the completion of the logical model technical
documentation defining the sources, targets, requisite business rule transformations, mappings and filters. This documentation
serves as the basis for the creation of the Extraction, Transformation and Loading tools to actually manipulate the data from the
applications sources into the Data Warehouse/Data Mart.

To start development on any data mart you should have the following things set up by the Informatica Load Administrator

 Informatica Folder. The development team in consultation with the BI Support Group can decide a three-letter
code for the project, which would be used to create the informatica folder as well as Unix directory structure.
 Informatica Userids for the developers
 Unix directory structure for the data mart.
 A schema XXXLOAD on DWDEV database.

Transformation Specifications

Before developing the mappings you need to prepare the specifications document for the mappings you need to develop. A
good template is placed in the templates folder You can use your own template as long as it has as much detail or more than that
which is in this template.
While estimating the time required to develop mappings the thumb rule is as follows.
 Simple Mapping – 1 Person Day
 Medium Complexity Mapping – 3 Person Days
 Complex Mapping – 5 Person Days.
Usually the mapping for the fact table is most complex and should be allotted as much time for development as possible.

46
INFORMATICA hand book

Data Loading from Flat Files


It’s an accepted best practice to always load a flat file into a staging table before any transformations are done on the data in the
flat file.
Always use LTRIM, RTRIM functions on string columns before loading data into a stage table.
You can also use UPPER function on string columns but before using it you need to ensure that the data is not case sensitive
(e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the delimiter is not a character which could appear in the data itself.
Avoid using comma-separated files. Tilde (~) is a good delimiter to use.

Failure Notification
Once in production your sessions and batches need to send out notification when then fail to the Support team. You can do this
by configuring email task in the session level.

Naming Conventions and usage of Transformations


Port Standards:
Input Ports – It will be necessary to change the name of input ports for lookups, expression and filters where ports might have
the same name. If ports do have the same name then will be defaulted to having a number after the name. Change this default
to a prefix of “in_”. This will allow you to keep track of input ports through out your mappings.
Prefixed with: IN_

Variable Ports – Variable ports that are created within an expression


Transformation should be prefixed with a “v_”. This will allow the developer to distinguish between input/output and variable
ports. For more explanation of Variable Ports see the section “VARIABLES”.
Prefixed with: V_

Output Ports – If organic data is created with a transformation that will be mapped to the target, make sure that it has the same
name as the target port that it will be mapped to.
Prefixed with: O_

Quick Reference

Object Type Syntax


Folder XXX_<Data Mart Name>
Mapping m_fXY_ZZZ_<Target Table Name>_x.x
Session s_fXY_ZZZ_<Target Table Name>_x.x
Batch b_<Meaningful name representing the sessions inside>
Source Definition <Source Table Name>
Target Definition <Target Table Name>
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Joiner JNR_<Names of Joined Tables>
Lookup LKP_<Lookup Table Name>
Normalizer Norm_<Source Name>
Rank RNK_<Purpose>
Router RTR_<Purpose>

47
INFORMATICA hand book

Sequence Generator SEQ_<Target Column Name>


Source Qualifier SQ_<Source Table Name>
Stored Procedure STP_<Database Name>_<Procedure Name>
Update Strategy UPD_<Target Table Name>_xxx
Mapplet MPP_<Purpose>
Input Transformation INP_<Description of Data being funneled in>
Output Tranformation OUT_<Description of Data being funneled out>
Database Connections XXX_<Database Name>_<Schema Name>

Unit Test Cases (UTC):

QA Life Cycle consists of 5 types of

Testing regimens:
1. Unit Testing
2. Functional Testing
3. System Integration Testing
4. User Acceptance Testing

Unit testing: The testing, by development, of the application modules to verify each unit (module) itself meets the accepted
user requirements and design and development standards

Functional Testing: The testing of all the application’s modules individually to ensure the modules, as released from
development to QA, work together as designed and meet the accepted user requirements and system standards

System Integration Testing: Testing of all of the application modules in the same environment, database instance, network
and inter-related applications, as it would function in production. This includes security, volume and stress testing.

User Acceptance Testing(UAT): The testing of the entire application by the end-users ensuring the application functions as set
forth in the system requirements documents and that the system meets the business needs.

UTP Template:

Actual Pass or Tested


Results, Fail By
Step Description Test Conditions Expected Results (P or
# F)
SAP-
CMS
Interfa
ces

48
INFORMATICA hand book

Actual Pass or Tested


Results, Fail By
Step Description Test Conditions Expected Results (P or
# F)
1 Check for the SOURCE: Both the source and target table Should be Pass Stev
total count of load record count should match. same as the
records in SELECT count(*) FROM expected
source tables XST_PRCHG_STG
that is fetched
and the total
records in the TARGET:
PRCHG table
for a perticular Select count(*) from
session _PRCHG
timestamp

2 Check for all select PRCHG_ID, Both the source and target table Should be Pass Stev
the target PRCHG_DESC, record values should return zero same as the
columns DEPT_NBR, records expected
whether they EVNT_CTG_CDE,
are getting PRCHG_TYP_CDE,
populated PRCHG_ST_CDE,
correctly with from T_PRCHG
source data. MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG

3 Check for Identify a one record from the It should insert a record into Should be Pass Stev
Insert strategy source which is not in target target table with source data same as the
to load records table. Then run the session expected
into target
table.

4 Check for Identify a one Record from It should update record into Should be Pass Stev
Update the source which is already target table with source data for same as the
strategy to present in the target table with that existing record expected
load records different PRCHG_ST_CDE
into target or PRCHG_TYP_CDE values
table. Then run the session

49
INFORMATICA hand book

UNIX
How strong you are in UNIX?

1) I have Unix shell scripting knowledge whatever informatica required like


If we want to run workflows in Unix using PMCMD.
Below is the script to run workflow using Unix.
cd /pmar/informatica/pc/pmserver/

/pmar/informatica/pc/pmserver/pmcmd startworkflow -u $INFA_USER -p $INFA_PASSWD -s $INFA_SERVER:


$INFA_PORT -f $INFA_FOLDER -wait $1 >> $LOG_PATH/$LOG_FILE

2) And if we suppose to process flat files using informatica but those files were exists in remote server then we have to write
script to get ftp into informatica server before start process those files.
3) And also file watch mean that if indicator file available in the specified location then we need to start our informatica jobs
otherwise will send email notification using
Mail X command saying that previous jobs didn’t completed successfully something like that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can Google and get the solution
implement the same.

Basic Commands:

Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)
cat file1 >> file2---it will append to file 2

o > will redirect output from standard out (screen) to file or printer or whatever you like.

o >> Filename will append at the end of a file called filename.

o < will redirect input to a process or command.

How to create zero byte file?

Touch filename (touch is the command to create zero byte file)

How to find all processes that are running

ps -A

Crontab command
Crontab command is used to schedule jobs. You must have permission to run this command by Unix Administrator. Jobs are
scheduled in five numbers, as follows.

50
INFORMATICA hand book

Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-6) (0 is Sunday)

so for example you want to schedule a job which runs from script named backup jobs in /usr/local/bin directory on sunday (day
0) at 11.25 (22:25) on 15th of month. The entry in crontab file will be. * represents all values.

25 22 15 * 0 /usr/local/bin/backup_jobs

The * here tells system to run this each month.


Syntax is
crontab file So a create a file with the scheduled jobs as above and then type crontab filename .This will scheduled the jobs.

Below cmd gives total no of users logged in at this time

who | wc -l

echo "are total number of people logged in at this time."

Below cmd will display only directories

$ ls -l | grep '^d'

Pipes:

The pipe symbol "|" is used to direct the output of one command to the input of another.

Moving, renaming, and copying files:

Cp file1 file2 copy a file

mv file1 newname move or rename a file

mv file1 ~/AAA/ move file1 into sub-directory AAA in your home directory.

rm file1 [file2 ...] remove or delete a file

To display hidden files

ls –a

Viewing and editing files:

cat filename Dump a file to the screen in ascii.

More file name to view the file content

head filename Show the first few lines of a file.

head -5 filename Show the first 5 lines of a file.

tail filename Show the last few lines of a file.

51
INFORMATICA hand book

Tail -7 filename Show the last 7 lines of a file.

Searching for files :

find command

find -name aaa.txt Finds all the files named aaa.txt in the current directory or

any subdirectory tree.

find / -name vimrc Find all the files named 'vimrc' anywhere on the system.

find /usr/local/games -name "*xpilot*"

Find all files whose names contain the string 'xpilot' which

exist within the '/usr/local/games' directory tree.

You can find out what shell you are using by the command:

echo $SHELL

If file exists then send email with attachment.

if [[ -f $your_file ]]; then


uuencode $your_file $your_file|mailx -s "$your_file exists..." your_email_address
fi

Below line is the first line of the script

#!/usr/bin/sh

Or

#!/bin/ksh

What does #! /bin/sh mean in a shell script?

It actually tells the script to which interpreter to refer. As you know, bash shell has some specific functions that other shell does
not have and vice-versa. Same way is for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in your shell script.

Interactive History

A feature of bash and tcsh (and sometimes others) you can use the up-arrow keys to access your previous commands, edit
them, and re-execute them.

Basics of the vi editor

Opening a file

52
INFORMATICA hand book

Vi filename

Creating text

Edit modes: These keys enter editing modes and type in the text

of your document.

i Insert before current cursor position

I Insert at beginning of current line

a Insert (append) after current cursor position

A Append to end of line

r Replace 1 character

R Replace mode

<ESC> Terminate insertion or overwrite mode

Deletion of text

x Delete single character

dd Delete current line and put in buffer

:w Write the current file.

:w new.file Write the file to the name 'new.file'.

:w! existing.file Overwrite an existing file with the file currently being edited.

:wq Write the file and quit.

:q Quit.

:q! Quit with no changes.

Shell Script Senario:


“How we can loop informatica workflows when we have to run the same jobs multiple times”. A common scenario for this
is a History load. In order to minimize system load and achieve better performance, we can split the history load by
weekly or monthly time periods. Thus in that case we have to run same workflow “n” numbers of time.

Solution:

1. Creating a Workflow List file:

Create a Workflow list file with “.lst” extension. Add all the workflows you might want to run in the appropriate sequence.

File Format: <Informatica_Folder_name>, <Informatica_Workflow_name>

53
INFORMATICA hand book

Example: wfl_ProcessName.lst
Folder_name, Workflow_Name1
Folder_name, Workflow_Name2
Folder_name, Workflow_Name3

2. Creating a Looping file:

Create a Data File with the Workflow list and Number of Loops (in other words number of re-runs needed for the Workflow list)
as a comma separated file.

File Format: <Workflow List file Without Extension>, <Number of loops>

Example: EDW_ETLLOOP.dat

wfl_ProcessName1, 5
wfl_ProcessName2, 10
wfl_ProcessName3, 2

3. Call Script W_WorkflowLooping.ksh:


This script is use to execute the workflow list in required number of loops. For example to process a History load we have to
run the same sequence of workflow’s “n” numbers of time.
This process will run the workflow list required number of times.

An added feature in the script is an Optional Termination file which can be created into the given directory to force full
termination the looping process. The advantage of Optional termination File is that users can stop the looping process in
case other Processes are being affected due to the looping jobs.

Processing Step’s
 Read the Parameter values and assign them to Variables.
 Validate that Parameter is not null or empty string. If empty exit.
 If Data file exist read the List file and Number of Loops.
o If Job termination file exist then Exit
o Else
 Call W_CallWorkflow.sh script and pass <workflow list> as variable.
 Loop previous step till the ‘n’ number of loops.
 Remove the data file.

#!/bin/ksh
###########################################################################
# FileName: W_WorkflowLooping.ksh
# Parameters: Parm $1 = Looping File name (no extension)

# Description: Performs looping of given Workflow List. Optional File can


# be used to terminate the looping process.
# Warnings: Program name variable must be defined.
#
# Date: 15-Aug-2008

########################## MODIFICATIONS LOG ###########################


# Changed By Date Description
# ------------- -------------- --------------------
#Manish Kothari 08/15/2008 Initial Version
###########################################################################
# sets the Environment variables and functions used in the script.
###########################################################################

54
INFORMATICA hand book

#. /scripts/env_variable.ksh

###########################################################################
# Defines program variables
###########################################################################

DATA_FILE=$1
DATA_FILE_EXTN='dat'
LOG_FILE_SUFF='Log.log'
TERM_FILE_SUFF='TerminationInd.dat'

###########################################################################
# Check if the Data File Name is passed as a Parameter
###########################################################################
if [ -z $DATA_FILE ]
then
echo "!!! W_WorkflowLooping: $DATE ERROR - Data File Name Parameter not provided..!!!"
exit 1
fi

DATA_FILE_NAME=$DATA_FILE.$DATA_FILE_EXTN
LOG_FILE_NAME=$DATA_FILE$LOG_FILE_SUFF
JOB_TERMINATION_IND_FILE_NAME=$DATA_FILE$TERM_FILE_SUFF

DATA_FILE=/informatica/Loop_Dir/$DATA_FILE_NAME
LOG_FILE=/inofrmatica/Unix_Log/$LOG_FILE_NAME
JOB_TERMINATION_IND_FILE=/informatica/Loop_Dir/$JOB_TERMINATION_IND_FILE_NAME

###########################################################################
# Update the status and log file - script is starting.
###########################################################################
echo "***** Starting script $0 on `date`." >> $LOG_FILE

###########################################################################
# Check whether the data files exists
###########################################################################
if [ -s $DATA_FILE ]
then
while read member
do
wf_list_file_name=`echo $member | awk -F"," '{print $1}'`
loop_count=`echo $member | awk -F"," '{print $2}'`
while [ $loop_count -gt 0 ]
do
if [ -f $JOB_TERMINATION_IND_FILE ]
then
rm $JOB_TERMINATION_IND_FILE
# rm $DATA_FILE
echo "Indicator file for terminating the load found in /informatica/Loop_Dir/ on
`date`" >> $LOG_FILE
exit 0
fi

55
INFORMATICA hand book

#############################################################################
# Executing the workflows
#############################################################################
informatica/Scripts/W_CallWorkflow.sh $wf_list_file_name
PMRETCODE=$?
if [ "$PMRETCODE" -ne 0 ]
then
echo "Error in $wf_name Load on `date`" >> $LOG_FILE
exit 1
fi
loop_count=`expr $loop_count - 1`
done
done <$DATA_FILE
else
echo "Source Parameter file $DATA_FILE is missing on `date`" >> $LOG_FILE
exit 1
fi

###########################################################################
# Updates the status and log file - script is ending.
###########################################################################
echo "***** Ending script $0 with no errors on `date`.\n" >> $LOG_FILE
rm $DATA_FILE
exit 0

4. Call Script W_CallWorkFlow.sh:


This script is used to execute the workflows from the .lst file. In case of any error it creates a Restart file and starts from this
point when the process is re run.

This script requires Workflow list file name (without extension) as a parameter.

Processing Steps:
 Read the Parameter values and assign them to Variables.
 Validate that Parameter is not null or an empty string. If empty exit.
 Validate that workflow list file exists and is not a zero byte. If yes then exit.
 Assigning a name to restart and Workflow List Log file.
 Read the folder name and Worklflow name from .lst file
o If Restart file is not a zero byte
 Then loop till the restarting workflow name matches the workflow list and then execute
the workflow with pmcmd command.
o Else
 Run the workflow with pmcmd command.
 If any error occurs create a restart file and exit.

 Loop Previous step till all the workflow from the lst file have been executed.

Ex: W_CallWorkFlow.sh <workflow list file name without .lst>

#!/bin/ksh
###########################################################################

56
INFORMATICA hand book

# FileName: W_CallWorkFlow.sh
# Parameters: Parm $1 = Workflow List Filename (no extension)
#
# Purpose: Provides the ability to call the PMCMD command from the enterprise
# Scheduler or from Informatica Command Tasks.
#
# Warnings:
#
# Date: 08/28/2007
###########################################################################
########################### MODIFICATIONS LOG #############################
# Changed By Date Description
# ---------- -------- -----------
# Manish Kothari 08/15/2008 Initial Version
###########################################################################
#Include the environment file if any.
#. /scripts/xxx_env.ksh

###########################################################################
# Define Variables.
###########################################################################
DATE=`date '+ %Y-%m-%d %H:%M:%S'`
WORKFLOW_LIST_FILE=$1
WF_LIST_EXTN='lst'
WORKFLOW_LIST_DIR='informatica/WORKFLOW_LISTS/'
UNIXLOG_DIR='informatica/UNIXLOG_DIR'
INFA_REP='infarep:4400'
INFA_USER='USER_NAME'
INFA_PWD='INFA_PWD'
WF_LOG_FILE='informatica/LOGFILES_DIR'

###########################################################################
# Check if the WorkFlow List File Name is Passed as a Parameter
###########################################################################
if [ -z $WORKFLOW_LIST_FILE ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List Parameter not provided..!!!"
exit 1
fi
WORKFLOW_LIST=$WORKFLOW_LIST_DIR/$WORKFLOW_LIST_FILE.$WF_LIST_EXTN

###########################################################################
# Make sure that the WorkFolw List File is a Valid File and is
# Not Zero Bytes
###########################################################################

if [ ! -s $WORKFLOW_LIST ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR - Workflow List File does not exist or is Zero
Bytes!!!!"
exit 1
fi
###########################################################################

57
INFORMATICA hand book

# Define the Variables that will be used in the Script


###########################################################################

RESTART_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.rst
WF_LOG_FILE=$UNIXLOG_DIR/$WORKFLOW_LIST_FILE.log
RESTART_WF_FLAG=1

while read WF_LIST_FILE_LINE


do
###########################################################################
# Get the INFA Folder and WF Name from the WorkStream File.
# This file is Comma Delimited and has a .lst extension
###########################################################################

INFA_FOLDER=`echo $WF_LIST_FILE_LINE|cut -f1 -d','`


WF_NAME=`echo $WF_LIST_FILE_LINE|cut -f2 -d','`

###########################################################################
# Check if a Re-Start File Exists. If it does it means that the script has
# started after failing on a previous run. Be Careful while modifying the
# contents of this file. The script if restarted will start running WF's from POF
###########################################################################

if [ -s $RESTART_FILE ]
then
###########################################################################
# If re-start file exists use the WF Name in the Re-start file to determine
# which Failed workflow from a previous run needs to be re-started
# Already completed WF's in a Workstream will be skipped
###########################################################################

RESTART_WF=`cat $RESTART_FILE|cut -f2 -d','`


if [ $WF_NAME != $RESTART_WF ]
then
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Skipping $WF_NAME" >>
$WF_LOG_FILE
continue
else
if [ $RESTART_WF_FLAG -eq 1 ]
then
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Restarting at Workflow Name
$WF_NAME \n"
echo "!!! W_CallWorkFlow: $DATE RESTART DETECTED - Restarting at Workflow Name
$WF_NAME \n" >> $WF_LOG_FILE
RESTART_WF_FLAG=0
fi
fi
fi
echo "W_CallWorkFlow: $DATE STARTING execution of Workflows $WF_NAME in $INFA_FOLDER
using $WORKFLOW_LIST_FILE" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE

#-------------------------------------------------------------------------

58
INFORMATICA hand book

# Call Informatica pmcmd command with defined parameters.


#-------------------------------------------------------------------------
pmcmd startworkflow -u $INFA_USER -p $INFA_PWD -s $INFA_REP -f $INFA_FOLDER -wait $WF_NAME
PMRETCODE=$?

if [ "$PMRETCODE" -ne 0 ]
then
echo "!!! W_CallWorkFlow: $DATE ERROR encountered in $WF_NAME in $INFA_FOLDER \n" >>
$WF_LOG_FILE
echo "!!! W_CallWorkFlow: $DATE Restart file for this workstream is $RESTART_FILE \
n" >> $WF_LOG_FILE
###########################################################################
# Incase a WorkFlow Fails the WF Name and the INFA Folder are logged into
# the re-start File. If the script starts again the WF mentioned in
# this file will be started
###########################################################################
echo "$INFA_FOLDER,$WF_NAME" > $RESTART_FILE
exit 1
fi
rm $RESTART_FILE
done < $WORKFLOW_LIST

if [ -f $RESTART_FILE ]
then
echo "!!! Problem either in Restart File or Workflow List. Please make sure WorkFlow Names
are correct in both Places" >> $WF_LOG_FILE
exit 1
fi

echo "************Ending Script W_CallWorkFlow.sh for the WorkStream


$WORKFLOW_LIST_FILE.lst************" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE
echo "\n" >> $WF_LOG_FILE
exit 0

Calling a stored proc from a command OR shell script


sqlplus -s user@connection_string/password<<END
execute procedure UPD_WORKER_ATTR_FLAG;
exit;
END

INFORMATICA TRANSFORMATIONS
59
INFORMATICA hand book

New features of INFORMATICA 9 compared to INFORMATICA 8.6

Informatica 9 empowers line-of-business managers and business analysts to identify bad data and fix it faster. Architecture wise
there are no differences between Informatica 8 and 9 but there are some new features added in powercenter 9.

New Client tools


Informatica 9 includes the Informatica Developer and Informatica Analyst client tools.
The Informatica Developer tool is eclipse-based and supports both data integration and data quality for enhanced productivity.
From here you can update/refine those same rules, and create composite data objects - e.g. Get customer details from a number of
different sources and aggregate these up to a Customer Data Object.
The Informatica Analyst tool is a browser-based tool for analysts, stewards and line of business managers. This tool supports
data profiling, specifying and validating rules (Scorecards), and monitoring data quality.

Informatica Administrator
The powercenter Administration Console has been renamed the Informatica Administrator.
The Informatica Administrator is now a core service in the Informatica Domain that is used to configure and manage all
Informatica Services, Security and other domain objects (such as connections) used by the new services.
The Informatica Administrator has a new interface. Some of the properties and configuration tasks from the powercenter
Administration Console have been moved to different locations in Informatica Administrator. The Informatica Administrator is
expanded to include new services and objects.
Cache Update in Lookup Transformation
You can update the lookup cache based on the results of an expression. When an expression is true, you can add to or update the
lookup cache. You can update the dynamic lookup cache with the results of an expression.
Database deadlock resilience
In previous releases, when the Integration Service encountered a database deadlock during a lookup, the session failed. Effective
in 9.0, the session will not fail. When a deadlock occurs, the Integration Service attempts to run the last statement in a lookup.
You can configure the number of retry attempts and time period between attempts.
Multiple rows return
Lookups can now be configured as an Active transformation to return Multiple Rows.We can configure the Lookup
transformation to return all rows that match a lookup condition. A Lookup transformation is an active transformation when it can
return more than one row for any given input row.
Limit the Session Log
You can limit the size of session logs for real-time sessions. You can limit the size by time or by file size. You can also limit the
number of log files for a session.
Auto-commit
We can enable auto-commit for each database connection. Each SQL statement in a query defines a transaction. A commit occurs
when the SQL statement completes or the next statement is executed, whichever comes first.
Passive transformation
We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL transformation runs in
passive mode, the SQL transformation returns one output row for each input row.
Connection management
Database connections are centralized in the domain. We can create and view database connections in Informatica Administrator,
Informatica Developer, or Informatica Analyst. Create, view, edit, and grant permissions on database connections in Informatica
Administrator.
Monitoring
We can monitor profile jobs, scorecard jobs, preview jobs, mapping jobs, and SQL Data Services for each Data Integration
Service. View the status of each monitored object on the Monitoring tab of Informatica Administrator.
Deployment
We can deploy, enable, and configure deployment units in the Informatica Administrator. Deploy Deployment units to one or
more Data Integration Services. Create deployment units in Informatica Developer.
Model Repository Service
Application service that manages the Model repository. The Model repository is a relational database that stores the metadata for
projects created in Informatica Analyst and Informatica Designer. The Model repository also stores run-time and configuration
information for applications deployed to a Data.

60
INFORMATICA hand book

Data Integration Service


Application service that processes requests from Informatica Analyst and Informatica Developer to preview or run data profiles
and mappings. It also generates data previews for SQL data services and runs SQL queries against the virtual views in an SQL
data service. Create and enable a Data Integration Service on the Domain tab of Informatica Administrator.
XML Parser
The XML Parser transformation can validate an XML document against a schema. The XML Parser transformation routes
invalid XML to an error port. When the XML is not valid, the XML Parser transformation routes the XML and the error
messages to a separate output group that We can connect to a target.
Enforcement of licensing restrictions
Powercenter will enforce the licensing restrictions based on the number of CPUs and repositories.
Also Informatica 9 supports data integration for the cloud as well as on premise. You can integrate the data in cloud applications,
as well as run Informatica 9 on cloud infrastructure.

Informatica Transformations

A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations
that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data.
Transformations can be of two types:
 Active Transformation
An active transformation can change the number of rows that pass through the transformation, change the transaction
boundary, can change the row type. For example, Filter, Transaction Control and Update Strategy are active
transformations.
The key point is to note that Designer does not allow you to connect multiple active transformations or an active and a
passive transformation to the same downstream transformation or transformation input group because the Integration
Service may not be able to concatenate the rows passed by active transformations However, Sequence Generator
transformation(SGT) is an exception to this rule. A SGT does not receive data. It generates unique numeric values. As a
result, the Integration Service does not encounter problems concatenating rows passed by a SGT and an active
transformation.
 Passive Transformation.
A passive transformation does not change the number of rows that pass through it, maintains the transaction boundary,
and maintains the row type.
The key point is to note that Designer allows you to connect multiple transformations to the same downstream
transformation or transformation input group only if all transformations in the upstream branches are passive. The
transformation that originates the branch can be active or passive.
Transformations can be Connected or UnConnected to the data flow.
 Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
 UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another
transformation, and returns a value to that transformation.

Aggregator Transformation
Aggregator transformation performs aggregate funtions like average, sum, count etc. on multiple rows or groups. The Integration
Service performs these calculations as it reads and stores data group and row data in an aggregate cache. It is an Active &
Connected transformation.

Difference b/w Aggregator and Expression Transformation?


Expression transformation permits you to perform calculations row by row basis only. In Aggregator you can perform
calculations on groups.
Aggregator transformation has following ports State, State_Count, Previous_State and State_Counter.
Components: Aggregate Cache, Aggregate Expression, Group by port, Sorted input.
Aggregate Expressions: are allowed only in aggregate transformations. can include conditional clauses and non-aggregate
functions. can also include one aggregate function nested into another aggregate function.
Aggregate Functions: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE

Application Source Qualifier Transformation


Represents the rows that the Integration Service reads from an application, such as an ERP source, when it runs a session.It is an
Active & Connected transformation.
Custom Transformation

61
INFORMATICA hand book

It works with procedures you create outside the designer interface to extend PowerCenter functionality. calls a procedure from a
shared library or DLL. It is active/passive & connected type.
You can use CT to create T. that require multiple input groups and multiple output groups.
Custom transformation allows you to develop the transformation logic in a procedure. Some of the PowerCenter transformations
are built using the Custom transformation. Rules that apply to Custom transformations, such as blocking rules, also apply to
transformations built using Custom transformations. PowerCenter provides two sets of functions called generated and API
functions. The Integration Service uses generated functions to interface with the procedure. When you create a Custom
transformation and generate the source code files, the Designer includes the generated functions in the files. Use the API
functions in the procedure code to develop the transformation logic.
Difference between Custom and External Procedure Transformation? In Custom T, input and output functions occur
separately.The Integration Service passes the input data to the procedure using an input function. The output function is a
separate function that you must enter in the procedure code to pass output data to the Integration Service. In contrast, in the
External Procedure transformation, an external procedure function does both input and output, and its parameters consist of all
the ports of the transformation.

Data Masking Transformation


Passive & Connected. It is used to change sensitive production data to realistic test data for non production environments. It
creates masked data for development, testing, training and data mining. Data relationship and referential integrity are maintained
in the masked data.
For example: It returns masked value that has a realistic format for SSN, Credit card number, birthdate, phone number, etc. But is
not a valid value. Masking types: Key Masking, Random Masking, Expression Masking, Special Mask format. Default is no
masking.

Expression Transformation
Passive & Connected. are used to perform non-aggregate functions, i.e to calculate values in a single row. Example: to calculate
discount of each product or to concatenate first and last names or to convert date to a string field.
You can create an Expression transformation in the Transformation Developer or the Mapping Designer. Components:
Transformation, Ports, Properties, Metadata Extensions.

External Procedure
Passive & Connected or Unconnected. It works with procedures you create outside of the Designer interface to extend
PowerCenter functionality. You can create complex functions within a DLL or in the COM layer of windows and bind it to
external procedure transformation. To get this kind of extensibility, use the Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter. You must be an experienced programmer to use TX and use multi-threaded code in external
procedures.

Filter Transformation
Active & Connected. It allows rows that meet the specified filter condition and removes the rows that do not meet the condition.
For example, to find all the employees who are working in NewYork or to find out all the faculty member teaching Chemistry in
a state. The input ports for the filter must come from a single transformation. You cannot concatenate ports from more than one
transformation into the Filter transformation. Components: Transformation, Ports, Properties, Metadata Extensions.

HTTP Transformation
Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP
transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data to the
target or downstream transformation in the mapping.
Authentication types: Basic, Digest and NTLM. Examples: GET, POST and SIMPLE POST.

Java Transformation
Active or Passive & Connected. It provides a simple native programming interface to define transformation functionality with the
Java programming language. You can use the Java transformation to quickly define simple or moderately complex
transformation functionality without advanced knowledge of the Java programming language or an external Java development
environment.

Joiner Transformation
Active & Connected. It is used to join data from two related heterogeneous sources residing in different locations or to join data
from the same source. In order to join two sources, there must be at least one or more pairs of matching column between the
sources and a must to specify one source as master and the other as detail. For example: to join a flat file and a relational source
or to join two flat files or to join a relational source and a XML source.
The Joiner transformation supports the following types of joins:
 Normal

62
INFORMATICA hand book

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
 Master Outer
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source
and the matching rows from the master source.
 Detail Outer
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards
the unmatched rows from the detail source.
 Full Outer
Full outer join keeps all rows of data from both the master and detail sources.
Limitations on the pipelines you connect to the Joiner transformation:
*You cannot use a Joiner transformation when either input pipeline contains an Update Strategy transformation.
*You cannot use a Joiner transformation if you connect a Sequence Generator transformation directly before the Joiner
transformation.

Lookup Transformation
Default Passive (can be configured active) & Connected or UnConnected. It is used to look up data in a flat file, relational table,
view, or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup
condition. Later returned values can be passed to other transformations. You can create a lookup definition from a source
qualifier and can also use multiple Lookup transformations in a mapping.
You can perform the following tasks with a Lookup transformation:
*Get a related value. Retrieve a value from the lookup table based on a value in the source. For example, the source has an
employee ID. Retrieve the employee name from the lookup table.
*Perform a calculation. Retrieve a value from a lookup table and use it in a calculation. For example, retrieve a sales tax
percentage, calculate a tax, and return the tax to a target.
*Update slowly changing dimension tables. Determine whether rows exist in a target.
Lookup Components: Lookup source, Ports, Properties, Condition.
Types of Lookup:
1) Relational or flat file lookup.
2) Pipeline lookup.
3) Cached or uncached lookup.
4) connected or unconnected lookup.

Normalizer Transformation
Active & Connected. The Normalizer transformation processes multiple-occurring columns or multiple-occurring groups of
columns in each source row and returns a row for each instance of the multiple-occurring data. It is used mainly with COBOL
sources where most of the time data is stored in de-normalized format.
You can create following Normalizer transformation:
*VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL
source. VSAM stands for Virtual Storage Access Method, a file access method for IBM mainframe.
*Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files.
This is default when you create a normalizer transformation.
Components: Transformation, Ports, Properties, Normalizer, Metadata Extensions.

Rank Transformation
Active & Connected. It is used to select the top or bottom rank of data. You can use it to return the largest or smallest numeric
value in a port or group or to return the strings at the top or the bottom of a session sort order. For example, to select top 10
Regions where the sales volume was very high or to select 10 lowest priced products. As an active transformation, it might
change the number of rows passed through it. Like if you pass 100 rows to the Rank transformation, but select to rank only the
top 10 rows, passing from the Rank transformation to another transformation. You can connect ports from only one
transformation to the Rank transformation. You can also create local variables and write non-aggregate expressions.

Router Transformation
Active & Connected. It is similar to filter transformation because both allow you to apply a condition to test data. The only
difference is, filter transformation drops the data that do not meet the condition whereas router has an option to capture the data
that do not meet the condition and route it to a default output group.
If you need to test the same input data based on multiple conditions, use a Router transformation in a mapping instead of creating
multiple Filter transformations to perform the same task. The Router transformation is more efficient.

63
INFORMATICA hand book

Sequence Generator Transformation


Passive & Connected transformation. It is used to create unique primary key values or cycle through a sequential range of
numbers or to replace missing primary keys.
It has two output ports: NEXTVAL and CURRVAL. You cannot edit or delete these ports. Likewise, you cannot add ports to the
transformation. NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is
the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
You can make a Sequence Generator reusable, and use it in multiple mappings. You might reuse a Sequence Generator when you
perform multiple loads to a single target.
For non-reusable Sequence Generator transformations, Number of Cached Values is set to zero by default, and the Integration
Service does not cache values during the session.For non-reusable Sequence Generator transformations, setting Number of
Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the
session. It also causes sections of skipped values since unused cached values are discarded at the end of each session.
For reusable Sequence Generator transformations, you can reduce Number of Cached Values to minimize discarded values,
however it must be greater than one. When you reduce the Number of Cached Values, you might increase the number of times
the Integration Service accesses the repository to cache values during the session.

Sorter Transformation
Active & Connected transformation. It is used sort data either in ascending or descending order according to a specified sort key.
You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be
distinct. When you create a Sorter transformation in a mapping, you specify one or more ports as a sort key and configure each
sort key port to sort in ascending or descending order.

Source Qualifier Transformation


Active & Connected transformation. When adding a relational or a flat file source definition to a mapping, you need to connect it
to a Source Qualifier transformation. The Source Qualifier is used to join data originating from the same source database, filter
rows when the Integration Service reads source data, Specify an outer join rather than the default inner join and to specify sorted
ports.
It is also used to select only distinct values from the source and to create a custom query to issue a special SELECT statement for
the Integration Service to read source data .

SQL Transformation
Active/Passive & Connected transformation. The SQL transformation processes SQL queries midstream in a pipeline. You can
insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL
transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an
SQL editor. The SQL transformation processes the query and returns rows and database errors.

Stored Procedure Transformation


Passive & Connected or UnConnected transformation. It is useful to automate time-consuming tasks and it is also used in error
handling, to drop and recreate indexes and to determine the space in database, a specialized calculation etc. The stored procedure
must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source,
target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL
statements and control statements, user-defined variables and conditional statements.

Transaction Control Transformation


Active & Connected. You can control commit and roll back of transactions based on a set of rows that pass through a Transaction
Control transformation. Transaction control can be defined within a mapping or within a session.
Components: Transformation, Ports, Properties, Metadata Extensions.

Union Transformation
Active & Connected. The Union transformation is a multiple input group transformation that you use to merge data from multiple
pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union
transformation does not remove duplicate rows.
Rules
1) You can create multiple input groups, but only one output group.
2) All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all
groups.
3) The Union transformation does not remove duplicate rows. To remove duplicate rows, you must add another transformation
such as a Router or Filter transformation.
4) You cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.

64
INFORMATICA hand book

5) The Union transformation does not generate transactions.


Components: Transformation tab, Properties tab, Groups tab, Group Ports tab.

Unstructured Data Transformation


Active/Passive and connected. The Unstructured Data transformation is a transformation that processes unstructured and semi-
structured file formats, such as messaging formats, HTML pages and PDF documents. It also transforms structured formats such
as ACORD, HIPAA, HL7, EDI-X12, EDIFACT, AFP, and SWIFT.
Components: Transformation, Properties, UDT Settings, UDT Ports, Relational Hierarchy.

Update Strategy Transformation


Active & Connected transformation. It is used to update data in target table, either to maintain history of data or recent changes.
It flags rows for insert, update, delete or reject within a mapping.

XML Generator Transformation


Active & Connected transformation. It lets you create XML inside a pipeline. The XML Generator transformation accepts data
from multiple ports and writes XML through a single output port.

XML Parser Transformation


Active & Connected transformation. The XML Parser transformation lets you extract XML data from messaging systems, such
as TIBCO or MQ Series, and from other sources, such as files or databases. The XML Parser transformation functionality is
similar to the XML source functionality, except it parses the XML in the pipeline.

XML Source Qualifier Transformation


Active & Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data
elements that the Informatica Server reads when it executes a session with XML sources. has one input or output port for every
column in the XML source.

External Procedure Transformation


Active & Connected/UnConnected transformation. Sometimes, the standard transformations such as Expression transformation
may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a
dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.

Advanced External Procedure Transformation


Active & Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer
interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as
sorting and aggregation, which require all input rows to be processed before emitting any output rows.

Informatica Lookups

Lookups are expensive in terms of resources and time.

65
INFORMATICA hand book

A set of tips about how to setup lookup transformations would dramatically improve the main constrains such as time and
performance.
In this article you will learn about the following topics:
- Lookup cache
- Persistent lookup cache
- Unconnected lookup
- Order by clause within SQL

Lookup Cache
Problem:
For non-cached lookups, Informatica hits the database and bring the entire set of rows for each record coming from the
source. There is an impact in terms of time and resources. If there are 2 Million rows from the source qualifier,
Informatica hits 2 Million times the database for the same query.
Solution:
When a lookup is cached: Informatica queries the database, brings the whole set of rows to the Informatica server and
stores in a cache file. When this lookup is called next time, Informatica uses the file cached. As a result, Informatica
saves the time and the resources to hit the database again.
When to cache a lookup?
As a general rule, we will use lookup cache when the following condition is satisfied:
N>>M
N is the number of records from the source
M is the number of records retrieved from the lookup
Note: Remember to implement database index on the columns used in the lookup condition to provide better
performance in non-cached lookups.

Persistent Lookup Cache


Problem:
Informatica cache the lookups by default. Let’s consider the following scenario: A lookup table is used many times in
different mappings. In each Lookup transformation, Informatica builds the same lookup cache table over and over
again. Do we need to build the lookup cache every time for each lookup?
Solution:
It is possible to build the cache file once instead of creating the same cache file N-times.
Just using persistent cache option will allow Informatica to save resources and time for something done before.
Check the following parameters in the transformation to use Persistent Lookup cache:
- Lookup caching enabled
- Lookup cache persistent

66
INFORMATICA hand book

Figure 1: Cache Persistent Enabled

From now onwards, the same cache file will be used in all the consecutive runs, saving time building the cache file.
However, the lookup data might change and then the cache must be refreshed by either deleting the cache file or
checking the option “Re-cache from lookup source”.

Figure 2:Re-cache from Lookup Source Enabled

In case of using a lookup reusable in multiple mappings we will have 1 mapping with “Re-cache” option enabled while
others will remain with the “Re-cache” option disabled. Whenever the cache needs to be refreshed we just need to run
the first mapping.
Note:Take into account that it is necessary to ensure data integrity in long run ETL process when underlying tables
change frequently. Furthermore, Informatica Power Center is not able to create larger files than 2GB. In case of a file
exceeds 2GB, Informatica will create multiple cache files. Using multiple files will decrease the performance. Hence,
we might consider joining the lookup source table in the database.

Unconnected lookup
Problem:
Imagine the following mapping with 1,000,000 records retrieved from the Source Qualifier:

67
INFORMATICA hand book

Figure 3: Connected Lookup Transformation


Suppose out of a million records, the condition is satisfied 10% of the amount of records. In case of connected lookup,
the lookup will be called 900,000 times even there isn’t any match.
Solution:
It is possible calling the Lookup transformation only when the condition is satisfied. As a result, in our scenario the
transformation will be called and executed only 100,000 of times out of 1M. The solution is using an Expression
transformation that calls the lookup transformation that is not connected to the dataflow:

Figure 4: Unconnected Lookup Transformation

For instance, an Expression transformation will contain a port with the following expression:
IIF (ISNULL (COUNTRY),
:LKP.LKP_COUNTRY (EMPLOYEE_ID), COUNTRY)

If the COUNTRY is null, then the lookup named LKP_COUNTRY is called with the parameter EMPLOYEE_ID.
The ports in the look up transformation are COUNTRY and EMPLOYEE_ID, as well as the input port.

Order by clause within SQL


Informatica takes the time (and the effort) to bring all the data for each port within the lookup transformation. Thereby,
it is recommended to get rid of those ports that are not used to avoid additional processing.
It is also a best practice to perform “ORDER BY” clause on the columns which are being used in the join condition.
This “ORDER BY” clause is done by default and helps Informatica to save time and space to create its own index.
Informatica sorts the query for each column on the SELECT statement. Hence, redundant or unnecessary columns
should not be here.
To avoid any sort, just add a comment at the end of the SQL override:

Figure 5: To Avoid ORDER BY in SQL Override

To sum up, it is possible to enhance Informatica lookups by using different set of configurations in order to increase
performance as well as save resources and time. However, before applying any of the mentioned features, an analysis
of the tables and the SQL queries involved needs to be done.

68
INFORMATICA hand book

Performance Tuning Methodology


Session Log contains a wealth of information provided we know what we want, timings for each of the activities can be derived.
For every session INFA creates
o One Reader Thread
o One Writer Thread
o One Transformation Thread

Thread Statistics

 Thread statistics reveal important information regarding how the session was run and how the reader, writer and
transformation threads were utilized.
 Busy Percentage of each thread are published by INFA at the end of the file in the session log.
 By adding partition points judiciously and running it again we can slowly zero in on the transformation
bottleneck.
 Number of threads cannot be increased for reader and writer, but through partition points we can increase the
number of transformation threads

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ] has completed: Total Run
Time = [858.151535] secs, Total Idle Time = [842.536136] secs, Busy Percentage = [1.819655].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ] has completed:
Total Run Time = [857.485609] secs, Total Idle Time = [0.485609] secs, Busy Percentage = [100].

MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of partition point(s) [Target Tables] has completed:
Total Run Time = [1573.351240] secs, Total Idle Time = [1523.193522] secs, Busy Percentage = [3.187954].

Bottleneck

 The transformation thread is 100 % busy meaning that the bottleneck for performance lies in the transformations.
 If the Reader or Writer threads show 100% then there is a bottleneck in reading data from source or writing data to
targets
 100% is only relative. Always FOCUS on the thread that is the greatest among the Reader, Transformation and
Writer.
 If the Busy percentage of all threads is less than 50% there may not be much of a bottleneck in the mapping.

IMPROVING MAPPING PERFORMANCE - TIPS

1. Aggregator Transformation
You can use the following guidelines to optimize the performance of an Aggregator transformation.
 Use Sorted Input to decrease the use of aggregate caches:
The Sorted Input option reduces the amount of data cached during the session and improves session performance. Use
this option with the Source Qualifier Number of Sorted Ports option to pass sorted data to the Aggregator transformation.

69
INFORMATICA hand book

 Limit connected input/output or output ports:


Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator
transformation stores in the data cache.

 Filter before aggregating:


If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to
reduce unnecessary aggregation.

2. Filter Transformation

The following tips can help filter performance:

 Use the Filter transformation early in the mapping:


To maximize session performance, keep the Filter transformation as close as possible to the sources in the
mapping. Rather than passing rows that you plan to discard through the mapping, you can filter out unwanted data early in
the flow of data from sources to targets.

 Use the Source Qualifier to filter:


The Source Qualifier transformation provides an alternate way to filter rows. Rather than filtering rows from
within a mapping, the Source Qualifier transformation filters rows when read from a source. The main difference is that the

70
INFORMATICA hand book

source qualifier limits the row set extracted from a source, while the Filter transformation limits the row set sent to a target.
Since a source qualifier reduces the number of rows used throughout the mapping, it provides better performance.
However, the source qualifier only lets you filter rows from relational sources, while the Filter transformation
filters rows from any type of source. Also, note that since it runs in the database, you must make sure that the source
qualifier filter condition only uses standard SQL. The Filter transformation can define a condition using any statement or
transformation function that returns either a TRUE or FALSE value.

3. Joiner Transformation

The following tips can help improve session performance:

 Perform joins in a database:


Performing a join in a database is faster than performing a join in the session. In some cases, this is not possible,
such as joining tables from two different databases or flat file systems. If you want to perform a join in a database, you can
use the following options:
 Create a pre-session stored procedure to join the tables in a database before running the mapping.
 Use the Source Qualifier transformation to perform the join.

 Designate as the master source the source with the smaller number of records:
For optimal performance and disk storage, designate the master source as the source with the lower number of
rows. With a smaller master source, the data cache is smaller, and the search time is shorter.

4. LookUp Transformation

Use the following tips when you configure the Lookup transformation:

 Add an index to the columns used in a lookup condition:


If you have privileges to modify the database containing a lookup table, you can improve performance for both
cached and uncached lookups. This is important for very large lookup tables. Since the Informatica Server needs to query,
sort, and compare values in these columns, the index needs to include every column used in a lookup condition.

 Place conditions with an equality operator (=) first:


If a Lookup transformation specifies several conditions, you can improve lookup performance by placing all the
conditions that use the equality operator first in the list of conditions that appear under the Condition tab.

 Cache small lookup tables:


Improve session performance by caching small lookup tables. The result of the Lookup query and processing is
the same, regardless of whether you cache the lookup table or not.

 Join tables in the database:


If the lookup table is on the same database as the source table in your mapping and caching is not feasible, join the
tables in the source database rather than using a Lookup transformation.

 Un Select the cache look-up option in Look Up transformation if there is no look up over-ride. This improves
performance of session.

BEST PRACTICES OF DEVELOPING MAPPINGS IN INFORMATICA

1. Provide the join condition in Source Qualifier Transformation itself as far as possible. If it is compulsory, use a Joiner
Transformation.

2. Use functions in Source Qualifier transformation itself as far as possible (in SQL Override.)

71
INFORMATICA hand book

3. Don’t bring all the columns into the Source Qualifier transformation. Take only the necessary columns and delete all
unwanted columns.

4. Too many joins in Source Qualifier can reduce the performance. Take the base table and the first level of parents into
one join condition, base table and next level of parents into another and so on. Similarly, there can be multiple data
flows, which can either insert or insert as well as update.

5. Better to use the sorted ports in Source Qualifier Transformation to avoid the Sorter transformation.

6. If it is compulsory, use Aggregator Transformation. Otherwise, calculate the aggregated values in Source Qualifier
Transformation in SQL override.

7. In case of Aggregators, ensure that proper Group By clause is used.

8. Do not Implement the Error Logic (if applicable) in Aggregator Transformation.

9. Data must be sorted on key columns before passing to Aggregator

10. Minimize aggregate function calls: SUM(A+B) will perform better than SUM(A)+SUM(B)

11. If you are using Aggregator & Lookup transformations, try to use Lookup after aggregation.

12. Don’t bring all the columns into the look up transformation. Take only the necessary columns and delete all unwanted
columns.

13. Using the more no of lookups reduces the performance. If there are 2 lookups, try to club them into one using SQL
override.

14. Use the Reusable Lookups on Dimensions for getting the Keys

15. Cache lookup rows if the number of rows in the lookup table is significantly less than the typical number of source
rows.

16. Share caches if several lookups are based on the same data set

17. Index the columns in the lookup condition if lookup is unavoidable

18. If you use a Filter transformation in mapping, keep it as close as possible to the sources in mapping and before the
Aggregator transformation

19. Avoid using Stored Procedure Transformation as far as possible.

20. Try to use proper index for the columns used in where conditions while searching.

21. Call the procedures in pre-session or post-session command.

22. Be careful while selecting the bulk load option. If bulk load is used, disable all constraints in pre-session and enable
them in post-session. Ensure that the mapping does not allow null, duplicates, etc...

23. As far as possible try to convert procedures (functions) into informatica transformations.

24. Do not create multiple groups in Router (Like Error, Insert, Update etc), Try to Utilize the Default Group.

25. Don't take two instances of the target table for insert/Update. Use Update Strategy Transformation to achieve the same.

26. In case of joiners ensure that smaller tables are used as master tables

27. Configure the sorted input to the Joiner transformation to improve the session performance.

72
INFORMATICA hand book

28. Use operators instead of functions since the Informatica Server reads expressions written with operators faster than
those with functions. For example, use the || operator instead of the CONCAT () function.

29. Make sure data types should be Unique across mapping from Source to target

30. If you are using the bulk load option increase the commit interval

31. Check the source queries at the backend while developing mapping and store all queries used during development
separately so that it can be re-use during unit testing which saves time.

SESSION LOGS
Information that reside in a session log:
- Allocation of system shared memory
- Execution of Pre-session commands/ Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM statistics
Other Information
- By default, the server generates log files based on the server code page.
Thread Identifier
Ex: CMN_1039
Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits. The number following a
thread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number
Log File Codes
Error Codes Description
BR - Related to reader process, including ERP, relational and flat file.
CMN - Related to database, memory allocation
DBGR - Related to debugger
EP- External Procedure
LM - Load Manager
TM - DTM
REP - Repository
WRT - Writer
Load Summary
(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected
Statistics details
(a) Requested rows shows the no of rows the writer actually received for the specified operation
(b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error)
(c) Rejected rows show the no of rows the writer could not apply to the target
(d) Affected rows shows the no of rows affected by the specified operation
Detailed transformation statistics
The server reports the following details for each transformation in the mapping
(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped
Tracing Levels
Normal - Initialization and status information, Errors encountered, Transformation errors, rows skipped,
summarize session details (Not at the level of individual rows)
Terse - Initialization information as well as error messages, and notification of rejected data

73
INFORMATICA hand book

Verbose Init - Addition to normal tracing, Names of Index, Data files used and detailed transformation
statistics.
Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.
NOTE
When you enter tracing level in the session property sheet, you override tracing levels configured for
transformations in the mapping.
Session Failures and Recovering Sessions
Two types of errors occurs in the server
- Non-Fatal
- Fatal
(a) Non-Fatal Errors
It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the
session property sheet with the stop on option. When you enable this option, the server counts Non-Fatal errors
that occur in the reader, writer and transformations.
Reader errors can include alignment errors while running a session in Unicode mode.
Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors.
Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as NULL
Input.
(b) Fatal Errors
This occurs when the server can not access the source, target or repository. This can include loss of connection or target
database errors, such as lack of database space to load data.
If the session uses normalizer (or) sequence generator transformations, the server can not update the sequence values in
the repository, and a fatal error occurs.
© Others
Usages of ABORT function in mapping logic, to abort a session when the server encounters a transformation
error.
Stopping the server using pmcmd (or) Server Manager
Performing Recovery
- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the rowed of the last
row commited to the target database. The server then reads all sources again and starts processing from the next rowid.
- By default, perform recovery is disabled in setup. Hence it won’t make entries in OPB_SRVR_RECOVERY table.
- The recovery session moves through the states of normal session schedule, waiting to run, Initializing, running,
completed and failed. If the initial recovery fails, you can run recovery as many times.
- The normal reject loading process can also be done in session recovery process.
- The performance of recovery might be low, if
o Mapping contain mapping variables
o Commit interval is high
Un recoverable Sessions
Under certain circumstances, when a session does not complete, you need to truncate the target and run the
session from the beginning.
Commit Intervals
A commit interval is the interval at which the server commits data to relational targets during a session.
(a) Target based commit
- Server commits data based on the no of target rows and the key constraints on the target table. The commit
point also depends on the buffer block size and the commit interval.
- During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the
buffer block is full, the Informatica server issues a commit command. As a result, the amount of data
committed at the commit point generally exceeds the commit interval.
- The server commits data to each target based on primary –foreign key constraints.
(b) Source based commit
- Server commits data based on the number of source rows. The commit point is the commit interval you
configure in the session properties.
- During a session, the server commits data to the target based on the number of rows from an active source
in a single pipeline. The rows are referred to as source rows.
- A pipeline consists of a source qualifier and all the transformations and targets that receive data from
source qualifier.
- Although the Filter, Router and Update Strategy transformations are active transformations, the server does
not use them as active sources in a source based commit session.
- When a server runs a session, it identifies the active source for each pipeline in the mapping. The server
generates a commit row from the active source at every commit interval.
- When each target in the pipeline receives the commit rows the server performs the commit.
Reject Loading

74
INFORMATICA hand book

During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target
rejects data, the server writers the rejected row into the reject file.
You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You
cannot load rejected data into a flat file target)
Each time, you run a session, the server appends a rejected data to the reject file.
Locating the BadFiles
$PMBadFileDir
Filename.bad
When you run a partitioned session, the server creates a separate reject file for each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
(a) Row indicator
Row indicator tells the writer, what to do with the row of wrong data.
Row indicator Meaning Rejected By
0 Insert Writer or target
1 Update Writer or target
2 Delete Writer or target
3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.
(b) Column indicator
Column indicator is followed by the first column of data, and another column indicator. They appears after every
column of data and define the type of data preceding it
Column Indicator Meaning Writer Treats as
D Valid Data Good Data. The target accepts it unless a database error occurs, such as finding
duplicate key.
O Overflow Bad Data.
N Null Bad Data.
T Truncated Bad Data
NOTE
NULL columns appear in the reject file with commas marking their column.
Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data.
Keep in mind that correcting the reject file does not necessarily correct the source of the reject.
Correct the mapping and target database to eliminate some of the rejected data when you run the session again.
Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may
contain misleading column indicator.
For example, a series of “N” indicator might lead you to believe the target database does not accept NULL values,
so you decide to change those NULL values to Zero.
However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update
strategy expression, not because of a target database restriction.
If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0
values, in place of NULL values.
Why writer can reject ?
- Data overflowed column constraints
- An update strategy expression
Why target database can Reject ?
- Data contains a NULL column
- Database errors, such as key violations
Steps for loading reject file:
- After correcting the rejected data, rename the rejected file to reject_file.in
- The rejloader used the data movement mode configured for the server. It also used the code page of
server/OS. Hence do not change the above, in middle of the reject loading
- Use the reject loader utility
Pmrejldr pmserver.cfg [folder name] [session name]
Other points
The server does not perform the following option, when using reject loader
(a) Source base commit
(b) Constraint based loading
(c) Truncated target table
(d) FTP targets
(e) External Loading

75
INFORMATICA hand book

Multiple reject loaders


You can run the session several times and correct rejected data from the several session at once. You can correct and
load all of the reject files at once, or work on one or two reject files, load then and work on the other at a later time.
External Loading
You can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load session target files
into the respective databases.
The External Loader option can increase session performance since these databases can load information directly
from files faster than they can the SQL commands to insert the same data into the database.
Method:
When a session used External loader, the session creates a control file and target flat file. The control file contains
information about the target flat file, such as data format and loading instruction for the External Loader. The control
file has an extension of “*.ctl “ and you can view the file in $PmtargetFilesDir.
For using an External Loader:
The following must be done:
- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session property sheet.
Issues with External Loader:
- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging
- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are having a session with the two
target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information:
- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in the session log. However,
details about EL performance, it is generated at EL log, which is getting stored as same target directory.
- If the session contains errors, the server continues the EL process. If the session fails, the server loads partial target
data using EL.
- The EL creates a reject file for data rejected by the database. The reject file has an extension of “*.ldr” reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, and not through Informatica reject
load utility (For EL reject file only)
Configuring EL in session
- In the server manager, open the session property sheet
- Select File target, and then click flat file options
Caches
- Server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup transformation in a
mapping.
- Server stores key values in index caches and output values in data caches : if the server requires more
memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most circumstances, it deletes
the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow :
Transformation index cache data cache
Aggregator stores group values stores calculations
As configured in the based on Group-by ports
Group-by ports.
Rank stores group values as stores ranking information
Configured in the Group-by based on Group-by ports .
Joiner stores index values for stores master source rows .
The master source table
As configured in Joiner condition.
Lookup stores Lookup condition stores lookup data that’s
Information. Not stored in the index cache.
Determining cache requirements

76
INFORMATICA hand book

To calculate the cache size, you need to consider column and row requirements as well as processing
overhead.
- server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to key information.
Steps:
- first, add the total column size in the cache to the row overhead.
- Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache requirements .
- For maximum requirements, multiply min requirements by 2.
Location:
-by default , the server stores the index and data files in the directory $PMCacheDir.
-the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds 2GB,you may find
multiple index and data files in the directory .The server appends a number to the end of
filename(PMAGG*.id*1,id*2,etc).
Aggregator Caches
- when server runs a session with an aggregator transformation, it stores data in memory until it completes
the aggregation.
- when you partition a source, the server creates one memory cache and one disk cache and one and disk
cache for each partition .It routes data from one partition to another based on group key values of the
transformation.
- server uses memory to process an aggregator transformation with sort ports. It doesn’t use cache memory
.you don’t need to configure the cache memory, that use sorted ports.
Index cache:
#Groups ((∑ column size) + 7)
Aggregate data cache:
#Groups ((∑ column size) + 7)
Rank Cache
- when the server runs a session with a Rank transformation, it compares an input row with rows with rows
in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored row with the
input row.
- If the rank transformation is configured to rank across multiple groups, the server ranks incrementally for
each group it finds .
Index Cache :
#Groups ((∑ column size) + 7)
Rank Data Cache:
#Group [(#Ranks * (∑ column size + 10)) + 20]
Joiner Cache:
- When server runs a session with joiner transformation, it reads all rows from the master source and builds
memory caches based on the master rows.
- After building these caches, the server reads rows from the detail source and performs the joins
- Server creates the Index cache as it reads the master source into the data cache. The server uses the
Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache.
- To improve joiner performance, the server aligns all data for joiner cache or an eight byte boundary.
Index Cache :
#Master rows [(∑ column size) + 16)
Joiner Data Cache:
#Master row [(∑ column size) + 8]
Lookup cache:
- When server runs a lookup transformation, the server builds a cache in memory, when it process the first
row of data in the transformation.
- Server builds the cache and queries it for the each row that enters the transformation.
- If you partition the source pipeline, the server allocates the configured amount of memory for each partition.
If two lookup transformations share the cache, the server does not allocate additional memory for the
second lookup transformation.
- The server creates index and data cache files in the lookup cache drectory and used the server code page
to create the files.
Index Cache :
#Rows in lookup table [(∑ column size) + 16)
Lookup Data Cache:
#Rows in lookup table [(∑ column size) + 8]
Mapplets
When the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it
would any other sessions, passing data through each transformations in the mapplet.

77
INFORMATICA hand book

If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping
using the mapplet.
You can create a non-reusable instance of a reusable transformation.
Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation
Mapplet Won’t Support:
- Joiner
- Normalizer
- Pre/Post session stored procedure
- Target definitions
- XML source definitions
Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit
your changes
You can use a single mapplet, even more than once on a mapping.
Ports
Default value for I/P port - NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values
Session Parameters
This parameter represent values you might want to change between sessions, such as DB Connection or source file.
We can use session parameter in a session property sheet, then define the parameters in a session parameter file.
The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory
Description:
Use session parameter to make sessions more flexible. For example, you have the same type of transactional data
written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the
databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection parameter, like
$DBConnectionSource, and use it as the source database connection for the session.
When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session.
After it completes set the value to TransDB2 and run the session again.
NOTE:
You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session parameter, it fails to
initialize the session.
Session Parameter File
- A parameter file is created by text editor.
- In that, we can specify the folder and session name, then list the parameters and variables used in the
session and assign each value.
- Save the parameter file in any directory, load to the server
- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters
- You can include parameter and variable information for more than one session in a single parameter file by
creating separate sections, for each session with in the parameter file.
- You can override the parameter file for sessions contained in a batch by using a batch parameter file. A
batch parameter file has the same format as a session parameter file
Locale
Informatica server can transform character data in two modes
(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data

78
INFORMATICA hand book

(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at session level, to
ensure data integrity.
Code pages contains the encoding to specify characters in a set of one or more languages. We can select a code page,
based on the type of character data in the mappings.
Compatibility between code pages is essential for accurate data movement.
The various code page components are
- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page
Locale
(a) System Locale - System Default
(b) User locale - setting for date, time, display
© Input locale
Mapping Parameter and Variables
These represent values in mappings/mapplets.
If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter
and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of mapping needs to be
changed.
When you want to use the same value for a mapping parameter each time you run the session.
Unlike a mapping parameter, a mapping variable represent a value that can change through the session. The
server saves the value of a mapping variable to the repository at the end of each successful run and used that value
the next time you run the session.
Mapping objects:
Source, Target, Transformation, Cubes, Dimension
Debugger
We can run the Debugger in two situations
(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process
MEadata Reporter:
- Web based application that allows to run reports against repository metadata
- Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.
Repository
Types of Repository
(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple developers can use
through shortcuts. These may include operational or application source definitions, reusable
transformations, mapplets and mappings
(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for
development.
© Standard Repository
a. A repository that functions individually, unrelated and unconnected to other repository
NOTE:
- Once you create a global repository, you can not change it to a local repository
- However, you can promote the local to global repository
Batches
- Provide a way to group sessions for either serial or parallel execution by server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)

Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels deep, defining batches within batches
Nested batches are useful when you want to control a complex series of sessions that must run sequentially or concurrently

Scheduling

79
INFORMATICA hand book

When you place sessions in a batch, the batch schedule override that session schedule by default. However, we can configure a
batched session to run on its own schedule by selecting the “Use Absolute Time Session” Option.

Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the batch. If you have multiple servers,
all sessions within a batch run on the Informatica server that runs the batch.
The server marks a batch as failed if one of its sessions is configured to run if “Previous completes” and that
previous session fails.
Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a sequential batch, so that
Informatica server can run them is consecutive order.
They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)
Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to run the session
separately or in a sequential batch.
Concurrent batch in a Sequential batch
If you have concurrent batches with source-target dependencies that benefit from running those batches in a particular order, just
like sessions, place them into a sequential batch.
Stopping and aborting a session
- If the session you want to stop is a part of batch, you must stop the batch
- If the batch is part of nested batch, stop the outermost batch
- When you issue the stop command, the server stops reading data. It continues processing and writing data and committing data
to targets
- If the server cannot finish processing and committing data, you can issue the ABORT command. It is similar
to stop command, except it has a 60 second timeout. If the server cannot finish processing and committing
data within 60 seconds, it kills the DTM process and terminates the session.
Recovery:
- After a session being stopped/aborted, the session results can be recovered. When the recovery is
performed, the session continues from the point at which it stopped.
- If you do not recover the session, the server runs the entire session the next time.
- Hence, after stopping/aborting, you may need to manually delete targets before the session runs again.
NOTE: ABORT command and ABORT function, both are different.
When can a Session Fail
- Server cannot allocate enough system resources
- Session exceeds the maximum no of sessions the server can run concurrently
- Server cannot obtain an execute lock for the session (the session is already locked)
- Server unable to execute post-session shell commands or post-load stored procedures
- Server encounters database errors
- Server encounter Transformation row errors (Ex: NULL value in non-null fields)
- Network related errors
When Pre/Post Shell Commands are useful
- To delete a reject file
- To archive target files before session begins
Session Performance
- Minimum log (Terse)
- Partitioning source data.
- Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
- Adding indexes.
- Changing commit Level.
- Using Filter trans to remove unwanted data movement.
- Increasing buffer memory, when large volume of data.
- Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.
- In session level, the causes are small cache size, low buffer memory and small commit interval.
- At system level,
o WIN NT/2000-U the task manager.
o UNIX: VMSTART, IOSTART.
Hierarchy of optimization
- Target.
- Source.

80
INFORMATICA hand book

- Mapping
- Session.
- System.
Optimizing Target Databases:
- Drop indexes /constraints
- Increase checkpoint intervals.
- Use bulk loading /external loading.
- Turn off recovery.
- Increase database network packet size.
Source level
- Optimize the query (using group by, group by).
- Use conditional filters.
- Connect to RDBMS using IPC protocol.
Mapping
- Optimize data type conversions.
- Eliminate transformation errors.
- Optimize transformations/ expressions.
Session:
- concurrent batches.
- Partition sessions.
- Reduce error tracing.
- Remove staging area.
- Tune session parameters.
System:
- improve network speed.
- Use multiple preservers on separate systems.
- Reduce paging.

Improving Performance at Session level


Optimizing the Session
Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. You can
perform the following tasks to improve overall performance:
 Run concurrent batches.
 Partition sessions.
 Reduce errors tracing.
 Remove staging areas.
 Tune session parameters.
Table lists the settings and values you can use to improve session performance:

81
INFORMATICA hand book

Example Walkthrough

1. Go to Mappings Tab, Click Parameters and Variables Tab, Create a NEW port as below.

$$LastRunTime Variable date/time 19 0 Max


Give an Initial Value. For example 1/1/1900.

2. IN EXP Transformation, Create Variable as below:

SetLastRunTime (date/time) = SETVARIABLE ($$LastRunTime, SESSSTARTTIME)

3. Go to SOURCE QUALIFIER Transformation,

Click Properties Tab, In Source Filter area, ENTER the following Expression.
UpdateDateTime (Any Date Column from source) >= '$$LastRunTime'
AND
UPDATEDATETIME < '$$$SESSSTARTTIME'

4. Handle Nulls in DATE

iif(isnull(AgedDate),to_date('1/1/1900','MM/DD/YYYY'),trunc(AgedDate,'DAY'))

5. LOOK UP AND UPDATE STRATEGY EXPRESSION

First, declare a Look Up condition in Look Up Transformation.


For example,

EMPID_IN (column coming from source) = EMPID (column in target table)

Second, drag and drop these two columns into UPDATE Strategy Transformation.

Check the Value coming from source (EMPID_IN) with the column in the target table (EMPID). If both are equal this means that
the record already exists in the target. So we need to update the record (DD_UPDATE). Else will insert the record coming from
source into the target (DD_INSERT). See below for UPDATE Strategy expression.

IIF ((EMPID_IN = EMPID), DD_UPDATE, DD_INSERT)

Note: Always the Update Strategy expression should be based on Primary keys in the target table.

6. EXPRESSION TRANSFORMATION

 IIF (ISNULL (ServiceOrderDateValue1),


TO_DATE ('1/1/1900','MM/DD/YYYY'), TRUNC (ServiceOrderDateValue1,'DAY'))
 2.IIF (ISNULL (NpaNxxId1) or LENGTH (RTRIM (NpaNxxId1))=0 or TO_NUMBER (NpaNxxId1) <= 0,'UNK',
NpaNxxId1)
 IIF (ISNULL (InstallMethodId),0,InstallMethodId)
 Date_Diff(TRUNC(O_ServiceOrderDateValue),TRUNC(O_ServiceOrderDateValue), 'DD')

7. FILTER CONDITION

82
INFORMATICA hand book

To pass only NOT NULL AND NOT SPACES VALUES THROUGH TRANSFORMATION.

IIF ( ISNULL(LENGTH(RTRIM(LTRIM(ADSLTN)))),0,LENGTH(RTRIM(LTRIM(ADSLTN))))>0

SECOND FILTER CONDITION [Pass only NOT NULL FROM FILTER]

iif(isnull(USER_NAME),FALSE,TRUE)

SENARIOS
1. Using indirect method we can load files with same structure. How to load file name in the database.
Input files
File1.txt
Andrew|PRES|Addline1|NJ|USA
Samy|NPRS|Addline1|NY|USA
File2.txt
Bharti|PRES|Addline1|KAR|INDIA
Ajay|PRES|Addline1|RAJ|INDIA
Bhawna|NPRS|Addline1|TN|INDIA
In database want to load the file name
File Name Name Type Address Line State Country
File1.txt Andrew PRES Addline1 NJ USA
File1.txt Samy NPRS Addline1 NY USA

File2.txt Bharti PRES Addline1 KAR INDIA


File2.txt Ajay PRES Addline1 RAJ INDIA
File2.txt Bhawna NPRS Addline1 TN INDIA
Ans:
This can be done by enabling CurrentlyProcessedRow
Do this in the source analyzer while create the source definition

83
INFORMATICA hand book

Then this Currently ProcessedRows column will be enabled

2. How to separate the duplicate in 1 target and unique only to another target
1|Piyush|Patra|
2|Somendra|Mohanthy
3|Santhosh|bishoyi
1|Piyush|Patra|
2|Somendra|Mohanthy

O/P
File1
1|Piyush|Patra|
2|Somendra|Mohanthy
File2
3|Santhosh|bishoyi

Solution:

84
INFORMATICA hand book

This can be done this the help of Aggregator.


Group by the columns on which you want to decide duplicate or unique
Port Expression Group by
ID Yes
FName Yes
LName Yes
Count count(id)

3. How to load n number of records equally to 4 targets


Sol:
You can do this using the sequence generator and router

Sequence Generator
Use this to generate the record number from 1 to 4
Set following properties

85
INFORMATICA hand book

Expression:
Use this to get the next value from Sequence generator
Router:
Use this to redirect output to 4 targets based on the group property

4. How to insert First_Name,Last_Name for below scenario


Source
First_Name Last_Name
Vijay
Jaiswal

86
INFORMATICA hand book

Kuldeep
Solanki
Lalsingh
Bharti
Poornaya
Cherukmala
Rajeev
TK
Target (o/p)
First_Name Last_Name
Vijay Jaiswal
Kuldeep Solanki

Lalsingh Bharti

Poornaya Cherukmala

Rajeev TK
Solution:
Option1
You can assign serial number to Source data,then group by serial number and take write to target

Expression
Use expression to assign the same serial number to first_name and last_name

Now data will be as below


Sl First_Name Last_Name
1 Vijay
1 Jaiswal
2 Kuldeep
2 Solanki

87
INFORMATICA hand book

3 Lalsingh
3 Bharti
4 Poornaya
4 Cherukmala
5 Rajeev
5 TK
Aggregator
Group by SL.As per aggregator property null values are ignored ,use max or min function to get name combination

5. Customer records entered in the OLTP system by different agent as below

First Name Last Name Address Entry_Date


Srini Reddy Cegedim, Bangalore 01-01-2011 10:05

Tarun Tanwar Capgemini, US 01-01-2011 10:15

Devashis Jain Symphony, Bangalore 01-01-2011 10:25

Srini Reddy Cegedim ,Bangalore 01-01-2011 11:20

In Data mart records are loaded on the same date 4:00 PM, records should be loaded as below
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00 01-01-2011 14:00:10

Tarun Tanwar Capgemini, US 01-01-2011 14:00:02

Devashis Jain Symphony, Bangalore 01-01-2011 14:00:05

Srini Reddy Cegedim Pvt Ltd ,Bangalore 01-01-2011 14:00:10

Solution:
If you u use static lookup then for srini Reddy only one record will be loaded. Because lookup will return only one value
First Name Last Name Address Effective date End date
Srini Reddy Cegedim, Bangalore 01-01-2011 14:00:00
88
Tarun Tanwar Capgemini, US 01-01-2011 14:00:02

Dev Jain Symphony, Bangalore 01-01-2011 14:00:05


INFORMATICA hand book

This can be done using dynamic lookup cache


Configure lookup cache as Dynamic

First Name Last Name Address NewLookupRow


Srini Reddy Cegedim, Bangalore 1

Tarun Tanwar Capgemini, US 1


Dev Jain Symphony, Bangalore 1

Srini Reddy Cegedim Pvt Ltd ,Bangalore 2

Use router to route for Insert and Insert-Update


Then for Insert and Update strategy to insert
For insert Update use sequence generator to create surrogate key
Use update strategy to end old record with system date

6. Needs to calculate the contribution to family income

Family ID Person ID Person Name Earning


100 1 Vijay 20000
100 2 Ajay 30000
200 3 Bharat 60000
200 4 Akash 60000
300 5 Sanjay 50000
O/P
Family ID Person ID Person Name %Contribution to family
100 1 Vijay 40
100 2 Ajay 60
200 3 Bharat 50
200 4 Akash 50
300 5 Sanjay 100
Solution:
This can be done this the help of Joiner and Aggregator.
Port Expression Group by
Family_id1 Yes
Sal
Tot_sal Sum(sal)

89
INFORMATICA hand book

Use the joiner to join the records based on the family id

Port Master/Detail
Family_id Detail
Tot_Sal Detail
Person_id Master
Family_id Master
Person_name Master
Sal Master
Use the Join condtion
Family_id1=family_id
Use the expression to get the calculation
Port Epression
Contribution (sal/Tot_sal)*100

7. To produce normalized output or Convert rows into columns

I/P
ID Month Sales
1 Jan 100
1 Feb 120
1 March 135
2 Jan 110
2 Feb 130
2 March 120
O/P
ID Jan Feb March
1 100 120 135
2 110 130 120

90
INFORMATICA hand book

Ans:
Use the aggregator group by ID and use First function
FIRST(AMOUNT, MONTH='JAN')
FIRST(AMOUNT, MONTH='FEB')
FIRST(AMOUNT, MONTH='MAR')

8. Data Scenario
When multiple records are coming from the source when join with another table for a single input.

Sample Logic:

The logic has been used to find a single valid crcust number from the source.
The source tables are TRAN, CUSTMR and CUSTOMER.
The crcust number will be pulled from the CUSTMR table by the normal join on TRAN table based on the fields gtkey,
gtcmpy and gtbrch.
If multiple records are coming from the custmr table for a single combination of gtkey, gtcmpy and gtbrch then we can
do lookup on the customer table based on crcust number from custmr table and outlet_status in customer table should
be in (‘1’ or spaces) or if we are getting only one crcust number for a single combination then we can use that valid
crcust number.
If we are getting only one crcust number from customer table then we can process that crcust number or if we are
getting multiple crcust numbers then we have to use filter, sorter and aggregator transformations to get the valid crcust
number without max or min on the multiple records in the customer table.

The following query that has used to retrieve the source records from the first source table. (AS400 Environment)
Select
tran.gtdvsn,
tran.gtamnt,
tran.gtpsdt,
tran.gtpsmt,
tran.gtpsyr,
tran.gtlact,
Substr (digits (tran.gtkey), 1, 7) as gtkey,
Digits (tran.gtcmpy) as gtcmpy,
tran.gtbrch
From
Npc.tran Tran
Where
tran.gtcmpy = 300
And Tran.Gtamnt = 115.5

Source data:

GTDVSN GTAMNT GTPSDT GTPSMT GTPSYR GTLACT GTKEY GTCMPY GTBRC

91
INFORMATICA hand book

H
101 115.50 1090210 2 109 422000-0001 0002463 300

The following source query has used to retrieve the second table source records (AS400 Environment):
When there is a single input from the TRAN table, CUSTMR table populating multiple records (Normal Join).
SELECT
CUSTMR.CRCUST as CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC,
A.COUNT
FROM
NMC.CUSTMR CUSTMR, (SELECT COUNT (*) COUNT,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
GROUP BY
CUSTMR.CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3),
CUSTMR.CRPBRC) A
WHERE
CUSTMR.CRPCUS=A.CRPCUS and
SUBSTR (CUSTMR.CREXT1, 1, 3) =A.CREXT1 AND
CUSTMR.CRPBRC=A.CRPBRC AND
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRCUST IN ('0045907','0014150')

Source data from custmr table:

CRPBR
CRCUST CRPCUS CREXT1 C COUNT
0014150 0002463 300 2
0045907 0002463 300 2

The below detail outer join on customer table has been used to get the valid crcust number when we are getting multiple crcust
numbers after the normal join.

The master table CUSTMR has joined with the detail table CUSTOMER based on CRCUST field (detail outer join).

SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0014150','0045907')

Source data from customer table:

AR_NUM

92
INFORMATICA hand book

0014150

The valid crcust number '0014150' will be processed and populated to the target.
We will see the source data set when multiple records have found in customer table, and when we should not use max or min on
the crcust number.
Source data set:

Source query for TRAN table:

SELECT
TRAN.GTDVSN,
TRAN.GTAMNT,
TRAN.GTPSDT,
TRAN.GTPSMT,
TRAN.GTPSYR,
TRAN.GTLACT,
SUBSTR (DIGITS (TRAN.GTKEY), 1, 7) AS GTKEY,
DIGITS (TRAN.GTCMPY) AS GTCMPY,
TRAN.GTBRCH
FROM
NMP.TRAN TRAN
WHERE
TRAN.GTCMPY = 300
AND TRAN.GTAMNT IN (1030.5, 558.75, 728)

Source data from TRAN table:

GTDVSN GTAMNT GTPSDT GTPSMT GTPSYR GTLACT GTKEY GTCMPY GTBRCH


101 1030.50 1090210 2 109 422000-0001 0006078 300
101 558.75 1090210 2 109 2550-1043001 0006078 300
101 728.00 1090210 2 109 2550-422000 0006078 300

Source query for CUSTMR table:

SELECT CUSTMR.CRCUST,
CUSTMR.CRPCUS as CRPCUS,
SUBSTR (CUSTMR.CREXT1, 1, 3) as CREXT1,
CUSTMR.CRPBRC as CRPBRC
FROM
NMC.CUSTMR CUSTMR
WHERE
SUBSTR (CUSTMR.CREXT1, 1, 3) = '300'
AND CUSTMR.CRPCUS= '0006078'

Source data from CUSTMR table:

CRCUST CRPCUS CREXT1 CRPBRC


0001877 0006078 300
0002392 0006078 300
0041271 0006078 300

93
INFORMATICA hand book

Source query for CUSTOMER table:

SELECT
DISTINCT LPAD (AR_NUM, 7,'0') AS AR_NUM
FROM
CUSTOMER
WHERE
LPAD (AR_NUM, 7,'0') IN ('0001877','0002392','0041271')

Source data from CUSTOMER table:

AR_NUM
0001877
0041271

The crcust numbers '0001877' and '0041271' are valid among those three crcust numbers. But the mapping should populate only
one crcust number among these two valid crcust numbers.

The below logic has been used to select a one valid crcust number.

Filter transformation:

The inputs for the filter transformation are coming from the joiner transformation which have done the detail outer join on the
master table custmr and the detail table customer table based on the crcust number.

94
INFORMATICA hand book

Filter condition:

COUNT_CRCUST=1 OR (COUNT_CRCUST<>1 AND NOT ISNULL (AR_NUM))

COUNT_CRCUST=1 represents that the records from custmr table which have only one valid crcust number for a single
combination of gtkey, gtcmpy and gtbrch fields and that crcust number may not be present in customer table.

COUNT_CRCUST<>1 AND NOT ISNULL (AR_NUM) represents that the records which have more than one crcust number
for a single combination of gtkey, gtcmpy and gtbrch fields and the ar_num from customer table should not be null(It means that
the multiple crcust numbers should have present in customer table also).

The filter transformation is used to filter the records which are the records that have the valid crcust numbers.

Sorter Transformation:

95
INFORMATICA hand book

The sorter transformation is used to sort the records in descending order based on crcust number, crpcus, crext1 and crpbrc. In the
next aggregator transformation we are going to use min on outlet_status field, we will loose the multiple records for that we are
sorting the records. Till this sorter transformation we have processed the records from customer table when all kinds of
outlet_status.

Aggregator transformation:

The aggregator transformation has used to eliminate the crcust numbers which has the outlet_status not in (‘1’ or spaces) and has
used to group by the crcust numbers based on the source fields crpcus, crext1 and crpbrc.

96
INFORMATICA hand book

The outlet_status has transformed as ‘1’ when outlet_status is in ‘1’ or spaces else that will be transformed as ‘9’. From this we
are taking a min outlet_status to consider only the records which have the outlet_status is in ‘1’ or spaces for the target.

Then we can join these records with the source table TRAN to get the valid crcust number.

97
INFORMATICA hand book

This joiner transformation is used to join the records from both the inputs from the aggregator transformation and TRAN source
qualifier transformation.

98
INFORMATICA hand book

Using this logic, we can find a single crcust number for a single combination of inputs from the source TRAN.

9. Data Scenario

When more than two continuous joiner transformations have to be used in the mapping and when the source tables are belongs to
the same database for those joins.
Sample Logic:
The tables that have used in the logic are SALES, PURCXREF and SALES_INDEX. All these tables are belongs to the
same database.

99
INFORMATICA hand book

The fields to the target from source are historical_business_unit, item_code, order_qty, ship_date, grp_code and
event_order_ind.
This logic is applicable for only for the records which have the SALES.historical_business_unit is in ‘BH’ and ‘BW’.
Optionally join to PURCXREF where PURCXREF.ITEM_ID = SALES.ITEM_ID
If join to PURCXREF is successful, then perform required join to SALES_INDEX table when PURCXREF.ITEM_OD
= SALES_INDEX.ITEM_ID.
If join to PURCXREF is not successful then perform required join to SALES where SALES_INDEX.ITEM_ID =
SALES.ITEM_ID.
The logic that have used for item_code is that concatenation of style_num, color_code, attribution_code and size_num
from SALES_INDEX when join to PURCXREF is successful or not.
The logic that have used for order_qty is that SUM (PURCXREF.ZCOMP_QTY *
SALES.ORINGINAL_ORDER_UNITS) when the join to PURCXREF is successful and SUM
(SALES.ORIGINAL_ORDER_UNITS) when the join to PURCXREF is not successful.

The below sql query is used to reduce the number of joiners in the mapping.

Select
S.HISTORICAL_BUSINESS_UNIT,
(SI.STYLE_NUM || SI.COLOR_CODE || SI.ATTRIBUTION_CODE || SI.SIZE_NUM) AS ITEMCODE,
ROUND (SUM (CASE WHEN S.HISTORICAL_BUSINESS_UNIT IN ('CH','CW') AND
PCX.ZITEMID IS NOT NULL THEN
S.ORIGINAL_ORDER_UNITS * PCX.ZCOMP_QTY
ELSE
S.ORIGINAL_ORDER_UNITS
END),0) ORDER_QTY,
S. SHIP_DATE,
S.GRP_CDE,
S.EVENT_ORDER_IND
FROM
OPC.SALES S
LEFT OUTER JOIN (SELECT MAX (p.plant),
p.af_grid,
p.zcmp_grid,
p.zcomp_qty,
p.material,
p.component,
p.zitemid,
p.compitemid
FROM OPC.PURCXREF p
GROUP BY p.af_grid,
p.zcmp_grid,
p.zcomp_qty,
p.material,
p.component,
p.zitemid,
p.compitemid) PCX
ON S.ITEM_ID = PCX.ZITEMID
JOIN OPC.SALES INDEX SI
ON (CASE WHEN S.HISTORICAL_BUSINESS_UNIT IN ('BH','BW') THEN
CASE WHEN PCX.ZITEMID IS NULL THEN
S.ITEM_ID
ELSE PCX.COMPITEMID END
ELSE S.ITEM_ID END) = SI.ITEM_ID
GROUP BY
S.HISTORICAL_BUSINESS_UNIT,
S.REQUESTED_SHIP_DATE,
S.GRP_CDE,
S.EVENT_ORDER_IND,
SI.STYLE_NUM || SI.COLOR_CODE || SI.ATTRIBUTION_CODE || SI.SIZE_NUM

100
INFORMATICA hand book

10. How to populate 1st record to 1st target ,2nd record to 2nd target ,3rd record to 3rd target and 4th record to 1st
target through informatica?
We can do using sequence generator by setting end value=3 and enable cycle option.then in the router take 3 goups
In 1st group specify condition as seq next value=1 pass those records to 1st target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start from 1,for the 4th record seq.next
value is 1 so it will go to 1st target.

11. How to do Dymanic File generation in Informatica?


We want to generate the separate file for every State (as per state, it should generate file).It has to generate 2 flat files and name
of the flat file is corresponding state name that is the requirement.
Below is my mapping.
Source (Table) -> SQ -> Target (FF)
Source:
State Transaction City
AP 2 HYD
AP 1 TPT
KA 5 BANG
KA 7 MYSORE
KA 3 HUBLI
This functionality was added in informatica 8.5 onwards earlier versions it was not there.
We can achieve it with use of transaction control and special "FileName" port in the target file .
In order to generate the target file names from the mapping, we should make use of the special "FileName" port in the target file.
You can't create this special port from the usual New port button. There is a special button with label "F" on it to the right most
corner of the target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use the same instance, but with a Transaction
Control transformation which defines the boundary for the source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue) else tc_commit_before
map the emp_no column to target's filename column
ur mapping will be like this
source -> squlf-> transaction control-> target
run it ,separate files will be created by name of Ename

12. How to concatenate row data through informatica?


Source:
Ename EmpNo
stev 100
methew 100
john 101
tom 101
Target:
Ename EmpNo
Stev methew 100
John tom 101

Approach1: Using Dynamic Lookup on Target table:


If record doen’t exit do insert in target .If it is already exist then get corresponding Ename vale from lookup and concat in
expression with current Ename value then update the target Ename column using update strategy.
Approch2: Using Var port :

101
INFORMATICA hand book

Sort the data in sq based on EmpNo column then Use expression to store previous record information using Var port after that
use router to insert a record if it is first time if it is already inserted then update Ename with concat value of prev name and
current name value then update in target.

13. How to send Unique (Distinct) records into One target and duplicates into another tatget?
Source:
Ename EmpNo
stev 100
Stev 100
john 101
Mathew 102

Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102

Target_2:
Ename EmpNo
Stev 100

Approch 1: Using Dynamic Lookup on Target table:


If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using Router.
Approch2: Using Var port :
Sort the data in sq based on EmpNo column then Use expression to store previous record information using Var ports after that
use router to route the data into targets if it is first time then sent it to first target if it is already inserted then send it to Tartget_2.

14. How to process multiple flat files to single target table through informatica if all files are same structure?
We can process all flat files through one mapping and one session using list file.
First we need to create list file using unix script for all flat file the extension of the list file is .LST.
The list files it will have only flat file names.
At session level we need to set source file directory as list file path
And source file name as list file name
And file type as indirect.

15. How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flatfile name option in the properties tab of source definition after
import source file defination in source analyzer.It will add new column as currently processed file name.we can map this
column to target to populate filename.

16. If we want to run 2 workflow one after another(how to set the dependence between wf’s)
 If both workflow exists in same folder we can create 2 worklet rather than creating 2 workfolws.
 Finally we can call these 2 worklets in one workflow.
 There we can set the dependency.
 If both workflows exists in different folders or repository then we cannot create worklet.
 We can set the dependency between these two workflow using shell script is one approach.
 The other approach is event wait and event rise.
If both workflow exists in different folrder or different rep then we can use below approaches.
Using shell script
 As soon as first workflow get completes we are creating zero byte file (indicator file).
 If indicator file is available in particular location. We will run second workflow.

102
INFORMATICA hand book

 If indicator file is not available we will wait for 5 minutes and again we will check for the indicator. Like this we
will continue the loop for 5 times i.e 30 minutes.
 After 30 minutes if the file does not exists we will send out email notification.
Event wait and Event rise approach
We can put event wait before actual session run in the workflow to wait a indicator file if file available then it will run the session
other event wait it will wait for infinite time till the indicator file is available.

17. How to load cumulative salary in to target ?


Solution:
Using var ports in expression we can load cumulative salary into target.

18. I have 1000 records in my source table, the same i have in target ,but a new column added in target as "batchno",
and this column adds no 10 for 1st 100 records and 20 for next 100 records and 30 next 100 records and vice versa.
How to acheive this?
Firstly it should have a primary key(unique key in target) on the target table then use update startegy t/r to update the targert.
Either use Seq or a variable as counter
Option1:
In Sequence Generator:
Give Start Value = 1
End value = 100
Check the Properties as CYCLE.
Give the following condition In Expression Transformation:
O_SEQUENCE= NEXT_VAL
v_COUNT = IIF(O_SEQUENCE = 1, v_COUNT+10)
Option2: sequence geneteor without cycle
iif(seq<100,10,iif(seq>100 and seq<=200,20,iif(seq>200 and seq<=300,30,40)))

19. How can we call a stored procedure at session level under Target post SQL
We can call the procedures in a anonymous block within pre-sql /post -sql using the following syntax:
begin procedure_name()\; end\;

103
INFORMATICA hand book

20. Consider the input records to be


Trade_I STATU
D AMOUNT S CURRENCY APPLIED_DATE
100 78.86889 TRUE USD 10/25/2013 5:13
100 6.864 TRUE USD 10/25/2013 7:37
100 865.97 TRUE USD 10/25/2013 10:01
100 -0.879 FALSE USD 10/25/2013 12:25
200 8.99 FALSE EUR 10/25/2013 14:49
200 78 TRUE EUR 10/25/2013 17:13
200 53.98 TRUE EUR 10/25/2013 19:37

The data is ordered based on applied_date. When the latest record status is FALSE for a trad_id the very next previous record
values should be passed to target.
Output required:
20 53.98 TRUE EUR 10/25/2013 19:37
0
10 865.97 TRUE USD 10/25/2013 10:01
0
Solution:
Use a aggregator- group by traded
O_ AMOUNT – IIF(FIRST(STATUS)=’FALSE’, FIRST(in_ AMOUNT, STATUS!= ’FALSE’), FIRST(in_ AMOUNT)
O_CURRENCY –similary as above

21. I have a input column which has alpha numeric and special character of data type string, I want only numbers [0-9]
from it, in my output port

REPLACECHR(0, REG_REPLACE(COLUMN_NAME, '[^0-9]', '/'), '/', NULL)


The reason why I have used a replace character function after using a regular expression was because, REG_REPLACE did not
allow me to replace the unwanted character with a NULL.

22. Input flatfile1 has 5 columns, flatfile2 has 3 columns (no common column) output should contain merged data(8
columns) How to achieve ?
Take first file's 5 column to a expression transformation,add a output column in it let say 'A'. Create
a mapping variable let say 'countA' having datatype integer.Now in port 'A' you put expression like
setcountvariable(countA).

Take second file's 3 column to a expression transformation ,add a output column in it let say 'B'.create a mapping variable let say
'countB' having datatype integer.Now in port 'B' you put expression like setcountvariable(countB).

The above two is only meant for creating common fields with common data in two pipeline.
Now join two pipe line in a joiner transformation upon the
condition A=B.
Now join the required 8 ports to the target.

23. How will restrict values in 0-9 and A-Z and a-z and special character. Only allowed these chars otherwise we will
reject the records? what is the function we used to restrict...
IIF(reg_match(in_String,’^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]
$’),’Valid’,'Invalid’)
This function is usually used to validate the email.

24. File SOURCE-1


NO NAME
1 SATISH
2 KARTHIK

104
INFORMATICA hand book

3 SWATHI
4 KEERTHI

File SOURCE-2
NO NAME
1 SATISH
2 KARTHIK
5 SANTHOSE
6 VASU

TARGET
3 SWATHI
4 KEERTHI
5 SANTHOSE
6 VASU
Here as the source meta data is same we can use UNION t/r, after that use AGGRIGATOR t/r where you count(no), after that
keep the filter t/r in that the condition is count=0.then connect to the target
The flow is as fallows
src--->sq--->union--->agg---->filter---->trg

25. Below
Source
name sal
aaaa 2000
bbbb 3000
abcd 5000

Target
name sal
aaaa 2000
bbbb 3000
abcd 5000
total 10000 ---AS NEW ROW

SRC==>EXP==>AGGR==>UNION
SRC==> ==>TRG

In AGGR take one output port that as name field with constant value(total) and the sum sal field as sal after that UNION with the
other pipeline.

26. We have 20 records in source system, when we run for the 1st time, it should load only 10 records into the target,
when you run for the second time it should load another 10 record which are not loaded. How do we do that? Can we
write a SQL query in source qualifier to do it.
If u r source is Relational,then u can write sqloverride.
SQ ==> select rownum as num,col1,col2,.. from table
FLT ==> (num >= MVar1 and num <=MVar2)

NOTE: MVar1, MVar2 are mapping variables the initial values of these variables 1,10 with respectively.

After completion of first run the variables are increased 11, 20 respectively.

27. I have a wf like wf-->s1-->s2-->s3-->s4; first start s1 later s2 later s3 here my session s3 have to run 3 times later
start s4?
Option1:
Wf-->s1-->s2-->s3-->Instance_of_s3--->Instance_of_s3-->s4
It can be achieved through Assignment task set a variable var = var+1 & check mod(Var/3)=0 in the link condition
Option2: create S3 in another workflow & call it from PMCMD command. OR in write a shell script to call session s1 and after
successful completion s2; later in a loop call s2 three times and later s4.

28. Following source


Name gender

105
INFORMATICA hand book

Ramya female
Ram male
deesha female
david male
kumar male

I want the target

male female
ram ramya
david deesha
kumar

anybody give solution above question?


Create the mapping as below:

EXP1 ---> SRT1


/ \
Src --> Router Joiner ---> EXP3 ---> Targ
\ /
EXP2 ---> SRT2

Router: Filter on the basis of gender. All records with gender male wil go into EXP1 and the records with gender female will go
into EXP2.
In EXP1 and EXP2, create a variable dummy port (Seq_Var) with value Seq_Var+1 and an output port Seq_Out (value-
Seq_Var).
Pass the name and sequence ports from EXP1 and EXP2 to SRT1 and SRT2 (sorter) respectively.
In SRT1 and SRT2, sort on the basis of sequence.
In Joiner, use 'Full Outer Join' and keep 'Sorted Input' checked in properties.
Pass the male name values and female name values from joiner to EXP3 (Gather all the data). Pass all the data to Target.

29. We have 30 wf with 3 batches means 1batch=10 wf, 2 batch=10-20 wf, 3batch=20-30wf. First you have to complete
batch1 after that batch2 and batch3 can have to run concurrently. How can you do in unix?
Write three shell scripts for each batch of workflows using pmcmd commad to invoke workflows. Write another shell script to
call first batch shell script which contains wfs the after the in script to schedule next batch of shell for the next minute using
crontab.

30. How to display session logs based upon particular dates. If I want to display session logs for 1 week from a particular
date how can I do it without using unix
Open the session properties of a session,Go to Properties Tab then Log Option settings , in this look for Save
session log by and then select the option Session Timestamp.

31. How to add source flat file header with output field names into target file?
When we use target as a flatfile then in the mapping tab of a session click the flatfile target then we have a header option, in that
select output field names

32. What is Test load plan? Let us assume 10 records in source, how many rows will be loaded into target?
NO rows will be loaded in to the target,bcoz this only TESTING
Source records should be load into the target as per the number of records set at the session level after enabling test load plan.

33. How do schedule a workflow in Informatica thrice in a day? Like run the workflow at 3am, 5am and 4pm?
In workflow designer > workflow > schedulers ... you can set 3 different daily schedulers. Each one has its execution time (3, 4, 5
pm)

34. Can we insert, update, delete in target tables with one update strategy tran.?
Yes we can do all these insert,update,delete also reject by using single update strategy.
In update strategy expression can be:
iif(somexvalue=xx,dd_insert,iif(somexvalue=yy,dd_update,iif (somexvalue=zz,dd_delete,dd_reject)))

35. If our source containing 1 terabyte data so while loading data into target what are the thing we keep in mind?

106
INFORMATICA hand book

1tera data is huge amount of data so if we use normal load type it takes so much of time....to overcome this better to use bulk
loading...and also taking the large commit intervals we can get good performance....and one more technique is external loading
for databases like netezza and teradata

Use partition option: Works for both relational and file source systems

If source is file: Instead of loading huge amount of data at a time, we can always split the source if it is a file using unix and
extract the data from all the source files using indirect type and load into target.
36. Can we insert duplicate data with dynamic look up cache, if yes than why and if no why?
Duplicate data cannot be inserted using dynamic look up cache..bcoz dynamic look up cache performs update and insert function
based on the keys it gets in the target table.

37. If there are 3 workflows are running and if 1st workflow fails then how could we start 2nd workflow or if 2nd
workflow fails how could we start 3rd workflow?
Option 1: Use worklet for each workflow and create and workflow with these worklets and create
dependency.
Option2: Use a scheduling tools such as control M or Autosys to create dependency among workflows.

38. My source having the records like


Name Occurs
ram 3
sam 5
tom 8

and I want to load into target like ram record 3 times, sam record 5 times, tom record 8 times.

Option1:
SQ SQLTranTarget
In SQLTran : INSERT INTO TABLE (c1) select ?p_name? from dual connect by level<=?p_occurs?
Option2:
SQStored ProcTransTarget
Stored Proc: Call stored procedure within the pipeline SP_INSERT(p_name,p_occurs) within the proc use p_occurs as loop
counter and within in the loop insert records

39. SOURECE

Name id dept sal


1 a1 A 100
2 b1 B 200
3 c1 C 300
4 d1 D 400

TARGET:

Name id dept sal


1 a1 A 100
2 b1 B 200
3 WER1 567 300
4 d1 D 400

I have source and target. How to validate data? Tell me difference in data between steps above table?
There are many ways to check this:
1. Do column level check meaning SRC.ID=TGT.ID but others columns are not matching on data.
2. You can do MINUS (SRC MINUS TGT) and (TGT MINUS SRC) for particular day's load.
3. You can do referential integrity check by checking any id in SRC but not in TGT or any id in TGT but not in SRC

40. We have 10 records in source in that good records go to relational target and bad records go to target bad file ? Here
if any bad records you need to send an email, if no bad records means no need to send email
Option1:
Pre session command: remove the existing file from infa_shared/BadFiles/XXXXX.txt
In post session command of the session you can call a shell script which checks the bad file for non zero records and send email
accordingly.

107
INFORMATICA hand book

Option 2:
Link Condition if $Session.TgtFailedsRows>0 then call a email task

41. Why should we use star schema in datawarehouse design?


1) Query performance is high.
2) It consume less space compare to SnowFlake.

42. When dynamic lookup; How do we generate surrogate keys using dynamic lookup? Can we use it for scd type 2
mapping and why?
Get surrogate key in as a lookup port and the set the associated port to sequence-ID.

When the Integration Service updates(old records) rows in the lookup cache it uses the primary key (PK_PRIMARYKEY) values
for rows in the cache and the target table. The Integration Service uses the sequence ID to generate a primary key for the
customer that it does not find in the cache. The Integration Service inserts the primary key value into the lookup cache and it
returns the value to the lookup/output port.

43. I have the source like


col1 col2
a l
b p
a m
a n
b q
x y
How to get the target data like below
col1 col2
a l,m,n
b p,q
x y
src->sorter->exp->agg->tgt

sorter:-
select col1 key as sorter

exp:-
var1=iff(var2=col1,var1||','||col2,col2)
var2=col1
output_port=var1

agg;-
select group by col1

44. I have Relational source like his.


JAN FEB MAR APR
100 200 300 400
500 600 700 800
900 100 200 300

I need to convert these rows into columns to the target


MONTH TOTAL
JAN 1500
FEB 900
MAR 1200
APR 1500

Source qualifier --> Normalizer --> Expr --> Agg --> target

Take a normalizer transformation. Create a normalized port named "detail" with occurence 4
. Connect input ports from source qualifier to each detail port in normalizer. Next take an expression transformation. In that
create an output port named month. And in expression editor write the
logic as

108
INFORMATICA hand book

DECODE(GCID_DETAIL,1,'JAN',DECODE(GCID_DETAIL,2,'FEB',DECODE(GCID_DETAIL,3,'MARCH','APRIL')))

In agg group by on month and get sum(numeric value)

45. Why the UNION TRANSFERMATION is an Active transformation


Reason1: In Union Transformation we may combine the data from two (or) more sources. Assume Table-1 contains '10' rows and
Table-2 contains '20' rows. If we combine the rows of Table-1 and Table-2 we will get a total of '30' rows in the Target. So it is
definitely an Active Transformation.
Reason2: That is UNION transformation is derived from custom transformation which is active transformation

46. Write sql query following table

city gender no
chennai male 40
chennai female 35
bangalore male 25
bangalore female 25
mumbai female 15

I want the required output

city male female


chennai 40 35
bangalore 25 25
mumbai 15

SELECT CITY,
SUM(DECODE(GENDER,'MALE',NO,0)) MALE ,
SUM(DECODE(GENDER,'FEMALE',NO,0)) FEMALE
FROM TABLE_NAME GROUP BY CITY

47. Write sql query following source table

jan feb mar apr


100 200 300 400
500 600 700 800
900 100 200 300
I want the output format like

month total
jan 1500
feb 900
mar 1200
apr 1500

Using UNION ALL ,You can achieve it, here is Your Query

SELECT 'JAN' AS MONTH, SUM(JAN) AS TOTAL FROM SRC_MONTHS


UNION ALL
SELECT 'FEB' AS MONTH, SUM(FEB) AS TOTAL FROM SRC_MONTHS
UNION ALL
SELECT 'MAR' AS MONTH, SUM(MAR) AS TOTAL FROM SRC_MONTHS
UNION ALL
SELECT 'APR' AS MONTH, SUM(APR) AS TOTAL FROM SRC_MONTHS

48. Write sql query following table amount year quarter


1000 2003 first
2000 2003 second
3000 2003 third
4000 2003 fourth
5000 2004 first

109
INFORMATICA hand book

6000 2004 second


7000 2004 third
8000 2004 fourth

I want the output


year q1_amount q2_amount q3_amount q4_amount
2003 1000 2000 3000 4000
2004 5000 6000 7000 8000

SELECT YEAR,
SUM(DECODE(QUARTER,'FIRST',AMOUNT)) Q1_AMOUNT ,
SUM(DECODE(QUARTER,'SECOND',AMOUNT)) Q2_AMOUNT,
SUM(DECODE(QUARTER,'THIRD',AMOUNT)) Q3_AMOUNT,
SUM(DECODE(QUARTER,'FOURTH',AMOUNT)) Q4_AMOUNT
FROM TABLE_NAME GROUP BY YEAR

49. If I have 10 records in my source, if we use router t/r and given the condition as i>2,i=5 and i<2in the different
groups. what is the o/p in the target
Router

I>2 3,5...10 (Tgt1)

I=5 ONLY 5 (Tgt2)

I<2 only 1 (Tgt3)

50. A flat file .dat is having 1 laks columns. Can I have andexcelled file format as a target.
No, A excel sheet can hold having 65536 colums but flat files one lak columns. The only option is to save it in as .CSV(comma
separated value)

51. Write a query to remove null value follwing table?


col1 col2 col3
dinesh null null
null suresh null
null null prakesh

i want the output


col1 col2 col3
dinesh suresh prakesh

SELECT MAX(COL1),MAX(COL2),MAX(COL3) FROM TABLE_NAME

52. write a sql query following table some duplicate present


1
1
2
2
3
3
4
5

I want the output unique one column duplicate another column following format like

Unique duplicate
1 1
2 2
3 3
4

110
INFORMATICA hand book

5
SELECT DISTINCT(DEPTNO) UNIQ,E.DUP FROM EMP
LEFT OUTER JOIN
(SELECT DEPTNO DUP FROM EMP GROUP BY DEPTNO HAVING COUNT
(DEPTNO)>1) E
ON (EMP.DEPTNO=E.DUP)

53. How will u get 1 and 3rd and 5th records in table what is the query in oracle
display odd records
select * from emp where (rowid,1) in ( select
rowid,mod(rownum,2) from emp)

display even records


select * from emp where (rowid,0) in ( select
rowid,mod(rownum,2) from emp)

54. How i will stop my workflow after 10 errors


sesseion level property error handling mention condition
stop on errors :10

55. To improve the performance of aggregator we use sorted input option and use sorter t/r before aggregator. But here
we are increasing one more cache in our mapping i.e; sorter. So how can u convince that you are increasing the
performance?
To do aggregation calculations function it needs some time.. We can reduce that time by providing sorted I/P. The time taken to
forward the rows from sorter to aggregator and then to downstream transformations is less than the time to do aggregation
calculations without sorted data

56. In Router transformation I created two groups .


One is Passthrough=> True
Second one is CorrectId’s => Invest>50000
Default
Is there any difference between default group and Passthrough group in this scenario?
Yes there is a diff in this scenario.
The first group (pass through) will pass all the records with all invests
Second group will pass invests > 50000 and
Finally if you want to use the default group instead of first group then,
default group will pass all the records with invest <= 50000

57. I have one flatfile as target in a mapping. When i am trying to load data second time into it.
The records already is in flatfile is getting override. I don't want to override existing records.
Note : we can do this by implementing CDC / Incremental pool logic if target is relational .
But this is flatfile. So, even i use this same technique it will override only
So what is the solution ?
Double click on session-->Mapping tab-->>Target properties--
>Append if exists(check this option).

111
INFORMATICA hand book

1. What is sql override?


Overriding SQL in source qualifier or lookup for additional logic

2. Can we have multiple conditions in a Lookup?


Yes

3. Can we have multiple conditions in a Filter?


Yes

4. How the flags are called in Update strategy?


0 - DD_INSERT , 1- DD_UPDATE , 2- DD_DELETE , 3- DD_REJECT

5. Is it possible to run the session other than Server manager? If so how?


YES USING PMCMD

6. What is diff. Things you can do using PMCMD?


Start, Stop and abort the session

7. What is the use of power plug?


For 3rd party connectors to sap, mainframe, Peoplesoft

8. What kind of Test plan? What kind of validation you do?


In Informatica we create some test SQL to compare the number or records and validate scripts if the data in the
warehouse is loaded for the logic incorporated.

9. What is the usage of unconnected/connected look up?


We use a lookup for connecting to a table in the source or a target. There are 2 ways in which a lookup can be
configured i.e. connected or unconnected

10. What is the difference between Connected and Unconnected Lookups ?


 Connected Lookup Receives input values directly from the pipeline.
 Unconnected Lookup Receives input values from the result of a :LKP expression in another transformation.
 Connected Lookup You can use a dynamic or static cache.
 Unconnected Lookup You can use a static cache.
 Connected Lookup Cache includes all lookup columns used in the mapping (that is, lookup table columns
included in the lookup condition and
 lookup table columns linked as output ports to other transformations).
 Unconnected Lookup Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
 Connected Lookup Can return multiple columns from the same row or insert into the dynamic lookup cache.
 Unconnected Lookup The dynamic lookup cache, Designate one return port (R). Returns one column from each
row.
 Connected Lookup If there is no match for the lookup condition, the Informatica Server returns the default value
for all output ports. If you configure dynamic caching, the Informatica Server inserts rows into the cache.
 Unconnected Lookup If there is no match for the lookup condition, the Informatica Server returns NULL.
 Connected Lookup Pass multiple output values to another transformation. Link lookup/output ports to another
transformation.
 Unconnected Lookup pass one output value to another transformation. The lookup/output/return port passes the
value to the transformation calling: LKP expression.
 Connected Lookup Supports user-defined default values.
 Unconnected Lookup Does not support user-defined default values

11. If u have data coming from diff. sources what transformation will u use in your designer?
Joiner Transformation

12. What are different ports in Informatica?

112
INFORMATICA hand book

Input, Output, Variable, Return/Rank, Lookup and Master.


13. What is a Variable port? Why it is used?
Variable port is used to store intermediate results. Variable ports can reference input ports and variable ports, but not
output ports.

14. Diff between Active and passive transformation?


Transformation can be active or passive, active transformation can change the no of records passed through it, a
passive transformation can never change the record count.
Active transformation that might change the record count are advanced ext proc, aggregator, filter, joiner, normalizer,
rank , update strategy, source qualifier, if u use powerconnect to access ERP sources, ERP source qualifier is also an
active transformation
Passive tranf :- lookup, expression, external procedure, seq generator, stored procedure
You can connect only 1 active transformation to the same transformation or target can connect any number of passive
transformations.

15. What are Mapplet?


A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and
can contain as many transformations as you need.

16. What is Aggregate transformation?


An aggregator transformation allows you to perform aggregate calculations, such as average and sums. The Aggregator
transformation is unlike the Expression transformation, in that you can use the Aggregator transformation to perform
calculations on groups.

17. What is Router Transformation? How is it different from Filter transformation?


A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition
to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the
condition. However, a router transformation tests data for one or more conditions and gives you the option to route
rows of data that do not meet any of the conditions to default output group.

18. What are connected and unconnected transformations?


Connected transformations are the transformation, which are in the data flow, whereas unconnected transformation
will not be in the data flow. These are dealt in Lookup and Stored procedure transformations.

19. What is Normalizer transformation?


Normalizer transformation normalizes records from COBOL and relational sources allowing you to organize the data
according to your needs. A normalizer transformation can appear anywhere in a data flow when you normalize a
relational source.

20. How to use a sequence created in Oracle in Informatica?


By using Stored procedure transformation

21. What are source qualifier transformations?


The source qualifier represents the records that the Informatica Server reads when it runs a session.

22. Significance of Source Qualifier Transformation?


When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier
transformation. The Source Qualifier represents the records that the Informatica Server reads when it runs a session. ·
To join data originating from the same DB.
· Filter records in the Source itself.
· To specify an outer join instead of a default inner join.
· To specify sorter ports.
· To select distinct values from the source.
· To create a custom query to issue a special select statement for the Informatica server to read source data. For
example, we might use a custom query to perform aggregate calculations or execute a stored procedure.

113
INFORMATICA hand book

23. What are cache and their types in Informatica?


The Informatica server creates index and data cache for aggregator, Rank, joiner and Lookup transformations in a
mapping. The Informatica server stores key values in the index cache and output values in the data cache.

24. What is an incremental aggregation?


In Incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the
source changes only incrementally and you can capture changes, you can configure the session to process only those
changes. This allows the Informatica server to update your target incrementally, rather than forcing it to process the
entire source and recalculate the same calculation each time you run the session.

25. What is Reject loading?


During a session, the Informatica server creates a reject file for each target instance in the mapping. If the writer or the
target rejects data, the Informatica server writes the rejected row into reject file. The reject file and session log contain
information that helps you determine the cause of the reject. You can correct reject files and load them to relational
targets using the Informatica reject load utility. The reject loader also creates another reject file for the data that the
writer or target reject during the reject loading.

26. WHAT IS SESSION and BATCHES?


SESSION - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From
Sources To Targets. After creating the session, we can use either the server manager or the command line program
pmcmd to start or stop the session.
BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server.
There Are Two Types OfBatches :
1. SEQUENTIAL - Run Session One after the Other.
2. CONCURRENT - Run Session At The Same Time.

27. What are 2 modes of data movement in Informatica Server?


The data movement mode depends on whether Informatica Server should process single byte or multi-byte character
data. This mode selection can affect the enforcement of code page relationships and code page validation in the
Informatica Client and Server.
a) Unicode –IS allows 2 bytes for each character and uses additional byte for each non-ascii character (such as
Japanese characters)
b) ASCII – IS holds all data in a single byte

28. Why we use lookup transformations?


Lookup Transformations can access data from relational tables that are not sources in mapping. With Lookup
transformation, we can accomplish the following tasks:
a) Get a related value - Get the Employee Name from the Employee table based on the Employee ID
b) Perform Calculation
Update slowly changing dimension tables - We can use unconnected lookup transformation to determine whether the
records already exist in the target or not.

29. What are confirmed dimensions?


Confirmed dimensions are linked to multiple fact tables

30. What is Data warehousing?


A DW is a DB used for query,analysis and reporting . By definition DW is a subject oriented, intergrated, non volatile
and time variant
Subject Oriented:- Represents a subject Aread like sales, Mktg
Integrated :- Data Colleted from multiple source systems integrated into a user readable unique format.Ex:- male,
female ,0,1, M,F, T, F
Non Volatile :- Dw stores historical data
Time Variant :- Stores data timewise like weekly,monthly,quarterly, yearly

31. What is a reusable transformation? What is a mapplet . Explain the difference between them
Reusable tranformation:- if u want to create transformation that perform common tasks such as avg sal in a dept
Mapplet:- Is a reusable object that represents a set of transformations

32. What happens when you use the delete or update or reject or insert statement in your update strategy?

114
INFORMATICA hand book

Inserts:- treats all records as inserts , while inserting if the record violates primary, foreign key or foreign key in the
database it rejects the record

33. Where do you design your mappings?


Designer

34. Where do you define users and privileges in Informatica?


Repository manager

35. How do you debug the data in Informatica?


Use debugger in designer

36. When you run the session does debugger loads the data to target?
If you select the option discard target data then it will not load to target

37. Can you use flat file and table (relational) as source together?
Yes

38. Suppose I need to separate the data for delete and insert to target depending on the condition, which transformation you
use?
Router or filter

39. What is the difference between lookup Data cache and Index cache.
Index cache:Contains columns used in condition
Data cache: :Contains other output columns than the condition columns.

40. What is an indicator file and how it can be used?


Indicator file is used for Event Based Scheduling when you don’t know when the Source Data is availaible., A shell
command ,script or a batch file creates and send this indicator file to the directory local to the Informatica Server.
Server waits for the indicator file to appear before running the session.

41. Different Tools in Designer


· Source Analyzer
· Warehouse designer
· Transformation Developer
· Maplet designer
· Mapping designer

42. Components of Informatica


· Designer
· Workflow Manager
· Workflow Monitor

43. Different Tools in Workflow Manager


· Task Developer
· Worklet designer
· Workflow Designer

44. What is overview window? Why it is used?


It’s a window in which you can see all the transformations that are used for a mapping.

45. While using Debugger, how will you find out which transformation is currently running?
The left hand corner of the transformation that has an arrow looks like moving.

46. How do you load the data using Informatica?


Using workflow manager

47. What is a Filter Transformation? or what options you have in Filter Transformation?

115
INFORMATICA hand book

The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a source
transformation through the Filter transformation, then enter a filter condition for the transformation. All ports in a Filter
transformation are input/output and only records that meet the condition pass through the Filter transformation.

48. What happens to the discarded rows in Filter Transformation?


Discarded rows do not appear in the session log or reject files

49. What are the two programs that communicate with the Informatica Server?
Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:
Server Manager. A client application used to create and manage sessions and batches, and to monitor and stop the
Informatica Server. You can use information provided through the Server Manager to troubleshoot sessions and
improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server,
and verify if the Informatica Server is running.

50. What you can do with Designer?


The Designer client application provides five tools to help you create mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and relational sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.

51. What are different types of Tracing Levels you have in Transformations?
Tracing Levels in Transformations :-
Level Description
Terse Indicates when the Informatica Server initializes the session and its components. Summarizes session results, but
not at the level of individual records. Normal Includes initialization information as well as error messages and
notification of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive information
about initializing transformations in the session.
Verbose data Includes all information provided with the Verbose initialization setting.
Note: By default, the tracing level for every transformation is Normal.
To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum of detail to the
session log when running a session containing the transformation.

52. What is Mapplet and how do you create Mapplet?


A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and
can contain as many transformations as you need. Create a mapplet when you want to use a standardized set of
transformation logic in several mappings. For example, if you have a several fact tables that require a series of
dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key.
You can then use the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping.

53. If data source is in the form of Excel Spread sheet then how do use?
PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file. Like relational
sources, the Designer uses ODBC to import a Microsoft Excel source. You do not need database permissions to import
Microsoft Excel sources.
To import an Excel source definition, you need to complete the following tasks:
· Install the Microsoft Excel ODBC driver on your system.
· Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit Administrator.
· Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric data.
· Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as source
definitions when you import the source.

54. When do u use connected lookup and when do you use unconnected lookup?
A connected Lookup transformation is part of the mapping data flow. With connected lookups, you can have multiple
return values. That is, you can pass multiple values from the same row in the lookup table out of the Lookup
transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates

116
INFORMATICA hand book

=> Finding a value based on multiple conditions


Unconnected Lookups : -
An unconnected Lookup transformation exists separate from the data flow in the mapping. You write an expression
using the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating slowly changing dimension
tables)
=> Calling the same lookup multiple times in one mapping

55. How many values it (informatica server) returns when it passes thru Connected Lookup n Unconncted Lookup?
Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that is Return
Value.

56. What kind of modifications you can do/perform with each Transformation?
Using transformations, you can modify data in the following ways:
----------------- ------------------------
Task Transformation
----------------- ------------------------
Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure Stored Procedure
Call a procedure in a shared library or in the External Procedure COM layer of Windows NT
Generate primary keys Sequence Generator
Limit records to a top or bottom range Rank
Normalize records, including those read Normalizerfrom COBOL sources
Look up values Lookup
Determine whether to insert, delete, update, Update Strategy or reject records
Join records from different databases Joiner or flat file systems

57. Expressions in Transformations, Explain briefly how do you use?


Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious examples of these
are the Expression and Aggregator transformations, which perform calculations on either single values or an entire
range of values within a port. Transformations that use expressions include the following:
--------------------- ------------------------------------------
Transformation How It Uses Expressions
--------------------- ------------------------------------------
Expression calculates the result of an expression for each row passing through the transformation, using values from
one or more ports.
Aggregator Calculates the result of an aggregate expression, such as a sum or average, based on all data passing
through a port or on groups within that data.
Filter Filters records based on a condition you enter using an expression.
Rank Filters the top or bottom range of records, based on a condition you enter using an expression.
Update Strategy assigns a numeric code to each record based on an expression, indicating whether the Informatica
Server should insert/update/delete/reject.

58. In case of Flat files (which comes thru FTP as source) has not arrived then what happens
You get a fatal error which cause server to fail/stop the session.

59. What does a load manager do ?


The Load Manager is the primary PowerCenter Server process. It accepts requests from the PowerCenter Client and
from pmcmd. The Load Manager runs and monitors the workflow. It performs the following tasks:
 Starts the session, creates DTM process and sends pre & post session emails.
 Manages the session and batch scheduling
 Locks the session and reads the session properties.
 Expands the session and server variables and parameters

117
INFORMATICA hand book

 Validates the source and target code pages


 Verifies the permissions and privileges
 Creates session log file
 Creates DTM process which executes the session

60. What is a cache?


Temporary memory area used to store intermediate results. Operations like sorting and grouping requires cache.

61. What is an Expression transformation?


Expression transformation is used to calculate expressions on a row by row basis. Total_sal = Com * sal

62. I have two sources S1 having 100 records and S2 having 10000 records, I want to join them, using joiner
transformation. Which of these two sources (S1,S2) should be master to improve my performance? Why?
S1 should be the master as it contains few records so that the usage of cache can be reduced, S2
should be detail.

63. I have a source and I want to generate sequence numbers using mappings in informatica. But I don’t want to use
sequence generator transformation. Is there any other way to do it?
YES, Use an unconnected lookup to get max key value and there on increment by 1 using an expression variable OR
write a stored procedure and use Stored Procedure Transformation.

64. What is a bad file?


Bad file is the file which contains the data rejected by the writer or target.

65. What is the first column of the bad file?


Record / Row indicator 0,1,2,3
0 – insert -- Rejected by writer/target
1- update -- Rejected by writer/target
2- delete -- Rejected by writer/target
3-reject -- Rejected by writer --- coz update statement. has marked it for reject.

66. What are the contents of the cache directory in the server?
Index cache files and Data caches

67. Is lookup an Active transformation or Passive transformation?


Passive by default and can be configured to be active

68. Is SQL transformation an Active transformation or Passive transformation?


Active by default and can be configured to be passive

69. What is a Mapping?


Mapping represents the data flow between source and target

70. What are the types of transformations?


Passive and active

71. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL port,
what value will each target get?
Each target will get the value in multiple of 3

72. Difference between Source Based Commit Vs Target Based Commit


Target Based Commit
During a target-based commit session, the Informatica Server continues to fill the writer buffer after it reaches the
commit interval. When the buffer block is filled, the Informatica Server issues a commit command. As a result, the
amount of data committed at the commit point generally exceeds the commit interval.
For example, a session is configured with target-based commit interval of 10,000. The writer buffers fill every 7,500
rows. When the Informatica Server reaches the commit interval
of 10,000, it continues processing data until the writer buffer is filled. The second buffer fills at 15,000 rows, and the
Informatica Server issues a commit to the target. If the session completes successfully, the Informatica Server issues
commits after 15,000, 22,500, 30,000, and 40,000 rows.
Source Based Commit

118
INFORMATICA hand book

During a source-based commit session, the Informatica Server commits data to the target based on the number of rows
from an active source in a single pipeline. These rows are referred to as source rows. A pipeline consists of a source
qualifier and all the transformations and targets that receive data from the source qualifier. An active source can be any
of the following active transformations:
Advanced External Procedure
Source Qualifier
Normalizer
Aggregator
Joiner
Rank
Sorter
Mapplet, if it contains one of the above transformations
Note: Although the Filter, Router, and Update Strategy transformations are active transformations, the Informatica
Server does not use them as active sources in a source-based commit session.

73. Have you used the Abort, Decode functions?


Abort can be used to Abort / stop the session on an error condition. If the primary key column contains NULL, and you
need to stop the session from continuing then you may use ABORT function in the default value for the port. It can be
used with IIF and DECODE function to Abort the session.

74. What do you know about the Informatica server architecture? Load Manager, DTM, Reader, Writer, Transformer
o Load Manager is the first process started when the session runs. It checks for validity of mappings, locks sessions and
other objects.
o DTM process is started once the Load Manager has completed its job. It starts a thread for each pipeline.
o Reader scans data from the specified sources.
o Writer manages the target/output data.
o Transformer performs the task specified in the mapping.

75. What are the default values for variables?


String = Null, Number = 0, Date = 1/1/1753

76. How many ways you can filter the records?


1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy

77. How do you identify the bottlenecks in Mappings?


Bottlenecks can occur in
1. Targets
The most common performance bottleneck occurs when the informatica server writes to a target database. You can
identify target bottleneck by configuring the session to write to a flat file target.
If the session performance increases significantly when you write to a flat file, you have a target bottleneck.
Solution :
Drop or Disable index or constraints
Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised)
Tune the database for RBS, Dynamic Extension etc.,

2. Sources
Set a filter transformation after each SQ and see the records are not through.
If the time taken is same then there is a problem.
You can also identify the Source problem by
Read Test Session – where we copy the mapping with sources, SQ and remove all transformations and connect to file
target.
If the performance is same then there is a Source bottleneck.
Using database query – Copy the read query directly from the log. Execute the query against the source database with a
query tool. If the time it takes to execute the query and the time to fetch the first row are significantly different, then the
query can be modified using optimizer hints.
Solutions:
Optimize Queries using hints.

119
INFORMATICA hand book

Use indexes wherever possible.

3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then there is a problem.
(OR) Look for the performance monitor in the Sessions property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a mapping bottleneck.
Optimize Single Pass Reading

78. How to improve the Session performance?


1 Run concurrent sessions
2 Partition sessions (Power center)
3. Tune Parameter – DTM buffer pool, Buffer block size, Index cache size, data cache size, Commit Interval, Tracing
level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM can be increased.
The informatica server uses the index and data caches for Aggregate, Rank, Lookup and Joiner
transformation. The server stores the transformed data from the above transformation in the data cache before
returning it to the data flow. It stores group information for those transformations in index cache.
If the allocated data or index cache is not large enough to store the date, the server stores the data in a
temporary disk file as it processes
the session data. Each time the server pages to the disk the performance slows. This can be seen from the
counters .
Since generally data cache is larger than the index cache, it has to be more than the index.
4. Remove Staging area
5. Tune off Session recovery
6. Reduce error tracing

79. What are Business components? Where it exists?


It is available in navigator inside the folder.

80. What are Short cuts? Where it is used?


Shortcuts allow you to use metadata across folders without making copies, ensuring uniform metadata. A shortcut
inherits all properties of the object to which it points. Once you create a shortcut, you can configure the shortcut name
and description.
When the object the shortcut references changes, the shortcut inherits those changes. By using a shortcut instead of a
copy, you ensure each use of the shortcut matches the original object. For example, if you have a shortcut to a target
definition, and you add a column to the definition, the shortcut inherits the additional column.

· Scenario1
Here is a table with Single Row, in a target table the same row should be populated 10 times.
Using Normalizer, we can do it. Hint : Normalizer / Occurs make it 10 and
Have 10 inputs and a output. You will get 10 rows.

81. While importing the relational source definition from database, what are the metadata of source you import?
Source name
Database location
Column names
Data types
Key constraints

82. How many ways U can update a relational source definition and what are they?
Two ways
1. Edit the definition
2. Re-import the definition

83. How many ways u create ports?


Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.

120
INFORMATICA hand book

84. What r the unsupported repository objects for a mapplet?


COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations.
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions

85. What are the mapping parameters and mapping variables?


Mapping parameter represents a constant value that you can define before running a session. A mapping parameter
retains the same value throughout the entire session.
When you use the mapping parameter in a mapping or maplet, then define the value of parameter in a parameter file for
the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the session. The
informatica server saves the value of mapping variable to the repository at the end of session run and uses that value
next time you run the session.

86. Can you use the mapping parameters or variables created in one mapping into another mapping?
NO.
We can use mapping parameters or variables in any transformation of the same mapping or mapplet in which U have
created mapping parameters or variables.

87. Can u use the mapping parameters or variables created in one mapping into any other reusable transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.

88. How can U improve session performance in aggregator transformation?


Use sorted input.

89. What is the difference between joiner transformation and source qualifier transformation?
U can join heterogeneous data sources in joiner transformation which we cannot achieve in source qualifier
transformation.
U need matching keys to join two relational sources in source qualifier transformation. Whereas u doesn't need
matching keys to join
two sources.
Two relational sources should come from same data source in sourcequalifier. U can join relational sources which are
coming from different sources also.

90. In which conditions we can/cannot use joiner transformation(Limitations of joiner transformation)?


Ideally in joiner transformation, below are the limitations
 Both input pipelines originate from the same Source Qualifier transformation.
 Both input pipelines originate from the same Normalizer transformation.
 Both input pipelines originate from the same Joiner transformation.
 Either input pip`elines contains an Update Strategy transformation.
 Either input pipelines contains a connected or unconnected Sequence Generator transformation.
But you can join data using joiner from single pipeline by selecting sorted input option in Joiner transformation.

91. What are the settings that you use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join

92. What are the join types in joiner transformation? Normal (Default)
Master outer
Detail outer
Full outer

93. How the informatica server sorts the string values in Rank transformation?

121
INFORMATICA hand book

When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sort order. If
you configure the session to use a binary sort order, the informatica server calculates the binary value of each string and
returns the specified number of rows with the highest binary values for the string.

94. What is the Rank index in Rank transformation?


The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses
the Rank Index port to store the ranking position for each record in a group.
For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index
numbers the salespeople from 1 to 5.

95. What is the Router transformation?


Input group
Output group
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition
to test data.
However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition.
A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not
meet any of the conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead
of creating multiple Filter transformations to perform the same task.

96. What are the types of groups in Router transformation?


The designer copies property information from the input ports of the input group to create a set of output ports for each
output group.
Two types of output groups
User defined groups
Default group
U cannot modify or delete default groups.

97. Why we use stored procedure transformation?


For populating and maintaining data bases.
 To perform calculation: There will be many well tested calculations which we implement using expression.
Instead of using expression we can use stored procedure to store these calculations and then use them by using
connected or unconnected stored procedure transformation
 Dropping and recreating indexes: Whenever we have huge number of record to be loaded to target its better to
drop the existing indexes and recreate it.For dropping and recreation of indexes we can make use of connected or
unconnected stored procedure transformation
 Check the status of a target table before loading data into it.
 To check the space left in Database

98. What are the types of data that passes between informatica server and stored procedure?
3 types of data
Input/Out put parameters
Return Values
Status code.

99. What is the status code?


Status code provides error handling for the informatica server during the session. The stored procedure issues a status
code that notifies whether or not stored procedure completed successfully. This value cannot be seen by the user. It
only used by the informatica server to determine whether to continue running the session or stop.

100. What are the tasks that source qualifier performs?


Join data originating from same source data base.
Filter records when the informatica server reads source data.
Specify an outer join rather than the default inner join specify sorted records.
Select only distinct values from the source.
Creating custom query to issue a special SELECT statement for the informatica server to read source data.

122
INFORMATICA hand book

101. What is the default join that source qualifier provides?


Inner equi join.

102. What are the basic needs to join two sources in a source qualifier?
Two sources should have primary and foreign key relationships.
Two sources should have matching data types.

103. What is update strategy transformation?


Flagging rows within a mapping.
Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete, update, or reject.

Operation Constant Numeric Value


INSERT DD_INSERT 0
UPDATE DD_UPDATE 1
DELETE DD_DELETE 2
REJECT DD_REJECT 3

104. Describe two levels in which update strategy transformation sets?


Within a session: When you configure a session, you can instruct the Informatica Server to either treat all records in the
same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records
for different database operations.
Within a mapping: Within a mapping, you use the Update Strategy transformation to flag records for insert, delete,
update, or reject.

105. What is the default source option for update strategy transformation?
Data driven

106. What is Data driven?


The informatica server follows instructions coded into update strategy transformations with in the session mapping
determine how to flag records for insert, update, delete or reject.
If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the
mapping.

107. What are the options in the target session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table

108. What are the types of mapping wizards that are to be provided in Informatica?
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are
designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact
table.
Getting Started Wizard: Creates mappings to load static fact and dimension tables, as well as slowly growing
dimension tables.
Slowly Changing Dimensions Wizard:. Creates mappings to load slowly changing dimension tables based on the
amount of historical dimension data you want to keep and the method you choose to handle historical dimension data.

109. What are the types of mapping in Getting Started Wizard?


Simple Pass through mapping :
Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data
from your table before loading new data.
Slowly Growing target :
Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when
existing data does not require updates.

123
INFORMATICA hand book

110. What are the mappings that we use for slowly changing dimension table?
Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing
dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any
previous versions of dimensions in the table.
Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are
tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep
a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to
each dimension.
Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those
found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target.
When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row
and replaces the existing data with the updates

111. What are the different types of Type2 dimension mapping?


Type2 Dimension/Version Data Mapping: In this mapping the updated dimension in the source will gets inserted in
target along with a new version number. And newly added dimension in source will insert into target with a primary
key.
Type2 Dimension/Flag current Mapping: This mapping is also used for slowly changing dimensions. In addition it
creates a flag value for changed or new dimension. Flag indicates the dimension is new or newly updated. Recent
dimensions will gets saved with current flag value 1. And updated dimensions are saved with the value 0.
Type2 Dimension/Effective Date Range Mapping: This is also one flavor of Type2 mapping used for slowly changing
dimensions. This mapping also inserts both new and changed dimensions in to the target. And changes r tracked by the
effective date range for each version of each dimension.

112. How can u recognize whether or not the newly added rows in the source r gets insert in the target ?
In the Type2 mapping we have three options to recognize the newly added rows
Version number
Flag value
Effective date Range

113. What r two types of processes that informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session
completes.
The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-
session operations.

114. Can u generate reports in Informatica?


Yes. By using Metadata reporter we can generate reports in informatica.

115. What is metadata reporter?


It is a web based application that enables you to run reports against repository metadata. With a meta data reporter, u
can access information about Ur repository without having knowledge of sql, transformation language or underlying
tables in the repository.

116. Define mapping and sessions?


Mapping: It is a set of source and target definitions linked by transformation objects that define the rules for
transformation.
Session: It is a set of instructions that describe how and when to move data from source to targets.

117. Which tool U use to create and manage sessions and batches and to monitor and stop the informatica server?
Informatica Workflow manager and monitor

118. Why we use partitioning the session in informatica?


Partitioning achieves the session performance by reducing the time period of reading the source and loading the data
into target.

119. To achieve the session partition what r the necessary tasks u have to do?
Configure the session to partition source data.
Install the informatica server on a machine with multiple CPU's.

124
INFORMATICA hand book

120. How the informatica server increases the session performance through partitioning the source?
For relational sources informatica server creates multiple connections for each partition of a single source and extracts
separate range of data for each connection. Informatica server reads multiple partitions of a single source concurrently;
each partition is associated to a thread. Similarly for loading also informatica server creates multiple connections to the
target and loads partitions of data concurrently.
For XML and file sources, informatica server reads multiple files concurrently. For loading the data informatica server
creates a separate file for each partition(of a source file).U can choose to merge the targets.

121. Why u use repository connectivity?


When u edit, schedule the session each time, informatica server directly communicates the repository to check whether
or not the session and users r valid. All the metadata of sessions and mappings will be stored in repository.

122. What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When u start the informatica server the load manager launches and queries
the repository for a list of sessions configured to run on the informatica server. When u configure the session the load
manager maintains list of list of sessions and session start times. When u start a session load manger fetches the session
information from the repository to perform the validations
and verifications prior to starting DTM process
Locking and reading the session: When the informatica server starts a session load manager locks the session from the
repository. Locking prevents U starting the session again and again.
Reading the parameter file: If the session uses a parameter files, load manager reads the parameter file and verifies that
the session level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges
to run the session.

123. What is DTM process?


After the load manger performs validations for session, it creates the DTM process. DTM is to create and manage the
threads that carry out the session tasks. I creates the master thread. Master thread creates and manages all the other
threads.

124. What r the different threads in DTM process?


Master thread: Creates and manages all other threads
Mapping thread: One mapping thread will be creates for each session. Fetches session and mapping information.
Pre and post session threads: This will be created to perform pre and post session operations.
Reader thread: One thread will be created for each partition of a source. It reads data from source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to transform data.

125. What r the data movement modes in informatica?


Data movement modes determines how informatica server handles the character data. U choose the data movement in
the informatica server configuration settings. Two types of data movement modes available in informatica.
ASCII mode
Uni code mode.

126. What r the output files that the informatica server creates during the session running?
Informatica server log: Informatica server(on UNIX) creates a log for all status and error messages(default name:
pm.server.log).It also creates an error log for error messages. These files will be created in informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes information about session into log
files such as initialization process, creation of sql commands for reader and writer threads, errors encountered and load
summary. The amount of detail in session log file depends on the tracing level that u set.
Session detail file: This file contains load statistics for each target in mapping. Session detail include information such
as table name, number of rows written or rejected. U can view this file by double clicking on the session in monitor
window
Performance detail file: This file contains information known as session performance details which helps U where
performance can be improved. To generate this file select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when U run a session that uses the external loader.
The control file contains the information Post session email: Post session email allows U to automatically communicate
information about a session run to designated recipients. U can create two different Indicator file: If u use the flat file as
a target, U can configure the informatica server to create indicator file. For each target row, the indicator file contains

125
INFORMATICA hand book

output file: If session writes to a target file, the informatica server creates the target file based on file properties entered
in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files. For the following
circumstances informatica server creates index Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation

127. In which circumstances that informatica server creates Reject files?


When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Filed in the rows was truncated or overflowed.

128. What is polling?


It displays the updated information about the session in the monitor window. The monitor window displays the status
of each session when U poll the informatica server

129. Can u copy the session to a different folder or repository?


Yes. By using copy session wizard u can copy a session in a different folder or repository. But that target folder or
repository should consists of mapping of that session. If target folder or repository is not having the mapping of
copying session , u should have to copy that mapping first before u copy the session.

130. What is batch and describe about types of batches?


Grouping of session is known as batch. Batches r two types
Sequential: Runs sessions one after the other
Concurrent: Runs session at same time.
If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after
another. If u have several independent sessions u can use concurrent batches. Which runs all the sessions at the same
time.

131. Can u copy the batches?


NO

132. How many number of sessions that u can create in a batch?


Any number of sessions.

133. When the informatica server marks that a batch is failed?


If one of session is configured to "run if previous completes" and that previous session fails.

134. What is a command that used to run a batch?


pmcmd is used to start a batch.

135. What are the different options used to configure the sequential batches?
Two options
Run the session only if previous session completes successfully.
Always runs the session.

136. In a sequential batch can u run the session if previous session fails?
Yes. By setting the option always runs the session.

137. Can u start batches with in a batch?


You cannot. If u want to start batch that resides in a batch, create a new independent batch and copy the necessary
sessions into the new batch.

138. Can u start a session inside a batch individually?


We can start our required session only in case of sequential batch. in case of concurrent batch we can’t do like this.

139. How can u stop a batch?


By using workflow monitor or pmcmd or forcefully cancel the.

140. What are the session parameters?

126
INFORMATICA hand book

Session parameters are like mapping parameters, represent values U might want to change between sessions such as
database connections or source files.
Server manager also allows you to create user defined session parameters. Following are user defined session
parameters.
- Database connections
- Source file names: use this parameter when u want to change the name or location of session source file between
session runs
- Target file name: Use this parameter when u want to change the name or location of session target file between
session runs.
- Reject file name: Use this parameter when u want to change the name or location of session reject files between
session runs.

141. What is parameter file?


Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by
text editor such as word pad or notepad. You can define the following values in parameter file
Mapping parameters
Mapping variables
Session parameters

142. How can u access the remote source into your session?
Relational source: To access relational source which is situated in a remote place, you need to configure database
connection to the data source.
File Source : To access the remote source file you must configure the FTP connection to the host machine before u
create the session.
Heterogeneous : When your mapping contains more than one source type, the server manager creates
a heterogeneous session that displays source options for all types.

143. What is difference between portioning of relational target and partitioning of file targets?
If u partition a session with a relational target informatica server creates multiple connections to the target database to
write target data concurrently. If u partition a session with a file target the informatica server creates one target file for
each partition. U can configure session properties to merge these target files.

144. What are the transformations that restrict the partitioning of sessions?
Advanced External procedure transformation and External procedure transformation: This transformation contains a
check box on the properties tab to allow partitioning.
Aggregator Transformation: If you use sorted ports you cannot partition the associated source
Joiner Transformation: U cannot partition the master source for a joiner transformation
Normalizer Transformation
XML targets.

145. What is performance tuning in Informatica?


The goal of performance tuning is to optimize session performance so that the sessions run during the available load
window for the Informatica Server. Increase the session performance by following.
Network: The performance of the Informatica Server is related to network connections. Data generally moves across a
network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network
connections often affect on session performance. So avoid network connections.
Flat files: If your flat files stored on a machine other than the informatica server, move those files to the machine that
consists of informatica server.
Less Connections: Minimize the connections to sources ,targets and informatica server to improve session
performance. Moving target database into server system may improve session performance.
Staging areas: If you use staging areas you force informatica server to perform multiple data passes. Removing of
staging areas may improve session performance. Use staging area only when its mandatory
Informatica Servers: You can run the multiple informatica servers against the same repository. Distributing the
session load to multiple informatica servers may improve session performance.
Run the informatica server in ASCII data movement mode improves the session performance. Because ASCII data
movement mode stores a character value in one byte. Unicode mode takes 2 bytes to store a character.
Source qualifier: If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve
performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from
optimization such as adding indexes.
Drop constraints: If target consists of key constraints and indexes it slows the loading of data. To improve the session
performance in this case drop constraints and indexes before we run the session (while loading facts and dimensions)
and rebuild them after completion of session.

127
INFORMATICA hand book

Parallel sessions: Running parallel sessions by using concurrent batches will also reduce the time of loading the data.
So concurrent batches may also increase the session performance.
Partitioning: the session improves the session performance by creating multiple connections to sources and targets and
loads data in parallel pipe lines.
Incremental Aggregation: In some cases if a session contains an aggregator transformation, you can use incremental
aggregation to improve session performance.
Transformation Errors: Avoid transformation errors to improve the session performance. Before saving the mapping
validate it and see and if any transformation errors rectify it.
Lookup Transformations: If the session contained lookup transformation you can improve the session performance
by enabling the look up cache.The cache improves the speed by saving the previous data and hence no need to load that
again
Filter Transformations: If your session contains filter transformation, create that filter transformation nearer to the
sources or you can use filter condition in source qualifier.
Group transformations: Aggregator, Rank and joiner transformation may often decrease the session
performance .Because they must group data before processing it. To improve session performance in this case use
sorted ports option ie sort the data before using the transformation.
Packet size: We can improve the session performance by configuring the network packet size, which allows data to
cross the network at one time. To do this go to server manger, choose server configure database connections.

146. Define informatica repository?


The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server
and Client tools.
Metadata can include information such as mappings describing how to transform source data, sessions indicating when
you want the Informatica Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords, permissions and privileges,
and product version.
Use repository manager to create the repository. The Repository Manager connects to the repository database and runs
the code needed to create the repository tables. These tables stores metadata in specific format the informatica server,
client tools use.

147. What are the types of metadata that stores in repository?


Following r the types of metadata that stores in the repository
Database connections
Global objects
Mappings
Mapplets
Multidimensional metadata
Reusable transformations
Sessions and batches
Short cuts
Source definitions
Target definitions
Transformations

148. What is incremental aggregation?


When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If
the source changes only incrementally and you can capture changes, you can configure the session to process only
those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process
the entire source and recalculate the same calculations each time you run the session.

149. What are the scheduling options to run a session?


You can schedule a session to run at a given time or interval, or u can manually run the session.
Different options of scheduling
Run only on demand: Informatica server runs the session only when user starts session explicitly
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the repeat dialog box.

150. What is tracing level and what r the types of tracing level?
Tracing level represents the amount of information that informatica server writes in a log file.
Types of tracing level
Normal

128
INFORMATICA hand book

Verbose
Verbose init
Verbose data

151. What is difference between stored procedure transformation and external procedure transformation?
In case of stored procedure transformation procedure will be compiled and executed in a relational data source. U need
database connection to import the stored procedure in to your mapping. Where as in external procedure transformation
procedure or function will be executed outside of data source. ie u need to make it as a DLL to access in u r mapping.
No need to have data base connection in case of external procedure transformation.

152. Explain about Recovering sessions?


If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of
failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the
properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.

153. If a session fails after loading of 10,000 records in to the target. How can u load the records from 10001st record when
u run the session next time?
As explained above informatica server has 3 methods to recovering the sessions. Use performing recovery to load the
records from where the session fails.

154. Explain about perform recovery?


When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row
ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts
processing from the next row ID.
For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the
Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001.
By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica
Server setup before you run a session so the Informatica Server can create and/or write entries in the
OPB_SRVR_RECOVERY table.

155. How to recover the standalone session?


A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a
menu command or pmcmd. These options are not available for batched sessions.
To recover sessions using the menu:
1. In the Server Manager, highlight the session you want to recover.
2. Select Server Requests-Stop from the menu.
3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.
To recover sessions using pmcmd:
1.From the command line, stop the session.
2. From the command line, start recovery.

156. How can u recover the session in sequential batches?


If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session.
The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session
property To recover sessions in sequential batches configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the
previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch
complete, recover the failed session as a standalone session.

157. How to recover sessions in concurrent batches?

129
INFORMATICA hand book

If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However,
if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as
a standalone session.
To recover a session in a concurrent batch:
1.Copy the failed session using Operations-Copy Session.
2.Drag the copied session outside the batch to be a standalone session.
3.Follow the steps to recover a standalone session.
4.Delete the standalone copy.

158. How can u complete unrecoverable sessions?


Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the
session from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or
when running recovery might result in inconsistent data.

159. What are the circumstances that informatica server results an unrecoverable session?
The source qualifier transformation does not use sorted ports.
If u change the partition information after the initial session fails.
Perform recovery is disabled in the informatica server configuration.
If the sources or targets changes after initial session fails.
If the mapping consists of sequence generator or normalizer transformation.
If a concurrent batch contains multiple failed sessions.

160. If I have done any modifications for my table in back end does it reflect in informatica warehouse or mapping designer
or source analyzer?
NO. Informatica is not at all concern with back end data base. It displays u all the information that is to be stored in
repository. If want to reflect back end changes to informatica screens, again u have to import from back end to
informatica by valid connection. And u have to replace the existing files with imported files.

161. After dragging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can u map these three
ports directly to target?
NO. Unless and until u join those three ports in source qualifier u cannot map them directly.

162. Informatica Server Variables


1. $PMRootDir
2. $PMSessionLogDir
3. $PMBadFileDir
4. $PMCacheDir
5. $PMTargetFileDir
6. $PMSourceFileDir
7. $PMExtProcDir
8. $PMTempDir
9. $PMSuccessEmailUser
10. $PMFailureEmailUser
11. $PMSessionLogCount
12. $PMSessionErrorThreshold
13. $PMWorkflowLogDir
14. $PMWorkflowLogCount

163. what are the main issues while working with flat files as source and as targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'. Keep that file in exact
path which you have specified in the session.
1. We cannot use SQL override. We have to use transformations for all our requirements
2. Testing the flat files is a very tedious job
3. The file format (source/target definition) should match exactly with the format of data file. Most of the time
erroneous
result come when the data file layout is not in sync with the actual file.
(i) Your data file may be fixed width but the definition is delimited----> truncated data
(ii) Your data file as well as definition is delimited but specifying a wrong delimiter
(a) a delimitor other than present inactual file or
(b) a delimiter that comes as a character in some field of the file--->wrong data again
(iii) Not specifying NULL character properly may result in wrong data
(iv) there are other settings/attributes while creating file definition which one should be very careful

130
INFORMATICA hand book

4. If you miss link to any column of the target then all the data will be placed in wrong fields. That
missed column won’t exist in the target data file.

164. Explain about Informatica server process that how it works relates to mapping variables?
Informatica primarily uses load manager and data transformation manager(dtm) to perform extracting transformation
and loading. Load manager reads parameters and a variable related to session mapping and server and passes the
mapping parameters and variable information to the DTM.DTM uses this information to perform the data movement
from source to target
The PowerCenter Server holds two different values for a mapping variable during a session run:
 Start value of a mapping variable
 Current value of a mapping variable
Start Value
The start value is the value of the variable at the start of the session. The start value could be a value defined in the
parameter file for the variable a value saved in the repository from the previous run of the session a user defined initial
value for the variable or the default value based on the variable datatype.
The PowerCenter Server looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value
The current value is the value of the variable as the session progresses. When a session starts the current value of a
variable is the same as the start value. As the session progresses the PowerCenter Server calculates the current value
using a variable function that you set for the variable. Unlike the start value of a mapping variable the current value can
change as the PowerCenter Server evaluates the current value of a variable as each row passes through the mapping.
165. A query to retrieve the latest records from the table sorted by version(scd).
Select* from dimension a
where a.version (select max(b.version) from dimension b where a.dno b.dno);

select * from dimension where sysdate between begin_effective_date and end_effective_date;

166. Which one is better performance wise joiner or lookup


Are you lookuping flat file or database table? Generally sorted joiner is more effective on flat files than lookup because
sorted joiner uses merge join and cashes less rows. Lookup caches always the whole file. If the file is not sorted it can
be comparable. Lookups into database table can be effective if the database can return sorted data fast and the amount
of data is small because lookup can create whole cash in memory. If database responses slowly or big amount of data
are processed lookup cache initialization can be really slow (lookup waits for database and stores cashed data on discs).
Then it can be better use sorted joiner which throws data to output as reads them on input.

167. How many types of sessions are there in informatica


Reusable & nonusable session
Session is a type of workflow task and set of instructions that describe how to move Data from Source to targets using a
mapping
The sessions in Informatica can be configured as
1. Sequential: When Data moves one after another from source to target it is sequential
2.Concurrent: When whole data moves simultaneously from source to target it is Concurrent

168. How can we remove/optmize source bottlenecks using "query hints"


Create indexes for source table colums
First u must have proper indexes and the table must be analyzed to gather stats to use the cbo.
Use the hints after and it is powerful so be careful with the hints.

169. What is target load order ?


In a mapping if there are more than one target table then we need to give in which order the target tables should be
loaded
Example: suppose in our mapping there are 2 pipelines to load 2 target table
1. Customer
2. Audit table
First customer table should be populated than Audit table for that we use target load order

170. How did you handle errors?(ETL-Row-Errors)

131
INFORMATICA hand book

If there is an error comes it stored it on target_table.bad file.


The error are in two type
1. row-based errors
2. column based errors
column based errors identified by
D-GOOD DATA N-NULL DATA O-OVERFLOW DATA R-REJECTED DATA
the data stored in .bad file
D1232234O877NDDDN23

171. What is the event-based scheduling?


In time based scheduling the jobs run at the specified time. In some situations we've to run a job based on some events
like if a file arrives then only the job has to run whatever the time it is. In such cases event based scheduling is used.
.
172. What Bulk & Normal load? Where we use Bulk and where Normal?
When we try to load data in bulk mode there will be no entry in database log files so it will be tough to recover data if
session got failed at some point. where as in case of normal mode entry of every record will be with database log file
and with the informatica repository. so if the session got failed it will be easy for us to start data from last committed
point.
Bulk mode is very fast comparatively with normal mode.
Bulk mode is used for Oracle/SQLserver/Sybase. This mode improves performance by not writing to the database log.

173. What is CDC?


Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction.
With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables
and the change data is stored inside the database in change tables.
The change data thus captured is then made available to the target systems in a controlled manner.

174. How do we do unit testing in informatica?how do we load data in informatica ?


Unit testing are of two types
1. Quantitaive testing
2.Qualitative testing
Steps.
1.First validate the mapping
2.Create session on themapping and then run workflow.
Once the session is succeeded the right click on session and go for statistics tab.
There you can see how many number of source rows are applied and how many number of rows loaded in to targets
and how many number of rows rejected.This is called Quantitative testing. If once rows are successfully loaded then we
will go for qualitative testing.
Steps
1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source
columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded
according to the DATM then go and check in the code and rectify it.

175. How can we store previous session logs


Just run the session in time stamp mode then automatically session log will not overwrite current session log.
We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to
save)
Go to Session-->right click -->Select Edit Task then Goto -->Config Object then set the property
Save Session Log By --Runs
Save Session Log for These Runs --->To Number of Historical Session logs you want

GOOD LINKS

http://stackoverflow.com/questions/tagged/informatica
http://www.itnirvanas.com/2009/01/informatica-interview-questions-part-1.html
http://gogates.blogspot.in/2011/05/informatica-interview-questions.html
https://community.informatica.com/thread/38970
http://shan-informatica.blogspot.in/

132
INFORMATICA hand book

http://www.info-etl.com/course-materials/informatica-powercenter-development-best-practices
http://informaticaramamohanreddy.blogspot.in/2012/08/final-interview-questions-etl.html
http://informaticaconcepts.blogspot.in/search/label/ScenaroBasedQuestions
http://baluinfomaticastar.blogspot.in/2011/06/dwh-material-with-informatica-material.html

133

You might also like