BODS Interview Questions
BODS Interview Questions
1. When to go LO Extraction in BW?
It is not required.
2. What is DELTA Load in BW and BODS?
Full load is the method every time truncating the target data and load the whole data from source.
If your source contains very less records and you no need to maintain the history then full load is the
better.
The main disadvantage is performance. If we take transactional data , it may have millions of
records . If we want to load full load , it takes lot of time . We cannot maintain the history of the
source changes in the target as everything deleting the target data
You will create two dataflows in this exercise. The first Dataflow (Initial/full) initially loads all of the
rows from a source table.
The second dataflow (Delta) identifies only the rows that have been added or changed in source and
loads them into the target table.
Delta dataflow extract only new records and updated records from source based on updated date
field > last_date of job execution which is coming from script.
Emp_Init_Delta.updated_date> $GV_JOB_LAST_RUN_DATE
3. Which transforms are u worked in ur project?
Case, Map_Operation, Merge, Query, Row_Generation, SQL, Validation
The different types of slowly changing dimension types are given below.
1. Type 0
2. Type 1
3. Type 2
4. Type 3
In this document I will explain about first five types of SCD types with examples.
Source Data
SCD TYPE 0
The SCD Type 0 method is passive. Value remains the same as it were at the time the dimension record was
first entered. Type 0 also applies to most date dimension attributes.
SCD TYPE 1
This method does not track any history data .This methodology overwrite old data with new data without
keeping the history. This method mainly used for misspelled names.
Let consider below given data is our target data after the first run.
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output
will be.
Here the no history is preserved the designation and Idate are updated with new value.
QR_MAP :- Map the source data to query transform without applying any transformation.
TBL_CPM :- Table comparison used to compare source data and the target table data .
MP_OPR :- This will be used to insert new data and update old data.
SCD TYPE 2
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information.
Therefore, both the original and the new record will be present. The new record gets its own primary key. Here
2 new column columns are inserted called start_date and end_date.
it is maintaining the history data. New records it will go and add to target and also updated record will
go and create new row in the target table.
Let consider below given data is our target data after the first run.
Here a new row is inserted and the end_date of the first row will be the start date of the newly updated value.
(note:-‘ 9000.12.31’ is the default value)
QR_MAP : Map the source data to query transform without applying any transformation.
TBL_CPM :- Table comparison used to compare source data and the target table data .
HIS_PRES:- This transform is used to store the history data and if any updates on source data then new row
will be inserted.
SCD TYPE 3
This method tracks changes using separate columns and preserves limited history. The Type II preserves
unlimited history as it’s limited to the number of columns designated for storing historical data. The original
table structure of Type II differs from Type II by Type III adds additional columns. In the following example,
an additional column has been added to the table to record the original target- only the previous history is
stored.
Let consider below given data is our target data after the first run.
ID Name IDate Curr_Designation Effective_date Pre_Designation
1 John 2002.12.01 A 2002.12.01 A
2 Jay 2002.12.01 A 2002.12.01 A
3 Jasil 2002.12.01 A 2002.12.01 A
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output
will be.
In the target output an extra column will be added to keep the previous value of that particular column.
(Note:- on the first run the Pre_designaion and Current Designation are same)
QR_JOIN : This query transform is used to join the Source with target.
This will be a left outer join and the Source will be the outer source and the target is the inner
source. This join will based on the name column on the both side.(Source.name = SCD3.name)
QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .
QR_UPDATE : This query transform used to filter the data which is already existed in target table but the
designation is updated that means SCD3.name is not null and the designation from the source and the previous
designation from the SCD3 table are not same.
MP_UPDATE : This transform used to update the target table by setting the map operation as ‘normal to
update’
KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.
During the first run the source file have the data as given above and the target table will not have any data so
that all set of data will moved to QR_INSERT flow and loaded into target table.
1)Extracting the data from sap ecc and loading it into Staging.
2)Transforming the data and then validating the data for all key fields
3)Finaly uploading the data in to IDOCS (S4Hana).
7. Why we r using Look-ups. Is it necessary to use look-ups for extracting the columns
from other tables?
It is not mandatory to use lookups but if we want to extract lest than 5 fields, if use
loop up it will give better performance.
Note: lookup(), lookup_ext() cannot be used inside a dataflow against a SAP system.
The lookup() function can be used within an R/3 dataflow if the source is SAP system.
NET4SITE INDIA company:
1.When to go LO Extraction in BW?
I did not get a chance to work on it.
For delta ,we need to extract only updated and newly added records from source
instead of extracting entire data .Based on source date fields, we are designing the job
to bring only latest updated records from source.
Condition:
Updated date field > last job run date.
7.Why we r using Look-ups. Is it necessary to use look-ups for extracting the columns
from other tables?
Ans: lookup(), lookup_ext() cannot be used inside a dataflow against a SAP system.
The lookup() function can be used within an R/3 dataflow if the source is SAP system.
3. Transforms?
Transforms are used to manipulate the data sets as inputs and creating one or multiple outputs. There are
various transforms, which can be used in Data Services.
Incremental jobs: it will extract only updated and newly added records from source.
CTS company:
2. How to move the R/3 objects from Dev R/3 Application to Quality R/3 Application.
Select empno,count(*)
from emp
group by empno
having count(*)>1
Explanation: if gender is Female ,need to pass the value ‘F’ otherwise need to populate
‘M’.
FULL: every time truncating the target table data and loading the entire from source.it
called as Full load.
Local repository is not multi user development. At a time one user can login and
complete their work.
Note: It will allow us to login the local repo more than user but repository may get
corrupted or may not save our changes.
Once created batch job and move to central repo and schedule it in management
console.
KPIT company:
1. What are the different types of joins?
2. Diff. b/w LEFT OUTER and RIGHT OUTER JOIN?
3. Diff. b/w FULL OUTER and CROSS/CARTESIAN?
4. Explian about FACT and DIMENSION?
5. Explian about SCD TYPE-2 Implementation?
6. Scenario:
Eno,ename
101,ksr,ysr
Req: I just want to display “ksr,ysr” content in single ‘ename’ column only rather to
separate content using Comma separator?. How to do this in bods?
rtrim(,ksr,ysr ,,,,',',')
word_ext(in,ksr,ysr , 1, ',,')
replace_substr(,ksr,ysr ',,','')
eplace_substr(replace_substr(replace_substr_ext(I ,ksr,ysr ,',1,1),',',''),'µ',',')
here are 7 touch points that need to be considered for performance optimization. They are given
below:
1. Source Operating System.
2. Source Data Base.
3. Target Operating System.
4. Target Database.
5. Network.
6. Job Server OS.
7. BODS Repository database.
The BODS Job Server and the BODS Repository Database can reside on the same server. If installed on
different servers, Network throughput configured between Job Server and BODS Repository Database
will play a key role. The performance tuning methods for each identified touch point are given below.
2.1.1 Source Operating System
The Source Operating System should be tuned to quickly read data from disks. This can be done by the
following methods:
· Set the read-ahead protocol to 64 KB to make the I/O operations fast.
· The size of read-ahead protocol is by default set to 4-8 KB.
2.1.2 Source Database
The source database should be tuned to make SELECTs as quickly as possible. This can be done by the
following methods:
· Increase the size of Database I/O to match with the OS read-ahead protocol; otherwise it may cause
bottle neck there by affecting the performance.
· Increase size of shared buffer to cache more data in the database server.
· Cache tables that are small enough to fit in the shared buffer by enabling cache at table level.
Caching reduces the number of I/Os thereby improving the speed of access to the data.
· Turn off PARALLEL for small tables.
· Create indexes on appropriate columns by looking into the dataflow to identify the columns used in
the join criteria.
· Create Bitmap indexes on the columns with low cardinality as the index does not take much space in
the memory, there by resulting in faster SELECTs.
2.1.3 Target Operating System
The Target Operating System should be tuned to quickly write data to disks. This can be done by the
following methods:
· Turn on the asynchronous I/O to make the Input/output operations as fast as possible.
2.1.4 Target Database
The Target Database should be tuned to perform INSERTs and UPDATEs as quickly as possible. This can
be done by the following methods:
· Turn off archive logging.
· Turn off redo logging for all tables.
· Tune rollback segments for better performance.
· Place redo log files and data files on a raw file if possible.
· Increase the size of the shared buffer.
2.1.5 Network
Even if the Source Database and Target Database are tuned if the network band width is small, there is
possibility for the occurrence of bottle neck which hampers the performance.
· Adjust the size of the Network buffer in the database client so that each client request completely
fills a small number of network packets, there by reducing the number of round trips across databases
which in turn improves the performance.
2.1.6 Job Server OS
Data Services jobs are multi-threaded applications. Typically a single data flow in a job initiates one
‘al_engine’ process that in turn initiates four threads. For maximum performance benefits:
· Consider a design that runs one ‘al_engine’ process per CPU at a time.
· Tune the Job Server OS so that theData Services threads spread to all available CPUs.
The above performance optimization methods need to be implemented during the environment and
infrastructure preparation of BODS components.
2.1.7 Data Services Jobs
The following execution options can be used to improve the performance of BODS jobs:
· Monitor Sample Rate: If the job processes large amount of data set the ‘Monitor Sample Rate’ to a
higher value (maximum being 50,000, default is 1000) to reduce the number of I/O calls to the log file
there by improving the performance.
· If virus scanner is configured on the BODS JobServer, exclude the Data Services log from the virus
scan. Otherwise the virus scan scans the Data Services log repeatedly during the execution, which
causes performance degradation.
· Collect Statistics for self-tuning: BODS has a self-tuning capability to determine the cache type by
looking into the statistics of previous job executions. The Collect Statistics option needs to be selected
during the first execution of the job. BODS collects the statistics for that job and stores in the
metadata for that job. In the next execution select the ‘Use Collected statistics’ option to allow BODS
to decide the type of cache to be used to execute the job, there by improving the performance of the
job.
· Set the data flow properties like Degree of Parallelism depending upon the number of CPUs available
for processing and set the cache type to in-memory if the data volume is less.
· If source tables are from same schema and using same Data Store, identify the joins in the early
phase so that the join can be pushed to the database.
· Create synonyms for the tables in other schemas to pushdown join conditions of tables belonging to
different schemas in a database.
· Use data base links to connect different schemas in different databases to pushdown the joins of
tables available in these schemas to database.
· Use data transfer (type = ‘TABLE’) to pushdown the complex logic in a dataflow to the database.
· Do not use advanced features like Run Distinct as Separate process etc available in ‘Advanced’ tab
in Query transform, as it starts multiple sub-processes which causes heavy traffic between the
processes and there by lead to the termination of job.
· Do not use Data Transfer transform unless required. (Use table type if required as it is more
reliable.). SAP suggests that Data Transform is not a reliable transform and hence recommends to not
using it unless required.
· Turn off the Cache option for the tables with larger amounts of data. Cache is turned on by default
for every source table. Make sure that there are indexes created on key columns for these tables on
which cache is turned off.
· Do not use BODS functions like job_name(), instead initialize a variable in a script and use that
variable for mapping in query transforms.
· Use Join where ever applicable in place of look up transform as the look up transform has to access
the table for each and every record which increases the number of I/O operations to database.
· Use Query transforms to split the data in place of Case transforms as it is a costly transform in BODS.
Ensure that most of the dataflows are optimized. Maximize the push-down operations to the
database as much as possible. You can check the optimized SQL using below option inside a
dataflow. SQL should start with INSERT INTO……SELECT statements…..
Split complex logics in a single dataflow into multiple dataflows if possible. This would be much
easier to maintain in future as well as most of the dataflows can be pushed down.
If full pushdown is not possible in a dataflow then enable Bulk Loader on the target table. Double
click the target table to enable to bulk loader as shown in below diagram. Bulk loader is much
faster than using direct load.
Right click the Datastore. Select Edit and then go to Advanced Option and then Edit it. Change
the Ifthenelse Support to ‘Yes’. Note that by default this is set to ‘No’ in BODS. This will push
down all the decode and ifthenelse functions used in the Job.
Index Creation on Key Columns: If you are joining more than one tables then ensure that Tables
have indexes created on the columns used in where clause. This drastically improves the
performance. Define primary keys while creating the target tables in DS. In most of the databases
indexes are created automatically if you define the keys in your Query Transforms. Therefore,
define primary keys in query transforms itself when you first create the target table. This way
you can avoid manual index creation on a table.
Select Distinct: In BODS ‘Select Distinct’ is not pushed down. This can be pushed down only in
case you are checking the ‘Select Distinct’ option just before the target table. So if you require to
use select distinct then use it in the last query transform.
Order By and Group By are not pushed down in BODS. This can be pushed down only in case
you have single Query Transform in a dataflow.
Avoid data type conversions as it prevents full push down. Validate the dataflow and ensure
there are no warnings.
Parallel Execution of Dataflows or WorkFlows: Ensure that workflows and dataflows are not
executing in sequence unnecessarily. Make it parallel execution wherever possible.
Avoid parallel execution of Query Transforms in a dataflow as it prevents full pushdown. If same
set of data required from a source table then use another instance of the same Table as source.
Join Rank: Assign higher Join Rank value to the larger table. Open the Query Editor where tables
are joined. In below diagram second table has millions of records so have assigned higher join
rank. Max number has higher join rank. This improves performance.
Database links and linked datastores: Create database links if you are using more than one
database for source and target tables (multiple datastores) or in case using different database
servers. You can refer my another article on how to create the DB Link. Click URL
Use of Joining in place of Lookup Functions: Use Lookup table as a source table and set as an
outer join in dataflow instead of using lookup functions. This technique has advantage over the
lookup functions as it pushes the execution of the join down to the underlying database. Also, it
is much easier to maintain the dataflow
2) Login as an administrator.
3) From the Welcome screen click on the Administrator icon or link.
4) From the Administrator screen click on the Web Services link from the left side menu panel. It would
display the web service configuration page with three tabs on the right hand side. The tabs are “Web Services
Status”, “Web Services Configuration” and “Custom WSDL Labels”.
5) Click on the Web Services Configuration tab. This tab displays all the currently published web services.
6) On the Web Services Configuration tab click on the drop down below as shown in the picture below(red
box).
7) Select “Add Batch Job…” from the list.
8) On the “Web Services Configuration – Add Batch Jobs” screen select the Repository where the batch job
resides.
In this example, the batch job named “1_JB_Test_WS” resides in the local repository named
“RAJESH_DS_LOC”.
9) After the repository is selected, a list of jobs residing in this repository will be displayed.
10) From the list of jobs displayed: Select your job to be published as a web service. eg “1_JB_Test_WS” is
selected.
11) On successful Add it should take you back on the “Web Services Configuration” page where this job
should be displayed in the
14) From the “Add Custom WSDL Label” page add the new WSDL Label Name. eg, JB_Test_LBL as shown
below.
associate the new label “JB_Test_LBL” with this Batch Job “1_JB_Test_WS”.
17) Click on the Select check box section of the operation named “1_JB_Test_WS” (shown below with the red
arrow)
19) Following screen would show the new label named “jb_test_lbl” created along with the list of all other
labels(if any).
20) Its time to generate the custom WSDL now 🙂 , lets click on the “Web Services Status” tab as shown
below.
21) From the Label drop down: select the new label named “jb_test_lbl”.
23) It should pop up a new internet explorer window with the custom WSDL generated.
L&T company:
1.What is ur role in ur project?
Team size: 6 members ,1 lead , one project manager and one site coordinator.
Say like It is admin part they will take of it. Otherwise you can say below steps.
he Data Services Server Manager to configure a new job server and associate the job
server with the local repository.
Procedure
1. Select Start Programs SAP Data Services 4.2 Data Services Server Manager.
3. Click Add.
7. Select the RDBMS database type that you used for the repository from the Database
type dropdown list.
The remaining connection options that appear are applicable to the database type you
choose.
8. Enter the remaining connection information based on the information that you noted in
the worksheet in Creating repository, source, and target databases on an existing
RDBMS.
Only select Default repository for the local repository. There can be only one default
repository. If you are following these steps to set up a different repository other than the
local repository, do not select the Default repository option.
POLARIS:
1. What is RAISE EXCEPTION and RAISED EXCEPTION EXT in bods?
2. How to execute jobs(job1,job2..job10) one by one(only job1 executed successful then
job2 otherwise not executed job2…)?. How to achieve this kind of requirement?
1. Objects moved from dev-quality. But I placed sources in C://Drive in dev server
and I placed the sources in D://Drive in quality server?. So, here I changed the source
path. How can I get source directly without giving the path manually in quality server?
The degree of parallelism (DOP) is a property of a data flow that defines how many times each
transform defined in the data flow replicates for use on a parallel subset of data. If there are
multiple transforms in a data flow, SAP Data Services chains them together until it reaches a
merge point.
Magnitude company:
BODS:
1. What are the different ways to restrict duplicates in BODS?
using Select distinct option in Query transform.
We can enable auto correct load option in the target table if the key fields are defined
Table comparison transform and map operation can be used to avoid duplicate rows in target.
2. What is the different between a join and lookup function?
Look Up will work as a left outer join so we need to use it when we are extracting one or two
columns then lookup is the best.
Joins will work as inner join,left outer join,right outer join and full outer join based on key
columns.
A joiner is used to join data from different sources(Tables and files) and a lookup is used to get
related values from another table.
3. How to apply distinct using Query Transform?
There is a distinct row option in query transform .We need to select it while extracting the data
from source.
4. List the types of repositories?
There are three types of repositories.
Local repository-Used to store the metadata of all objects created in designer.
Central repository-Used to maintain the versions of the objects and used for multiuse
development.
Profile repository- Used to manage all the metadata related to profiler tasks performed in SAP
BODS designer.
5. What is the default port for job sever?
Job server port is 3500
SQL:
1. What are the differences between Function and Procedure?
Functions:
The function must return a value.
It can have only input parameters.
Functions can be called from Procedure.
Procedure:
procedure can return zero or n values.
Procedures can have input or output parameters
Procedures cannot be called from a Function.
Migration: Data migration is the process of moving data from one location to another, one
format to another, or one application to another.
Example:when we are migrating the data from SAP ECC to S4HANA.
Integration :Data Integration involves collecting data from sources outside of an organization
for analysis.