Pentaho Data Integration
Student Name: MuhammadAyaz Farid Shah & Usama Naeem
Class: MSCS
What is business intelligence.
▪ Business intelligence is the process of transforming the business data
into information/ knowledge using computer-based techniques thus
enabling the users to take effective fact-based decision.
Business Intelligence: Need of time.
▪ What would be my insightful decision based on ocean of data? How
quick I take decision based on that huge data?  End to End BI
Solution
▪ How can I integrate heterogeneous data feeds to common platform
to analyze  ELT
▪ How to interpret raw data in best possible manner?  Data
discovery.
▪ Can I predict the future of my business trajectory?  ML, Predictive
▪ What is the best way to share the data?  visualization Reporting
▪ How can I monitor the dynamics of changing trends? Dashboard
BI essentially intended for the
following 3 things.
▪ Precise and concise interpretation of data.
▪ Identify new opportunities.
▪ Implementing an effective strategy to have competitive edge.
BI Existing Solutions.
Large BIVenders New Breed
IBM Pentaho
SAP QlikTech
Microsoft Logi
Oracle Alteryx
Data Integration Analytics
Informatica Rapidminer
Why Pentaho ?
▪ One step solution for all the business analytics need.
▪ Low integration time and infrastructure cost.
▪ Have community support .
▪ Easily Scalable.
▪ Virtually unlimited visualization and data source.
▪ And much more.
About Pentaho
▪ Pentaho is founded in 2004 at Orlando, USA.
▪ Recognized leader in business analytics and data integration.
▪ Subscription based business model.
▪ Achieved critical mass:
Over 1200 commercial customers
Over 10,000 production deployments.
Over 185 countries.
Download Pentaho BI suit form website.
www.pentaho.com or www.sourceforge.net
What is Pentaho and what is it?
▪ It is a business intelligence system.
It offers
▪ Analytics
▪ Visual data integration
▪ Reports
▪ Dashboards
▪ Data mining
▪ ELT
Pentaho
Available for
▪ Windows
▪ Linux
▪ Mac OSX
▪ Community supported.
▪ Open-source plugins available.
Pentaho Data Integration (Kettle)
▪ Kettle
Kettle  Kettle. ExtractionTransformationTransportation
and Loading tool.
▪ Extraction
▪ Transportation
▪ Transformation
▪ Loading
▪ Environment
Data integration  Challenges
▪ Data is everywhere.
▪ Data is inconsistent.
Records are different in each system.
▪ Performance issues.
Running queries to summarize data take long period.
▪ Data is never all in DataWarehouse.
Excel Sheets, New application.
What is Kettle?
▪ Batch data integration and processing tool written in java.
▪ Exists to Retrieve, Process and Load data.
▪ ETL ( Extract,Transform, and Load).
▪ Extracts data form various data sauces.
▪ Transform data
▪ From  being optimized for transaction.
▪ To  being optimized for reporting and analysis .
▪ Synchronizes the data coming form different databases.
▪ Data cleanness to remove errors.
▪ Load data into data warehouse.
Why do I need it ETL?
▪ ETL tool save time and money when developing a data warehouse by
removing the need for hand coding.
▪ It is very difficult for database administrators to connect between different
brands of databases without using an external tool.
▪ ETL is heart and soul of business intelligence(BI).
▪ Provide a graphical environment for data integration, migration, and
synchronization.
▪ Drag and drop graphic components to execute the desired task, saving
time and effort.
ETL
▪ The set of criteria that were used for the ETL tools comparison were divided into seven categories.
▪ TCO (Total cost of ownership).
Open-source products are typically free to use, but support, training, and consulting are what companies
need to pay for.
▪ Risk. ( Going over budget, Over schedule, Not completing the requirements of the customers)
▪ Ease of use. (Having a good GUI also reduces the time to train and use the tool)
▪ Support. (Nowadays all software products have support.)
▪ Speed. (Pentaho Kettle is faster)
▪ Data Quality. (Data Quality is fast, has features in its GUI)
▪ Monitoring. (Pentaho Kettle has practical monitoring tools. )
▪ Connectivity. (ETL tools transfer data to a very wide variety of Database systems, XML, and web
services.)
What is Kettle good for ?
▪ Loading data to RDBMS.
▪ Syncing two data sources.
▪ Processing data retrieved form multiple sources and pushed to multiple
destinations.
▪ Graphical manipulation of data.
▪ It has a very easy to use GUI.
Data Sources
▪ Files
▪ Databases
▪ SQL
▪ XML
▪ JASON
▪ Excel
▪ Google Analytics
Larger picture
Kettle 10 years old.
Joined Pentaho about 7 years ago.
Open source, at version 4.4
BI suite
▪ Reporting
▪ Analytics
▪ Dashboards
▪ ML (Machine Learning)
Kettle Tools
▪ Spoon ( Allows you to design transformations and jobs that can be run with
the Kettle tools)
▪ Kitchen ( Execute jobs designed by Spoon in XML or database repository)
▪ Pan (A program to execute transformations designed by spoon in XML or
database repository)
Most common uses of Kettle
▪ Data warehouse and DataMart loads.
▪ Data integration. (Changing input to desired output)
▪ Data cleansing.
▪ Data migration.
▪ Data Export.
▪ Etc.
Pentaho Data integration
▪ Transportation of data.
▪ Splitting
▪ Partitioning
▪ Merging
▪ Joining
▪ Duplicating
Pentaho
Steps for downloading and installing
Pentaho
▪ Step 1: Download Java from https://download.oracle.com
▪ Step 2: Download Pentaho from https://sourceforge.net
▪ Step 3: Create a new folder in C: Drive and give the same name as the
version of Pentaho.
▪ Step 4: Extract the Pentaho in this new folder
▪ Step 5: Now from MY COMPUTER -> Properties -> Advanced s stem
settings -> EnvironmentVariables -> New ->Variable name : JAVA_HOME
▪ Step 6: Check the JRE in CMD by typing echo %JAVA_HOME%
▪ Step 7: From Pentaho folder, run the spoon.bat as Administrator.
Step 1: Download Java
▪ Download Java from https://download.oracle.com
Step 2: Download pentaho
▪ Download pentaho from https://sourceforge.net
Step 3: Create a new folder
▪ Create a new folder in C: Drive and give the same name as the version of Pentaho.
Step 4: Extract the Pentaho
▪ Extract the Pentaho in this new folder
Step 5: Environment variables
▪ Now from MY COMPUTER -> Properties -> Advanced system settings -> Environment
Variables -> New ->Variable name : JAVA_HOME
Step 6: Check the JRE
▪ Check the JRE in CMD by typing echo %JAVA_HOME%
Step 7: Run Pentaho
▪ From Pentaho folder, run the spoon.bat as Administrator.
Pentaho ppt up
Pentaho ppt up

Pentaho ppt up

  • 1.
    Pentaho Data Integration StudentName: MuhammadAyaz Farid Shah & Usama Naeem Class: MSCS
  • 2.
    What is businessintelligence. ▪ Business intelligence is the process of transforming the business data into information/ knowledge using computer-based techniques thus enabling the users to take effective fact-based decision.
  • 3.
    Business Intelligence: Needof time. ▪ What would be my insightful decision based on ocean of data? How quick I take decision based on that huge data?  End to End BI Solution ▪ How can I integrate heterogeneous data feeds to common platform to analyze  ELT ▪ How to interpret raw data in best possible manner?  Data discovery. ▪ Can I predict the future of my business trajectory?  ML, Predictive ▪ What is the best way to share the data?  visualization Reporting ▪ How can I monitor the dynamics of changing trends? Dashboard
  • 4.
    BI essentially intendedfor the following 3 things. ▪ Precise and concise interpretation of data. ▪ Identify new opportunities. ▪ Implementing an effective strategy to have competitive edge.
  • 5.
    BI Existing Solutions. LargeBIVenders New Breed IBM Pentaho SAP QlikTech Microsoft Logi Oracle Alteryx Data Integration Analytics Informatica Rapidminer
  • 6.
    Why Pentaho ? ▪One step solution for all the business analytics need. ▪ Low integration time and infrastructure cost. ▪ Have community support . ▪ Easily Scalable. ▪ Virtually unlimited visualization and data source. ▪ And much more.
  • 7.
    About Pentaho ▪ Pentahois founded in 2004 at Orlando, USA. ▪ Recognized leader in business analytics and data integration. ▪ Subscription based business model. ▪ Achieved critical mass: Over 1200 commercial customers Over 10,000 production deployments. Over 185 countries. Download Pentaho BI suit form website. www.pentaho.com or www.sourceforge.net
  • 8.
    What is Pentahoand what is it? ▪ It is a business intelligence system. It offers ▪ Analytics ▪ Visual data integration ▪ Reports ▪ Dashboards ▪ Data mining ▪ ELT
  • 9.
    Pentaho Available for ▪ Windows ▪Linux ▪ Mac OSX ▪ Community supported. ▪ Open-source plugins available.
  • 10.
    Pentaho Data Integration(Kettle) ▪ Kettle Kettle  Kettle. ExtractionTransformationTransportation and Loading tool. ▪ Extraction ▪ Transportation ▪ Transformation ▪ Loading ▪ Environment
  • 11.
    Data integration Challenges ▪ Data is everywhere. ▪ Data is inconsistent. Records are different in each system. ▪ Performance issues. Running queries to summarize data take long period. ▪ Data is never all in DataWarehouse. Excel Sheets, New application.
  • 12.
    What is Kettle? ▪Batch data integration and processing tool written in java. ▪ Exists to Retrieve, Process and Load data. ▪ ETL ( Extract,Transform, and Load). ▪ Extracts data form various data sauces. ▪ Transform data ▪ From  being optimized for transaction. ▪ To  being optimized for reporting and analysis . ▪ Synchronizes the data coming form different databases. ▪ Data cleanness to remove errors. ▪ Load data into data warehouse.
  • 13.
    Why do Ineed it ETL? ▪ ETL tool save time and money when developing a data warehouse by removing the need for hand coding. ▪ It is very difficult for database administrators to connect between different brands of databases without using an external tool. ▪ ETL is heart and soul of business intelligence(BI). ▪ Provide a graphical environment for data integration, migration, and synchronization. ▪ Drag and drop graphic components to execute the desired task, saving time and effort.
  • 14.
    ETL ▪ The setof criteria that were used for the ETL tools comparison were divided into seven categories. ▪ TCO (Total cost of ownership). Open-source products are typically free to use, but support, training, and consulting are what companies need to pay for. ▪ Risk. ( Going over budget, Over schedule, Not completing the requirements of the customers) ▪ Ease of use. (Having a good GUI also reduces the time to train and use the tool) ▪ Support. (Nowadays all software products have support.) ▪ Speed. (Pentaho Kettle is faster) ▪ Data Quality. (Data Quality is fast, has features in its GUI) ▪ Monitoring. (Pentaho Kettle has practical monitoring tools. ) ▪ Connectivity. (ETL tools transfer data to a very wide variety of Database systems, XML, and web services.)
  • 15.
    What is Kettlegood for ? ▪ Loading data to RDBMS. ▪ Syncing two data sources. ▪ Processing data retrieved form multiple sources and pushed to multiple destinations. ▪ Graphical manipulation of data. ▪ It has a very easy to use GUI.
  • 16.
    Data Sources ▪ Files ▪Databases ▪ SQL ▪ XML ▪ JASON ▪ Excel ▪ Google Analytics
  • 17.
    Larger picture Kettle 10years old. Joined Pentaho about 7 years ago. Open source, at version 4.4 BI suite ▪ Reporting ▪ Analytics ▪ Dashboards ▪ ML (Machine Learning)
  • 18.
    Kettle Tools ▪ Spoon( Allows you to design transformations and jobs that can be run with the Kettle tools) ▪ Kitchen ( Execute jobs designed by Spoon in XML or database repository) ▪ Pan (A program to execute transformations designed by spoon in XML or database repository)
  • 19.
    Most common usesof Kettle ▪ Data warehouse and DataMart loads. ▪ Data integration. (Changing input to desired output) ▪ Data cleansing. ▪ Data migration. ▪ Data Export. ▪ Etc.
  • 20.
    Pentaho Data integration ▪Transportation of data. ▪ Splitting ▪ Partitioning ▪ Merging ▪ Joining ▪ Duplicating
  • 21.
  • 22.
    Steps for downloadingand installing Pentaho ▪ Step 1: Download Java from https://download.oracle.com ▪ Step 2: Download Pentaho from https://sourceforge.net ▪ Step 3: Create a new folder in C: Drive and give the same name as the version of Pentaho. ▪ Step 4: Extract the Pentaho in this new folder ▪ Step 5: Now from MY COMPUTER -> Properties -> Advanced s stem settings -> EnvironmentVariables -> New ->Variable name : JAVA_HOME ▪ Step 6: Check the JRE in CMD by typing echo %JAVA_HOME% ▪ Step 7: From Pentaho folder, run the spoon.bat as Administrator.
  • 23.
    Step 1: DownloadJava ▪ Download Java from https://download.oracle.com
  • 24.
    Step 2: Downloadpentaho ▪ Download pentaho from https://sourceforge.net
  • 25.
    Step 3: Createa new folder ▪ Create a new folder in C: Drive and give the same name as the version of Pentaho.
  • 26.
    Step 4: Extractthe Pentaho ▪ Extract the Pentaho in this new folder
  • 27.
    Step 5: Environmentvariables ▪ Now from MY COMPUTER -> Properties -> Advanced system settings -> Environment Variables -> New ->Variable name : JAVA_HOME
  • 28.
    Step 6: Checkthe JRE ▪ Check the JRE in CMD by typing echo %JAVA_HOME%
  • 29.
    Step 7: RunPentaho ▪ From Pentaho folder, run the spoon.bat as Administrator.