TechFest – Open Source ETL Software David Morris Fort Worth, TX October, 2008 * For Internal Use Only *
Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Conclusion
This presentation will attempt to clarify misconceptions about open source software and discuss how it can benefit IT organizations in many different ways Open Source vs. Free Software Many people do not understand the definition of open source There is no such thing as free software Benefits of Open Source The potential for cost savings is the number one motivation to use enterprise open source software Software license costs is the most likely component where savings will occur Cost savings in general are difficult to calculate Flexibility often turns out to be the most beneficial result of using open source software Enterprise open source software has cost the proprietary software industry an estimated  $60 billion per year Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments Business Intelligence Software Open source business intelligence software has matured over the last three years and many companies are beginning  to offer an impressive set of tools free of up front licensing costs These open source projects are typically backed by a corporation that pays full time employees to develop the core code base, and earns money through support contracts and consulting services Many of the open source BI products are built using Java technology, with user interfaces built over the Eclipse IDE Evaluating open source BI software can offer a fresh perspective on techniques and processes used by competing proprietary software Many of these tools have the potential to be a real competitor in the BI space
Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
The idea of open source or free software has a rich history that began in the 1960s 1969 - ARPANET - Advanced Research Projects Agency Network  First operational packet switching network Predecessor of the internet 1970s – Email (SMTP), File Transfer Protocol, Network Voice Protocol (NVP) standards developed 1985 – Free Software Foundation – Richard Stallman Universal freedom to distribute and modify computer software without restriction Founded  to support the free software movement Enforcement of the General Public License 1992 – Linux kernel released under GPL – Linus Torvalds 1998 - Open Source Initiative (OSI) – Bruce Perens and Eric Raymond Formalized open source software and brought the model to major software companies Formulated the Open Source Definition to determine which licenses are actually “open source” licenses 1998 – Netscape Navigator releases source code known today as Firefox and Thunderbird 1999 – Sun Microsystems releases StarOffice under GPL later renamed OpenOffice
“ Free as in speech, not beer.” – Richard Stallman “ Open source” is not free Many open source software licenses are free, but some licenses have costs associated with them Many mature open source projects, especially operating systems, earn money from paid support and documentation “ Open source is a free like a puppy is free.” – Scott McNealy, Chairman of Sun Microsystems “ Lowering the cost of goods tends to increase the total investment of the people and infrastructure that sustain it.” – Eric Raymond, The Magic Cauldron http://www.gnu.org/philosophy/open-source-misses-the-point.html
Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria Free Redistribution The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale. Source Code The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed. Derived Works The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software. Integrity of The Author's Source Code The license may restrict source-code from being distributed in modified form  only  if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software. No Discrimination Against Persons or Groups The license must not discriminate against any person or group of persons. http://opensource.org/docs/osd
Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria (cont.) No Discrimination Against Fields of Endeavor The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research. Distribution of License The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties. License Must Not Be Specific to a Product The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution. License Must Not Restrict Other Software The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software. License Must Be Technology-Neutral No provision of the license may be predicated on any individual technology or style of interface.
There are many different types of hard costs associated with leveraging open source software for enterprise IT projects Software Licenses Referring to the licenses themselves, not legal terms and conditions Often offers the most potential cost savings vs. proprietary software Hardware Open source software often has reduced hardware requirements Support Often more but less mature options for support in open source projects Development Access to source code can help make development easier and less costly Lack of feature parity with proprietary software may create a need for more custom development Opportunity to give source code back to the open source community  Professional Services Development, installation, and configuration costs Offered by many open source software vendors Training Offered directly by software vendor Through a professional training center or educational institution On-site or off-site, or online Testing Unit testing, performance testing, functional testing, test scripts, use-case scenarios,  quality assurance costs
There are many different types of hard costs associated with leveraging open source software for enterprise IT projects (cont.) Operations (Manageability) Mix of labor, management and monitoring tools configuration, creation of manuals to support operations Open source tends to have less mature management capabilities Staffing No conclusive evidence to show that staffing open source projects is cheaper than for proprietary projects Maintenance Contracts 15-25 percent of the license costs or equipment costs per year.  Calculated using the list price, not the actual paid price Treated separate from support contracts in many open source projects Costs associated with patching and updating software over time Often free with most zero-cost open source software licenses Migration Especially for system replacement projects where existing data must be migrated to the new application Environmental Datacenter and hosting costs, floor space, power, bandwidth, hardware leasing Documentation Often coincides with the training category above Configuration Often captured with the development and operations categories
There are also soft or intangible costs associated with enterprise open source software projects and are typically harder to calculate than hard costs Downtime financial impact of system outage IP Risk Legal/litigation costs License Auditing Risk Resources required to perform a vendor-required license audit License Management Resources required to manage deployment of licenses and purchase of additional licenses as the deployment grows License Negotiation Overhead Legal costs required in negotiating the software licensing contract Planning Resources for planning and overhead Process Inefficiencies Lost time and costs related to process activities Procurement Overhead Purchase cost and resources required to procure the software Productivity Efficiencies from using the software Reliability Financial impact of improved system reliability and uptime Support Quality Resources required for software support
Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments Below is an example finanical analysis spreadsheet comparing the use of open source software vs. proprietary for an enterprise IT project. It is interesting to look at to see how  misleading many of these open source software vendors can be when they are desperate for a new client. 451 CAOS Report 2 - Cost Conscious A practical guide for understanding and calculating the financial benefits of open source for enterprise IT projects The 451 Group                         Calculator - Example A                         Open Source Option   Proprietary/Existing Option   Initial Investment Year 1 Year 2 Year 3 TOTALS     Initial Investment Year 1 Year 2 Year 3 TOTALS HARD COSTS $76,250.00 $41,200.00 $111,150.00 $49,100.00 $277,700.00   HARD COSTS $296,250.00 $99,200.00 $176,150.00 $104,100.00 $675,700.00 Software Licenses - - - - $0.00   Software Licenses $200,000.00 $50,000.00 $50,000.00 $50,000.00 $350,000.00 Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00   Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00 Support - $25,000.00 $30,000.00 $35,000.00 $90,000.00   Support - $30,000.00 $35,000.00 $40,000.00 $105,000.00 Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00   Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00 Professional Services $12,500.00 $3,000.00 - - $15,500.00   Professional Services $25,000.00 $6,000.00 - - $31,000.00 Training $7,500.00 - - - $7,500.00   Training $15,000.00 - - - $15,000.00 Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00   Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00 Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00   Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00 Staffing - - $65,000.00 - $65,000.00   Staffing - - $75,000.00 - $75,000.00 Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00   Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00                           SOFTS COSTS         -   SOFTS COSTS         - INTERNAL COSTS         -   INTERNAL COSTS         -                           REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00   REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00                           CASHFLOW             CASHFLOW           Period -$76,250.00 $208,800.00 $388,850.00 $750,900.00     Period -$296,250.00 $150,800.00 $323,850.00 $695,900.00   Cumulative (Payback) -$76,250.00 $132,550.00 $521,400.00 $1,272,300.00     Cumulative (Payback) -$296,250.00 -$145,450.00 $178,400.00 $874,300.00                             RATE OF RETURN   607% 450% 1629%     RATE OF RETURN   252% 284% 768%   PAYBACK PERIOD Year 1           PAYBACK PERIOD Year 2         NPV $954,643.20           NPV $591,891.97         IRR 340%           IRR 82%         cost of capital % 12.00%           cost of capital % 12.00%         NOTES:                         Use this spreadsheet to consider Cost Avoidance and Opportunity Costs… May add an additional "COST SAVINGS" column above the REVENUE column to account for existing sunk costs Not covered in this spreadsheet: Depreciation, Amortization, Capital and Expense Budgets, etc… Dave's Attempt at saving $3,750 by not buying the spreadsheet from the 451 group
There are a variety of business models that have proven to work for companies who want to make money using open source software Support Sellers (otherwise known as "Give Away the Recipe, Open A Restaurant") :  Give away the software product Sell distribution, branding, and after-sale service This is what RedHat does. Loss Leader Give away open-source as a loss-leader and market positioner for closed software.  Netscape, Digium (Asterisk) Widget Frosting Hardware company goes open-source in order to get better drivers and interface tools cheaper.  Silicon Graphics (Samba), Apple (Darwin) Accessorizing Selling accessories – books, compatible hardware, complete systems with open-source software pre-installed O'Reilly Associates, OLPC,  source: The Open Source Initiative:  http://www.opensource.org/advocacy/case_for_business.php
There are many barriers to open source adoption in IT organizations, most of which are risk related Open source licenses are viral Open source software lacks formal support and training Software changes too often and is difficult to keep up Lack of a long term roadmap Sunk costs in existing projects Switching costs De facto industry standards
Enterprise open source adoption offers many benefits to IT organizations within any type of business Short term Cost savings Most IT organizations are motivated by short term cost savings when evaluating open source software adoption  The potential for saving money on software licensing fees is a huge factor in the cost equation Software licensing fees can be a large percentage of the up front costs for new projects as well as massive expansion of existing projects Long term flexibility In the long run, the benefits of flexibility tend to outweigh the cost benefit of using open source software Developers have access to the source code and have the ability to modify and customize it to suit their specific needs Reliability “ If builders built houses the way programmers built programs, the first woodpecker to come along would destroy civilization. “ – Gerald P. Weinberg The internet depends on a variety of high reliability open source projects (DNS, sendmail, TCP/IP stacks, Perl) Avoiding vendor lock-in Organizations can become less vendor-dependent by using open source software Avoiding vendor lock-in can help a company avoid severe switching costs Royalty-Free standards vs. Free and Open Source Software (FOSS) Security "Given enough eyeballs, all bugs are shallow."  - Linus’ Law Security problems can be identified quickly and someone will be able to fix it Performance An often-cited example is Linux vs. Windows server clusters
There are many different open source licenses and it can be difficult to distinguish one license from another Most popular GNU General Public License GNU Library or Lesser GPL Apache Software License Berkeley Software Distribution (BSD) MIT License Mozilla Public License Eclipse Public License Special Purpose Educational Community License NASA Open Source Agreement 1.3 Open Group Test Suite License Miscellaneous Adaptive Public License Artistic License 2.0 Open Software License Qt Public License And many more… http://www.opensource.org/licenses/category http://en.wikipedia.org/wiki/Comparison_of_free_software_licences
Open source software licenses can range from very simple to relatively complex Software Licenses Cost of the actual license Many open source vendors have a dual license model Not the legal licensing terms or conditions Seen as the greatest potential for savings in an open source project Savings on licenses often used to offset training and professional services costs Can include client access licenses, desktop licenses, database license and development tools Based on the number of CPUs or number of users Every vendor has their own rules Makes calculating project costs difficult Dual license model Choose between an open source (free) license or  a commercial license that costs money Trolltech Qt Example Motivated by market segregation based business models and license compatibility needs Open Core License model core is GPL: if you embed the GPL in closed source, you pay a fee technical support of GPL product may be offered for a fee (up for debate as to whether it must be offered) annual commercial subscription  includes: indemnity, technical support, and additional features and/or platform support.  Additional commercial features having viewable or closed source, becoming GPL after time bomb period  are both up for debate professional services and training are for a fee  Licensing cost comparison works for new projects, but not necessarily existing projects Must be estimated over the life of the project Zero cost open source software has caused proprietary vendors to lower their prices and this trend will continue  An estimated $60 billion per year is lost by proprietary software vendors
Another great benefit of open source software is the ability to download and try it out for free, and even learn about the development history and statistics Ohloh.net is a site that gives everyone more visibility into open source software projects by providing statistics, tracking code commit history, providing package downloads, etc…
Open source software websites offer  free downloads, documentation SSIS Informatica Ab Initio Blender Talend Pentaho Proprietary Open Source
Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
The concept of business intelligence has been around since the 1950s and became popularized in the software industry in the late 1980s and early 1990s 1958: Business Intelligence term coined by Hans Peter Luhn – “to support better decision making” In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.“  – Hans Peter Luhn 1989: "Business intelligence" is first used as an umbrella term to describe the set of “concepts and methods to improve business decision-making by using fact-based support systems.” – Howard Dresner 1990s: Business Intelligence software market explodes and becomes very difficult to keep track of who’s doing what.
There are many “big players” in the business intelligence software space that have been around for much longer than their open source counterparts Informatica IBM Ascential Microsoft DTS/SSIS ACE*COMM Ab Initio Actuate Comanche CyberQuery Dimensional Insight IBM  Applix  Cognos Informatica Information Builders LogiXML LucidEra Microsoft Microsoft Analysis Services PerformancePoint Server 2007 Proclarity DTS/SSIS Microstrategy Oracle Corporation Hyperion Solutions Corporation Panorama Software Pervasive Pilot Software, Inc. PRELYTIS Prospero Business Suite Qliktech SAP Business Information Warehouse Business Objects OutlookSoft SAS Institute Siebel Systems Tibco StatSoft SPSS Telerik Reporting Teradata Thomson Data Analyzer
Open source business intelligence solutions started coming onto the scene in 2000, but the space began to explode in 2005
There are a variety of open source business intelligence products that are beginning to compete against proprietary alternatives after only a few years of development Talend (Suresnes, France and Los Altos, CA) OpenStudio  Integration Suite  Open Profiler  Pentaho (Orlando, FL and Belgium) Reporting Engine  Kettle Data Integration Weka Data Mining Jasper (Dublin, Ireland) JasperServer – Interactive, ad hoc, and managed reporting and dashboards JasperAnalysis – Interactive data analysis, OLAP JasperETL – Data Integration JasperReports Apatar (Chicopee, MA and Minsk, Belarus) Merge  OnDemand  Mondrian
Companies that sponsor and develop open source projects or offer many different types of tools and support options that vary in cost Professional Services Proof of Concept Development On Demand Service Contracts Consulting Services Technology Assessments Professional Tools In addition to open source tools Professional tools offer more functionality Training On-site or Online, Group or Individual Certification Exams Support Contracts Typically priced on a per year or per incident basis Variety of support options depends on popularity of the open source project Technology Partners / Alliance Program Training partners  Development partners Platinum, Gold, Silver, Bronze levels or tiers
There are often a wider variety of support options available when using open source software, but they tend to be less mature than with proprietary software Three common support models Professional support by open source software vendors Third-party vendor or consultant support Self-support Various Feature Levels Sold on a per user per year basis Number of incident reports per year (1,2,3, unlimited) with the ability to purchase extra incidents Web-based support Email support Phone support Guaranteed response times (8 hours, 1 day, 2-3 business days) Guaranteed diagnostic turnaround times Access to a certified version of the software Automatic notification of bug fixes / updates Access to community or professional support forum and bug tracker Access to advanced tutorials Various Pricing Levels $1,000 - $5,000 per user per year Can be even more expensive for “Enterprise” or “Professional” versions of the open source tools
Most companies providing support for open source projects offer different support levels such as Bronze, Silver, Gold, and Platinum Company Support Level Description Type Price Talend Silver Support Three incidents per year Web support Guaranteed response times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $1,150.00 Talend Silver Support Same as above 5 users for 1 year $4,950.00 Talend Gold Support Unlimited incidents Web and email support 24 hour access on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $2,150.00 Talend Gold Support Same as above 3 users for 1 year $5,750.00 Talend Gold Support Same as above 5 users for 1 year $8,350.00 Talend Platinum Support Unlimited incidents Web, email and phone support 24 hour access (email and Web) on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year ? Talend Platinum Support Same as above 3 users for 1 year ? Talend Platinum Support Same as above unlimited users ?
Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
Talend claims to be the “first provider of open source data integration software”, and while that is not really the case, their software does provide some unique functionality Talend Open Studio “ the most open, innovative and powerful data integration solution on the market today.” Provides connectors to almost any source or destination, and has an easy to use/learn interface Talend Integration Suite Open Studio with a subscription service for technical support and source control for team environments Talend On Demand Saas version of Open Studio, stores all metadata and source code in a central repository hosted by Talend Does not require much configuration or administration by the development team Talend Open Profiler “ The first open source data profiling tool” Allows users to define metrics and goals about data quality for databases, files, applications, etc… Produces reports and graphs to display data quality issues and KPIs based on the defined metrics and goals
Talend’s software is written in Java and has a user interface built around the Eclipse IDE, and it includes a basic business modeler tool and real-time debugging capabilities Advanced Lookup/Join Editor Familiar Eclipse Interface Real-Time Debugger Basic Business Modeler
Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration  Data Integration (Kettle) “ Extract, Transform and Load (ETL) capabilities with an intuitive design environment” “ Proven, scalable, standards-based architecture” 100% Java with broad, cross platform support Advanced scheduling, process integration, reporting, and analysis Reporting Engine/Dashboards  Visualize KPIs, metrics, etc… Deploy as JSP pages Integrates with Google Maps, uses AJAX, etc… Data Mining (Weka) Clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis Algorithms can be called from code or applied directly OLAP Server (Mondrian) Web-based interface Excel Plugin Drillable spreadsheets and charts
Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration  Debugger  with Pause/Resume User Friendly Job Designer GUI Web-based Dashboards/Mashups Advanced Logging/Statistics
“ Yahoo Pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.”  Web-based ETL tool Written in Canvas and Javascript Features Extract Data CSV Files RSS Feeds Screen Scrape HTML Yahoo Local and Yahoo Search Flickr and Google Base Transform Data Row Count, Filter Geocode addresses String, Date and Number Manipulation Make web service calls Load/Output Data RSS Feeds KML Files JSON PHP Objects Interactive Yahoo Maps, etc… Combine many feeds into one, then sort, filter and translate it Geocode your favorite feeds and browse the items on an interactive map Embed widgets/badges on your own web site Output data as RSS, JSON, KML, and other formats
This Yahoo Pipes Demo uses data exported from the DeepBlue Employee List and transforms the data into a georss and kml feed Demo: Pariveda Employee Map Uses data from DeepBlue Employee Contact List Uses the Yahoo Geocoder service on each employee’s address Uses a text input box to limit the size of the dataset Displays as an interactive Yahoo Map or can be loaded into Google Earth as a kml file http://tinyurl.com/64ny7o
Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
Thank you for attending, please let me know if you have any questions Special Thanks to: Samir Ray Jeff Townes Brian Orell Daniel Herrin Sean Beard Grant Sutton

More Related Content

PPTX
Guide to Open Source Compliance
PPT
Govnet.Ppt
PPT
Open Source Presentation To Portal Partners2
PDF
Open Source License Compliance in the Cloud (CELESQ) (October 2012)
PPTX
Open Source vs Proprietary
PPTX
Open source software licenses
PDF
An Introduction to Free and Open Source Software Licensing and Business Models
Guide to Open Source Compliance
Govnet.Ppt
Open Source Presentation To Portal Partners2
Open Source License Compliance in the Cloud (CELESQ) (October 2012)
Open Source vs Proprietary
Open source software licenses
An Introduction to Free and Open Source Software Licensing and Business Models

What's hot (20)

PDF
Open source software: The infrastructure impact
PPTX
Proprietary & open source software
PPT
Opensource powerpoint-reviewppt742
PPT
Opensource Powerpoint Review.Ppt
PDF
Open Source Developer by Binary Semantics
PPT
Licensing,Ppt
PPTX
My Seminar
PDF
Open source software, commercial software, freeware software, shareware softw...
PDF
WP_Open-Source_Best_pratice_web
ODP
Understanding Free/Open Source Software (FOSS) and the Benefit to E-Commerce
PPTX
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
PPTX
Open Source and Open Data
PPT
PROPRIETARY AND OPEN SOURCE SOFTWARE
PPT
Open Source Software Presentation
DOCX
Mis full
PPT
2011 NASA Open Source Summit - Forge.mil
PPTX
Optimizing The Cost Of Open Source Software Management
PDF
License to Code: Indemnifying Your Business Against Open Source Licensing Lia...
PDF
Open Source Governance for your Organization
PDF
Exploring Open Source Licensing
Open source software: The infrastructure impact
Proprietary & open source software
Opensource powerpoint-reviewppt742
Opensource Powerpoint Review.Ppt
Open Source Developer by Binary Semantics
Licensing,Ppt
My Seminar
Open source software, commercial software, freeware software, shareware softw...
WP_Open-Source_Best_pratice_web
Understanding Free/Open Source Software (FOSS) and the Benefit to E-Commerce
Advantages & Disadvantages (Open-Source vs. Proprietary Software)
Open Source and Open Data
PROPRIETARY AND OPEN SOURCE SOFTWARE
Open Source Software Presentation
Mis full
2011 NASA Open Source Summit - Forge.mil
Optimizing The Cost Of Open Source Software Management
License to Code: Indemnifying Your Business Against Open Source Licensing Lia...
Open Source Governance for your Organization
Exploring Open Source Licensing
Ad

Viewers also liked (6)

PPTX
Pentaho - Apresentação
PDF
Postgres_9.0 vs MySQL_5.5
PPTX
Lean Analytics by Alistair Croll, Author, Lean Analytics
PPTX
Data Integration with CloverETL
PPTX
Lenguaje de programación MySQL
PDF
Talend Open Studio Data Integration
Pentaho - Apresentação
Postgres_9.0 vs MySQL_5.5
Lean Analytics by Alistair Croll, Author, Lean Analytics
Data Integration with CloverETL
Lenguaje de programación MySQL
Talend Open Studio Data Integration
Ad

Similar to Open Source ETL (20)

PDF
1 Open Source Business
PDF
Start your open source project
ODP
Open Source and You
PPTX
OPEN SOURCE SOFTWARE
PPTX
Open source presentation_v03
PDF
OPS_Unit-1--Open Source Demystifying.pdf
PPTX
Open Source Software
PPT
Open Source Software in Libraries
PPTX
Can We Really Run Our Businesses On Open Source Software
PDF
Intro to open_source
ODP
foss_19-9
PPT
Open source software vs proprietary software
PPT
Open source operating systems
PPT
open source
PPT
open source
PPT
Open source licenses training
ODP
FLOSS in SMEs
PPTX
open source technology
PPT
Open Source in the Enterprise: Compliance and Risk Management
PPT
Opensource
1 Open Source Business
Start your open source project
Open Source and You
OPEN SOURCE SOFTWARE
Open source presentation_v03
OPS_Unit-1--Open Source Demystifying.pdf
Open Source Software
Open Source Software in Libraries
Can We Really Run Our Businesses On Open Source Software
Intro to open_source
foss_19-9
Open source software vs proprietary software
Open source operating systems
open source
open source
Open source licenses training
FLOSS in SMEs
open source technology
Open Source in the Enterprise: Compliance and Risk Management
Opensource

Recently uploaded (20)

PDF
Launch a Bumble-Style App with AI Features in 2025.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PPTX
How to use fields_get method in Odoo 18
PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Human Computer Interaction Miterm Lesson
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Launch a Bumble-Style App with AI Features in 2025.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
NewMind AI Journal Monthly Chronicles - August 2025
How to use fields_get method in Odoo 18
Build Real-Time ML Apps with Python, Feast & NoSQL
Data Virtualization in Action: Scaling APIs and Apps with FME
Lung cancer patients survival prediction using outlier detection and optimize...
Report in SIP_Distance_Learning_Technology_Impact.pptx
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
giants, standing on the shoulders of - by Daniel Stenberg
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Human Computer Interaction Miterm Lesson
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Rapid Prototyping: A lecture on prototyping techniques for interface design
Connector Corner: Transform Unstructured Documents with Agentic Automation
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
Build automations faster and more reliably with UiPath ScreenPlay
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf

Open Source ETL

  • 1. TechFest – Open Source ETL Software David Morris Fort Worth, TX October, 2008 * For Internal Use Only *
  • 2. Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Conclusion
  • 3. This presentation will attempt to clarify misconceptions about open source software and discuss how it can benefit IT organizations in many different ways Open Source vs. Free Software Many people do not understand the definition of open source There is no such thing as free software Benefits of Open Source The potential for cost savings is the number one motivation to use enterprise open source software Software license costs is the most likely component where savings will occur Cost savings in general are difficult to calculate Flexibility often turns out to be the most beneficial result of using open source software Enterprise open source software has cost the proprietary software industry an estimated $60 billion per year Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments Business Intelligence Software Open source business intelligence software has matured over the last three years and many companies are beginning to offer an impressive set of tools free of up front licensing costs These open source projects are typically backed by a corporation that pays full time employees to develop the core code base, and earns money through support contracts and consulting services Many of the open source BI products are built using Java technology, with user interfaces built over the Eclipse IDE Evaluating open source BI software can offer a fresh perspective on techniques and processes used by competing proprietary software Many of these tools have the potential to be a real competitor in the BI space
  • 4. Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
  • 5. The idea of open source or free software has a rich history that began in the 1960s 1969 - ARPANET - Advanced Research Projects Agency Network First operational packet switching network Predecessor of the internet 1970s – Email (SMTP), File Transfer Protocol, Network Voice Protocol (NVP) standards developed 1985 – Free Software Foundation – Richard Stallman Universal freedom to distribute and modify computer software without restriction Founded to support the free software movement Enforcement of the General Public License 1992 – Linux kernel released under GPL – Linus Torvalds 1998 - Open Source Initiative (OSI) – Bruce Perens and Eric Raymond Formalized open source software and brought the model to major software companies Formulated the Open Source Definition to determine which licenses are actually “open source” licenses 1998 – Netscape Navigator releases source code known today as Firefox and Thunderbird 1999 – Sun Microsystems releases StarOffice under GPL later renamed OpenOffice
  • 6. “ Free as in speech, not beer.” – Richard Stallman “ Open source” is not free Many open source software licenses are free, but some licenses have costs associated with them Many mature open source projects, especially operating systems, earn money from paid support and documentation “ Open source is a free like a puppy is free.” – Scott McNealy, Chairman of Sun Microsystems “ Lowering the cost of goods tends to increase the total investment of the people and infrastructure that sustain it.” – Eric Raymond, The Magic Cauldron http://www.gnu.org/philosophy/open-source-misses-the-point.html
  • 7. Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria Free Redistribution The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale. Source Code The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed. Derived Works The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software. Integrity of The Author's Source Code The license may restrict source-code from being distributed in modified form  only  if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software. No Discrimination Against Persons or Groups The license must not discriminate against any person or group of persons. http://opensource.org/docs/osd
  • 8. Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria (cont.) No Discrimination Against Fields of Endeavor The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research. Distribution of License The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties. License Must Not Be Specific to a Product The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution. License Must Not Restrict Other Software The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software. License Must Be Technology-Neutral No provision of the license may be predicated on any individual technology or style of interface.
  • 9. There are many different types of hard costs associated with leveraging open source software for enterprise IT projects Software Licenses Referring to the licenses themselves, not legal terms and conditions Often offers the most potential cost savings vs. proprietary software Hardware Open source software often has reduced hardware requirements Support Often more but less mature options for support in open source projects Development Access to source code can help make development easier and less costly Lack of feature parity with proprietary software may create a need for more custom development Opportunity to give source code back to the open source community Professional Services Development, installation, and configuration costs Offered by many open source software vendors Training Offered directly by software vendor Through a professional training center or educational institution On-site or off-site, or online Testing Unit testing, performance testing, functional testing, test scripts, use-case scenarios, quality assurance costs
  • 10. There are many different types of hard costs associated with leveraging open source software for enterprise IT projects (cont.) Operations (Manageability) Mix of labor, management and monitoring tools configuration, creation of manuals to support operations Open source tends to have less mature management capabilities Staffing No conclusive evidence to show that staffing open source projects is cheaper than for proprietary projects Maintenance Contracts 15-25 percent of the license costs or equipment costs per year. Calculated using the list price, not the actual paid price Treated separate from support contracts in many open source projects Costs associated with patching and updating software over time Often free with most zero-cost open source software licenses Migration Especially for system replacement projects where existing data must be migrated to the new application Environmental Datacenter and hosting costs, floor space, power, bandwidth, hardware leasing Documentation Often coincides with the training category above Configuration Often captured with the development and operations categories
  • 11. There are also soft or intangible costs associated with enterprise open source software projects and are typically harder to calculate than hard costs Downtime financial impact of system outage IP Risk Legal/litigation costs License Auditing Risk Resources required to perform a vendor-required license audit License Management Resources required to manage deployment of licenses and purchase of additional licenses as the deployment grows License Negotiation Overhead Legal costs required in negotiating the software licensing contract Planning Resources for planning and overhead Process Inefficiencies Lost time and costs related to process activities Procurement Overhead Purchase cost and resources required to procure the software Productivity Efficiencies from using the software Reliability Financial impact of improved system reliability and uptime Support Quality Resources required for software support
  • 12. Many organizations do not have a formal process in place to do a comprehensive financial analysis of software commitments Below is an example finanical analysis spreadsheet comparing the use of open source software vs. proprietary for an enterprise IT project. It is interesting to look at to see how misleading many of these open source software vendors can be when they are desperate for a new client. 451 CAOS Report 2 - Cost Conscious A practical guide for understanding and calculating the financial benefits of open source for enterprise IT projects The 451 Group                         Calculator - Example A                         Open Source Option   Proprietary/Existing Option   Initial Investment Year 1 Year 2 Year 3 TOTALS     Initial Investment Year 1 Year 2 Year 3 TOTALS HARD COSTS $76,250.00 $41,200.00 $111,150.00 $49,100.00 $277,700.00   HARD COSTS $296,250.00 $99,200.00 $176,150.00 $104,100.00 $675,700.00 Software Licenses - - - - $0.00   Software Licenses $200,000.00 $50,000.00 $50,000.00 $50,000.00 $350,000.00 Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00   Hardware $25,000.00 $3,000.00 $3,000.00 $3,000.00 $34,000.00 Support - $25,000.00 $30,000.00 $35,000.00 $90,000.00   Support - $30,000.00 $35,000.00 $40,000.00 $105,000.00 Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00   Development $22,000.00 $2,500.00 $5,000.00 $2,500.00 $32,000.00 Professional Services $12,500.00 $3,000.00 - - $15,500.00   Professional Services $25,000.00 $6,000.00 - - $31,000.00 Training $7,500.00 - - - $7,500.00   Training $15,000.00 - - - $15,000.00 Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00   Testing $3,000.00 $3,000.00 $3,000.00 $3,000.00 $12,000.00 Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00   Operations $2,500.00 $500.00 $500.00 $500.00 $4,000.00 Staffing - - $65,000.00 - $65,000.00   Staffing - - $75,000.00 - $75,000.00 Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00   Maintenance Contracts $3,750.00 $4,200.00 $4,650.00 $5,100.00 $17,700.00                           SOFTS COSTS         -   SOFTS COSTS         - INTERNAL COSTS         -   INTERNAL COSTS         -                           REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00   REVENUE   $250,000.00 $500,000.00 $800,000.00 $1,550,000.00                           CASHFLOW             CASHFLOW           Period -$76,250.00 $208,800.00 $388,850.00 $750,900.00     Period -$296,250.00 $150,800.00 $323,850.00 $695,900.00   Cumulative (Payback) -$76,250.00 $132,550.00 $521,400.00 $1,272,300.00     Cumulative (Payback) -$296,250.00 -$145,450.00 $178,400.00 $874,300.00                             RATE OF RETURN   607% 450% 1629%     RATE OF RETURN   252% 284% 768%   PAYBACK PERIOD Year 1           PAYBACK PERIOD Year 2         NPV $954,643.20           NPV $591,891.97         IRR 340%           IRR 82%         cost of capital % 12.00%           cost of capital % 12.00%         NOTES:                         Use this spreadsheet to consider Cost Avoidance and Opportunity Costs… May add an additional "COST SAVINGS" column above the REVENUE column to account for existing sunk costs Not covered in this spreadsheet: Depreciation, Amortization, Capital and Expense Budgets, etc… Dave's Attempt at saving $3,750 by not buying the spreadsheet from the 451 group
  • 13. There are a variety of business models that have proven to work for companies who want to make money using open source software Support Sellers (otherwise known as "Give Away the Recipe, Open A Restaurant") : Give away the software product Sell distribution, branding, and after-sale service This is what RedHat does. Loss Leader Give away open-source as a loss-leader and market positioner for closed software. Netscape, Digium (Asterisk) Widget Frosting Hardware company goes open-source in order to get better drivers and interface tools cheaper. Silicon Graphics (Samba), Apple (Darwin) Accessorizing Selling accessories – books, compatible hardware, complete systems with open-source software pre-installed O'Reilly Associates, OLPC, source: The Open Source Initiative: http://www.opensource.org/advocacy/case_for_business.php
  • 14. There are many barriers to open source adoption in IT organizations, most of which are risk related Open source licenses are viral Open source software lacks formal support and training Software changes too often and is difficult to keep up Lack of a long term roadmap Sunk costs in existing projects Switching costs De facto industry standards
  • 15. Enterprise open source adoption offers many benefits to IT organizations within any type of business Short term Cost savings Most IT organizations are motivated by short term cost savings when evaluating open source software adoption The potential for saving money on software licensing fees is a huge factor in the cost equation Software licensing fees can be a large percentage of the up front costs for new projects as well as massive expansion of existing projects Long term flexibility In the long run, the benefits of flexibility tend to outweigh the cost benefit of using open source software Developers have access to the source code and have the ability to modify and customize it to suit their specific needs Reliability “ If builders built houses the way programmers built programs, the first woodpecker to come along would destroy civilization. “ – Gerald P. Weinberg The internet depends on a variety of high reliability open source projects (DNS, sendmail, TCP/IP stacks, Perl) Avoiding vendor lock-in Organizations can become less vendor-dependent by using open source software Avoiding vendor lock-in can help a company avoid severe switching costs Royalty-Free standards vs. Free and Open Source Software (FOSS) Security "Given enough eyeballs, all bugs are shallow." - Linus’ Law Security problems can be identified quickly and someone will be able to fix it Performance An often-cited example is Linux vs. Windows server clusters
  • 16. There are many different open source licenses and it can be difficult to distinguish one license from another Most popular GNU General Public License GNU Library or Lesser GPL Apache Software License Berkeley Software Distribution (BSD) MIT License Mozilla Public License Eclipse Public License Special Purpose Educational Community License NASA Open Source Agreement 1.3 Open Group Test Suite License Miscellaneous Adaptive Public License Artistic License 2.0 Open Software License Qt Public License And many more… http://www.opensource.org/licenses/category http://en.wikipedia.org/wiki/Comparison_of_free_software_licences
  • 17. Open source software licenses can range from very simple to relatively complex Software Licenses Cost of the actual license Many open source vendors have a dual license model Not the legal licensing terms or conditions Seen as the greatest potential for savings in an open source project Savings on licenses often used to offset training and professional services costs Can include client access licenses, desktop licenses, database license and development tools Based on the number of CPUs or number of users Every vendor has their own rules Makes calculating project costs difficult Dual license model Choose between an open source (free) license or a commercial license that costs money Trolltech Qt Example Motivated by market segregation based business models and license compatibility needs Open Core License model core is GPL: if you embed the GPL in closed source, you pay a fee technical support of GPL product may be offered for a fee (up for debate as to whether it must be offered) annual commercial subscription includes: indemnity, technical support, and additional features and/or platform support. Additional commercial features having viewable or closed source, becoming GPL after time bomb period are both up for debate professional services and training are for a fee Licensing cost comparison works for new projects, but not necessarily existing projects Must be estimated over the life of the project Zero cost open source software has caused proprietary vendors to lower their prices and this trend will continue An estimated $60 billion per year is lost by proprietary software vendors
  • 18. Another great benefit of open source software is the ability to download and try it out for free, and even learn about the development history and statistics Ohloh.net is a site that gives everyone more visibility into open source software projects by providing statistics, tracking code commit history, providing package downloads, etc…
  • 19. Open source software websites offer free downloads, documentation SSIS Informatica Ab Initio Blender Talend Pentaho Proprietary Open Source
  • 20. Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
  • 21. The concept of business intelligence has been around since the 1950s and became popularized in the software industry in the late 1980s and early 1990s 1958: Business Intelligence term coined by Hans Peter Luhn – “to support better decision making” In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera. The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system. The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.“ – Hans Peter Luhn 1989: "Business intelligence" is first used as an umbrella term to describe the set of “concepts and methods to improve business decision-making by using fact-based support systems.” – Howard Dresner 1990s: Business Intelligence software market explodes and becomes very difficult to keep track of who’s doing what.
  • 22. There are many “big players” in the business intelligence software space that have been around for much longer than their open source counterparts Informatica IBM Ascential Microsoft DTS/SSIS ACE*COMM Ab Initio Actuate Comanche CyberQuery Dimensional Insight IBM Applix Cognos Informatica Information Builders LogiXML LucidEra Microsoft Microsoft Analysis Services PerformancePoint Server 2007 Proclarity DTS/SSIS Microstrategy Oracle Corporation Hyperion Solutions Corporation Panorama Software Pervasive Pilot Software, Inc. PRELYTIS Prospero Business Suite Qliktech SAP Business Information Warehouse Business Objects OutlookSoft SAS Institute Siebel Systems Tibco StatSoft SPSS Telerik Reporting Teradata Thomson Data Analyzer
  • 23. Open source business intelligence solutions started coming onto the scene in 2000, but the space began to explode in 2005
  • 24. There are a variety of open source business intelligence products that are beginning to compete against proprietary alternatives after only a few years of development Talend (Suresnes, France and Los Altos, CA) OpenStudio Integration Suite Open Profiler Pentaho (Orlando, FL and Belgium) Reporting Engine Kettle Data Integration Weka Data Mining Jasper (Dublin, Ireland) JasperServer – Interactive, ad hoc, and managed reporting and dashboards JasperAnalysis – Interactive data analysis, OLAP JasperETL – Data Integration JasperReports Apatar (Chicopee, MA and Minsk, Belarus) Merge OnDemand Mondrian
  • 25. Companies that sponsor and develop open source projects or offer many different types of tools and support options that vary in cost Professional Services Proof of Concept Development On Demand Service Contracts Consulting Services Technology Assessments Professional Tools In addition to open source tools Professional tools offer more functionality Training On-site or Online, Group or Individual Certification Exams Support Contracts Typically priced on a per year or per incident basis Variety of support options depends on popularity of the open source project Technology Partners / Alliance Program Training partners Development partners Platinum, Gold, Silver, Bronze levels or tiers
  • 26. There are often a wider variety of support options available when using open source software, but they tend to be less mature than with proprietary software Three common support models Professional support by open source software vendors Third-party vendor or consultant support Self-support Various Feature Levels Sold on a per user per year basis Number of incident reports per year (1,2,3, unlimited) with the ability to purchase extra incidents Web-based support Email support Phone support Guaranteed response times (8 hours, 1 day, 2-3 business days) Guaranteed diagnostic turnaround times Access to a certified version of the software Automatic notification of bug fixes / updates Access to community or professional support forum and bug tracker Access to advanced tutorials Various Pricing Levels $1,000 - $5,000 per user per year Can be even more expensive for “Enterprise” or “Professional” versions of the open source tools
  • 27. Most companies providing support for open source projects offer different support levels such as Bronze, Silver, Gold, and Platinum Company Support Level Description Type Price Talend Silver Support Three incidents per year Web support Guaranteed response times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $1,150.00 Talend Silver Support Same as above 5 users for 1 year $4,950.00 Talend Gold Support Unlimited incidents Web and email support 24 hour access on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year $2,150.00 Talend Gold Support Same as above 3 users for 1 year $5,750.00 Talend Gold Support Same as above 5 users for 1 year $8,350.00 Talend Platinum Support Unlimited incidents Web, email and phone support 24 hour access (email and Web) on business days Guaranteed response times Guaranteed diagnostic turnaround times Access to a certified version of Talend´s products Automatic notification of updates and bug fixes Access to the support forum and bug tracker 1 user for 1 year ? Talend Platinum Support Same as above 3 users for 1 year ? Talend Platinum Support Same as above unlimited users ?
  • 28. Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
  • 29. Talend claims to be the “first provider of open source data integration software”, and while that is not really the case, their software does provide some unique functionality Talend Open Studio “ the most open, innovative and powerful data integration solution on the market today.” Provides connectors to almost any source or destination, and has an easy to use/learn interface Talend Integration Suite Open Studio with a subscription service for technical support and source control for team environments Talend On Demand Saas version of Open Studio, stores all metadata and source code in a central repository hosted by Talend Does not require much configuration or administration by the development team Talend Open Profiler “ The first open source data profiling tool” Allows users to define metrics and goals about data quality for databases, files, applications, etc… Produces reports and graphs to display data quality issues and KPIs based on the defined metrics and goals
  • 30. Talend’s software is written in Java and has a user interface built around the Eclipse IDE, and it includes a basic business modeler tool and real-time debugging capabilities Advanced Lookup/Join Editor Familiar Eclipse Interface Real-Time Debugger Basic Business Modeler
  • 31. Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration Data Integration (Kettle) “ Extract, Transform and Load (ETL) capabilities with an intuitive design environment” “ Proven, scalable, standards-based architecture” 100% Java with broad, cross platform support Advanced scheduling, process integration, reporting, and analysis Reporting Engine/Dashboards Visualize KPIs, metrics, etc… Deploy as JSP pages Integrates with Google Maps, uses AJAX, etc… Data Mining (Weka) Clustering, segmentation, decision trees, random forests, neural networks, and principal component analysis Algorithms can be called from code or applied directly OLAP Server (Mondrian) Web-based interface Excel Plugin Drillable spreadsheets and charts
  • 32. Pentaho Kettle provides a full spectrum of business intelligence capabilities including reporting, analysis, dashboards, data mining, and data integration Debugger with Pause/Resume User Friendly Job Designer GUI Web-based Dashboards/Mashups Advanced Logging/Statistics
  • 33. “ Yahoo Pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” Web-based ETL tool Written in Canvas and Javascript Features Extract Data CSV Files RSS Feeds Screen Scrape HTML Yahoo Local and Yahoo Search Flickr and Google Base Transform Data Row Count, Filter Geocode addresses String, Date and Number Manipulation Make web service calls Load/Output Data RSS Feeds KML Files JSON PHP Objects Interactive Yahoo Maps, etc… Combine many feeds into one, then sort, filter and translate it Geocode your favorite feeds and browse the items on an interactive map Embed widgets/badges on your own web site Output data as RSS, JSON, KML, and other formats
  • 34. This Yahoo Pipes Demo uses data exported from the DeepBlue Employee List and transforms the data into a georss and kml feed Demo: Pariveda Employee Map Uses data from DeepBlue Employee Contact List Uses the Yahoo Geocoder service on each employee’s address Uses a text input box to limit the size of the dataset Displays as an interactive Yahoo Map or can be loaded into Google Earth as a kml file http://tinyurl.com/64ny7o
  • 35. Table of Contents Introduction Open Source Software Brief History of Open Source Software The Cost of Open Source Barriers to Open Source Adoption Open Source Licensing Overview Business Intelligence Software Business Intelligence Software Overview Brief History of Open Source BI Software Open Source BI Vendor Offerings Demonstrations Talend OpenStudio Pentaho Kettle Yahoo Pipes Questions
  • 36. Thank you for attending, please let me know if you have any questions Special Thanks to: Samir Ray Jeff Townes Brian Orell Daniel Herrin Sean Beard Grant Sutton

Editor's Notes

  • #2: Hi everyone, my name is David Morris, and I’m a C2 in the Houston Office for those of you who don’t know me yet. In this presentation, I’m going to talk about open source software, and specifically ETL, or Extract, Transform and Load software. I am sure many of you are familiar with ETL software if you’ve ever used Microsoft SQL Server Integration Services. Regardless of your previous experience level with ETL software or business intelligence in general, I hope everyone learns something from my presentation today. Without further ado, thank you for coming, and I’m going to get started. I’m using an experimental plugin for PowerPoint today, so bear with me if the slide transitions are a little strange to you. I like to try new things, and it was free, so we’ll see how it goes… Go over agenda on next slide