Big Data:
Movement, Warehousing, & Virtualization
Presented to TSAM Data Management Stream – July 14th, 2011
Overview

• Major Industry Trends

• Data Virtualization & Distributed Storage

• Impacts to Our Industry

• Solution Alignment to Technology




                                              2
Trend #1: Cost Structure of Storage

•  Cost                                     2009
                                            -  67 TB in 4U
  •  Distributed commodity storage is       -  24.5x multiple
                                            -  Reliability as key
     25x cheaper than Tier 1 SAN               differentiator
  •  High reliability (replication) it is   -  With replication
                                               (55x)
     closer to 55x                          -  Equivalent
                                               Performance


•  Performance
  •  Distributed is now faster
  •  Flash Exacerbates
                                                                    Source: BackBlaze.com



                                            2011
•  Decreasing Differentiators               -  145 TB in 4U (Disk)
                                            -  27 TB in 4U (Flash)
  •  Perceived Reliability                  -  26x multiple
  •  Enterprise Management                  -  w/ Data Fabric /
                                               Virtualization is
  •  Legacy Compatibility                      as reliable
                                            -  Higher
                                               Performance
                                                                                            3
Trend #2: Moving from Blocks to Data
•  Blocks are a legacy to tape storage

•  Deeply embedded in the OS / Driver fabric and most legacy DB
   architectures

•  Horribly inefficient for modern requirements
  •  Replication / Synchronization (>100x retransmission)
  •  Networks are not designed for blocks
  •  Applications have to Load / Store
  •  Wall Street data usage is different than standard Fortune 500 (more dynamic data
     and higher churn rates)
  •  WAN Optimization can not fully solve

•  Atomic Data is an emerging model
  •  DB Rows / Messages are the historical Atomic example
  •  PaaS interfaces are ALL data and file driven

•  What is YOUR interface?
                                                                                        4
Trend #3: End of Single Location
•  Single Location Warehouse’s are Challenged
  •  Time to Query
  •  User Experience & SLA
  •  Data volumes and WAN bandwidth
  •  Regulatory and Security
  •  Integrated System Dependencies
  •  Clients / customers / applications are all in motion (mobile platform & need for

•  Impact of Moving from Single Location
  •  Dynamic data synchronization
     • 1 Second global SLA for data synchronization – emerging standard for risk
     • Mechanisms for distribute data sync are different
        •  PUSH = the new Data Fabric
        •  PULL = existing WAN Optimization
        •  Need for a new model for WAN optimization (beyond zlib / dedupe)
     • Networks can’t handle file copy (block) it must be data
  •  Elasticity in data movement – the “fabric” must be able to buffer
  •  Turns the file and database replication and model on it’s head: 1 to many

                                                                                        5
Data Virtualization & Distributed Storage

•  Data Virtualization Layers
 •  Data (storage, DB, cache, streaming
    sources, state, etc…)
 •  Data Fabric (data movement,
    reliability, buffering, WAN services)
 •  Data transformation (EII) and
    coordination services (virtualization)
 •  Data Access / Interface
                   &
•  Distributed Storage Model
 •  Data (storage, DB, cache, streaming
    sources, state, etc…)
 •  Data Fabric (data movement,
    reliability, buffering, WAN services)
 •  Legacy Interfaces

                                             6
Impact of the New Model

• Database Vendor Market
 •  New Architectures (column store & distributed) can have the same
    reliability, enterprise features and far better performance
 •  Monolithic DB solutions no longer need to rely upon storage for
    DR / reliability
• Cost Structure – One size does NOT fit all
• Platform
 •  Cloud – Public / Private
 •  Existing Infrastructure
 •  Is there any difference
• Elasticity of Compute


                                                                       7
Adoption

•  Early Adopters of the Model in the Enterprise
 •  Big Data and Mining:
    • Options
    • Back testing
    • Regulatory and compliance
    • Real-time risk
    • Global position & Instrument Master
    • Best Execution
 •  Hot-Hot DR
 •  Global Data Availability

•  Flexible Computing Utilizing Cloud Technologies
 •  Complex derivative pricing
 •  Grid – DR
 •  Seamless integration of remote locations / venues

                                                        8
About Tervela: Data In Motion
The Tervela Data Fabric                                                    Products
The fastest, most reliable, and cost
effective data transport system for globally
                                                                            TMX: Message Switch
distributed, mission-critical applications.                                 Message transport through the fabric

 •  10-100x performance increase
  over traditional solutions                                                TPE: Persistence Engine
                                                                            Embedded storage within the fabric
 •  Beyond 5x9’s
  built-in fault tolerance & high availability
                                                                            TPM: Provisioning & Management
                                                                            Central management of the fabric
 •  50% faster to deliver new apps
  simple development tools & embedded services                              Data Fabric
                                                                            Optimized for Distributed Data and
 •  Data-layer security                                                     Applications
  integrated data entitlements & protection
                                                                            Client APIs
                                                                            C, C++, C#, Java, JMS, PaaS

                                                 Virtual Data Fabric Appliance
                                                 Free Download
                                                 www.tervela.com/download


                                                                                                                   9
Q&A




      10
Big Data:
Movement, Warehousing, & virtualization
Presented to TSAM Data Management Stream – July 14th, 2011




                                                             11

Big Data: Movement, Warehousing, & Virtualization

  • 1.
    Big Data: Movement, Warehousing,& Virtualization Presented to TSAM Data Management Stream – July 14th, 2011
  • 2.
    Overview • Major Industry Trends • DataVirtualization & Distributed Storage • Impacts to Our Industry • Solution Alignment to Technology 2
  • 3.
    Trend #1: CostStructure of Storage •  Cost 2009 -  67 TB in 4U •  Distributed commodity storage is -  24.5x multiple -  Reliability as key 25x cheaper than Tier 1 SAN differentiator •  High reliability (replication) it is -  With replication (55x) closer to 55x -  Equivalent Performance •  Performance •  Distributed is now faster •  Flash Exacerbates Source: BackBlaze.com 2011 •  Decreasing Differentiators -  145 TB in 4U (Disk) -  27 TB in 4U (Flash) •  Perceived Reliability -  26x multiple •  Enterprise Management -  w/ Data Fabric / Virtualization is •  Legacy Compatibility as reliable -  Higher Performance 3
  • 4.
    Trend #2: Movingfrom Blocks to Data •  Blocks are a legacy to tape storage •  Deeply embedded in the OS / Driver fabric and most legacy DB architectures •  Horribly inefficient for modern requirements •  Replication / Synchronization (>100x retransmission) •  Networks are not designed for blocks •  Applications have to Load / Store •  Wall Street data usage is different than standard Fortune 500 (more dynamic data and higher churn rates) •  WAN Optimization can not fully solve •  Atomic Data is an emerging model •  DB Rows / Messages are the historical Atomic example •  PaaS interfaces are ALL data and file driven •  What is YOUR interface? 4
  • 5.
    Trend #3: Endof Single Location •  Single Location Warehouse’s are Challenged •  Time to Query •  User Experience & SLA •  Data volumes and WAN bandwidth •  Regulatory and Security •  Integrated System Dependencies •  Clients / customers / applications are all in motion (mobile platform & need for •  Impact of Moving from Single Location •  Dynamic data synchronization • 1 Second global SLA for data synchronization – emerging standard for risk • Mechanisms for distribute data sync are different •  PUSH = the new Data Fabric •  PULL = existing WAN Optimization •  Need for a new model for WAN optimization (beyond zlib / dedupe) • Networks can’t handle file copy (block) it must be data •  Elasticity in data movement – the “fabric” must be able to buffer •  Turns the file and database replication and model on it’s head: 1 to many 5
  • 6.
    Data Virtualization &Distributed Storage •  Data Virtualization Layers •  Data (storage, DB, cache, streaming sources, state, etc…) •  Data Fabric (data movement, reliability, buffering, WAN services) •  Data transformation (EII) and coordination services (virtualization) •  Data Access / Interface & •  Distributed Storage Model •  Data (storage, DB, cache, streaming sources, state, etc…) •  Data Fabric (data movement, reliability, buffering, WAN services) •  Legacy Interfaces 6
  • 7.
    Impact of theNew Model • Database Vendor Market •  New Architectures (column store & distributed) can have the same reliability, enterprise features and far better performance •  Monolithic DB solutions no longer need to rely upon storage for DR / reliability • Cost Structure – One size does NOT fit all • Platform •  Cloud – Public / Private •  Existing Infrastructure •  Is there any difference • Elasticity of Compute 7
  • 8.
    Adoption •  Early Adoptersof the Model in the Enterprise •  Big Data and Mining: • Options • Back testing • Regulatory and compliance • Real-time risk • Global position & Instrument Master • Best Execution •  Hot-Hot DR •  Global Data Availability •  Flexible Computing Utilizing Cloud Technologies •  Complex derivative pricing •  Grid – DR •  Seamless integration of remote locations / venues 8
  • 9.
    About Tervela: DataIn Motion The Tervela Data Fabric Products The fastest, most reliable, and cost effective data transport system for globally TMX: Message Switch distributed, mission-critical applications. Message transport through the fabric •  10-100x performance increase over traditional solutions TPE: Persistence Engine Embedded storage within the fabric •  Beyond 5x9’s built-in fault tolerance & high availability TPM: Provisioning & Management Central management of the fabric •  50% faster to deliver new apps simple development tools & embedded services Data Fabric Optimized for Distributed Data and •  Data-layer security Applications integrated data entitlements & protection Client APIs C, C++, C#, Java, JMS, PaaS Virtual Data Fabric Appliance Free Download www.tervela.com/download 9
  • 10.
    Q&A 10
  • 11.
    Big Data: Movement, Warehousing,& virtualization Presented to TSAM Data Management Stream – July 14th, 2011 11