Augmented OLAP
for Big Data
Luke Han | luke.han@Kyligence.io
Co-founder & CEO of Kyligence
Apache Kylin PMC Chair
Microsoft Reginal Director & MVP
Strata Global Sponsor
BOOTH #410
© Kyligence Inc. 2019.
About Luke Han
• Luke Han
• Co-founder & CEO at Kyligence
• Co-creator and PMC Chair of Apache Kylin
• Apache Software Foundation Member
• Microsoft Regional Director & MVP
• Former eBay Big Data Product Manager Lead
© Kyligence Inc. 2019.
About Apache Kylin
• Leading Open Source OLAP for Big Data
• Rank 1 from googling “big data OLAP”
• Rank 1 from googling “hadoop OLAP”
• Open sourced by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Video Demo
• Benchmark
• Use Cases
© Kyligence Inc. 2019.
Kyligence = Kylin + Intelligence
• Founded in 2016 by the original creators of Apache Kylin
• CRN Top 10 Big Data Startups 2018
• Backing by leading VCs:
• Redpoint Ventures
• Cisco
• CBC Capital
• Shunwei Capital
• Eight Roads Ventures (Fidelity International Arm)
• Coatue
• Global Offices:
• Shanghai
• Beijing
• Shenzhen
• San Jose
• New York
• Seattle
• …
© Kyligence Inc. 2019.
Telecom
Finance
Manufacturing
Trusted by Global Leaders
Retail &
Others
Most of them are Global Fortune 500
© Kyligence Inc. 2019.
Global Partners
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271418327393.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
How many people
really know how to
setup those?
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271437395703.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
Google Photos
© Kyligence Inc. 2019.
Let’s talk about photography story…
How do you manage
your 100,000+ photos?
Google Photos
© Kyligence Inc. 2019, Confidential.
Then…
how about your enterprise
data?
© Kyligence Inc. 2019, Confidential.
https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1
https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
© Kyligence Inc. 2019, Confidential.
Fast and Changing
Analysis Demand
Slow and Heavy
Big Data Operations
vs
© Kyligence Inc. 2019.
The Typical “Throw in some People” Approach
Business Users Analysts Data Engineers
Business Analysis Data Modeling Lake → Warehouse → Mart Reporting
$$$High Cost :
Administrators
Slow Time-to-Insight:
© Kyligence Inc. 2019, Confidential.
Presentation
Visualization
Impala
Data Lake
Hive Spark SQL Drill
MapReduce Spark …….
Time-to-value Pain
Weeks of waiting breaks the
“online” promise.
Collaboration Pain
Hard to reuse asset across teams.
Each team fights their own path.
Resource Pain
Hard to scale. Where to find so
many skilled big data engineers?
Pains in the “Throw in some People” Approach
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019, Confidential.
Throw in some Intelligence!
Let a system replace the people.
o Transparent SQL Acceleration
o On-demand Data Preparation
o Interactive Query Performance
o High Concurrency
o Centralized Semantic Layer
Faster time to market. Stay “online”.
Augmented OLAP
Data Mart
Presentation
Visualization
Impala
Data Lake
Hive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Business User Analyst Data Engineer
vs
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Pattern Detection
Auto Modeling
Data Preparing
Raw Data
Prepared Data
Augmented
OLAP Engine
(Background Learning)
© Kyligence Inc. 2019.
Demo Setup
Tableau
SparkSQL 2.4
Kyligence
Enterprise
Analyze 1 billion rows of
sales records (TPC-H)
Business User
© Kyligence Inc. 2019.
(Embed the Demo Video)
© Kyligence Inc. 2019.
Demo FAQ
Business User
Analyst
reuse
How to improve the first slow
exploration?
What if the analyst operates differently
the second time?
More comprehensive performance
benchmark?
Prepared Data
© Kyligence Inc. 2019.
TPC-H Decision Support Benchmark
TPC-H Benchmark
• Examine large volumes of data
• High complexity queries
• Answers critical business questions
• 22 decision making queries
E.g. The Shipping Priority Query
retrieves the shipping priority and potential
revenue of the orders having the largest revenue
among those that had not been shipped as of a
given date. Top 10 orders are listed in
decreasing order of revenue.
© Kyligence Inc. 2019.
Kyligence Enterprise 4 Beta vs SparkSQL 2.4
To see the trend as data grows
• 3 datasets
• Scale Factor = 20, 35, 50
• TPCH_SF1: Consists of the base row size (several million
elements).
• TPCH_SF20: Consists of the base row size x 20.
• TPCH_SF35: Consists of the base row size x 35.
• TPCH_SF50: Consists of the base row size x 50 (several
hundred million elements).
Billion
© Kyligence Inc. 2019.
Hardware Configurations
Same 4 physical nodes
- Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2
- Totally 86 vCores, 188 GB mem
Same Spark configuration for both KE 4 Beta and SparkSQL 2.4
- spark.driver.memory=16g
- spark.executor.memory=8g
- spark.yarn.executor.memoryOverhead=2g
- spark.yarn.am.memory=1024m
- spark.executor.cores=5
- spark.executor.instances=17
© Kyligence Inc. 2019.
Query Response Time | KE 4 Beta vs. SparkSQL 2.4
Milliseconds
TPC-H 22 queries
For each dataset
- Run each query 3 times
- Record the average time
- No warm up
Lower is better.
SF=50
© Kyligence Inc. 2019.
Total Response Time | KE 4 Beta vs. SparkSQL 2.4
Billion Seconds
Total response time is the sum
of 22 queries’ response time.
Compare over the size of
datasets and feel the trend.
Scale out for the future.
© Kyligence Inc. 2019.
Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4
Acceleration Rate
= SparkSQL time / KE time
Take average of the 22 and
compare over size of datasets.
© Kyligence Inc. 2019.
SQL
Query Log
Analytic Behavior
Data
Schema
Data
Profile
Machine Learning
Engine
Data Modeling
Automation
Kylin Cube
Learnt Index
Smart Pushdown
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container Data Services
© Kyligence Inc. 2018, Confidential.
AI-Augmented Analytics Platform
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Use Case: IBM Cognos Replacement
One Kyligence Cube for 800+ Cognos Cubes
Org. Daily
Cube
Merch. Daily
Cube
Channel Daily
Cube
Region Daily
Cube
Org. Monthly
Cube
Merch.
Monthly Cube
Channel
Monthly Cube
Region
Monthly Cube
Shanghai
Merchants
Zhejiang
Merchants
Anhui
Merchants
Guangdong
Merchants
Card Transaction
Dimensions: 167
Measures: 20
800+ Cognos Cube, 1000+ ETL jobs
Functional
Scene
Time
Scene
Geo
Scene
┄ ┄
Data: 300+ B Records
Merchants: 10+m
Cards: 10+B
© Kyligence Inc. 2019.
Use Case: Data as a Services Platform
In the past, due to the limitations of our previous
multi-dimensional analytic tool, we faced challenges
of constrained time range in queries……We are
considering leveraging multi-dimensional data
cubes to replace a number of fragmented legacy
tabular reports in more business units, so that we
can provide better analytic services to our business
users.”
-- Laments Wu Ying, VP of CMBs Development
Center,
Offline Platform
(Hadoop)
Online Platform
(Hadoop)
EDW
(Teradata)
Kyligence Data-as-a-Service Platform
Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N……
Business Intelligence
Cognos Tableau MicroStrategy Superset API
Smart Routine
Intelligent
Modeling
Multi-tenancy
Applications
SecurityJob
MIP MDS CRM DWS
© Kyligence Inc. 2019, Confidential.
Take away: Augmented OLAP, the future for analytics
ImpalaHive Spark SQL Drill
MapReduce Spark …….
AI-Augmented
OLAP
ImpalaHive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
Thanks
luke.han@Kyligence.io |
@lukehq
Homepage: http://kyligence.io
Twitter: @kyligence
Booth: #410

More Related Content

PDF
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
PDF
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
PDF
20160331 sa introduction to big data pipelining berlin meetup 0.3
PDF
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PPTX
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
PPTX
Big Data as Competitive Advantage in Financial Services
PDF
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
PDF
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...
20160331 sa introduction to big data pipelining berlin meetup 0.3
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Big Data as Competitive Advantage in Financial Services
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...

What's hot (19)

PDF
Cloud-Native Microservices
PDF
The Manulife Journey
PDF
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
PDF
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PDF
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PDF
Javaedge 2010-cschalk
PPTX
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
PPTX
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
PPTX
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
PPTX
Cognitive Procurement Masterclass with IBM - SID 51774
PPT
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
PDF
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
PPTX
Petabytes to Personalization - Data Analytics with Qubit and Looker
PDF
Stopping the Lake from becoming a Swamp
PDF
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
PDF
Big Data Paris - A Modern Enterprise Architecture
PPTX
2020 Big Data & Analytics Maturity Survey Results
PDF
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
PPTX
Extreme Analytics @ eBay
Cloud-Native Microservices
The Manulife Journey
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
Javaedge 2010-cschalk
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Cognitive Procurement Masterclass with IBM - SID 51774
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Petabytes to Personalization - Data Analytics with Qubit and Looker
Stopping the Lake from becoming a Swamp
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Big Data Paris - A Modern Enterprise Architecture
2020 Big Data & Analytics Maturity Survey Results
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
Extreme Analytics @ eBay
Ad

Similar to Augmented OLAP for Big Data (20)

PPTX
Augmented OLAP for Big Data Analytics
PPTX
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
PPTX
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
PPTX
Building Enterprise OLAP on Hadoop for FSI
PDF
Simplify Data Analytics Over the Cloud
PDF
Apache Kylin and Use Cases - 2018 Big Data Spain
PDF
Take the Bias out of Big Data Insights With Augmented Analytics
PPTX
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
PDF
Apache kylin boost your SQLs on extremely large dataset
PDF
Apache kylin boost your sqls on extremely large dataset
PPTX
Smashing Through Big Data Barriers with Tableau and Snowflake
PPTX
Kyligence Cloud 4 - An Overview
PPTX
Providing Interactive Analytics on Excel with Billions of Rows
PPTX
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
PPTX
Addressing the systemic shortcomings of cloud analytics
PPTX
Architecting Snowflake for High Concurrency and High Performance
PDF
Cloud-native Semantic Layer on Data Lake
PDF
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
PPTX
Refactoring your EDW with Mobile Analytics Products
PPTX
Open Source Technologies in the Analytics Revolution
Augmented OLAP for Big Data Analytics
AI-Powered Analytics: What It Is and How It’s Powering the Next Generation of...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Building Enterprise OLAP on Hadoop for FSI
Simplify Data Analytics Over the Cloud
Apache Kylin and Use Cases - 2018 Big Data Spain
Take the Bias out of Big Data Insights With Augmented Analytics
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Apache kylin boost your SQLs on extremely large dataset
Apache kylin boost your sqls on extremely large dataset
Smashing Through Big Data Barriers with Tableau and Snowflake
Kyligence Cloud 4 - An Overview
Providing Interactive Analytics on Excel with Billions of Rows
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
Addressing the systemic shortcomings of cloud analytics
Architecting Snowflake for High Concurrency and High Performance
Cloud-native Semantic Layer on Data Lake
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Refactoring your EDW with Mobile Analytics Products
Open Source Technologies in the Analytics Revolution
Ad

More from Luke Han (16)

PDF
Apache Kylin Use Cases in China and Japan
PDF
The Apache Way - Building Open Source Community in China - Luke Han
PDF
The Evolution of Apache Kylin by Luke Han
PDF
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
PDF
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
PDF
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
PDF
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
PDF
Apache Kylin Open Source Journey for QCon2015 Beijing
PPTX
ApacheKylin_HBaseCon2015
PPTX
Apache Kylin Extreme OLAP Engine for Big Data
PPTX
Apache Kylin Introduction
PPTX
Adding Spark support to Kylin at Bay Area Spark Meetup
PPTX
Apache kylin - Big Data Technology Conference 2014 Beijing
PPTX
Kylin OLAP Engine Tour
PPTX
Actuate presentation 2011
Apache Kylin Use Cases in China and Japan
The Apache Way - Building Open Source Community in China - Luke Han
The Evolution of Apache Kylin by Luke Han
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
Apache Kylin Open Source Journey for QCon2015 Beijing
ApacheKylin_HBaseCon2015
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Introduction
Adding Spark support to Kylin at Bay Area Spark Meetup
Apache kylin - Big Data Technology Conference 2014 Beijing
Kylin OLAP Engine Tour
Actuate presentation 2011

Recently uploaded (20)

PDF
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
PDF
Workplace Software and Skills - OpenStax
PPTX
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PDF
Microsoft Office 365 Crack Download Free
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PDF
Visual explanation of Dijkstra's Algorithm using Python
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Cybersecurity: Protecting the Digital World
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PDF
Guide to Food Delivery App Development.pdf
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Lecture 5 Software Requirement Engineering
PPTX
CNN LeNet5 Architecture: Neural Networks
PPTX
GSA Content Generator Crack (2025 Latest)
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
Workplace Software and Skills - OpenStax
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Microsoft Office 365 Crack Download Free
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Full-Stack Developer Courses That Actually Land You Jobs
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
Visual explanation of Dijkstra's Algorithm using Python
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Cybersecurity: Protecting the Digital World
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
MCP Security Tutorial - Beginner to Advanced
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Guide to Food Delivery App Development.pdf
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Lecture 5 Software Requirement Engineering
CNN LeNet5 Architecture: Neural Networks
GSA Content Generator Crack (2025 Latest)

Augmented OLAP for Big Data

  • 1. Augmented OLAP for Big Data Luke Han | [email protected] Co-founder & CEO of Kyligence Apache Kylin PMC Chair Microsoft Reginal Director & MVP Strata Global Sponsor BOOTH #410
  • 2. © Kyligence Inc. 2019. About Luke Han • Luke Han • Co-founder & CEO at Kyligence • Co-creator and PMC Chair of Apache Kylin • Apache Software Foundation Member • Microsoft Regional Director & MVP • Former eBay Big Data Product Manager Lead
  • 3. © Kyligence Inc. 2019. About Apache Kylin • Leading Open Source OLAP for Big Data • Rank 1 from googling “big data OLAP” • Rank 1 from googling “hadoop OLAP” • Open sourced by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  • 4. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Video Demo • Benchmark • Use Cases
  • 5. © Kyligence Inc. 2019. Kyligence = Kylin + Intelligence • Founded in 2016 by the original creators of Apache Kylin • CRN Top 10 Big Data Startups 2018 • Backing by leading VCs: • Redpoint Ventures • Cisco • CBC Capital • Shunwei Capital • Eight Roads Ventures (Fidelity International Arm) • Coatue • Global Offices: • Shanghai • Beijing • Shenzhen • San Jose • New York • Seattle • …
  • 6. © Kyligence Inc. 2019. Telecom Finance Manufacturing Trusted by Global Leaders Retail & Others Most of them are Global Fortune 500
  • 7. © Kyligence Inc. 2019. Global Partners
  • 8. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 9. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271418327393.jpg
  • 10. © Kyligence Inc. 2019. Let’s talk about photography story… How many people really know how to setup those?
  • 11. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271437395703.jpg
  • 12. © Kyligence Inc. 2019. Let’s talk about photography story… Google Photos
  • 13. © Kyligence Inc. 2019. Let’s talk about photography story… How do you manage your 100,000+ photos? Google Photos
  • 14. © Kyligence Inc. 2019, Confidential. Then… how about your enterprise data?
  • 15. © Kyligence Inc. 2019, Confidential. https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1 https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
  • 16. © Kyligence Inc. 2019, Confidential. Fast and Changing Analysis Demand Slow and Heavy Big Data Operations vs
  • 17. © Kyligence Inc. 2019. The Typical “Throw in some People” Approach Business Users Analysts Data Engineers Business Analysis Data Modeling Lake → Warehouse → Mart Reporting $$$High Cost : Administrators Slow Time-to-Insight:
  • 18. © Kyligence Inc. 2019, Confidential. Presentation Visualization Impala Data Lake Hive Spark SQL Drill MapReduce Spark ……. Time-to-value Pain Weeks of waiting breaks the “online” promise. Collaboration Pain Hard to reuse asset across teams. Each team fights their own path. Resource Pain Hard to scale. Where to find so many skilled big data engineers? Pains in the “Throw in some People” Approach
  • 19. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 20. © Kyligence Inc. 2019, Confidential. Throw in some Intelligence! Let a system replace the people. o Transparent SQL Acceleration o On-demand Data Preparation o Interactive Query Performance o High Concurrency o Centralized Semantic Layer Faster time to market. Stay “online”. Augmented OLAP Data Mart Presentation Visualization Impala Data Lake Hive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance
  • 21. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Business User Analyst Data Engineer vs
  • 22. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Pattern Detection Auto Modeling Data Preparing Raw Data Prepared Data Augmented OLAP Engine (Background Learning)
  • 23. © Kyligence Inc. 2019. Demo Setup Tableau SparkSQL 2.4 Kyligence Enterprise Analyze 1 billion rows of sales records (TPC-H) Business User
  • 24. © Kyligence Inc. 2019. (Embed the Demo Video)
  • 25. © Kyligence Inc. 2019. Demo FAQ Business User Analyst reuse How to improve the first slow exploration? What if the analyst operates differently the second time? More comprehensive performance benchmark? Prepared Data
  • 26. © Kyligence Inc. 2019. TPC-H Decision Support Benchmark TPC-H Benchmark • Examine large volumes of data • High complexity queries • Answers critical business questions • 22 decision making queries E.g. The Shipping Priority Query retrieves the shipping priority and potential revenue of the orders having the largest revenue among those that had not been shipped as of a given date. Top 10 orders are listed in decreasing order of revenue.
  • 27. © Kyligence Inc. 2019. Kyligence Enterprise 4 Beta vs SparkSQL 2.4 To see the trend as data grows • 3 datasets • Scale Factor = 20, 35, 50 • TPCH_SF1: Consists of the base row size (several million elements). • TPCH_SF20: Consists of the base row size x 20. • TPCH_SF35: Consists of the base row size x 35. • TPCH_SF50: Consists of the base row size x 50 (several hundred million elements). Billion
  • 28. © Kyligence Inc. 2019. Hardware Configurations Same 4 physical nodes - Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2 - Totally 86 vCores, 188 GB mem Same Spark configuration for both KE 4 Beta and SparkSQL 2.4 - spark.driver.memory=16g - spark.executor.memory=8g - spark.yarn.executor.memoryOverhead=2g - spark.yarn.am.memory=1024m - spark.executor.cores=5 - spark.executor.instances=17
  • 29. © Kyligence Inc. 2019. Query Response Time | KE 4 Beta vs. SparkSQL 2.4 Milliseconds TPC-H 22 queries For each dataset - Run each query 3 times - Record the average time - No warm up Lower is better. SF=50
  • 30. © Kyligence Inc. 2019. Total Response Time | KE 4 Beta vs. SparkSQL 2.4 Billion Seconds Total response time is the sum of 22 queries’ response time. Compare over the size of datasets and feel the trend. Scale out for the future.
  • 31. © Kyligence Inc. 2019. Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4 Acceleration Rate = SparkSQL time / KE time Take average of the 22 and compare over size of datasets.
  • 32. © Kyligence Inc. 2019. SQL Query Log Analytic Behavior Data Schema Data Profile Machine Learning Engine Data Modeling Automation Kylin Cube Learnt Index Smart Pushdown BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential. AI-Augmented Analytics Platform
  • 33. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 34. © Kyligence Inc. 2019. Use Case: IBM Cognos Replacement One Kyligence Cube for 800+ Cognos Cubes Org. Daily Cube Merch. Daily Cube Channel Daily Cube Region Daily Cube Org. Monthly Cube Merch. Monthly Cube Channel Monthly Cube Region Monthly Cube Shanghai Merchants Zhejiang Merchants Anhui Merchants Guangdong Merchants Card Transaction Dimensions: 167 Measures: 20 800+ Cognos Cube, 1000+ ETL jobs Functional Scene Time Scene Geo Scene ┄ ┄ Data: 300+ B Records Merchants: 10+m Cards: 10+B
  • 35. © Kyligence Inc. 2019. Use Case: Data as a Services Platform In the past, due to the limitations of our previous multi-dimensional analytic tool, we faced challenges of constrained time range in queries……We are considering leveraging multi-dimensional data cubes to replace a number of fragmented legacy tabular reports in more business units, so that we can provide better analytic services to our business users.” -- Laments Wu Ying, VP of CMBs Development Center, Offline Platform (Hadoop) Online Platform (Hadoop) EDW (Teradata) Kyligence Data-as-a-Service Platform Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N…… Business Intelligence Cognos Tableau MicroStrategy Superset API Smart Routine Intelligent Modeling Multi-tenancy Applications SecurityJob MIP MDS CRM DWS
  • 36. © Kyligence Inc. 2019, Confidential. Take away: Augmented OLAP, the future for analytics ImpalaHive Spark SQL Drill MapReduce Spark ……. AI-Augmented OLAP ImpalaHive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance