SlideShare a Scribd company logo
Data Platform
Airlift
Rui Quintino
Data Research, DevScope
rui.quintino@devscope.net
Machine Learning with
SQL Server 2016 and R
Services
24 de fevereiro  Microsoft Lisbon Experience
Your feedback is important!
Keep in touch with Microsoft Azure
Try Azure for FREE now: https://azure.microsoft.com/free/
Agenda
•Machine Learning
•R – What & Why?
•R, Microsoft & SQL Server
•SQL 2016 R Services
•Q&A
Machine Learning ?
Using existing structured or unstructured data to:
1. Predict unknown/future data
2. Create intelligent & automated agents/services
3. Advanced Data Insights -> Why? Drivers? Root causes?
4. Content producers & “creative” agents
Example use cases (classic)
What is
• A statistics programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 10.000+ free algorithms in CRAN
• Scalable to big data
• New and recent grad’s use it
Language
Platform
Community
Ecosystem
• Rich application & platform integration
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
https://www.r-bloggers.com/10000-cran-packages/
http://blog.revolutionanalytics.com/2017/01/cran-10000.html
?
?
Lack of
Commercial
Support
Inadequate
/Limited
Performance
Complex
Deployment
Processes
Limited
Data
Scale
Challenges posed by open source R
• R & SQL Server
• SQL Server is one of the most widely used SQL databases
• R is the most widely used statistical and advanced analytical
language
• Complications From Using R with SQL Databases
• Requires Data Extraction
• Bottlenecks in Performance
• Data Sizes Limitations
• Increases Security Risks
• Increases Duplication Costs
• Poor operationalization support
Before SQL Server 2016 & R Services
April 6, 2015
SQL Server 2016 EE
SQL Server 2016 SE
Growing Beyond Revolution Analytics
Red Hat
SUSE
Pre Acquisition
Microsoft R
Server
Azure HDInsights
Azure
Expanding
Product Family
SQL Server
R Services
Post Acquisition
Continued
Support of
Enterprise R
Solutions
Expanding
Support for
Open Source R
Cortana Analytics
Suite
Open
High-performance, Scalable R
Linux, Windows, SQL Server, Hadoop & Teradata
R Server Technology
Included in SQL Server
2016
Reuse and optimize
existing R code
Eliminate data movement
In-database deployment
Memory and disk
scalability
No R memory limits
Write once, deploy
anywhere
Enterprise speed and
scale
Near-DB analytics
Parallel threading and
processing
Reuse SQL skills for data
engineering
Cost
effectiveness
Scalability
and choice
Simplicity
and agility
SQL Server R Services
Integration Facilities:
• Component Integration
• Launchers
• Parameter Passing
• Results Return
• Console Output
Return
• Parallel Data Exchange
(RTM)
• Stored Procedures
• Package Administration
SQL Server 2016 & SQL R Services
SQL Server
Query
Processor Fast, Parallel, Storage Efficient Algorithms
Open Source R
Interpreter
Using Parallel Algorithms, Remote Contexts
• 5+ hours to 40 seconds:
Write Once – Deploy Anywhere
R Server portfolio
Cloud
RDBMS
Desktops & Servers
Hadoop & Spark
EDW
R Server Technology
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
Additional Notes/References
•Azure Data Science Virtual Machine
Additional Notes/References
•SQL 2016 R Services Virtual Labs
Additional Notes/References
•Free ebook: Data Science with Microsoft
SQL Server 2016
Additional Notes/References
•New R packages
•olapR
•MicrosoftML
•LightGBM (“xgboost” by Msft)
Additional Notes/References
•SQL 2016 Machine Learning Templates
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services
Obrigado!
Rui Quintino
Data Research, DevScope
rui.quintino@devscope.net
twitter.com/rquintino
rquintino.wordpress.com
www.devscope.net
Free Azure
Trial
Try SQL Server
2016
http://aka.ms/trysql2016
http://aka.ms/tryazure
Try Power BI
http://powerbi.com
Cortana Intelligence
Services
http://aka.ms/cortanaintelligence
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services

More Related Content

PDF
IBM Power leading Cognitive Systems
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
PPTX
Exploring microservices in a Microsoft landscape
PDF
Open Innovation with Power Systems
PPTX
Breaching the 100TB Mark with SQL Over Hadoop
PPTX
SQL Server on Linux - march 2017
PPTX
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
PPTX
Bootcamp 2017 - SQL Server on Linux
IBM Power leading Cognitive Systems
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Exploring microservices in a Microsoft landscape
Open Innovation with Power Systems
Breaching the 100TB Mark with SQL Over Hadoop
SQL Server on Linux - march 2017
Microsoft ignite 2018 SQL server 2019 big data clusters - deep dive session
Bootcamp 2017 - SQL Server on Linux

What's hot (20)

PDF
Automating the Enterprise with CloudForms & Ansible
PPTX
Db2 analytics accelerator on ibm integrated analytics system technical over...
PPTX
Accelerating Business Intelligence Solutions with Microsoft Azure pass
PPTX
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
PDF
Microsoft SQL server 2017 Level 300 technical deck
PDF
Red hat ceph storage customer presentation
PDF
IBM Power8 announce
PDF
Ibm integrated analytics system
PDF
RHTE2015_CloudForms_Containers
PPTX
OpenStack + Nano Server + Hyper-V + S2D
PPTX
IBM Power Systems Announcement Update
PDF
Machine learning services with SQL Server 2017
PDF
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
PDF
Microsoft Technologies for Data Science 201612
PDF
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
PPTX
The Future of Data Warehousing, Data Science and Machine Learning
PDF
Red Hat Container Strategy
PPTX
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
PPTX
Expert summit SQL Server 2016
PPTX
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Automating the Enterprise with CloudForms & Ansible
Db2 analytics accelerator on ibm integrated analytics system technical over...
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Microsoft SQL server 2017 Level 300 technical deck
Red hat ceph storage customer presentation
IBM Power8 announce
Ibm integrated analytics system
RHTE2015_CloudForms_Containers
OpenStack + Nano Server + Hyper-V + S2D
IBM Power Systems Announcement Update
Machine learning services with SQL Server 2017
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Microsoft Technologies for Data Science 201612
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
The Future of Data Warehousing, Data Science and Machine Learning
Red Hat Container Strategy
6Reinventing Oracle Systems in a Cloudy World (Sangam20, December 2020)
Expert summit SQL Server 2016
Joint NetApp and Cisco Solutions for SAP: FlexPod and HANA
Ad

Viewers also liked (18)

PPTX
DevNexus 2015: Kubernetes & Container Engine
PPTX
Keep your environment always on with sql server 2016 sql bits 2017
PPTX
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
PPTX
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
PPTX
FLISOL 2017 - SQL Server no Linux
PDF
Red Hat OpenShift Container Platform Overview
PDF
Microservices with Docker, Kubernetes, and Jenkins
PDF
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
PDF
Xen and the art of embedded virtualization (ELC 2017)
PDF
OSCON16: Analysis of the Xen code review process: An example of software deve...
PPTX
Xen Project Release and Roadmap Process (4.7+)
PPTX
OpenShift Enterprise 3.1 vs kubernetes
PDF
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
PPTX
virtualization and hypervisors
PDF
Virtualization Architecture & KVM
PDF
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
PDF
An Introduction to Kubernetes
PDF
Resilient microservices with Kubernetes - Mete Atamel
DevNexus 2015: Kubernetes & Container Engine
Keep your environment always on with sql server 2016 sql bits 2017
Nordic infrastructure Conference 2017 - SQL Server on Linux Overview
2017 OWASP SanFran March Meetup - Hacking SQL Server on Scale with PowerShell
FLISOL 2017 - SQL Server no Linux
Red Hat OpenShift Container Platform Overview
Microservices with Docker, Kubernetes, and Jenkins
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
Xen and the art of embedded virtualization (ELC 2017)
OSCON16: Analysis of the Xen code review process: An example of software deve...
Xen Project Release and Roadmap Process (4.7+)
OpenShift Enterprise 3.1 vs kubernetes
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
virtualization and hypervisors
Virtualization Architecture & KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
An Introduction to Kubernetes
Resilient microservices with Kubernetes - Mete Atamel
Ad

Similar to Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services (20)

PDF
Predictive Analysis using Microsoft SQL Server R Services
PDF
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
PDF
Advanced analytics with R and SQL
PDF
Microsoft and Revolution Analytics -- what's the add-value? 20150629
PDF
microsoft r server for distributed computing
PDF
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
PDF
Bluegranite AA Webinar FINAL 28JUN16
PPTX
20160317 - PAZUR - PowerBI & R
PDF
Microsoft Data Science Technologies 201608
PDF
Michal Marušan: Scalable R
PDF
SQL Server Modernization
PPTX
Create a Data Science Lab with Microsoft and Open Source tools
PPTX
Sql 2017 net raf
PPTX
Azure machine learning ile tahminleme modelleri
PPTX
Sql 2016 2017 full
PDF
Using R services with Machine Learning
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
PPTX
How does Microsoft solve Big Data?
PPTX
Taking R Analytics to SQL and the Cloud
PPTX
Data Amp South Africa - SQL Server 2017
Predictive Analysis using Microsoft SQL Server R Services
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
Advanced analytics with R and SQL
Microsoft and Revolution Analytics -- what's the add-value? 20150629
microsoft r server for distributed computing
Sql Server Machine Learning Services - Sql Saturday Prague 2018 #SqlSatPrague
Bluegranite AA Webinar FINAL 28JUN16
20160317 - PAZUR - PowerBI & R
Microsoft Data Science Technologies 201608
Michal Marušan: Scalable R
SQL Server Modernization
Create a Data Science Lab with Microsoft and Open Source tools
Sql 2017 net raf
Azure machine learning ile tahminleme modelleri
Sql 2016 2017 full
Using R services with Machine Learning
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
How does Microsoft solve Big Data?
Taking R Analytics to SQL and the Cloud
Data Amp South Africa - SQL Server 2017

More from Rui Quintino (14)

PDF
“Houston, we have a model...” Introduction to MLOps
PDF
Power BI for Data Science and Machine Learning - Data Science Portugal meetup
PDF
Empowering you - Power BI, Power Platform & AI Builder
PDF
Jupyter Notebooks: Introduction, Tips & Tools
PPTX
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
PPTX
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
PPTX
Docker & Containers for Big Data, Data Science, Machine Learning & Deep Learning
PPTX
Microsoft Cognitive Services & Bot Framework - Universidade Fernando Pessoa
PPTX
Open Source Deep Learning & Machine Learning with Microsoft CNTK & LightGBM
PPTX
Data Science Portugal Meetup 7 - Machine Learning & Data Science Safety Remi...
PPTX
Sql Saturday Lisbon 2017 Rui Quintino -R first steps for sql devs & dbas
PPTX
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
PPTX
SQL Saturday #188 Portugal - "Faster than the speed of light"... with Microso...
PPT
"De histórias mal contadas..."
“Houston, we have a model...” Introduction to MLOps
Power BI for Data Science and Machine Learning - Data Science Portugal meetup
Empowering you - Power BI, Power Platform & AI Builder
Jupyter Notebooks: Introduction, Tips & Tools
Kaggle Days Porto 2019 - 1st place presentation by team DevScope
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
Docker & Containers for Big Data, Data Science, Machine Learning & Deep Learning
Microsoft Cognitive Services & Bot Framework - Universidade Fernando Pessoa
Open Source Deep Learning & Machine Learning with Microsoft CNTK & LightGBM
Data Science Portugal Meetup 7 - Machine Learning & Data Science Safety Remi...
Sql Saturday Lisbon 2017 Rui Quintino -R first steps for sql devs & dbas
The Power of Now! Azure Stream Analytics - Microsoft ITPro AirLift
SQL Saturday #188 Portugal - "Faster than the speed of light"... with Microso...
"De histórias mal contadas..."

Recently uploaded (20)

PDF
How to run a consulting project- client discovery
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
New ISO 27001_2022 standard and the changes
PPTX
modul_python (1).pptx for professional and student
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Introduction to the R Programming Language
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Introduction to Data Science and Data Analysis
PDF
Global Data and Analytics Market Outlook Report
PDF
Transcultural that can help you someday.
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Managing Community Partner Relationships
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IMPACT OF LANDSLIDE.....................
How to run a consulting project- client discovery
Qualitative Qantitative and Mixed Methods.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
New ISO 27001_2022 standard and the changes
modul_python (1).pptx for professional and student
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
A Complete Guide to Streamlining Business Processes
Introduction to the R Programming Language
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Introduction to Data Science and Data Analysis
Global Data and Analytics Market Outlook Report
Transcultural that can help you someday.
CYBER SECURITY the Next Warefare Tactics
Managing Community Partner Relationships
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Database Infoormation System (DBIS).pptx
IMPACT OF LANDSLIDE.....................

Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL Server 2016 and R Services

Editor's Notes

  • #5: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #6: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #7: Confidential… + notes From the book: AzureMachineLearning – AzureFundamentals Many examples of predictive analytics can be found literally everywhere today in our society: Spam/junk email filters These are based on the content, headers, origins, and even user behaviors (for example, always delete emails from this sender). Mortgage applications Typically, your mortgage loan and credit worthiness is determined by advanced predictive analytic algorithm engines. Various forms of pattern recognition These include optical character recognition (OCR) for routing your daily postal mail, speech recognition on your smart phone, and even facial recognition for advanced security systems. Life insurance Examples include calculating mortality rates, life expectancy, premiums, and payouts. Medical insurance Insurers attempt to determine future medical expenses based on historical medical claims and similar patient backgrounds. Liability/property insurance Companies can analyze coverage risks for automobile and home owners based on demographics. Credit card fraud detection This process is based on usage and activity patterns. In the past year, the number of credit card transactions has topped 1 billion. The popularity of contactless payments via near-field communications (NFC) has also increased dramatically over the past year due to smart phone integration. Airline flights Airlines calculate fees, schedules, and revenues based on prior air travel patterns and flight data. Web search page results Predictive analytics help determine which ads, recommendations, and display sequences to render on the page. Predictive maintenance This is used with almost everything we can monitor: planes, trains, elevators, cars, and yes, even data centers. Health care Predictive analytics are in widespread use to help determine patient outcomes and future care based on historical data and pattern matching across similar patient data sets.
  • #8: Slide objective Establish that R is a language is as important for the community that uses it an the capabilities written to extend it than the language itself. Talking points Part 1 of the R World is The R language, developed specifically for data analysis – particularly among statisticians and mathematicians. [optional points]: Developed in New Zealand, release in roughly 2000. Maintained by the R Foundation which releases new editions of R every few weeks. Licensed under GPL open source license. R directly supports complex data manipulation operations making them extremely simple for users, particularly those with greater depth in statistics and mathematics than in computer science. Huge community of users across industry, government and academia use R daily. There are R user groups in most major cities. Some of them very active and very large. Suggest that users look at MeetUp for local groups that meet regularly. Most important to the value of R is the huge repository of freely exchanged, algorithms, techniques, scripts, adapters, techniques, training available. Introduce CRAN: “The Comprehensive R Archive Network”. Data access & integration Data transformation Data profiling Data visualization Predictive analytics Machine Learning CRAN contains over 7000 (and growing) contributed packages. Many algorithms, test data, comments on usage, etc. One package contains hundreds of algorithms packaged as a library. All are designed to run with the R language. CRAN is the largest but not the only. Thousands of additional algorithms, visualizations and tools are available from BioConductor, GitHub and other repositories. Notes
  • #9: Demo Power BI Desktop. Demos are available at //BI.
  • #14: Slide objective Introduce how the use of open source R for machine learning and advanced analytics has been limited to a narrow user base of data scientists. Related to this, also discuss how many challenges and complexities remain for advanced analytics in the marketplace. Talking points Today, advanced analytics using open source R are being performed only by highly trained and specialized data scientists, mathematicians, and analysts who can create and nurture these models. This means that many challenges and complexities remain in the marketplace. First, many companies cannot negotiate the increasing costs of specialized talent, infrastructure, and machine learning tools that make total cost of ownership (TCO) and return on investment (ROI) uncertain. Second, siloed and cumbersome data management restricts access to data and poses limitations on what data can be included in models. Third, trying to collaborate across complex and fragmented technologies tends to limit agility and reduce participation in exploring data and building models. People end up struggling with the technology instead of focusing on the business problem at hand. Finally, many models never achieve business value because it’s so difficult to deploy them to stable production environments. If you can imagine spending hundreds of thousands of dollars on a solution and having it never go into production, you can see why machine learning has been so niche up to this point. Notes
  • #18: Key Message: Products from Revolution Analytics are continuing and Microsoft is adapting and expanding platform coverage. Revolution R Enterprise product continue as R Servers New integration of RRE into SQL Server as SQL Server 2016 R Services Revolution R Open continues as Microsoft R Open Additional versions and editions coming for various unique communities – desktop developers, MSDN members, education community RRE / R Server available as the R options for various cloud offerings (CIS, Azure Linux, Azure HDInsights Hadoop, Data Science VM, etc.) Support for rev’s of existing versions of Hadoop and full support for HDInsight in Azure.
  • #19: Microsoft R Server is a broadly deployable enterprise-class analytics platform based on R that is supported, scalable and secure. Supporting a variety of big data statistics, predictive modeling and machine learning capabilities, R Server supports the full range of analytics – exploration, analysis, visualization and modeling Slide objective Introduce the high –level value of R Server and R Services over other instantiations of the R language. . Talking points R Server products provide an enhanced experience for the R User without loss of compatibility R Server products are “open core” – the utilize the open source R product entirely and build new capabilities around that core without impacting compatibility. Users of R Server products enjoy full compatibility with open source compatible with the entire (and vast) collection of algorithms, connectors, visualization tools shared openly via CRAN, Bioconductor and other shared resources like GitHub. Key extensions enable R to tackle big data challenges that exceed the capacity of open source R. R Scripts built for one platform using R Server can be easily run on another platform running R Server We call it WODA – write once, deploy anywhere. Two key contributions: Build on any version of the product and deploy using other versions Investment protection as platform choices change Develop on the desktop and immediately deploy to RDBMS – SQL Server, EDW (SQL Server & Teradsata) or Hadoop (Microsoft, Cloudera, Hortonworks and MapR) Notes
  • #20: Slide objective Introduce the three value proposition pillars of SQL Server 2016 R Services. Talking points SQL Server 2016 R Services brings the perfect mix of fast querying and In-Memory OLTP optimization from SQL Server 2016, as well as data exploration, predictive modeling, scoring, and visualization from the R Services family of products. It delivers unprecedented enterprise speed and performance for advanced analytics, thanks to near-database analytics and parallel threading and processing. It also delivers scalability and choice not seen before from a stable, commercial platform for advanced analytics. Its on-premises, cloud, and hybrid benefits, as well as its limits with large datasets, are unmatched. Finally, there is no additional cost because the offering is included in SQL Server 2016. In addition, the ability to reuse existing R code and eliminate data movement across machines provides significant value. Notes
  • #23: Slide objective Illustrate the potential scale benefits possible with R Server’s ScaleR algorithms. Show a representative example and explain the 3 mechanisms that help achieve the improvements. . Talking points We tested the improved data and computational scale of the R Server’s ScaleR library of enhanced, parallelized algorithms. This is an example. Speed: On a 4 core laptop, with 8GB of RAM, open source R could process about 300,000 events in a particular data set prior to exhausting available memory. The test tool about 77 seconds to run the most commonly used R linear regression algorithm called “lm”. We than ran the same test using our parallelized, rewritten (in C++) linear regression module called rxLinMod. Data Scale Algorithms in the ScaleR library are also rewritten to analyze data in “chunks” to eliminate the memory-limits of typical open source R algorithms. Where the open source lm exhausted memory at about 300,000 events, the improved rxLinMod was working fine at 5 million events where we stopped testing. The result is a 50x performance improvement over open source linear regression, and no memory limits. Parallel Scale This example shows only the effects of running optimized, compoiled code on all cores of a laptop. Greater benefits are available. What is not shown, is that the work done to parallelize across 4 cores can also be utilized to scale across many nodes in systems such as EDWs and Hadoop. While results vary, the system, as you can see, responds linearly with respect to data size. Rehosting using R Server for Hadoop can provide even more dramatic speed and scale results. Notes
  • #24: …. In the multi-platform world of on-premises… Slide objective Differentiate R Server from other R offerings such as vendor-specific offers from Oracle, HP, SAP and Teradata Clearly communicate two benefits – develop on multiple machines, and protect investments from platform change disruptions later. . Talking points R Server makes your data scientists’ jobs easier. By running identically on multiple platforms, users can build on one platform, the move the scripts to another. This has several advantages: Run modeling algortihms on systems here larger compute or data storage is available. It also permits modesl to be built on one platform, but model scoring or operationalization to take place elsewhere. Finally, with the availability of very low storage and compute costs in the cloud, users can load, transform, visualize, study and model data in the cloud, but actually run the model computations on on-premises systems. Perhaps more importantly, however, portability across systems protects organizations investments in R-based data science. The best best big data storage and compute platform for today may not be the best choice tomorrow. Compatibility across systems brings the possibility to avoid disruptive recoding efforts when such changes occur. Notes Two ways to underscore this point are to describe the range of compatibility available with other vendor’s R versions. Oracle R, because it works only with algorithms running on Oracle Database, is not portable. The same is true for R implementations from Teradata, HP Vertica, SAP and others. They are in essence platform specific. Another way to describe the problem is with a fictitious story: Imagine a CIO who has to tell his data science team “we’re changing platfomrs, and therefore, you need to change all your programs and scripts to work with the new platform”.
  • #26: Demo Power BI Desktop. Demos are available at //BI.
  • #27: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #28: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #29: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #30: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #31: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data
  • #32: Demo Power BI Desktop. Demos are available at //BI.
  • #33: Basic definition: Machine learning develops algorithms for making predictions (statistical sense) from data * Learning models from available training data, to make good predictions on unseen test data