SlideShare a Scribd company logo
Data Analytics with R and SQL Server
Stéphane Fréchette
Thursday March 19, 2015
Who am I?
My name is Stéphane Fréchette
SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data
|NoSQL | Data Science. Drums, good food and fine wine.
I have a passion for architecting, designing and building solutions that
matter.
Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com
Topics
• What is R?
• Should I use R?
• Data Structures
• Graphics
• Data Manipulation in R
• Connecting to SQL Server
• Demos
• Resources
• Q&A
DISCLAIMER
This is not a course nor a tutorial, but
an introduction, a walkthrough to
inspire you to further explore and
learn more about R and statistical computing
“ Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions,
and supporting decision-making. Data analysis has
multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business,
science, and social science domains.”
- Wikipedia
What is R?
• A programming language, environment for statistical computing and graphics
• R has its origins in the S programming language created in the 1970’s
• Best used to manipulate moderately sized datasets, do statistical analysis and
produce data-centric documents and presentations
• These tools are distributed as packages, which any user can download to
customize the R environment
• Cross-platform: runs on Mac, Windows and Unix based systems
Should I use R?
Are you
doing
statistics
?
No Yes
No Yes
Where “statistics” can mean machine learning, predictive analytics, data
science, anything that falls under a rather broad umbrella…
But if you have some data that makes sense to represent in a tabular like
structure, and you want to do some cool analytical or statistics stuff with it, R is
definitely a good choice…
Downloading and Installing R
http://www.r-project.org/ http://www.rstudio.com/
The IDE (RStudio)
1. View Files and Data
2. See Workspace and
History
3. See Files, Plots,
Packages and Help
4. Console
1 2
34
Installing Packages
• To use packages in R, one must first install them using the install.packages
function
• Downloads the packages from CRAN and installs it to ready to be use
Loading Packages
• To use particular packages in your current R session, one must load it into the
R environment using the library or require functions
Common Data Structures in R
To make the best of the R language, one needs a strong understanding of the
basic data types and data structures and how to operate and use them.
R has a wide variety of data types including scalars, vectors (numerical,
character, logical), matrices, data frames, and lists…
To understand computations in R, two slogans are helpful:
• Everything that exists is an object
• Everything that happens is a function call
John Chambers
creator of the S programming language, and core member of the R programming language project.
Data Structures - Vectors
The simplest structure is the numeric vector, which is a single entity consisting of an ordered
collection of numbers.
Data Structures - Matrices
Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function
matrix.
Data Structures - Data frames
Time series are often ordered in data frames. A data frame is a matrix with names above the
columns. This is nice, because you can call and use one of the columns without knowing in
which position it is.
Data Structures - Lists
An R list is an object consisting of an ordered collection of objects known as its components.
Data Structures - Date and Time
Sys.time() # returns the current system date time
Data Structures - Date and Time
Two main (internal) formats for date-time are: POSIXct and POSIXlt
• POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame
• POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
Data Structures - Others
Other useful and important data type
• NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero.
The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not.
• NA: Used for denoting missing values. (x = NA) creates a variable x with missing values.
The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not.
• NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function
is.nan() lets you check whether the value of a variable is NaN or not.
• Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The
function is.finite() lets you check whether the value of a variable is infinity or not.
Graphics
One of the main reasons data analysts and data
scientists turn to R is for its strong graphic
capabilities.
Basic Graphs:
• These include density plots (histograms and kernel
density plots), dot plots, bar charts (simple,
stacked, grouped), line charts, pie charts (simple,
annotated, 3D), boxplots (simple, notched, violin
plots, bagplots) and scatter plots (simple, with fit
lines, scatterplot matrices, high density plots, and
3D plots).
Graphics
Advances Graphs:
• Graphical parameters describes how to change a
graph's symbols, fonts, colors, and lines. Axes and
text describe how to customize a graph's axes, add
reference lines, text annotations and a legend.
Combining plots describes how to organize
multiple plots into a single graph.
• The lattice package provides a comprehensive
system for visualizing multivariate data, including
the ability to create plots conditioned on one or
more variables. The ggplot2 package offers a
elegant systems for generating univariate and
multivariate graphs based on a grammar of
graphics.
Data Manipulation in R
dplyr an R package for fast and easy data manipulation.
Data manipulation often involves common tasks, such as selecting certain variables, filtering
on certain conditions, deriving new variables from existing variables, and so forth. If we
think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation.
In dplyr the main verbs (or functions) are:
• filter: select a subset of the rows of a data frame
• arrange: works similarly to filter, except that instead of filtering or selecting rows, it
reorders them
• select: select columns of a data frame
• mutate: add new columns to a data frame that are functions of existing columns
• summarize: summarize values
• group_by: describe how to break a data frame into groups of rows
Demo
[dplyr – manipulating data]
Connecting R and SQL Server
The RODBC package provides access to databases (including Microsoft Access
and Microsoft SQL Server) through an ODBC interface
Function Description
odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database
sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame
sqlQuery(channel, query) Submit a query to an ODBC database and return the
results
sqlSave(channel, mydf, tablename = sqtable, append
= FALSE)
Write or update (append=TRUE) a data frame to a
table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
RODBC Example
Other interface
The RJDBC package provides access to databases through a JDBC interface.
(requires JDBC driver from Microsoft)
Demo
[Let’s analyze - R and SQL Server]
Resources
• The R Project for Statistical Computing http://www.r-project.org/
• RStudio http://www.rstudio.com/
• Revolution Analytics http://www.revolutionanalytics.com/
• Shiny http://shiny.rstudio.com/
• {swirl} Learn R, in R http://swirlstats.com/
• R-bloggers http://www.r-bloggers.com/
• Online R resources for Beginners http://bit.ly/1x2q6Gl
• 60+ R resources to improve your data skills http://bit.ly/1BzW4ox
• Stack Overflow - R http://stackoverflow.com/tags/r
• Cerebral Mastication - R Resources http://bit.ly/17YhZj4
• Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
What Questions Do You Have?
Thank You
For attending this session

More Related Content

What's hot (20)

Database development life cycle
Database development life cycleDatabase development life cycle
Database development life cycle
Afrasiyab Haider
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
Ummiya Mohammedi
 
Tipos de datos
Tipos de datosTipos de datos
Tipos de datos
Hernan Serrato
 
Sql commands
Sql commandsSql commands
Sql commands
Prof. Dr. K. Adisesha
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
Sarojkumari55
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
Anusuya123
 
Relational model
Relational modelRelational model
Relational model
Dabbal Singh Mahara
 
Etl
EtlEtl
Etl
Juan Figueroa
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Deepa Jeya
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
janani thirupathi
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
Aiswaryadevi Jaganmohan
 
Data Management in R
Data Management in RData Management in R
Data Management in R
Sankhya_Analytics
 
ARCHITECTURE OF DBMS (1).ppt
ARCHITECTURE OF DBMS (1).pptARCHITECTURE OF DBMS (1).ppt
ARCHITECTURE OF DBMS (1).ppt
ShivareddyGangam
 
Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
WBUTTUTORIALS
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
Hichem Felouat
 
Rol del Analista de Sistemas
Rol del Analista de SistemasRol del Analista de Sistemas
Rol del Analista de Sistemas
Núcleo UNEFA. Ext. San Casimiro
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
IDEAS - Int'l Data Engineering and Science Association
 
1.4 data independence
1.4 data independence1.4 data independence
1.4 data independence
BHARATH KUMAR
 
Database user and administrator.pptx
Database user and administrator.pptxDatabase user and administrator.pptx
Database user and administrator.pptx
Anusha sivakumar
 
Database development life cycle
Database development life cycleDatabase development life cycle
Database development life cycle
Afrasiyab Haider
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
Ummiya Mohammedi
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
DHIVYADEVAKI
 
03. Data Exploration.pptx
03. Data Exploration.pptx03. Data Exploration.pptx
03. Data Exploration.pptx
Sarojkumari55
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
Anusuya123
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Deepa Jeya
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
janani thirupathi
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
Aiswaryadevi Jaganmohan
 
ARCHITECTURE OF DBMS (1).ppt
ARCHITECTURE OF DBMS (1).pptARCHITECTURE OF DBMS (1).ppt
ARCHITECTURE OF DBMS (1).ppt
ShivareddyGangam
 
Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
WBUTTUTORIALS
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
Hichem Felouat
 
1.4 data independence
1.4 data independence1.4 data independence
1.4 data independence
BHARATH KUMAR
 
Database user and administrator.pptx
Database user and administrator.pptxDatabase user and administrator.pptx
Database user and administrator.pptx
Anusha sivakumar
 

Viewers also liked (6)

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
Revolution Analytics
 
RHadoop
RHadoopRHadoop
RHadoop
Praveen Kumar Donta
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
Wei Zhong Toh
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
Wasim Akram
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
Ajay Ohri
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
Wei Zhong Toh
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
Wasim Akram
 
Ad

Similar to Data Analytics with R and SQL Server (20)

Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
Ali Arsalan Kazmi
 
Introduction+to+R.pdf
Introduction+to+R.pdfIntroduction+to+R.pdf
Introduction+to+R.pdf
MudasserAziz2
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
myworld93
 
Introduction to R programming Language.pptx
Introduction to R programming Language.pptxIntroduction to R programming Language.pptx
Introduction to R programming Language.pptx
kemetex
 
Basics of R-Programming with example.ppt
Basics of R-Programming with example.pptBasics of R-Programming with example.ppt
Basics of R-Programming with example.ppt
geethar79
 
Basocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.pptBasocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.ppt
geethar79
 
R-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming languageR-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming language
Zoha681526
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
Savitribai Phule Pune University
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
RohanBorgalli
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
R Programming for Statistical Applications
R Programming for Statistical ApplicationsR Programming for Statistical Applications
R Programming for Statistical Applications
drputtanr
 
R-programming with example representation.ppt
R-programming with example representation.pptR-programming with example representation.ppt
R-programming with example representation.ppt
geethar79
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
Andrew Lowe
 
Data Science With R Programming Unit - II Part-1.pptx
Data Science With R Programming Unit - II Part-1.pptxData Science With R Programming Unit - II Part-1.pptx
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
Data science with R Unit - II Part-1.pptx
Data science with R Unit - II Part-1.pptxData science with R Unit - II Part-1.pptx
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
Introduction To Programming In R for data analyst
Introduction To Programming In R for data analystIntroduction To Programming In R for data analyst
Introduction To Programming In R for data analyst
ssuser26ff68
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
Introduction+to+R.pdf
Introduction+to+R.pdfIntroduction+to+R.pdf
Introduction+to+R.pdf
MudasserAziz2
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
myworld93
 
Introduction to R programming Language.pptx
Introduction to R programming Language.pptxIntroduction to R programming Language.pptx
Introduction to R programming Language.pptx
kemetex
 
Basics of R-Programming with example.ppt
Basics of R-Programming with example.pptBasics of R-Programming with example.ppt
Basics of R-Programming with example.ppt
geethar79
 
Basocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.pptBasocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.ppt
geethar79
 
R-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming languageR-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming language
Zoha681526
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
RohanBorgalli
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdfR-Language-Lab-Manual-lab-1.pdf
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
R Programming for Statistical Applications
R Programming for Statistical ApplicationsR Programming for Statistical Applications
R Programming for Statistical Applications
drputtanr
 
R-programming with example representation.ppt
R-programming with example representation.pptR-programming with example representation.ppt
R-programming with example representation.ppt
geethar79
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
Andrew Lowe
 
Data Science With R Programming Unit - II Part-1.pptx
Data Science With R Programming Unit - II Part-1.pptxData Science With R Programming Unit - II Part-1.pptx
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
Data science with R Unit - II Part-1.pptx
Data science with R Unit - II Part-1.pptxData science with R Unit - II Part-1.pptx
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
Introduction To Programming In R for data analyst
Introduction To Programming In R for data analystIntroduction To Programming In R for data analyst
Introduction To Programming In R for data analyst
ssuser26ff68
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
Ad

More from Stéphane Fréchette (18)

Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
Stéphane Fréchette
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
Stéphane Fréchette
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
TEDxGatineau
TEDxGatineau TEDxGatineau
TEDxGatineau
Stéphane Fréchette
 
Power BI
Power BIPower BI
Power BI
Stéphane Fréchette
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
Stéphane Fréchette
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 
Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
Stéphane Fréchette
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012Data Quality Services in SQL Server 2012
Data Quality Services in SQL Server 2012
Stéphane Fréchette
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 

Recently uploaded (20)

European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Maxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing placeMaxx nft market place new generation nft marketing place
Maxx nft market place new generation nft marketing place
usersalmanrazdelhi
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
Introducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRCIntroducing the OSA 3200 SP and OSA 3250 ePRC
Introducing the OSA 3200 SP and OSA 3250 ePRC
Adtran
 
Improving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevExImproving Developer Productivity With DORA, SPACE, and DevEx
Improving Developer Productivity With DORA, SPACE, and DevEx
Justin Reock
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 

Data Analytics with R and SQL Server

  • 1. Data Analytics with R and SQL Server Stéphane Fréchette Thursday March 19, 2015
  • 2. Who am I? My name is Stéphane Fréchette SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data |NoSQL | Data Science. Drums, good food and fine wine. I have a passion for architecting, designing and building solutions that matter. Twitter: @sfrechette Blog: stephanefrechette.com Email: [email protected]
  • 3. Topics • What is R? • Should I use R? • Data Structures • Graphics • Data Manipulation in R • Connecting to SQL Server • Demos • Resources • Q&A
  • 4. DISCLAIMER This is not a course nor a tutorial, but an introduction, a walkthrough to inspire you to further explore and learn more about R and statistical computing
  • 5. “ Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.” - Wikipedia
  • 6. What is R? • A programming language, environment for statistical computing and graphics • R has its origins in the S programming language created in the 1970’s • Best used to manipulate moderately sized datasets, do statistical analysis and produce data-centric documents and presentations • These tools are distributed as packages, which any user can download to customize the R environment • Cross-platform: runs on Mac, Windows and Unix based systems
  • 7. Should I use R? Are you doing statistics ? No Yes No Yes Where “statistics” can mean machine learning, predictive analytics, data science, anything that falls under a rather broad umbrella… But if you have some data that makes sense to represent in a tabular like structure, and you want to do some cool analytical or statistics stuff with it, R is definitely a good choice…
  • 8. Downloading and Installing R http://www.r-project.org/ http://www.rstudio.com/
  • 9. The IDE (RStudio) 1. View Files and Data 2. See Workspace and History 3. See Files, Plots, Packages and Help 4. Console 1 2 34
  • 10. Installing Packages • To use packages in R, one must first install them using the install.packages function • Downloads the packages from CRAN and installs it to ready to be use
  • 11. Loading Packages • To use particular packages in your current R session, one must load it into the R environment using the library or require functions
  • 12. Common Data Structures in R To make the best of the R language, one needs a strong understanding of the basic data types and data structures and how to operate and use them. R has a wide variety of data types including scalars, vectors (numerical, character, logical), matrices, data frames, and lists… To understand computations in R, two slogans are helpful: • Everything that exists is an object • Everything that happens is a function call John Chambers creator of the S programming language, and core member of the R programming language project.
  • 13. Data Structures - Vectors The simplest structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
  • 14. Data Structures - Matrices Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function matrix.
  • 15. Data Structures - Data frames Time series are often ordered in data frames. A data frame is a matrix with names above the columns. This is nice, because you can call and use one of the columns without knowing in which position it is.
  • 16. Data Structures - Lists An R list is an object consisting of an ordered collection of objects known as its components.
  • 17. Data Structures - Date and Time Sys.time() # returns the current system date time
  • 18. Data Structures - Date and Time Two main (internal) formats for date-time are: POSIXct and POSIXlt • POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame • POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
  • 19. Data Structures - Others Other useful and important data type • NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero. The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not. • NA: Used for denoting missing values. (x = NA) creates a variable x with missing values. The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not. • NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function is.nan() lets you check whether the value of a variable is NaN or not. • Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The function is.finite() lets you check whether the value of a variable is infinity or not.
  • 20. Graphics One of the main reasons data analysts and data scientists turn to R is for its strong graphic capabilities. Basic Graphs: • These include density plots (histograms and kernel density plots), dot plots, bar charts (simple, stacked, grouped), line charts, pie charts (simple, annotated, 3D), boxplots (simple, notched, violin plots, bagplots) and scatter plots (simple, with fit lines, scatterplot matrices, high density plots, and 3D plots).
  • 21. Graphics Advances Graphs: • Graphical parameters describes how to change a graph's symbols, fonts, colors, and lines. Axes and text describe how to customize a graph's axes, add reference lines, text annotations and a legend. Combining plots describes how to organize multiple plots into a single graph. • The lattice package provides a comprehensive system for visualizing multivariate data, including the ability to create plots conditioned on one or more variables. The ggplot2 package offers a elegant systems for generating univariate and multivariate graphs based on a grammar of graphics.
  • 22. Data Manipulation in R dplyr an R package for fast and easy data manipulation. Data manipulation often involves common tasks, such as selecting certain variables, filtering on certain conditions, deriving new variables from existing variables, and so forth. If we think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation. In dplyr the main verbs (or functions) are: • filter: select a subset of the rows of a data frame • arrange: works similarly to filter, except that instead of filtering or selecting rows, it reorders them • select: select columns of a data frame • mutate: add new columns to a data frame that are functions of existing columns • summarize: summarize values • group_by: describe how to break a data frame into groups of rows
  • 24. Connecting R and SQL Server The RODBC package provides access to databases (including Microsoft Access and Microsoft SQL Server) through an ODBC interface Function Description odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=TRUE) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection
  • 26. Other interface The RJDBC package provides access to databases through a JDBC interface. (requires JDBC driver from Microsoft)
  • 27. Demo [Let’s analyze - R and SQL Server]
  • 28. Resources • The R Project for Statistical Computing http://www.r-project.org/ • RStudio http://www.rstudio.com/ • Revolution Analytics http://www.revolutionanalytics.com/ • Shiny http://shiny.rstudio.com/ • {swirl} Learn R, in R http://swirlstats.com/ • R-bloggers http://www.r-bloggers.com/ • Online R resources for Beginners http://bit.ly/1x2q6Gl • 60+ R resources to improve your data skills http://bit.ly/1BzW4ox • Stack Overflow - R http://stackoverflow.com/tags/r • Cerebral Mastication - R Resources http://bit.ly/17YhZj4 • Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
  • 29. What Questions Do You Have?
  • 30. Thank You For attending this session