0% found this document useful (0 votes)
91 views36 pages

A Very Brief Introduction To R: - Matthew Keller

R is a programming language for statistical analysis and graphics. It allows users to create their own functions and modify their environment, unlike SAS and SPSS which have constrained algorithms. R is fast, free, and cutting edge as it incorporates the latest statistical methods from researchers. It has both advantages like a large collection of functions and packages, and disadvantages like a steep learning curve.

Uploaded by

Lokesh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
0% found this document useful (0 votes)
91 views36 pages

A Very Brief Introduction To R: - Matthew Keller

R is a programming language for statistical analysis and graphics. It allows users to create their own functions and modify their environment, unlike SAS and SPSS which have constrained algorithms. R is fast, free, and cutting edge as it incorporates the latest statistical methods from researchers. It has both advantages like a large collection of functions and packages, and disadvantages like a steep learning curve.

Uploaded by

Lokesh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT or read online on Scribd
You are on page 1/ 36

A very brief introduction to R

- Matthew Keller

Some material cribbed from: UCLA Academic Technology Services


Technical Report Series (by Patrick Burns) and presentations
(found online) by Bioconductor, Wolfgang Huber and Hung Chen, &
various Harry Potter websites
R programming R, And the Rise of
language is a lot the Best Software
like magic... Money Can’t Buy
except instead of
spells you have
functions.
=
muggle
SPSS and SAS users are like muggles. They are limited in their
ability to change their environment. They have to rely on
algorithms that have been developed for them. The way they
approach a problem is constrained by how SAS/SPSS
employed programmers thought to approach them. And they
have to pay money to use these constraining algorithms.
=
wizard
R users are like wizards. They can rely on functions (spells) that
have been developed for them by statistical researchers, but
they can also create their own. They don’t have to pay for the
use of them, and once experienced enough (like Dumbledore),
they are almost unlimited in their ability to change their
environment.
History of R
• S: language for data analysis developed at Bell
Labs circa 1976
• Licensed by AT&T/Lucent to Insightful Corp.
Product name: S-plus.
• R: initially written & released as an open source
software by Ross Ihaka and Robert Gentleman
at U Auckland during 90s (R plays on name
“S”)
• Since 1997: international R-core team ~15
people & 1000s of code writers and statisticians
“Open source”... that just means
I don’t have to pay for it, right?
•No. Much more:
–Provides full access to algorithms and their implementation
–Gives you the ability to fix bugs and extend software
–Provides a forum allowing researchers to explore and 
expand the methods used to analyze data
–Is the product of 1000s of leading experts in the fields they 
know best. It is CUTTING EDGE. 
–Ensures that scientists around the world ­ and not just 
ones in rich countries ­ are the co­owners to the software 
tools needed to carry out research
–Promotes reproducible research by providing open and 
accessible tools
–Most of R is written in… R! This makes it quite easy to see 
5
what functions are actually doing.
What is it?
•R is an interpreted computer language.
– Most user-visible functions are written in R itself, calling upon a
smaller set of internal primitives.
– It is possible to interface procedures written in C, C+, or FORTRAN
languages for efficiency, and to write additional primitives.
– System commands can be called from within R

•R is used for data manipulation, statistics, and graphics. It is


made up of:
– operators (+ - <- * %*% …) for calculations on arrays & matrices
– large, coherent, integrated collection of functions
– facilities for making unlimited types of publication quality graphics
– user written functions & sets of functions (packages); 800+
contributed packages so far & growing
R
Advantages Disadvantages
oFast and free.
oState of the art: Statistical
researchers provide their methods as
R packages. SPSS and SAS are
years behind R!
o2nd only to MATLAB for graphics.
oMx, WinBugs, and other programs
use or will use R.
oActive user community
oExcellent for simulation,
programming, computer intensive
analyses, etc.
oForces you to think about your
analysis.
oInterfaces with database storage
software (SQL)
R
Advantages Disadvantages
oFast and free. oNot user friendly @ start - steep
oState of the art: Statistical learning curve, minimal GUI.
researchers provide their methods as oNo commercial support; figuring out
R packages. SPSS and SAS are correct methods or how to use a function
years behind R! on your own can be frustrating.
o2nd only to MATLAB for graphics. oEasy to make mistakes and not know.
oMx, WinBugs, and other programs oWorking with large datasets is limited
use or will use R. by RAM
oActive user community oData prep & cleaning can be messier &
oExcellent for simulation, more mistake prone in R vs. SPSS or
programming, computer intensive SAS
analyses, etc. oSome users complain about hostility on
oForces you to think about your the R listserve
analysis.
oInterfaces with database storage
software (SQL)
Learning R....
R-help listserve....
Don’t expect R to be like
SAS/SPSS/Stata/etc…
Here’s a synopsis of one person’s story. He used SAS and, being 
a fan of open­source, attempted to learn R. He became frustrated 
with R and gave up. When he had a simple problem that he 
couldn’t do in  SAS, he quickly solved it with R. Then over about a 
month he became comfortable with R from consistent study of it. 
In hindsight he thinks that the initial problem was that he hadn’t 
changed his way of thinking  to match R’s approach, and he 
wanted to master R immediately. ­­Patrick Burns, UCLA Statistical 
Consultant
Two personal examples…
1. Run Mx (SEM program) ML factor analysis script from within R
Grep the Mx output and pull it into R in form of a matrix & p­value
If p­value <.05, run another Mx script. Otherwise, keep old matrix
Get distributions of the columns of these matrices from 10000 runs

2. Profile analysis (within­subject MANOVA) on dataset that 
included twins ­ violation of independence assumption!
So we needed to permute the independent variable within families 
for one analysis and within individuals for another.
Do this 10000 times and save results after each to get valid p­
values
R Commercial packages
Many different datasets (and other One datasets available at a given
“objects”) available at same time time
Datasets can be of any dimension Datasets are rectangular
Functions can be modified Functions are proprietary
Experience is interactive-you Experience is passive-you choose
program until you get exactly what you an analysis and they give you
want everything they think you need
One stop shopping - almost every Tend to be have limited scope,
analytical tool you can think of is forcing you to learn additional
available programs; extra options cost more
and/or require you to learn a different
language (e.g., SPSS Macros)
R is free and will continue to exist. They cost money. There is no
Nothing can make it go away, its price guarantee they will continue to exist,
will never increase. but if they do, you can bet that their
prices will always increase
R vs SAS/SPSS
R vs SAS/SPSS
R vs SAS/SPSS
There are over 800 add-on packages
(http://cran.r-project.org/src/contrib/PACKAGES.html)
• This is an enormous advantage - new
techniques available without delay, and they
can be performed using the R language you
already know.
• Allows you to build a customized statistical
program suited to your own needs.
• Downside = as the number of packages grows,
it is becoming difficult to choose the best
package for your needs, & QC is an issue.
A particular R strength: genetics
• Bioconductor is a suite of
additional functions and
some 200 packages
dedicated to analysis,
visualization, and
management of genetic
data
• Much more functionality
than software released by
Affy or Illumina
An R weakness
• Structural Equation Modeling - the sem
package is quite limited.
• But this will
not be a weakness
for long…
Typical R session
• Start up R via the GUI or favorite text editor
• Two windows:
– 1+ new or existing scripts (text files) - these will be saved
– Terminal – output & temporary input - usually unsaved
Typical R session
• R sessions are interactive

Write small bits of


code here and run it
Typical R session
• R sessions are interactive

Write small bits of


Output appears here.
code here and run it
Did you get what you
wanted?
Typical R session
• R sessions are interactive

Output appears here. Adjust your syntax here


Did you get what you depending on this
wanted? answer.
Typical R session
• R sessions are interactive
Typical R session
• R sessions are interactive

At end, all you


need to do is save
your script file(s) -
which can easily
be rerun later.
R Objects
• Almost all things in R – functions, datasets, results,
etc. – are OBJECTS.
– (graphics are written out and are not stored as objects)
• Script can be thought of as a way to make objects.
Your goal is usually to write a script that, by its end,
has created the objects (e.g., statistical results) and
graphics you need.
• Objects are classified by two criteria:
– MODE: how objects are stored in R - character, numeric,
logical, factor, list, & function
– CLASS: how objects are treated by functions (important to
know!) - [vector], matrix, array, data.frame, & hundreds of
special classes created by specific functions
R Objects

x1 x2 x3 x4 x5 x6
1
2
3
Z <- 4
5
6
7
8
R Objects

x1 x2 x3 x4 x5 x6
1
2
The MODE of Z is
3 determined automatically
4
5
by the types of things
6 stored in Z – numbers,
7
8
characters, etc. If it is a
mix, mode = list.
R Objects

x1 x2 x3 x4 x5 x6 The CLASS of Z is either


1 set by default depending,
2
3 on how it was created, or
4 is explicitly set by user.
5
6 You can check the objects’
7 class and change it. It
8
determines how functions
deal with Z.
Learning R
• Check out the course wikisite - lots of good manuals & links
• Read through the CRAN website
• Use http://www.rseek.org/ instead of google
• Know your objects’ classes: class(x) or info(x)
• Because R is interactive, errors are your friends!
• ?lm gives you help on lm function. Reading help files can be
very… helpful
• MOST IMPORTANT - the more time you spend using R, the
more comfortable you become with it. After doing your first real
project in R, you won’t look back. I promise.
Things to do now
• Open a dedicated gmail account & subscribe to R-help mailing
list (https://stat.ethz.ch/mailman/listinfo/r-help). Once you have
done this, email [email protected]. I will create an
email group and send out a notice about the group’s name.
Thereafter, please: a) have this email account open whenever
doing R (or more often if you want), and b) ask questions to the
group as they arise. If you know an answer or can guess at it,
fire away! Also, keep an eye on the list-serve queries. It’s a
great way to learn R!
• Create your own personalized script library. When you learn
how to do something, place the syntax in your library. Keep it
organized. Turn in your updated script library with each
homework.
Recommended Book
• An R and S-PLUS Companion to
Applied Regression: An excellent
overview of R, not just regression in R.
Highly recommended. Many of the HWs
we will do were inspired by Fox’s book.
If you are the type of person who likes to
have a book, buy this one. $56 at
Amazon.
Success of this course from Spring
2008, judged by % usage of R of all
statistical programs
Final Words of Warning
• “Using R is a bit akin to
smoking. The beginning is
difficult, one may get
headaches and even gag the
first few times. But in the long
run,it becomes pleasurable
and even addictive. Yet, deep
R
down, for those willing to be
honest, there is something not
fully healthy in it.” --Francois
Pinard
Next three classes
Jan 23: 1) Have R installed
2) Go over HW
3) Go over R basics and the reading, writing, and
manipulation of data
Jan 30: 1) Go over HW
2) Go over descriptive stats, ANOVA & regression
Feb 6: 1) Go over HW
2) Go over an intro to graphics

You might also like