An Introduction To R
An Introduction To R
AnIntroductiontoR
AnIntroductiontoR
TableofContents
Preface
1Introductionandpreliminaries
1.1TheRenvironment
1.2Relatedsoftwareanddocumentation
1.3Randstatistics
1.4Randthewindowsystem
1.5UsingRinteractively
1.6Anintroductorysession
1.7Gettinghelpwithfunctionsandfeatures
1.8Rcommands,casesensitivity,etc.
1.9Recallandcorrectionofpreviouscommands
1.10Executingcommandsfromordivertingoutputtoafile
1.11Datapermanencyandremovingobjects
2Simplemanipulationsnumbersandvectors
2.1Vectorsandassignment
2.2Vectorarithmetic
2.3Generatingregularsequences
2.4Logicalvectors
2.5Missingvalues
2.6Charactervectors
2.7Indexvectorsselectingandmodifyingsubsetsofadataset
2.8Othertypesofobjects
3Objects,theirmodesandattributes
3.1Intrinsicattributes:modeandlength
3.2Changingthelengthofanobject
3.3Gettingandsettingattributes
3.4Theclassofanobject
4Orderedandunorderedfactors
4.1Aspecificexample
4.2Thefunctiontapply()andraggedarrays
4.3Orderedfactors
5Arraysandmatrices
5.1Arrays
5.2Arrayindexing.Subsectionsofanarray
5.3Indexmatrices
5.4Thearray()function
5.4.1Mixedvectorandarrayarithmetic.Therecyclingrule
5.5Theouterproductoftwoarrays
5.6Generalizedtransposeofanarray
5.7Matrixfacilities
5.7.1Matrixmultiplication
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
1/116
5/28/2015
AnIntroductiontoR
5.7.2Linearequationsandinversion
5.7.3Eigenvaluesandeigenvectors
5.7.4Singularvaluedecompositionanddeterminants
5.7.5LeastsquaresfittingandtheQRdecomposition
5.8Formingpartitionedmatrices,cbind()andrbind()
5.9Theconcatenationfunction,c(),witharrays
5.10Frequencytablesfromfactors
6Listsanddataframes
6.1Lists
6.2Constructingandmodifyinglists
6.2.1Concatenatinglists
6.3Dataframes
6.3.1Makingdataframes
6.3.2attach()anddetach()
6.3.3Workingwithdataframes
6.3.4Attachingarbitrarylists
6.3.5Managingthesearchpath
7Readingdatafromfiles
7.1Theread.table()function
7.2Thescan()function
7.3Accessingbuiltindatasets
7.3.1LoadingdatafromotherRpackages
7.4Editingdata
8Probabilitydistributions
8.1Rasasetofstatisticaltables
8.2Examiningthedistributionofasetofdata
8.3Oneandtwosampletests
9Grouping,loopsandconditionalexecution
9.1Groupedexpressions
9.2Controlstatements
9.2.1Conditionalexecution:ifstatements
9.2.2Repetitiveexecution:forloops,repeatandwhile
10Writingyourownfunctions
10.1Simpleexamples
10.2Definingnewbinaryoperators
10.3Namedargumentsanddefaults
10.4Theargument
10.5Assignmentswithinfunctions
10.6Moreadvancedexamples
10.6.1Efficiencyfactorsinblockdesigns
10.6.2Droppingallnamesinaprintedarray
10.6.3Recursivenumericalintegration
10.7Scope
10.8Customizingtheenvironment
10.9Classes,genericfunctionsandobjectorientation
11StatisticalmodelsinR
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
2/116
5/28/2015
AnIntroductiontoR
11.1Definingstatisticalmodelsformulae
11.1.1Contrasts
11.2Linearmodels
11.3Genericfunctionsforextractingmodelinformation
11.4Analysisofvarianceandmodelcomparison
11.4.1ANOVAtables
11.5Updatingfittedmodels
11.6Generalizedlinearmodels
11.6.1Families
11.6.2Theglm()function
11.7Nonlinearleastsquaresandmaximumlikelihoodmodels
11.7.1Leastsquares
11.7.2Maximumlikelihood
11.8Somenonstandardmodels
12Graphicalprocedures
12.1Highlevelplottingcommands
12.1.1Theplot()function
12.1.2Displayingmultivariatedata
12.1.3Displaygraphics
12.1.4Argumentstohighlevelplottingfunctions
12.2Lowlevelplottingcommands
12.2.1Mathematicalannotation
12.2.2Hersheyvectorfonts
12.3Interactingwithgraphics
12.4Usinggraphicsparameters
12.4.1Permanentchanges:Thepar()function
12.4.2Temporarychanges:Argumentstographicsfunctions
12.5Graphicsparameterslist
12.5.1Graphicalelements
12.5.2Axesandtickmarks
12.5.3Figuremargins
12.5.4Multiplefigureenvironment
12.6Devicedrivers
12.6.1PostScriptdiagramsfortypesetdocuments
12.6.2Multiplegraphicsdevices
12.7Dynamicgraphics
13Packages
13.1Standardpackages
13.2ContributedpackagesandCRAN
13.3Namespaces
14OSfacilities
14.1Filesanddirectories
14.2Filepaths
14.3Systemcommands
14.4CompressionandArchives
AppendixAAsamplesession
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
3/116
5/28/2015
AnIntroductiontoR
AppendixBInvokingR
B.1InvokingRfromthecommandline
B.2InvokingRunderWindows
B.3InvokingRunderOSX
B.4ScriptingwithR
AppendixCThecommandlineeditor
C.1Preliminaries
C.2Editingactions
C.3Commandlineeditorsummary
AppendixDFunctionandvariableindex
AppendixEConceptindex
AppendixFReferences
Next:Preface[Contents][Index]
AnIntroductiontoR
ThisisanintroductiontoR(GNUS),alanguageandenvironmentforstatisticalcomputing
andgraphics.Rissimilartotheawardwinning1Ssystem,whichwasdevelopedatBell
LaboratoriesbyJohnChambersetal.Itprovidesawidevarietyofstatisticalandgraphical
techniques(linearandnonlinearmodelling,statisticaltests,timeseriesanalysis,classification,
clustering,...).
Thismanualprovidesinformationondatatypes,programmingelements,statisticalmodelling
andgraphics.
ThismanualisforR,version3.2.0(20150416).
Copyright1990W.N.Venables
Copyright1992W.N.Venables&D.M.Smith
Copyright1997R.Gentleman&R.Ihaka
Copyright1997,1998M.Maechler
Copyright19992015RCoreTeam
Permissionisgrantedtomakeanddistributeverbatimcopiesofthismanual
providedthecopyrightnoticeandthispermissionnoticearepreservedonallcopies.
Permissionisgrantedtocopyanddistributemodifiedversionsofthismanualunder
theconditionsforverbatimcopying,providedthattheentireresultingderivedwork
isdistributedunderthetermsofapermissionnoticeidenticaltothisone.
Permissionisgrantedtocopyanddistributetranslationsofthismanualintoanother
language,undertheaboveconditionsformodifiedversions,exceptthatthis
permissionnoticemaybestatedinatranslationapprovedbytheRCoreTeam.
Preface:
Introductionandpreliminaries:
Simplemanipulationsnumbersandvectors:
Objects:
Factors:
Arraysandmatrices:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
4/116
5/28/2015
AnIntroductiontoR
Listsanddataframes:
Readingdatafromfiles:
Probabilitydistributions:
Loopsandconditionalexecution:
Writingyourownfunctions:
StatisticalmodelsinR:
Graphics:
Packages:
OSfacilities:
Asamplesession:
InvokingR:
Thecommandlineeditor:
Functionandvariableindex:
Conceptindex:
References:
Next:Introductionandpreliminaries,Previous:Top,Up:Top[Contents][Index]
Preface
ThisintroductiontoRisderivedfromanoriginalsetofnotesdescribingtheSandSPLUS
environmentswrittenin19902byBillVenablesandDavidM.SmithwhenattheUniversityof
Adelaide.WehavemadeanumberofsmallchangestoreflectdifferencesbetweentheRandS
programs,andexpandedsomeofthematerial.
WewouldliketoextendwarmthankstoBillVenables(andDavidSmith)forgranting
permissiontodistributethismodifiedversionofthenotesinthisway,andforbeingasupporter
ofRfromwayback.
Commentsandcorrectionsarealwayswelcome.PleaseaddressemailcorrespondencetoR
[email protected].
Suggestionstothereader
MostRnoviceswillstartwiththeintroductorysessioninAppendixA.Thisshouldgivesome
familiaritywiththestyleofRsessionsandmoreimportantlysomeinstantfeedbackonwhat
actuallyhappens.
ManyuserswillcometoRmainlyforitsgraphicalfacilities.SeeGraphics,whichcanbereadat
almostanytimeandneednotwaituntilalltheprecedingsectionshavebeendigested.
Introductionandpreliminaries:
Next:Simplemanipulationsnumbersandvectors,Previous:Preface,Up:Top[Contents]
[Index]
1Introductionandpreliminaries
TheRenvironment:
Relatedsoftwareanddocumentation:
Randstatistics:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
5/116
5/28/2015
AnIntroductiontoR
Randthewindowsystem:
UsingRinteractively:
Gettinghelp:
Rcommandscasesensitivityetc:
Recallandcorrectionofpreviouscommands:
Executingcommandsfromordivertingoutputtoafile:
Datapermanencyandremovingobjects:
Next:Relatedsoftwareanddocumentation,Previous:Introductionandpreliminaries,Up:
Introductionandpreliminaries[Contents][Index]
1.1TheRenvironment
Risanintegratedsuiteofsoftwarefacilitiesfordatamanipulation,calculationandgraphical
display.Amongotherthingsithas
aneffectivedatahandlingandstoragefacility,
asuiteofoperatorsforcalculationsonarrays,inparticularmatrices,
alarge,coherent,integratedcollectionofintermediatetoolsfordataanalysis,
graphicalfacilitiesfordataanalysisanddisplayeitherdirectlyatthecomputeroron
hardcopy,and
awelldeveloped,simpleandeffectiveprogramminglanguage(calledS)whichincludes
conditionals,loops,userdefinedrecursivefunctionsandinputandoutputfacilities.
(IndeedmostofthesystemsuppliedfunctionsarethemselveswrittenintheSlanguage.)
Thetermenvironmentisintendedtocharacterizeitasafullyplannedandcoherentsystem,
ratherthananincrementalaccretionofveryspecificandinflexibletools,asisfrequentlythecase
withotherdataanalysissoftware.
Risverymuchavehiclefornewlydevelopingmethodsofinteractivedataanalysis.Ithas
developedrapidly,andhasbeenextendedbyalargecollectionofpackages.However,most
programswritteninRareessentiallyephemeral,writtenforasinglepieceofdataanalysis.
Next:Randstatistics,Previous:TheRenvironment,Up:Introductionandpreliminaries
[Contents][Index]
1.2Relatedsoftwareanddocumentation
RcanberegardedasanimplementationoftheSlanguagewhichwasdevelopedatBell
LaboratoriesbyRickBecker,JohnChambersandAllanWilks,andalsoformsthebasisoftheS
PLUSsystems.
TheevolutionoftheSlanguageischaracterizedbyfourbooksbyJohnChambersandcoauthors.
ForR,thebasicreferenceisTheNewSLanguage:AProgrammingEnvironmentforData
AnalysisandGraphicsbyRichardA.Becker,JohnM.ChambersandAllanR.Wilks.Thenew
featuresofthe1991releaseofSarecoveredinStatisticalModelsinSeditedbyJohnM.
ChambersandTrevorJ.Hastie.Theformalmethodsandclassesofthemethodspackageare
basedonthosedescribedinProgrammingwithDatabyJohnM.Chambers.SeeReferences,for
precisereferences.
TherearenowanumberofbookswhichdescribehowtouseRfordataanalysisandstatistics,
anddocumentationforS/SPLUScantypicallybeusedwithR,keepingthedifferencesbetween
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
6/116
5/28/2015
AnIntroductiontoR
theSimplementationsinmind.SeeWhatdocumentationexistsforR?inTheRstatisticalsystem
FAQ.
Next:Randthewindowsystem,Previous:Relatedsoftwareanddocumentation,Up:
Introductionandpreliminaries[Contents][Index]
1.3Randstatistics
OurintroductiontotheRenvironmentdidnotmentionstatistics,yetmanypeopleuseRasa
statisticssystem.Weprefertothinkofitofanenvironmentwithinwhichmanyclassicaland
modernstatisticaltechniqueshavebeenimplemented.AfewofthesearebuiltintothebaseR
environment,butmanyaresuppliedaspackages.Thereareabout25packagessuppliedwithR
(calledstandardandrecommendedpackages)andmanymoreareavailablethroughthe
CRANfamilyofInternetsites(viahttp://CRAN.Rproject.org)andelsewhere.Moredetailson
packagesaregivenlater(seePackages).
MostclassicalstatisticsandmuchofthelatestmethodologyisavailableforusewithR,butusers
mayneedtobepreparedtodoalittleworktofindit.
ThereisanimportantdifferenceinphilosophybetweenS(andhenceR)andtheothermain
statisticalsystems.InSastatisticalanalysisisnormallydoneasaseriesofsteps,with
intermediateresultsbeingstoredinobjects.ThuswhereasSASandSPSSwillgivecopious
outputfromaregressionordiscriminantanalysis,Rwillgiveminimaloutputandstorethe
resultsinafitobjectforsubsequentinterrogationbyfurtherRfunctions.
Next:UsingRinteractively,Previous:Randstatistics,Up:Introductionandpreliminaries
[Contents][Index]
1.4Randthewindowsystem
ThemostconvenientwaytouseRisatagraphicsworkstationrunningawindowingsystem.
Thisguideisaimedatuserswhohavethisfacility.Inparticularwewilloccasionallyrefertothe
useofRonanXwindowsystemalthoughthevastbulkofwhatissaidappliesgenerallytoany
implementationoftheRenvironment.
Mostuserswillfinditnecessarytointeractdirectlywiththeoperatingsystemontheircomputer
fromtimetotime.Inthisguide,wemainlydiscussinteractionwiththeoperatingsystemon
UNIXmachines.IfyouarerunningRunderWindowsorOSXyouwillneedtomakesome
smalladjustments.
SettingupaworkstationtotakefulladvantageofthecustomizablefeaturesofRisa
straightforwardifsomewhattediousprocedure,andwillnotbeconsideredfurtherhere.Usersin
difficultyshouldseeklocalexperthelp.
Next:Gettinghelp,Previous:Randthewindowsystem,Up:Introductionandpreliminaries
[Contents][Index]
1.5UsingRinteractively
WhenyouusetheRprogramitissuesapromptwhenitexpectsinputcommands.Thedefault
promptis>,whichonUNIXmightbethesameastheshellprompt,andsoitmayappearthat
nothingishappening.However,asweshallsee,itiseasytochangetoadifferentRpromptif
youwish.WewillassumethattheUNIXshellpromptis$.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
7/116
5/28/2015
AnIntroductiontoR
InusingRunderUNIXthesuggestedprocedureforthefirstoccasionisasfollows:
1. Createaseparatesubdirectory,saywork,toholddatafilesonwhichyouwilluseRfor
thisproblem.ThiswillbetheworkingdirectorywheneveryouuseRforthisparticular
problem.
$mkdirwork
$cdwork
2. StarttheRprogramwiththecommand
$R
3. AtthispointRcommandsmaybeissued(seelater).
4. ToquittheRprogramthecommandis
>q()
AtthispointyouwillbeaskedwhetheryouwanttosavethedatafromyourRsession.On
somesystemsthiswillbringupadialogbox,andonothersyouwillreceiveatextprompt
towhichyoucanrespondyes,noorcancel(asingleletterabbreviationwilldo)tosavethe
databeforequitting,quitwithoutsaving,orreturntotheRsession.Datawhichissaved
willbeavailableinfutureRsessions.
FurtherRsessionsaresimple.
1. Makeworktheworkingdirectoryandstarttheprogramasbefore:
$cdwork
$R
2. UsetheRprogram,terminatingwiththeq()commandattheendofthesession.
TouseRunderWindowstheproceduretofollowisbasicallythesame.Createafolderasthe
workingdirectory,andsetthatintheStartInfieldinyourRshortcut.ThenlaunchRbydouble
clickingontheicon.
1.6Anintroductorysession
ReaderswishingtogetafeelforRatacomputerbeforeproceedingarestronglyadvisedtowork
throughtheintroductorysessiongiveninAsamplesession.
Next:Rcommandscasesensitivityetc,Previous:UsingRinteractively,Up:Introductionand
preliminaries[Contents][Index]
1.7Gettinghelpwithfunctionsandfeatures
RhasaninbuilthelpfacilitysimilartothemanfacilityofUNIX.Togetmoreinformationonany
specificnamedfunction,forexamplesolve,thecommandis
>help(solve)
Analternativeis
>?solve
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
8/116
5/28/2015
AnIntroductiontoR
Forafeaturespecifiedbyspecialcharacters,theargumentmustbeenclosedindoubleorsingle
quotes,makingitacharacterstring:Thisisalsonecessaryforafewwordswithsyntactic
meaningincludingif,forandfunction.
>help("[[")
Eitherformofquotemarkmaybeusedtoescapetheother,asinthestring"It'simportant".
Ourconventionistousedoublequotemarksforpreference.
OnmostRinstallationshelpisavailableinHTMLformatbyrunning
>help.start()
whichwilllaunchaWebbrowserthatallowsthehelppagestobebrowsedwithhyperlinks.On
UNIX,subsequenthelprequestsaresenttotheHTMLbasedhelpsystem.TheSearchEngine
andKeywordslinkinthepageloadedbyhelp.start()isparticularlyusefulasitiscontainsa
highlevelconceptlistwhichsearchesthoughavailablefunctions.Itcanbeagreatwaytoget
yourbearingsquicklyandtounderstandthebreadthofwhatRhastooffer.
Thehelp.searchcommand(alternatively??)allowssearchingforhelpinvariousways.For
example,
>??solve
Try?help.searchfordetailsandmoreexamples.
Theexamplesonahelptopiccannormallyberunby
>example(topic)
WindowsversionsofRhaveotheroptionalhelpsystems:use
>?help
forfurtherdetails.
Next:Recallandcorrectionofpreviouscommands,Previous:Gettinghelp,Up:Introductionand
preliminaries[Contents][Index]
1.8Rcommands,casesensitivity,etc.
TechnicallyRisanexpressionlanguagewithaverysimplesyntax.Itiscasesensitiveasare
mostUNIXbasedpackages,soAandaaredifferentsymbolsandwouldrefertodifferent
variables.ThesetofsymbolswhichcanbeusedinRnamesdependsontheoperatingsystem
andcountrywithinwhichRisbeingrun(technicallyonthelocaleinuse).Normallyall
alphanumericsymbolsareallowed2(andinsomecountriesthisincludesaccentedletters)plus
.and_,withtherestrictionthatanamemuststartwith.oraletter,andifitstartswith.
thesecondcharactermustnotbeadigit.Namesareeffectivelyunlimitedinlength.
Elementarycommandsconsistofeitherexpressionsorassignments.Ifanexpressionisgivenas
acommand,itisevaluated,printed(unlessspecificallymadeinvisible),andthevalueislost.An
assignmentalsoevaluatesanexpressionandpassesthevaluetoavariablebuttheresultisnot
automaticallyprinted.
Commandsareseparatedeitherbyasemicolon(;),orbyanewline.Elementarycommands
canbegroupedtogetherintoonecompoundexpressionbybraces({and}).Commentscanbe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
9/116
5/28/2015
AnIntroductiontoR
putalmost3anywhere,startingwithahashmark(#),everythingtotheendofthelineisa
comment.
Ifacommandisnotcompleteattheendofaline,Rwillgiveadifferentprompt,bydefault
+
onsecondandsubsequentlinesandcontinuetoreadinputuntilthecommandissyntactically
complete.Thispromptmaybechangedbytheuser.Wewillgenerallyomitthecontinuation
promptandindicatecontinuationbysimpleindenting.
Commandlinesenteredattheconsolearelimited4toabout4095bytes(notcharacters).
Next:Executingcommandsfromordivertingoutputtoafile,Previous:Rcommandscase
sensitivityetc,Up:Introductionandpreliminaries[Contents][Index]
1.9Recallandcorrectionofpreviouscommands
UndermanyversionsofUNIXandonWindows,Rprovidesamechanismforrecallingandre
executingpreviouscommands.Theverticalarrowkeysonthekeyboardcanbeusedtoscroll
forwardandbackwardthroughacommandhistory.Onceacommandislocatedinthisway,the
cursorcanbemovedwithinthecommandusingthehorizontalarrowkeys,andcharacterscanbe
removedwiththeDELkeyoraddedwiththeotherkeys.Moredetailsareprovidedlater:seeThe
commandlineeditor.
TherecallandeditingcapabilitiesunderUNIXarehighlycustomizable.Youcanfindouthowto
dothisbyreadingthemanualentryforthereadlinelibrary.
Alternatively,theEmacstexteditorprovidesmoregeneralsupportmechanisms(viaESS,Emacs
SpeaksStatistics)forworkinginteractivelywithR.SeeRandEmacsinTheRstatisticalsystem
FAQ.
Next:Datapermanencyandremovingobjects,Previous:Recallandcorrectionofprevious
commands,Up:Introductionandpreliminaries[Contents][Index]
1.10Executingcommandsfromordivertingoutputtoafile
Ifcommands5arestoredinanexternalfile,saycommands.Rintheworkingdirectorywork,they
maybeexecutedatanytimeinanRsessionwiththecommand
>source("commands.R")
ForWindowsSourceisalsoavailableontheFilemenu.Thefunctionsink,
>sink("record.lis")
willdivertallsubsequentoutputfromtheconsoletoanexternalfile,record.lis.Thecommand
>sink()
restoresittotheconsoleonceagain.
Previous:Executingcommandsfromordivertingoutputtoafile,Up:Introductionand
preliminaries[Contents][Index]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
10/116
5/28/2015
AnIntroductiontoR
1.11Datapermanencyandremovingobjects
TheentitiesthatRcreatesandmanipulatesareknownasobjects.Thesemaybevariables,arrays
ofnumbers,characterstrings,functions,ormoregeneralstructuresbuiltfromsuchcomponents.
DuringanRsession,objectsarecreatedandstoredbyname(wediscussthisprocessinthenext
session).TheRcommand
>objects()
(alternatively,ls())canbeusedtodisplaythenamesof(mostof)theobjectswhicharecurrently
storedwithinR.Thecollectionofobjectscurrentlystorediscalledtheworkspace.
Toremoveobjectsthefunctionrmisavailable:
>rm(x,y,z,ink,junk,temp,foo,bar)
AllobjectscreatedduringanRsessioncanbestoredpermanentlyinafileforuseinfutureR
sessions.AttheendofeachRsessionyouaregiventheopportunitytosaveallthecurrently
availableobjects.Ifyouindicatethatyouwanttodothis,theobjectsarewrittentoafilecalled
.RData6inthecurrentdirectory,andthecommandlinesusedinthesessionaresavedtoafile
called.Rhistory.
WhenRisstartedatlatertimefromthesamedirectoryitreloadstheworkspacefromthisfile.At
thesametimetheassociatedcommandshistoryisreloaded.
Itisrecommendedthatyoushoulduseseparateworkingdirectoriesforanalysesconductedwith
R.Itisquitecommonforobjectswithnamesxandytobecreatedduringananalysis.Names
likethisareoftenmeaningfulinthecontextofasingleanalysis,butitcanbequitehardtodecide
whattheymightbewhentheseveralanalyseshavebeenconductedinthesamedirectory.
Next:Objects,Previous:Introductionandpreliminaries,Up:Top[Contents][Index]
2Simplemanipulations;numbersandvectors
Vectorsandassignment:
Vectorarithmetic:
Generatingregularsequences:
Logicalvectors:
Missingvalues:
Charactervectors:
Indexvectors:
Othertypesofobjects:
Next:Vectorarithmetic,Previous:Simplemanipulationsnumbersandvectors,Up:Simple
manipulationsnumbersandvectors[Contents][Index]
2.1Vectorsandassignment
Roperatesonnameddatastructures.Thesimplestsuchstructureisthenumericvector,whichis
asingleentityconsistingofanorderedcollectionofnumbers.Tosetupavectornamedx,say,
consistingoffivenumbers,namely10.4,5.6,3.1,6.4and21.7,usetheRcommand
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
11/116
5/28/2015
AnIntroductiontoR
>x<c(10.4,5.6,3.1,6.4,21.7)
Thisisanassignmentstatementusingthefunctionc()whichinthiscontextcantakeanarbitrary
numberofvectorargumentsandwhosevalueisavectorgotbyconcatenatingitsargumentsend
toend.7
Anumberoccurringbyitselfinanexpressionistakenasavectoroflengthone.
Noticethattheassignmentoperator(<),whichconsistsofthetwocharacters<(lessthan)
and(minus)occurringstrictlysidebysideanditpointstotheobjectreceivingthevalue
oftheexpression.Inmostcontextsthe=operatorcanbeusedasanalternative.
Assignmentcanalsobemadeusingthefunctionassign().Anequivalentwayofmakingthe
sameassignmentasaboveiswith:
>assign("x",c(10.4,5.6,3.1,6.4,21.7))
Theusualoperator,<,canbethoughtofasasyntacticshortcuttothis.
Assignmentscanalsobemadeintheotherdirection,usingtheobviouschangeintheassignment
operator.Sothesameassignmentcouldbemadeusing
>c(10.4,5.6,3.1,6.4,21.7)>x
Ifanexpressionisusedasacompletecommand,thevalueisprintedandlost8.Sonowifwe
weretousethecommand
>1/x
thereciprocalsofthefivevalueswouldbeprintedattheterminal(andthevalueofx,ofcourse,
unchanged).
Thefurtherassignment
>y<c(x,0,x)
wouldcreateavectorywith11entriesconsistingoftwocopiesofxwithazerointhemiddle
place.
Next:Generatingregularsequences,Previous:Vectorsandassignment,Up:Simple
manipulationsnumbersandvectors[Contents][Index]
2.2Vectorarithmetic
Vectorscanbeusedinarithmeticexpressions,inwhichcasetheoperationsareperformed
elementbyelement.Vectorsoccurringinthesameexpressionneednotallbeofthesamelength.
Iftheyarenot,thevalueoftheexpressionisavectorwiththesamelengthasthelongestvector
whichoccursintheexpression.Shortervectorsintheexpressionarerecycledasoftenasneedbe
(perhapsfractionally)untiltheymatchthelengthofthelongestvector.Inparticularaconstantis
simplyrepeated.Sowiththeaboveassignmentsthecommand
>v<2*x+y+1
generatesanewvectorvoflength11constructedbyaddingtogether,elementbyelement,2*x
repeated2.2times,yrepeatedjustonce,and1repeated11times.
Theelementaryarithmeticoperatorsaretheusual+,,*,/and^forraisingtoapower.In
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
12/116
5/28/2015
AnIntroductiontoR
additionallofthecommonarithmeticfunctionsareavailable.log,exp,sin,cos,tan,sqrt,and
soon,allhavetheirusualmeaning.maxandminselectthelargestandsmallestelementsofa
vectorrespectively.rangeisafunctionwhosevalueisavectoroflengthtwo,namelyc(min(x),
max(x)).length(x)isthenumberofelementsinx,sum(x)givesthetotaloftheelementsinx,
andprod(x)theirproduct.
Twostatisticalfunctionsaremean(x)whichcalculatesthesamplemean,whichisthesameas
sum(x)/length(x),andvar(x)whichgives
sum((xmean(x))^2)/(length(x)1)
orsamplevariance.Iftheargumenttovar()isannbypmatrixthevalueisapbypsample
covariancematrixgotbyregardingtherowsasindependentpvariatesamplevectors.
sort(x)returnsavectorofthesamesizeasxwiththeelementsarrangedinincreasingorder
howeverthereareothermoreflexiblesortingfacilitiesavailable(seeorder()orsort.list()
whichproduceapermutationtodothesorting).
Notethatmaxandminselectthelargestandsmallestvaluesintheirarguments,eveniftheyare
givenseveralvectors.Theparallelmaximumandminimumfunctionspmaxandpminreturna
vector(oflengthequaltotheirlongestargument)thatcontainsineachelementthelargest
(smallest)elementinthatpositioninanyoftheinputvectors.
Formostpurposestheuserwillnotbeconcernedifthenumbersinanumericvectorare
integers,realsorevencomplex.Internallycalculationsaredoneasdoubleprecisionreal
numbers,ordoubleprecisioncomplexnumbersiftheinputdataarecomplex.
Toworkwithcomplexnumbers,supplyanexplicitcomplexpart.Thus
sqrt(17)
willgiveNaNandawarning,but
sqrt(17+0i)
willdothecomputationsascomplexnumbers.
Generatingregularsequences:
Next:Logicalvectors,Previous:Vectorarithmetic,Up:Simplemanipulationsnumbersand
vectors[Contents][Index]
2.3Generatingregularsequences
Rhasanumberoffacilitiesforgeneratingcommonlyusedsequencesofnumbers.Forexample
1:30isthevectorc(1,2,,29,30).Thecolonoperatorhashighprioritywithinanexpression,
so,forexample2*1:15isthevectorc(2,4,,28,30).Putn<10andcomparethesequences
1:n1and1:(n1).
Theconstruction30:1maybeusedtogenerateasequencebackwards.
Thefunctionseq()isamoregeneralfacilityforgeneratingsequences.Ithasfivearguments,
onlysomeofwhichmaybespecifiedinanyonecall.Thefirsttwoarguments,ifgiven,specify
thebeginningandendofthesequence,andifthesearetheonlytwoargumentsgiventheresultis
thesameasthecolonoperator.Thatisseq(2,10)isthesamevectoras2:10.
Argumentstoseq(),andtomanyotherRfunctions,canalsobegiveninnamedform,inwhich
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
13/116
5/28/2015
AnIntroductiontoR
casetheorderinwhichtheyappearisirrelevant.Thefirsttwoargumentsmaybenamed
from=valueandto=valuethusseq(1,30),seq(from=1,to=30)andseq(to=30,from=1)areall
thesameas1:30.Thenexttwoargumentstoseq()maybenamedby=valueandlength=value,
whichspecifyastepsizeandalengthforthesequencerespectively.Ifneitheroftheseisgiven,
thedefaultby=1isassumed.
Forexample
>seq(5,5,by=.2)>s3
generatesins3thevectorc(5.0,4.8,4.6,,4.6,4.8,5.0).Similarly
>s4<seq(length=51,from=5,by=.2)
generatesthesamevectorins4.
Thefifthargumentmaybenamedalong=vector,whichisnormallyusedastheonlyargumentto
createthesequence1,2,,length(vector),ortheemptysequenceifthevectorisempty(asit
canbe).
Arelatedfunctionisrep()whichcanbeusedforreplicatinganobjectinvariouscomplicated
ways.Thesimplestformis
>s5<rep(x,times=5)
whichwillputfivecopiesofxendtoendins5.Anotherusefulversionis
>s6<rep(x,each=5)
whichrepeatseachelementofxfivetimesbeforemovingontothenext.
Next:Missingvalues,Previous:Generatingregularsequences,Up:Simplemanipulations
numbersandvectors[Contents][Index]
2.4Logicalvectors
Aswellasnumericalvectors,Rallowsmanipulationoflogicalquantities.Theelementsofa
logicalvectorcanhavethevaluesTRUE,FALSE,andNA(fornotavailable,seebelow).Thefirst
twoareoftenabbreviatedasTandF,respectively.NotehoweverthatTandFarejustvariables
whicharesettoTRUEandFALSEbydefault,butarenotreservedwordsandhencecanbe
overwrittenbytheuser.Hence,youshouldalwaysuseTRUEandFALSE.
Logicalvectorsaregeneratedbyconditions.Forexample
>temp<x>13
setstempasavectorofthesamelengthasxwithvaluesFALSEcorrespondingtoelementsofx
wheretheconditionisnotmetandTRUEwhereitis.
Thelogicaloperatorsare<,<=,>,>=,==forexactequalityand!=forinequality.Inadditionifc1
andc2arelogicalexpressions,thenc1&c2istheirintersection(and),c1|c2istheirunion
(or),and!c1isthenegationofc1.
Logicalvectorsmaybeusedinordinaryarithmetic,inwhichcasetheyarecoercedintonumeric
vectors,FALSEbecoming0andTRUEbecoming1.Howevertherearesituationswherelogical
vectorsandtheircoercednumericcounterpartsarenotequivalent,forexampleseethenext
subsection.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
14/116
5/28/2015
AnIntroductiontoR
Next:Charactervectors,Previous:Logicalvectors,Up:Simplemanipulationsnumbersand
vectors[Contents][Index]
2.5Missingvalues
Insomecasesthecomponentsofavectormaynotbecompletelyknown.Whenanelementor
valueisnotavailableoramissingvalueinthestatisticalsense,aplacewithinavectormay
bereservedforitbyassigningitthespecialvalueNA.IngeneralanyoperationonanNAbecomes
anNA.Themotivationforthisruleissimplythatifthespecificationofanoperationis
incomplete,theresultcannotbeknownandhenceisnotavailable.
Thefunctionis.na(x)givesalogicalvectorofthesamesizeasxwithvalueTRUEifandonlyif
thecorrespondingelementinxisNA.
>z<c(1:3,NA);ind<is.na(z)
Noticethatthelogicalexpressionx==NAisquitedifferentfromis.na(x)sinceNAisnotreallya
valuebutamarkerforaquantitythatisnotavailable.Thusx==NAisavectorofthesame
lengthasxallofwhosevaluesareNAasthelogicalexpressionitselfisincompleteandhence
undecidable.
Notethatthereisasecondkindofmissingvalueswhichareproducedbynumerical
computation,thesocalledNotaNumber,NaN,values.Examplesare
>0/0
or
>InfInf
whichbothgiveNaNsincetheresultcannotbedefinedsensibly.
Insummary,is.na(xx)isTRUEbothforNAandNaNvalues.Todifferentiatethese,is.nan(xx)is
onlyTRUEforNaNs.
Missingvaluesaresometimesprintedas<NA>whencharactervectorsareprintedwithoutquotes.
Next:Indexvectors,Previous:Missingvalues,Up:Simplemanipulationsnumbersandvectors
[Contents][Index]
2.6Charactervectors
CharacterquantitiesandcharactervectorsareusedfrequentlyinR,forexampleasplotlabels.
Whereneededtheyaredenotedbyasequenceofcharactersdelimitedbythedoublequote
character,e.g.,"xvalues","Newiterationresults".
Characterstringsareenteredusingeithermatchingdouble(")orsingle(')quotes,butare
printedusingdoublequotes(orsometimeswithoutquotes).TheyuseCstyleescapesequences,
using\astheescapecharacter,so\\isenteredandprintedas\\,andinsidedoublequotes"is
enteredas\".Otherusefulescapesequencesare\n,newline,\t,taband\b,backspacesee?
Quotesforafulllist.
Charactervectorsmaybeconcatenatedintoavectorbythec()functionexamplesoftheiruse
willemergefrequently.
Thepaste()functiontakesanarbitrarynumberofargumentsandconcatenatesthemonebyone
intocharacterstrings.Anynumbersgivenamongtheargumentsarecoercedintocharacter
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
15/116
5/28/2015
AnIntroductiontoR
stringsintheevidentway,thatis,inthesamewaytheywouldbeiftheywereprinted.The
argumentsarebydefaultseparatedintheresultbyasingleblankcharacter,butthiscanbe
changedbythenamedargument,sep=string,whichchangesittostring,possiblyempty.
Forexample
>labs<paste(c("X","Y"),1:10,sep="")
makeslabsintothecharactervector
c("X1","Y2","X3","Y4","X5","Y6","X7","Y8","X9","Y10")
Noteparticularlythatrecyclingofshortliststakesplaceheretoothusc("X","Y")isrepeated5
timestomatchthesequence1:10.9
Next:Othertypesofobjects,Previous:Charactervectors,Up:Simplemanipulationsnumbers
andvectors[Contents][Index]
2.7Indexvectors;selectingandmodifyingsubsetsofadataset
Subsetsoftheelementsofavectormaybeselectedbyappendingtothenameofthevectoran
indexvectorinsquarebrackets.Moregenerallyanyexpressionthatevaluatestoavectormay
havesubsetsofitselementssimilarlyselectedbyappendinganindexvectorinsquarebrackets
immediatelyaftertheexpression.
Suchindexvectorscanbeanyoffourdistincttypes.
1. Alogicalvector.Inthiscasetheindexvectorisrecycledtothesamelengthasthevector
fromwhichelementsaretobeselected.ValuescorrespondingtoTRUEintheindexvector
areselectedandthosecorrespondingtoFALSEareomitted.Forexample
>y<x[!is.na(x)]
creates(orrecreates)anobjectywhichwillcontainthenonmissingvaluesofx,inthe
sameorder.Notethatifxhasmissingvalues,ywillbeshorterthanx.Also
>(x+1)[(!is.na(x))&x>0]>z
createsanobjectzandplacesinitthevaluesofthevectorx+1forwhichthecorresponding
valueinxwasbothnonmissingandpositive.
2. Avectorofpositiveintegralquantities.Inthiscasethevaluesintheindexvectormust
lieintheset{1,2,,length(x)}.Thecorrespondingelementsofthevectorareselected
andconcatenated,inthatorder,intheresult.Theindexvectorcanbeofanylengthand
theresultisofthesamelengthastheindexvector.Forexamplex[6]isthesixth
componentofxand
>x[1:10]
selectsthefirst10elementsofx(assuminglength(x)isnotlessthan10).Also
>c("x","y")[rep(c(1,2,2,1),times=4)]
(anadmittedlyunlikelythingtodo)producesacharactervectoroflength16consistingof
"x","y","y","x"repeatedfourtimes.
3. Avectorofnegativeintegralquantities.Suchanindexvectorspecifiesthevaluestobe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
16/116
5/28/2015
AnIntroductiontoR
excludedratherthanincluded.Thus
>y<x[(1:5)]
givesyallbutthefirstfiveelementsofx.
4. Avectorofcharacterstrings.Thispossibilityonlyapplieswhereanobjecthasanames
attributetoidentifyitscomponents.Inthiscaseasubvectorofthenamesvectormaybe
usedinthesamewayasthepositiveintegrallabelsinitem2furtherabove.
>fruit<c(5,10,1,20)
>names(fruit)<c("orange","banana","apple","peach")
>lunch<fruit[c("apple","orange")]
Theadvantageisthatalphanumericnamesareofteneasiertorememberthannumeric
indices.Thisoptionisparticularlyusefulinconnectionwithdataframes,asweshallsee
later.
Anindexedexpressioncanalsoappearonthereceivingendofanassignment,inwhichcasethe
assignmentoperationisperformedonlyonthoseelementsofthevector.Theexpressionmustbe
oftheformvector[index_vector]ashavinganarbitraryexpressioninplaceofthevectorname
doesnotmakemuchsensehere.
Forexample
>x[is.na(x)]<0
replacesanymissingvaluesinxbyzerosand
>y[y<0]<y[y<0]
hasthesameeffectas
>y<abs(y)
Previous:Indexvectors,Up:Simplemanipulationsnumbersandvectors[Contents][Index]
2.8Othertypesofobjects
VectorsarethemostimportanttypeofobjectinR,butthereareseveralotherswhichwewill
meetmoreformallyinlatersections.
matricesormoregenerallyarraysaremultidimensionalgeneralizationsofvectors.In
fact,theyarevectorsthatcanbeindexedbytwoormoreindicesandwillbeprintedin
specialways.SeeArraysandmatrices.
factorsprovidecompactwaystohandlecategoricaldata.SeeFactors.
listsareageneralformofvectorinwhichthevariouselementsneednotbeofthesame
type,andareoftenthemselvesvectorsorlists.Listsprovideaconvenientwaytoreturnthe
resultsofastatisticalcomputation.SeeLists.
dataframesarematrixlikestructures,inwhichthecolumnscanbeofdifferenttypes.
Thinkofdataframesasdatamatriceswithonerowperobservationalunitbutwith
(possibly)bothnumericalandcategoricalvariables.Manyexperimentsarebestdescribed
bydataframes:thetreatmentsarecategoricalbuttheresponseisnumeric.SeeData
frames.
functionsarethemselvesobjectsinRwhichcanbestoredintheprojectsworkspace.This
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
17/116
5/28/2015
AnIntroductiontoR
providesasimpleandconvenientwaytoextendR.SeeWritingyourownfunctions.
Next:Factors,Previous:Simplemanipulationsnumbersandvectors,Up:Top[Contents]
[Index]
3Objects,theirmodesandattributes
Theintrinsicattributesmodeandlength:
Changingthelengthofanobject:
Gettingandsettingattributes:
Theclassofanobject:
Next:Changingthelengthofanobject,Previous:Objects,Up:Objects[Contents][Index]
3.1Intrinsicattributes:modeandlength
TheentitiesRoperatesonaretechnicallyknownasobjects.Examplesarevectorsofnumeric
(real)orcomplexvalues,vectorsoflogicalvaluesandvectorsofcharacterstrings.Theseare
knownasatomicstructuressincetheircomponentsareallofthesametype,ormode,namely
numeric10,complex,logical,characterandraw.
Vectorsmusthavetheirvaluesallofthesamemode.Thusanygivenvectormustbe
unambiguouslyeitherlogical,numeric,complex,characterorraw.(Theonlyapparentexception
tothisruleisthespecialvaluelistedasNAforquantitiesnotavailable,butinfactthereare
severaltypesofNA).Notethatavectorcanbeemptyandstillhaveamode.Forexamplethe
emptycharacterstringvectorislistedascharacter(0)andtheemptynumericvectoras
numeric(0).
Ralsooperatesonobjectscalledlists,whichareofmodelist.Theseareorderedsequencesof
objectswhichindividuallycanbeofanymode.listsareknownasrecursiveratherthanatomic
structuressincetheircomponentscanthemselvesbelistsintheirownright.
Theotherrecursivestructuresarethoseofmodefunctionandexpression.Functionsarethe
objectsthatformpartoftheRsystemalongwithsimilaruserwrittenfunctions,whichwe
discussinsomedetaillater.ExpressionsasobjectsformanadvancedpartofRwhichwillnotbe
discussedinthisguide,exceptindirectlywhenwediscussformulaeusedwithmodelinginR.
Bythemodeofanobjectwemeanthebasictypeofitsfundamentalconstituents.Thisisa
specialcaseofapropertyofanobject.Anotherpropertyofeveryobjectisitslength.The
functionsmode(object)andlength(object)canbeusedtofindoutthemodeandlengthofany
definedstructure11.
Furtherpropertiesofanobjectareusuallyprovidedbyattributes(object),seeGettingand
settingattributes.Becauseofthis,modeandlengtharealsocalledintrinsicattributesofan
object.
Forexample,ifzisacomplexvectoroflength100,theninanexpressionmode(z)isthe
characterstring"complex"andlength(z)is100.
Rcatersforchangesofmodealmostanywhereitcouldbeconsideredsensibletodoso,(anda
fewwhereitmightnotbe).Forexamplewith
>z<0:9
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
18/116
5/28/2015
AnIntroductiontoR
wecouldput
>digits<as.character(z)
afterwhichdigitsisthecharactervectorc("0","1","2",,"9").Afurthercoercion,or
changeofmode,reconstructsthenumericalvectoragain:
>d<as.integer(digits)
Nowdandzarethesame.12Thereisalargecollectionoffunctionsoftheformas.something()
foreithercoercionfromonemodetoanother,orforinvestinganobjectwithsomeotherattribute
itmaynotalreadypossess.Thereadershouldconsultthedifferenthelpfilestobecomefamiliar
withthem.
Next:Gettingandsettingattributes,Previous:Theintrinsicattributesmodeandlength,Up:
Objects[Contents][Index]
3.2Changingthelengthofanobject
Anemptyobjectmaystillhaveamode.Forexample
>e<numeric()
makeseanemptyvectorstructureofmodenumeric.Similarlycharacter()isaemptycharacter
vector,andsoon.Onceanobjectofanysizehasbeencreated,newcomponentsmaybeaddedto
itsimplybygivingitanindexvalueoutsideitspreviousrange.Thus
>e[3]<17
nowmakeseavectoroflength3,(thefirsttwocomponentsofwhichareatthispointbothNA).
Thisappliestoanystructureatall,providedthemodeoftheadditionalcomponent(s)agreeswith
themodeoftheobjectinthefirstplace.
Thisautomaticadjustmentoflengthsofanobjectisusedoften,forexampleinthescan()
functionforinput.(seeThescan()function.)
Converselytotruncatethesizeofanobjectrequiresonlyanassignmenttodoso.Henceifalpha
isanobjectoflength10,then
>alpha<alpha[2*1:5]
makesitanobjectoflength5consistingofjusttheformercomponentswithevenindex.(The
oldindicesarenotretained,ofcourse.)Wecanthenretainjustthefirstthreevaluesby
>length(alpha)<3
andvectorscanbeextended(bymissingvalues)inthesameway.
Next:Theclassofanobject,Previous:Changingthelengthofanobject,Up:Objects
[Contents][Index]
3.3Gettingandsettingattributes
Thefunctionattributes(object)returnsalistofallthenonintrinsicattributescurrentlydefined
forthatobject.Thefunctionattr(object,name)canbeusedtoselectaspecificattribute.These
functionsarerarelyused,exceptinratherspecialcircumstanceswhensomenewattributeis
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
19/116
5/28/2015
AnIntroductiontoR
beingcreatedforsomeparticularpurpose,forexampletoassociateacreationdateoranoperator
withanRobject.Theconcept,however,isveryimportant.
Somecareshouldbeexercisedwhenassigningordeletingattributessincetheyareanintegral
partoftheobjectsystemusedinR.
Whenitisusedonthelefthandsideofanassignmentitcanbeusedeithertoassociateanew
attributewithobjectortochangeanexistingone.Forexample
>attr(z,"dim")<c(10,10)
allowsRtotreatzasifitwerea10by10matrix.
Previous:Gettingandsettingattributes,Up:Objects[Contents][Index]
3.4Theclassofanobject
AllobjectsinRhaveaclass,reportedbythefunctionclass.Forsimplevectorsthisisjustthe
mode,forexample"numeric","logical","character"or"list",but"matrix","array",
"factor"and"data.frame"areotherpossiblevalues.
Aspecialattributeknownastheclassoftheobjectisusedtoallowforanobjectorientedstyle13
ofprogramminginR.Forexampleifanobjecthasclass"data.frame",itwillbeprintedina
certainway,theplot()functionwilldisplayitgraphicallyinacertainway,andothersocalled
genericfunctionssuchassummary()willreacttoitasanargumentinawaysensitivetoitsclass.
Toremovetemporarilytheeffectsofclass,usethefunctionunclass().Forexampleifwinter
hastheclass"data.frame"then
>winter
willprintitindataframeform,whichisratherlikeamatrix,whereas
>unclass(winter)
willprintitasanordinarylist.Onlyinratherspecialsituationsdoyouneedtousethisfacility,
butoneiswhenyouarelearningtocometotermswiththeideaofclassandgenericfunctions.
GenericfunctionsandclasseswillbediscussedfurtherinObjectorientation,butonlybriefly.
Next:Arraysandmatrices,Previous:Objects,Up:Top[Contents][Index]
4Orderedandunorderedfactors
Afactorisavectorobjectusedtospecifyadiscreteclassification(grouping)ofthecomponents
ofothervectorsofthesamelength.Rprovidesbothorderedandunorderedfactors.Whilethe
realapplicationoffactorsiswithmodelformulae(seeContrasts),weherelookataspecific
example.
4.1Aspecificexample
Suppose,forexample,wehaveasampleof30taxaccountantsfromallthestatesandterritories
ofAustralia14andtheirindividualstateoforiginisspecifiedbyacharactervectorofstate
mnemonicsas
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
20/116
5/28/2015
AnIntroductiontoR
>state<c("tas","sa","qld","nsw","nsw","nt","wa","wa",
"qld","vic","nsw","vic","qld","qld","sa","tas",
"sa","nt","wa","vic","qld","nsw","nsw","wa",
"sa","act","nsw","vic","vic","act")
Noticethatinthecaseofacharactervector,sortedmeanssortedinalphabeticalorder.
Afactorissimilarlycreatedusingthefactor()function:
>statef<factor(state)
Theprint()functionhandlesfactorsslightlydifferentlyfromotherobjects:
>statef
[1]tassaqldnswnswntwawaqldvicnswvicqldqldsa
[16]tassantwavicqldnswnswwasaactnswvicvicact
Levels:actnswntqldsatasvicwa
Tofindoutthelevelsofafactorthefunctionlevels()canbeused.
>levels(statef)
[1]"act""nsw""nt""qld""sa""tas""vic""wa"
Thefunctiontapply()andraggedarrays:
Orderedfactors:
Next:Orderedfactors,Previous:Factors,Up:Factors[Contents][Index]
4.2Thefunctiontapply()andraggedarrays
Tocontinuethepreviousexample,supposewehavetheincomesofthesametaxaccountantsin
anothervector(insuitablylargeunitsofmoney)
>incomes<c(60,49,40,61,64,60,59,54,62,69,70,42,56,
61,61,61,58,51,48,65,49,49,41,48,52,46,
59,46,58,43)
Tocalculatethesamplemeanincomeforeachstatewecannowusethespecialfunction
tapply():
>incmeans<tapply(incomes,statef,mean)
givingameansvectorwiththecomponentslabelledbythelevels
actnswntqldsatasvicwa
44.50057.33355.50053.60055.00060.50056.00052.250
Thefunctiontapply()isusedtoapplyafunction,heremean(),toeachgroupofcomponentsof
thefirstargument,hereincomes,definedbythelevelsofthesecondcomponent,herestatef15,
asiftheywereseparatevectorstructures.Theresultisastructureofthesamelengthasthelevels
attributeofthefactorcontainingtheresults.Thereadershouldconsultthehelpdocumentfor
moredetails.
Supposefurtherweneededtocalculatethestandarderrorsofthestateincomemeans.Todothis
weneedtowriteanRfunctiontocalculatethestandarderrorforanygivenvector.Sincethereis
anbuiltinfunctionvar()tocalculatethesamplevariance,suchafunctionisaverysimpleone
liner,specifiedbytheassignment:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
21/116
5/28/2015
AnIntroductiontoR
>stderr<function(x)sqrt(var(x)/length(x))
(WritingfunctionswillbeconsideredlaterinWritingyourownfunctions,andinthiscasewas
unnecessaryasRalsohasabuiltinfunctionsd().)Afterthisassignment,thestandarderrorsare
calculatedby
>incster<tapply(incomes,statef,stderr)
andthevaluescalculatedarethen
>incster
actnswntqldsatasvicwa
1.54.31024.54.10612.73860.55.2442.6575
Asanexerciseyoumaycaretofindtheusual95%confidencelimitsforthestatemeanincomes.
Todothisyoucouldusetapply()oncemorewiththelength()functiontofindthesamplesizes,
andtheqt()functiontofindthepercentagepointsoftheappropriatetdistributions.(Youcould
alsoinvestigateRsfacilitiesforttests.)
Thefunctiontapply()canalsobeusedtohandlemorecomplicatedindexingofavectorby
multiplecategories.Forexample,wemightwishtosplitthetaxaccountantsbybothstateand
sex.Howeverinthissimpleinstance(justonefactor)whathappenscanbethoughtofasfollows.
Thevaluesinthevectorarecollectedintogroupscorrespondingtothedistinctentriesinthe
factor.Thefunctionisthenappliedtoeachofthesegroupsindividually.Thevalueisavectorof
functionresults,labelledbythelevelsattributeofthefactor.
Thecombinationofavectorandalabellingfactorisanexampleofwhatissometimescalleda
raggedarray,sincethesubclasssizesarepossiblyirregular.Whenthesubclasssizesareallthe
sametheindexingmaybedoneimplicitlyandmuchmoreefficiently,asweseeinthenext
section.
Previous:Thefunctiontapply()andraggedarrays,Up:Factors[Contents][Index]
4.3Orderedfactors
Thelevelsoffactorsarestoredinalphabeticalorder,orintheordertheywerespecifiedto
factoriftheywerespecifiedexplicitly.
Sometimesthelevelswillhaveanaturalorderingthatwewanttorecordandwantourstatistical
analysistomakeuseof.Theordered()functioncreatessuchorderedfactorsbutisotherwise
identicaltofactor.Formostpurposestheonlydifferencebetweenorderedandunorderedfactors
isthattheformerareprintedshowingtheorderingofthelevels,butthecontrastsgeneratedfor
theminfittinglinearmodelsaredifferent.
Next:Listsanddataframes,Previous:Factors,Up:Top[Contents][Index]
5Arraysandmatrices
Arrays:
Arrayindexing:
Indexmatrices:
Thearray()function:
Theouterproductoftwoarrays:
Generalizedtransposeofanarray:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
22/116
5/28/2015
AnIntroductiontoR
Matrixfacilities:
Formingpartitionedmatrices:
Theconcatenationfunctionc()witharrays:
Frequencytablesfromfactors:
Next:Arrayindexing,Previous:Arraysandmatrices,Up:Arraysandmatrices[Contents]
[Index]
5.1Arrays
Anarraycanbeconsideredasamultiplysubscriptedcollectionofdataentries,forexample
numeric.Rallowssimplefacilitiesforcreatingandhandlingarrays,andinparticularthespecial
caseofmatrices.
Adimensionvectorisavectorofnonnegativeintegers.Ifitslengthiskthenthearrayisk
dimensional,e.g.amatrixisa2dimensionalarray.Thedimensionsareindexedfromoneupto
thevaluesgiveninthedimensionvector.
AvectorcanbeusedbyRasanarrayonlyifithasadimensionvectorasitsdimattribute.
Suppose,forexample,zisavectorof1500elements.Theassignment
>dim(z)<c(3,5,100)
givesitthedimattributethatallowsittobetreatedasa3by5by100array.
Otherfunctionssuchasmatrix()andarray()areavailableforsimplerandmorenaturallooking
assignments,asweshallseeinThearray()function.
Thevaluesinthedatavectorgivethevaluesinthearrayinthesameorderastheywouldoccur
inFORTRAN,thatiscolumnmajororder,withthefirstsubscriptmovingfastestandthelast
subscriptslowest.
Forexampleifthedimensionvectorforanarray,saya,isc(3,4,2)thenthereare3*4*2=24
entriesinaandthedatavectorholdsthemintheordera[1,1,1],a[2,1,1],,a[2,4,2],
a[3,4,2].
Arrayscanbeonedimensional:sucharraysareusuallytreatedinthesamewayasvectors
(includingwhenprinting),buttheexceptionscancauseconfusion.
Next:Indexmatrices,Previous:Arrays,Up:Arraysandmatrices[Contents][Index]
5.2Arrayindexing.Subsectionsofanarray
Individualelementsofanarraymaybereferencedbygivingthenameofthearrayfollowedby
thesubscriptsinsquarebrackets,separatedbycommas.
Moregenerally,subsectionsofanarraymaybespecifiedbygivingasequenceofindexvectors
inplaceofsubscriptshoweverifanyindexpositionisgivenanemptyindexvector,thenthefull
rangeofthatsubscriptistaken.
Continuingthepreviousexample,a[2,,]isa4*2arraywithdimensionvectorc(4,2)anddata
vectorcontainingthevalues
c(a[2,1,1],a[2,2,1],a[2,3,1],a[2,4,1],
a[2,1,2],a[2,2,2],a[2,3,2],a[2,4,2])
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
23/116
5/28/2015
AnIntroductiontoR
inthatorder.a[,,]standsfortheentirearray,whichisthesameasomittingthesubscripts
entirelyandusingaalone.
Foranyarray,sayZ,thedimensionvectormaybereferencedexplicitlyasdim(Z)(oneitherside
ofanassignment).
Also,ifanarraynameisgivenwithjustonesubscriptorindexvector,thenthecorresponding
valuesofthedatavectoronlyareusedinthiscasethedimensionvectorisignored.Thisisnot
thecase,however,ifthesingleindexisnotavectorbutitselfanarray,aswenextdiscuss.
Indexmatrices:
Thearray()function:
Next:Thearray()function,Previous:Arrayindexing,Up:Arraysandmatrices[Contents]
[Index]
5.3Indexmatrices
Aswellasanindexvectorinanysubscriptposition,amatrixmaybeusedwithasingleindex
matrixinordereithertoassignavectorofquantitiestoanirregularcollectionofelementsinthe
array,ortoextractanirregularcollectionasavector.
Amatrixexamplemakestheprocessclear.Inthecaseofadoublyindexedarray,anindex
matrixmaybegivenconsistingoftwocolumnsandasmanyrowsasdesired.Theentriesinthe
indexmatrixaretherowandcolumnindicesforthedoublyindexedarray.Supposeforexample
wehavea4by5arrayXandwewishtodothefollowing:
ExtractelementsX[1,3],X[2,2]andX[3,1]asavectorstructure,and
ReplacetheseentriesinthearrayXbyzeroes.
Inthiscaseweneeda3by2subscriptarray,asinthefollowingexample.
>x<array(1:20,dim=c(4,5))#Generatea4by5array.
>x
[,1][,2][,3][,4][,5]
[1,]1591317
[2,]26101418
[3,]37111519
[4,]48121620
>i<array(c(1:3,3:1),dim=c(3,2))
>i#iisa3by2indexarray.
[,1][,2]
[1,]13
[2,]22
[3,]31
>x[i]#Extractthoseelements
[1]963
>x[i]<0#Replacethoseelementsbyzeros.
>x
[,1][,2][,3][,4][,5]
[1,]1501317
[2,]20101418
[3,]07111519
[4,]48121620
>
Negativeindicesarenotallowedinindexmatrices.NAandzerovaluesareallowed:rowsinthe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
24/116
5/28/2015
AnIntroductiontoR
indexmatrixcontainingazeroareignored,androwscontaininganNAproduceanNAintheresult.
Asalesstrivialexample,supposewewishtogeneratean(unreduced)designmatrixforablock
designdefinedbyfactorsblocks(blevels)andvarieties(vlevels).Furthersupposetherearen
plotsintheexperiment.Wecouldproceedasfollows:
>Xb<matrix(0,n,b)
>Xv<matrix(0,n,v)
>ib<cbind(1:n,blocks)
>iv<cbind(1:n,varieties)
>Xb[ib]<1
>Xv[iv]<1
>X<cbind(Xb,Xv)
Toconstructtheincidencematrix,Nsay,wecoulduse
>N<crossprod(Xb,Xv)
Howeverasimplerdirectwayofproducingthismatrixistousetable():
>N<table(blocks,varieties)
Indexmatricesmustbenumerical:anyotherformofmatrix(e.g.alogicalorcharactermatrix)
suppliedasamatrixistreatedasanindexingvector.
Next:Theouterproductoftwoarrays,Previous:Indexmatrices,Up:Arraysandmatrices
[Contents][Index]
5.4Thearray()function
Aswellasgivingavectorstructureadimattribute,arrayscanbeconstructedfromvectorsbythe
arrayfunction,whichhastheform
>Z<array(data_vector,dim_vector)
Forexample,ifthevectorhcontains24orfewer,numbersthenthecommand
>Z<array(h,dim=c(3,4,2))
wouldusehtosetup3by4by2arrayinZ.Ifthesizeofhisexactly24theresultisthesameas
>Z<h;dim(Z)<c(3,4,2)
Howeverifhisshorterthan24,itsvaluesarerecycledfromthebeginningagaintomakeitupto
size24(seeTherecyclingrule)butdim(h)<c(3,4,2)wouldsignalanerrorabout
mismatchinglength.Asanextremebutcommonexample
>Z<array(0,c(3,4,2))
makesZanarrayofallzeros.
Atthispointdim(Z)standsforthedimensionvectorc(3,4,2),andZ[1:24]standsforthedata
vectorasitwasinh,andZ[]withanemptysubscriptorZwithnosubscriptstandsfortheentire
arrayasanarray.
Arraysmaybeusedinarithmeticexpressionsandtheresultisanarrayformedbyelementby
elementoperationsonthedatavector.Thedimattributesofoperandsgenerallyneedtobethe
same,andthisbecomesthedimensionvectoroftheresult.SoifA,BandCareallsimilararrays,
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
25/116
5/28/2015
AnIntroductiontoR
then
>D<2*A*B+C+1
makesDasimilararraywithitsdatavectorbeingtheresultofthegivenelementbyelement
operations.Howeverthepreciseruleconcerningmixedarrayandvectorcalculationshastobe
consideredalittlemorecarefully.
Therecyclingrule:
Previous:Thearray()function,Up:Thearray()function[Contents][Index]
5.4.1Mixedvectorandarrayarithmetic.Therecyclingrule
Thepreciseruleaffectingelementbyelementmixedcalculationswithvectorsandarraysis
somewhatquirkyandhardtofindinthereferences.Fromexperiencewehavefoundthe
followingtobeareliableguide.
Theexpressionisscannedfromlefttoright.
Anyshortvectoroperandsareextendedbyrecyclingtheirvaluesuntiltheymatchthesize
ofanyotheroperands.
Aslongasshortvectorsandarraysonlyareencountered,thearraysmustallhavethesame
dimattributeoranerrorresults.
Anyvectoroperandlongerthanamatrixorarrayoperandgeneratesanerror.
Ifarraystructuresarepresentandnoerrororcoerciontovectorhasbeenprecipitated,the
resultisanarraystructurewiththecommondimattributeofitsarrayoperands.
Next:Generalizedtransposeofanarray,Previous:Thearray()function,Up:Arraysandmatrices
[Contents][Index]
5.5Theouterproductoftwoarrays
Animportantoperationonarraysistheouterproduct.Ifaandbaretwonumericarrays,their
outerproductisanarraywhosedimensionvectorisobtainedbyconcatenatingtheirtwo
dimensionvectors(orderisimportant),andwhosedatavectorisgotbyformingallpossible
productsofelementsofthedatavectorofawiththoseofb.Theouterproductisformedbythe
specialoperator%o%:
>ab<a%o%b
Analternativeis
>ab<outer(a,b,"*")
Themultiplicationfunctioncanbereplacedbyanarbitraryfunctionoftwovariables.For
exampleifwewishedtoevaluatethefunctionf(xy)=cos(y)/(1+x^2)overaregulargridof
valueswithxandycoordinatesdefinedbytheRvectorsxandyrespectively,wecouldproceed
asfollows:
>f<function(x,y)cos(y)/(1+x^2)
>z<outer(x,y,f)
Inparticulartheouterproductoftwoordinaryvectorsisadoublysubscriptedarray(thatisa
matrix,ofrankatmost1).Noticethattheouterproductoperatorisofcoursenoncommutative.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
26/116
5/28/2015
AnIntroductiontoR
DefiningyourownRfunctionswillbeconsideredfurtherinWritingyourownfunctions.
Anexample:Determinantsof2by2singledigitmatrices
Asanartificialbutcuteexample,considerthedeterminantsof2by2matrices[a,bc,d]where
eachentryisanonnegativeintegerintherange0,1,,9,thatisadigit.
Theproblemistofindthedeterminants,adbc,ofallpossiblematricesofthisformand
representthefrequencywithwhicheachvalueoccursasahighdensityplot.Thisamountsto
findingtheprobabilitydistributionofthedeterminantifeachdigitischosenindependentlyand
uniformlyatrandom.
Aneatwayofdoingthisusestheouter()functiontwice:
>d<outer(0:9,0:9)
>fr<table(outer(d,d,""))
>plot(as.numeric(names(fr)),fr,type="h",
xlab="Determinant",ylab="Frequency")
Noticethecoercionofthenamesattributeofthefrequencytabletonumericinordertorecover
therangeofthedeterminantvalues.Theobviouswayofdoingthisproblemwithforloops,to
bediscussedinLoopsandconditionalexecution,issoinefficientastobeimpractical.
Itisalsoperhapssurprisingthatabout1in20suchmatricesissingular.
Next:Matrixfacilities,Previous:Theouterproductoftwoarrays,Up:Arraysandmatrices
[Contents][Index]
5.6Generalizedtransposeofanarray
Thefunctionaperm(a,perm)maybeusedtopermuteanarray,a.Theargumentpermmustbea
permutationoftheintegers{1,,k},wherekisthenumberofsubscriptsina.Theresultofthe
functionisanarrayofthesamesizeasabutwitholddimensiongivenbyperm[j]becomingthe
newjthdimension.Theeasiestwaytothinkofthisoperationisasageneralizationof
transpositionformatrices.IndeedifAisamatrix,(thatis,adoublysubscriptedarray)thenB
givenby
>B<aperm(A,c(2,1))
isjustthetransposeofA.Forthisspecialcaseasimplerfunctiont()isavailable,sowecould
haveusedB<t(A).
Next:Formingpartitionedmatrices,Previous:Generalizedtransposeofanarray,Up:Arraysand
matrices[Contents][Index]
5.7Matrixfacilities
Asnotedabove,amatrixisjustanarraywithtwosubscripts.Howeveritissuchanimportant
specialcaseitneedsaseparatediscussion.Rcontainsmanyoperatorsandfunctionsthatare
availableonlyformatrices.Forexamplet(X)isthematrixtransposefunction,asnotedabove.
Thefunctionsnrow(A)andncol(A)givethenumberofrowsandcolumnsinthematrixA
respectively.
Multiplication:
Linearequationsandinversion:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
27/116
5/28/2015
AnIntroductiontoR
Eigenvaluesandeigenvectors:
Singularvaluedecompositionanddeterminants:
LeastsquaresfittingandtheQRdecomposition:
Next:Linearequationsandinversion,Previous:Matrixfacilities,Up:Matrixfacilities
[Contents][Index]
5.7.1Matrixmultiplication
Theoperator%*%isusedformatrixmultiplication.Annby1or1bynmatrixmayofcoursebe
usedasannvectorifinthecontextsuchisappropriate.Conversely,vectorswhichoccurin
matrixmultiplicationexpressionsareautomaticallypromotedeithertoroworcolumnvectors,
whicheverismultiplicativelycoherent,ifpossible,(althoughthisisnotalwaysunambiguously
possible,asweseelater).
If,forexample,AandBaresquarematricesofthesamesize,then
>A*B
isthematrixofelementbyelementproductsand
>A%*%B
isthematrixproduct.Ifxisavector,then
>x%*%A%*%x
isaquadraticform.16
Thefunctioncrossprod()formscrossproducts,meaningthatcrossprod(X,y)isthesameas
t(X)%*%ybuttheoperationismoreefficient.Ifthesecondargumenttocrossprod()isomitted
itistakentobethesameasthefirst.
Themeaningofdiag()dependsonitsargument.diag(v),wherevisavector,givesadiagonal
matrixwithelementsofthevectorasthediagonalentries.Ontheotherhanddiag(M),whereMis
amatrix,givesthevectorofmaindiagonalentriesofM.Thisisthesameconventionasthatused
fordiag()inMATLAB.Also,somewhatconfusingly,ifkisasinglenumericvaluethendiag(k)
isthekbykidentitymatrix!
Next:Eigenvaluesandeigenvectors,Previous:Multiplication,Up:Matrixfacilities[Contents]
[Index]
5.7.2Linearequationsandinversion
Solvinglinearequationsistheinverseofmatrixmultiplication.Whenafter
>b<A%*%x
onlyAandbaregiven,thevectorxisthesolutionofthatlinearequationsystem.InR,
>solve(A,b)
solvesthesystem,returningx(uptosomeaccuracyloss).Notethatinlinearalgebra,formallyx
=A^{1}%*%bwhereA^{1}denotestheinverseofA,whichcanbecomputedby
solve(A)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
28/116
5/28/2015
AnIntroductiontoR
butrarelyisneeded.Numerically,itisbothinefficientandpotentiallyunstabletocomputex<
solve(A)%*%binsteadofsolve(A,b).
Thequadraticformx%*%A^{1}%*%xwhichisusedinmultivariatecomputations,shouldbe
computedbysomethinglike17x%*%solve(A,x),ratherthancomputingtheinverseofA.
Next:Singularvaluedecompositionanddeterminants,Previous:Linearequationsandinversion,
Up:Matrixfacilities[Contents][Index]
5.7.3Eigenvaluesandeigenvectors
Thefunctioneigen(Sm)calculatestheeigenvaluesandeigenvectorsofasymmetricmatrixSm.
Theresultofthisfunctionisalistoftwocomponentsnamedvaluesandvectors.The
assignment
>ev<eigen(Sm)
willassignthislisttoev.Thenev$valisthevectorofeigenvaluesofSmandev$vecisthematrix
ofcorrespondingeigenvectors.Hadweonlyneededtheeigenvalueswecouldhaveusedthe
assignment:
>evals<eigen(Sm)$values
evalsnowholdsthevectorofeigenvaluesandthesecondcomponentisdiscarded.Ifthe
expression
>eigen(Sm)
isusedbyitselfasacommandthetwocomponentsareprinted,withtheirnames.Forlarge
matricesitisbettertoavoidcomputingtheeigenvectorsiftheyarenotneededbyusingthe
expression
>evals<eigen(Sm,only.values=TRUE)$values
Next:LeastsquaresfittingandtheQRdecomposition,Previous:Eigenvaluesandeigenvectors,
Up:Matrixfacilities[Contents][Index]
5.7.4Singularvaluedecompositionanddeterminants
Thefunctionsvd(M)takesanarbitrarymatrixargument,M,andcalculatesthesingularvalue
decompositionofM.ThisconsistsofamatrixoforthonormalcolumnsUwiththesamecolumn
spaceasM,asecondmatrixoforthonormalcolumnsVwhosecolumnspaceistherowspaceofM
andadiagonalmatrixofpositiveentriesDsuchthatM=U%*%D%*%t(V).Disactuallyreturned
asavectorofthediagonalelements.Theresultofsvd(M)isactuallyalistofthreecomponents
namedd,uandv,withevidentmeanings.
IfMisinfactsquare,then,itisnothardtoseethat
>absdetM<prod(svd(M)$d)
calculatestheabsolutevalueofthedeterminantofM.Ifthiscalculationwereneededoftenwitha
varietyofmatricesitcouldbedefinedasanRfunction
>absdet<function(M)prod(svd(M)$d)
afterwhichwecoulduseabsdet()asjustanotherRfunction.Asafurthertrivialbutpotentially
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
29/116
5/28/2015
AnIntroductiontoR
usefulexample,youmightliketoconsiderwritingafunction,saytr(),tocalculatethetraceofa
squarematrix.[Hint:Youwillnotneedtouseanexplicitloop.Lookagainatthediag()
function.]
Rhasabuiltinfunctiondettocalculateadeterminant,includingthesign,andanother,
determinant,togivethesignandmodulus(optionallyonlogscale),
Previous:Singularvaluedecompositionanddeterminants,Up:Matrixfacilities[Contents]
[Index]
5.7.5LeastsquaresfittingandtheQRdecomposition
Thefunctionlsfit()returnsalistgivingresultsofaleastsquaresfittingprocedure.An
assignmentsuchas
>ans<lsfit(X,y)
givestheresultsofaleastsquaresfitwhereyisthevectorofobservationsandXisthedesign
matrix.Seethehelpfacilityformoredetails,andalsoforthefollowupfunctionls.diag()for,
amongotherthings,regressiondiagnostics.Notethatagrandmeantermisautomatically
includedandneednotbeincludedexplicitlyasacolumnofX.Furthernotethatyoualmost
alwayswillpreferusinglm(.)(seeLinearmodels)tolsfit()forregressionmodelling.
Anothercloselyrelatedfunctionisqr()anditsallies.Considerthefollowingassignments
>Xplus<qr(X)
>b<qr.coef(Xplus,y)
>fit<qr.fitted(Xplus,y)
>res<qr.resid(Xplus,y)
ThesecomputetheorthogonalprojectionofyontotherangeofXinfit,theprojectionontothe
orthogonalcomplementinresandthecoefficientvectorfortheprojectioninb,thatis,bis
essentiallytheresultoftheMATLABbackslashoperator.
ItisnotassumedthatXhasfullcolumnrank.Redundancieswillbediscoveredandremovedas
theyarefound.
Thisalternativeistheolder,lowlevelwaytoperformleastsquarescalculations.Althoughstill
usefulinsomecontexts,itwouldnowgenerallybereplacedbythestatisticalmodelsfeatures,as
willbediscussedinStatisticalmodelsinR.
Next:Theconcatenationfunctionc()witharrays,Previous:Matrixfacilities,Up:Arraysand
matrices[Contents][Index]
5.8Formingpartitionedmatrices,cbind()andrbind()
Aswehavealreadyseeninformally,matricescanbebuiltupfromothervectorsandmatricesby
thefunctionscbind()andrbind().Roughlycbind()formsmatricesbybindingtogethermatrices
horizontally,orcolumnwise,andrbind()vertically,orrowwise.
Intheassignment
>X<cbind(arg_1,arg_2,arg_3,)
theargumentstocbind()mustbeeithervectorsofanylength,ormatriceswiththesamecolumn
size,thatisthesamenumberofrows.Theresultisamatrixwiththeconcatenatedarguments
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
30/116
5/28/2015
AnIntroductiontoR
arg_1,arg_2,formingthecolumns.
Ifsomeoftheargumentstocbind()arevectorstheymaybeshorterthanthecolumnsizeofany
matricespresent,inwhichcasetheyarecyclicallyextendedtomatchthematrixcolumnsize(or
thelengthofthelongestvectorifnomatricesaregiven).
Thefunctionrbind()doesthecorrespondingoperationforrows.Inthiscaseanyvector
argument,possiblycyclicallyextended,areofcoursetakenasrowvectors.
SupposeX1andX2havethesamenumberofrows.TocombinethesebycolumnsintoamatrixX,
togetherwithaninitialcolumnof1swecanuse
>X<cbind(1,X1,X2)
Theresultofrbind()orcbind()alwayshasmatrixstatus.Hencecbind(x)andrbind(x)are
possiblythesimplestwaysexplicitlytoallowthevectorxtobetreatedasacolumnorrow
matrixrespectively.
Next:Frequencytablesfromfactors,Previous:Formingpartitionedmatrices,Up:Arraysand
matrices[Contents][Index]
5.9Theconcatenationfunction,c(),witharrays
Itshouldbenotedthatwhereascbind()andrbind()areconcatenationfunctionsthatrespectdim
attributes,thebasicc()functiondoesnot,butratherclearsnumericobjectsofalldimand
dimnamesattributes.Thisisoccasionallyusefulinitsownright.
Theofficialwaytocoerceanarraybacktoasimplevectorobjectistouseas.vector()
>vec<as.vector(X)
Howeverasimilarresultcanbeachievedbyusingc()withjustoneargument,simplyforthis
sideeffect:
>vec<c(X)
Thereareslightdifferencesbetweenthetwo,butultimatelythechoicebetweenthemislargelya
matterofstyle(withtheformerbeingpreferable).
Previous:Theconcatenationfunctionc()witharrays,Up:Arraysandmatrices[Contents]
[Index]
5.10Frequencytablesfromfactors
Recallthatafactordefinesapartitionintogroups.Similarlyapairoffactorsdefinesatwoway
crossclassification,andsoon.Thefunctiontable()allowsfrequencytablestobecalculated
fromequallengthfactors.Iftherearekfactorarguments,theresultisakwayarrayof
frequencies.
Suppose,forexample,thatstatefisafactorgivingthestatecodeforeachentryinadatavector.
Theassignment
>statefr<table(statef)
givesinstatefratableoffrequenciesofeachstateinthesample.Thefrequenciesareordered
andlabelledbythelevelsattributeofthefactor.Thissimplecaseisequivalentto,butmore
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
31/116
5/28/2015
AnIntroductiontoR
convenientthan,
>statefr<tapply(statef,statef,length)
Furthersupposethatincomefisafactorgivingasuitablydefinedincomeclassforeachentry
inthedatavector,forexamplewiththecut()function:
>factor(cut(incomes,breaks=35+10*(0:7)))>incomef
Thentocalculateatwowaytableoffrequencies:
>table(incomef,statef)
statef
incomefactnswntqldsatasvicwa
(35,45]11010010
(45,55]11112013
(55,65]03132221
(65,75]01000010
Extensiontohigherwayfrequencytablesisimmediate.
Next:Readingdatafromfiles,Previous:Arraysandmatrices,Up:Top[Contents][Index]
6Listsanddataframes
Lists:
Constructingandmodifyinglists:
Dataframes:
Next:Constructingandmodifyinglists,Previous:Listsanddataframes,Up:Listsanddata
frames[Contents][Index]
6.1Lists
AnRlistisanobjectconsistingofanorderedcollectionofobjectsknownasitscomponents.
Thereisnoparticularneedforthecomponentstobeofthesamemodeortype,and,forexample,
alistcouldconsistofanumericvector,alogicalvalue,amatrix,acomplexvector,acharacter
array,afunction,andsoon.Hereisasimpleexampleofhowtomakealist:
>Lst<list(name="Fred",wife="Mary",no.children=3,
child.ages=c(4,7,9))
Componentsarealwaysnumberedandmayalwaysbereferredtoassuch.ThusifLstisthe
nameofalistwithfourcomponents,thesemaybeindividuallyreferredtoasLst[[1]],Lst[[2]],
Lst[[3]]andLst[[4]].If,further,Lst[[4]]isavectorsubscriptedarraythenLst[[4]][1]isits
firstentry.
IfLstisalist,thenthefunctionlength(Lst)givesthenumberof(toplevel)componentsithas.
Componentsoflistsmayalsobenamed,andinthiscasethecomponentmaybereferredtoeither
bygivingthecomponentnameasacharacterstringinplaceofthenumberindoublesquare
brackets,or,moreconveniently,bygivinganexpressionoftheform
>name$component_name
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
32/116
5/28/2015
AnIntroductiontoR
forthesamething.
Thisisaveryusefulconventionasitmakesiteasiertogettherightcomponentifyouforgetthe
number.
Sointhesimpleexamplegivenabove:
Lst$nameisthesameasLst[[1]]andisthestring"Fred",
Lst$wifeisthesameasLst[[2]]andisthestring"Mary",
Lst$child.ages[1]isthesameasLst[[4]][1]andisthenumber4.
Additionally,onecanalsousethenamesofthelistcomponentsindoublesquarebrackets,i.e.,
Lst[["name"]]isthesameasLst$name.Thisisespeciallyuseful,whenthenameofthe
componenttobeextractedisstoredinanothervariableasin
>x<"name";Lst[[x]]
ItisveryimportanttodistinguishLst[[1]]fromLst[1].[[]]istheoperatorusedtoselecta
singleelement,whereas[]isageneralsubscriptingoperator.Thustheformeristhefirst
objectinthelistLst,andifitisanamedlistthenameisnotincluded.Thelatterisasublistofthe
listLstconsistingofthefirstentryonly.Ifitisanamedlist,thenamesaretransferredtothe
sublist.
Thenamesofcomponentsmaybeabbreviateddowntotheminimumnumberoflettersneededto
identifythemuniquely.ThusLst$coefficientsmaybeminimallyspecifiedasLst$coeand
Lst$covarianceasLst$cov.
Thevectorofnamesisinfactsimplyanattributeofthelistlikeanyotherandmaybehandledas
such.Otherstructuresbesideslistsmay,ofcourse,similarlybegivenanamesattributealso.
Next:Dataframes,Previous:Lists,Up:Listsanddataframes[Contents][Index]
6.2Constructingandmodifyinglists
Newlistsmaybeformedfromexistingobjectsbythefunctionlist().Anassignmentofthe
form
>Lst<list(name_1=object_1,,name_m=object_m)
setsupalistLstofmcomponentsusingobject_1,,object_mforthecomponentsandgiving
themnamesasspecifiedbytheargumentnames,(whichcanbefreelychosen).Ifthesenames
areomitted,thecomponentsarenumberedonly.Thecomponentsusedtoformthelistarecopied
whenformingthenewlistandtheoriginalsarenotaffected.
Lists,likeanysubscriptedobject,canbeextendedbyspecifyingadditionalcomponents.For
example
>Lst[5]<list(matrix=Mat)
Concatenatinglists:
Previous:Constructingandmodifyinglists,Up:Constructingandmodifyinglists[Contents]
[Index]
6.2.1Concatenatinglists
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
33/116
5/28/2015
AnIntroductiontoR
Whentheconcatenationfunctionc()isgivenlistarguments,theresultisanobjectofmodelist
also,whosecomponentsarethoseoftheargumentlistsjoinedtogetherinsequence.
>list.ABC<c(list.A,list.B,list.C)
Recallthatwithvectorobjectsasargumentstheconcatenationfunctionsimilarlyjoinedtogether
allargumentsintoasinglevectorstructure.Inthiscaseallotherattributes,suchasdimattributes,
arediscarded.
Previous:Constructingandmodifyinglists,Up:Listsanddataframes[Contents][Index]
6.3Dataframes
Adataframeisalistwithclass"data.frame".Therearerestrictionsonliststhatmaybemade
intodataframes,namely
Thecomponentsmustbevectors(numeric,character,orlogical),factors,numeric
matrices,lists,orotherdataframes.
Matrices,lists,anddataframesprovideasmanyvariablestothenewdataframeasthey
havecolumns,elements,orvariables,respectively.
Numericvectors,logicalsandfactorsareincludedasis,andbydefault18charactervectors
arecoercedtobefactors,whoselevelsaretheuniquevaluesappearinginthevector.
Vectorstructuresappearingasvariablesofthedataframemustallhavethesamelength,
andmatrixstructuresmustallhavethesamerowsize.
Adataframemayformanypurposesberegardedasamatrixwithcolumnspossiblyofdiffering
modesandattributes.Itmaybedisplayedinmatrixform,anditsrowsandcolumnsextracted
usingmatrixindexingconventions.
Makingdataframes:
attach()anddetach():
Workingwithdataframes:
Attachingarbitrarylists:
Managingthesearchpath:
Next:attach()anddetach(),Previous:Dataframes,Up:Dataframes[Contents][Index]
6.3.1Makingdataframes
Objectssatisfyingtherestrictionsplacedonthecolumns(components)ofadataframemaybe
usedtoformoneusingthefunctiondata.frame:
>accountants<data.frame(home=statef,loot=incomes,shot=incomef)
Alistwhosecomponentsconformtotherestrictionsofadataframemaybecoercedintoadata
frameusingthefunctionas.data.frame()
Thesimplestwaytoconstructadataframefromscratchistousetheread.table()functionto
readanentiredataframefromanexternalfile.ThisisdiscussedfurtherinReadingdatafrom
files.
Next:Workingwithdataframes,Previous:Makingdataframes,Up:Dataframes[Contents]
[Index]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
34/116
5/28/2015
AnIntroductiontoR
6.3.2attach()anddetach()
The$notation,suchasaccountants$home,forlistcomponentsisnotalwaysveryconvenient.A
usefulfacilitywouldbesomehowtomakethecomponentsofalistordataframetemporarily
visibleasvariablesundertheircomponentname,withouttheneedtoquotethelistname
explicitlyeachtime.
Theattach()functiontakesadatabasesuchasalistordataframeasitsargument.Thus
supposelentilsisadataframewiththreevariableslentils$u,lentils$v,lentils$w.Theattach
>attach(lentils)
placesthedataframeinthesearchpathatposition2,andprovidedtherearenovariablesu,vorw
inposition1,u,vandwareavailableasvariablesfromthedataframeintheirownright.Atthis
pointanassignmentsuchas
>u<v+w
doesnotreplacethecomponentuofthedataframe,butrathermasksitwithanothervariableuin
theworkingdirectoryatposition1onthesearchpath.Tomakeapermanentchangetothedata
frameitself,thesimplestwayistoresortonceagaintothe$notation:
>lentils$u<v+w
Howeverthenewvalueofcomponentuisnotvisibleuntilthedataframeisdetachedand
attachedagain.
Todetachadataframe,usethefunction
>detach()
Moreprecisely,thisstatementdetachesfromthesearchpaththeentitycurrentlyatposition2.
Thusinthepresentcontextthevariablesu,vandwwouldbenolongervisible,exceptunderthe
listnotationaslentils$uandsoon.Entitiesatpositionsgreaterthan2onthesearchpathcanbe
detachedbygivingtheirnumbertodetach,butitismuchsafertoalwaysuseaname,for
examplebydetach(lentils)ordetach("lentils")
Note:InRlistsanddataframescanonlybeattachedatposition2orabove,and
whatisattachedisacopyoftheoriginalobject.Youcanaltertheattachedvalues
viaassign,buttheoriginallistordataframeisunchanged.
Next:Attachingarbitrarylists,Previous:attach()anddetach(),Up:Dataframes[Contents]
[Index]
6.3.3Workingwithdataframes
Ausefulconventionthatallowsyoutoworkwithmanydifferentproblemscomfortablytogether
inthesameworkingdirectoryis
gathertogetherallvariablesforanywelldefinedandseparateprobleminadataframe
underasuitablyinformativename
whenworkingwithaproblemattachtheappropriatedataframeatposition2,andusethe
workingdirectoryatlevel1foroperationalquantitiesandtemporaryvariables
beforeleavingaproblem,addanyvariablesyouwishtokeepforfuturereferencetothe
dataframeusingthe$formofassignment,andthendetach()
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
35/116
5/28/2015
AnIntroductiontoR
finallyremoveallunwantedvariablesfromtheworkingdirectoryandkeepitascleanof
leftovertemporaryvariablesaspossible.
Inthiswayitisquitesimpletoworkwithmanyproblemsinthesamedirectory,allofwhich
havevariablesnamedx,yandz,forexample.
Next:Managingthesearchpath,Previous:Workingwithdataframes,Up:Dataframes
[Contents][Index]
6.3.4Attachingarbitrarylists
attach()isagenericfunctionthatallowsnotonlydirectoriesanddataframestobeattachedto
thesearchpath,butotherclassesofobjectaswell.Inparticularanyobjectofmode"list"may
beattachedinthesameway:
>attach(any.old.list)
Anythingthathasbeenattachedcanbedetachedbydetach,bypositionnumberor,preferably,
byname.
Previous:Attachingarbitrarylists,Up:Dataframes[Contents][Index]
6.3.5Managingthesearchpath
Thefunctionsearchshowsthecurrentsearchpathandsoisaveryusefulwaytokeeptrackof
whichdataframesandlists(andpackages)havebeenattachedanddetached.Initiallyitgives
>search()
[1]".GlobalEnv""Autoloads""package:base"
where.GlobalEnvistheworkspace.19
Afterlentilsisattachedwehave
>search()
[1]".GlobalEnv""lentils""Autoloads""package:base"
>ls(2)
[1]"u""v""w"
andasweseels(orobjects)canbeusedtoexaminethecontentsofanypositiononthesearch
path.
Finally,wedetachthedataframeandconfirmithasbeenremovedfromthesearchpath.
>detach("lentils")
>search()
[1]".GlobalEnv""Autoloads""package:base"
Next:Probabilitydistributions,Previous:Listsanddataframes,Up:Top[Contents][Index]
7Readingdatafromfiles
Largedataobjectswillusuallybereadasvaluesfromexternalfilesratherthanenteredduringan
Rsessionatthekeyboard.Rinputfacilitiesaresimpleandtheirrequirementsarefairlystrictand
evenratherinflexible.ThereisaclearpresumptionbythedesignersofRthatyouwillbeableto
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
36/116
5/28/2015
AnIntroductiontoR
modifyyourinputfilesusingothertools,suchasfileeditorsorPerl20tofitinwiththe
requirementsofR.Generallythisisverysimple.
Ifvariablesaretobeheldmainlyindataframes,aswestronglysuggesttheyshouldbe,anentire
dataframecanbereaddirectlywiththeread.table()function.Thereisalsoamoreprimitive
inputfunction,scan(),thatcanbecalleddirectly.
FormoredetailsonimportingdataintoRandalsoexportingdata,seetheRDataImport/Export
manual.
Theread.table()function:
Thescan()function:
Accessingbuiltindatasets:
Editingdata:
Next:Thescan()function,Previous:Readingdatafromfiles,Up:Readingdatafromfiles
[Contents][Index]
7.1Theread.table()function
Toreadanentiredataframedirectly,theexternalfilewillnormallyhaveaspecialform.
Thefirstlineofthefileshouldhaveanameforeachvariableinthedataframe.
Eachadditionallineofthefilehasasitsfirstitemarowlabelandthevaluesforeach
variable.
Ifthefilehasonefeweriteminitsfirstlinethaninitssecond,thisarrangementispresumedto
beinforce.Sothefirstfewlinesofafiletobereadasadataframemightlookasfollows.
Inputfileformwithnamesandrowlabels:
PriceFloorAreaRoomsAgeCent.heat
0152.00111.083056.2no
0254.75128.071057.5no
0357.50101.0100054.2no
0457.50131.069068.8no
0559.7593.090051.9yes
...
Bydefaultnumericitems(exceptrowlabels)arereadasnumericvariablesandnonnumeric
variables,suchasCent.heatintheexample,asfactors.Thiscanbechangedifnecessary.
Thefunctionread.table()canthenbeusedtoreadthedataframedirectly
>HousePrice<read.table("houses.data")
Oftenyouwillwanttoomitincludingtherowlabelsdirectlyandusethedefaultlabels.Inthis
casethefilemayomittherowlabelcolumnasinthefollowing.
Inputfileformwithoutrowlabels:
PriceFloorAreaRoomsAgeCent.heat
52.00111.083056.2no
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
37/116
5/28/2015
AnIntroductiontoR
54.75128.071057.5no
57.50101.0100054.2no
57.50131.069068.8no
59.7593.090051.9yes
...
Thedataframemaythenbereadas
>HousePrice<read.table("houses.data",header=TRUE)
wheretheheader=TRUEoptionspecifiesthatthefirstlineisalineofheadings,andhence,by
implicationfromtheformofthefile,thatnoexplicitrowlabelsaregiven.
Thescan()function:
Next:Accessingbuiltindatasets,Previous:Theread.table()function,Up:Readingdatafrom
files[Contents][Index]
7.2Thescan()function
Supposethedatavectorsareofequallengthandaretobereadinparallel.Furthersupposethat
therearethreevectors,thefirstofmodecharacterandtheremainingtwoofmodenumeric,and
thefileisinput.dat.Thefirststepistousescan()toreadinthethreevectorsasalist,asfollows
>inp<scan("input.dat",list("",0,0))
Thesecondargumentisadummyliststructurethatestablishesthemodeofthethreevectorsto
beread.Theresult,heldininp,isalistwhosecomponentsarethethreevectorsreadin.To
separatethedataitemsintothreeseparatevectors,useassignmentslike
>label<inp[[1]];x<inp[[2]];y<inp[[3]]
Moreconveniently,thedummylistcanhavenamedcomponents,inwhichcasethenamescanbe
usedtoaccessthevectorsreadin.Forexample
>inp<scan("input.dat",list(id="",x=0,y=0))
Ifyouwishtoaccessthevariablesseparatelytheymayeitherbereassignedtovariablesinthe
workingframe:
>label<inp$id;x<inp$x;y<inp$y
orthelistmaybeattachedatposition2ofthesearchpath(seeAttachingarbitrarylists).
Ifthesecondargumentisasinglevalueandnotalist,asinglevectorisreadin,allcomponents
ofwhichmustbeofthesamemodeasthedummyvalue.
>X<matrix(scan("light.dat",0),ncol=5,byrow=TRUE)
Therearemoreelaborateinputfacilitiesavailableandthesearedetailedinthemanuals.
Next:Editingdata,Previous:Thescan()function,Up:Readingdatafromfiles[Contents]
[Index]
7.3Accessingbuiltindatasets
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
38/116
5/28/2015
AnIntroductiontoR
Around100datasetsaresuppliedwithR(inpackagedatasets),andothersareavailablein
packages(includingtherecommendedpackagessuppliedwithR).Toseethelistofdatasets
currentlyavailableuse
data()
AllthedatasetssuppliedwithRareavailabledirectlybyname.However,manypackagesstill
usetheobsoleteconventioninwhichdatawasalsousedtoloaddatasetsintoR,forexample
data(infert)
andthiscanstillbeusedwiththestandardpackages(asinthisexample).Inmostcasesthiswill
loadanRobjectofthesamename.However,inafewcasesitloadsseveralobjects,soseethe
onlinehelpfortheobjecttoseewhattoexpect.
7.3.1LoadingdatafromotherRpackages
Toaccessdatafromaparticularpackage,usethepackageargument,forexample
data(package="rpart")
data(Puromycin,package="datasets")
Ifapackagehasbeenattachedbylibrary,itsdatasetsareautomaticallyincludedinthesearch.
Usercontributedpackagescanbearichsourceofdatasets.
Previous:Accessingbuiltindatasets,Up:Readingdatafromfiles[Contents][Index]
7.4Editingdata
Wheninvokedonadataframeormatrix,editbringsupaseparatespreadsheetlikeenvironment
forediting.Thisisusefulformakingsmallchangesonceadatasethasbeenread.Thecommand
>xnew<edit(xold)
willallowyoutoedityourdatasetxold,andoncompletionthechangedobjectisassignedto
xnew.Ifyouwanttoaltertheoriginaldatasetxold,thesimplestwayistousefix(xold),whichis
equivalenttoxold<edit(xold).
Use
>xnew<edit(data.frame())
toenternewdataviathespreadsheetinterface.
Next:Loopsandconditionalexecution,Previous:Readingdatafromfiles,Up:Top[Contents]
[Index]
8Probabilitydistributions
Rasasetofstatisticaltables:
Examiningthedistributionofasetofdata:
Oneandtwosampletests:
Next:Examiningthedistributionofasetofdata,Previous:Probabilitydistributions,Up:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
39/116
5/28/2015
AnIntroductiontoR
Probabilitydistributions[Contents][Index]
8.1Rasasetofstatisticaltables
OneconvenientuseofRistoprovideacomprehensivesetofstatisticaltables.Functionsare
providedtoevaluatethecumulativedistributionfunctionP(X<=x),theprobabilitydensity
functionandthequantilefunction(givenq,thesmallestxsuchthatP(X<=x)>q),andto
simulatefromthedistribution.
Distribution
beta
binomial
Cauchy
chisquared
exponential
F
gamma
geometric
hypergeometric
lognormal
logistic
negativebinomial
normal
Poisson
signedrank
Studentst
uniform
Weibull
Wilcoxon
Rname additionalarguments
beta
shape1,shape2,ncp
binom
size,prob
cauchy
location,scale
chisq
df,ncp
exp
rate
df1,df2,ncp
gamma
shape,scale
geom
prob
hyper
m,n,k
lnorm
meanlog,sdlog
logis
location,scale
nbinom
size,prob
norm
mean,sd
pois
lambda
signrank n
t
df,ncp
unif
min,max
weibull shape,scale
wilcox
m,n
Prefixthenamegivenherebydforthedensity,pfortheCDF,qforthequantilefunction
andrforsimulation(randomdeviates).Thefirstargumentisxfordxxx,qforpxxx,pforqxxx
andnforrxxx(exceptforrhyper,rsignrankandrwilcox,forwhichitisnn).Innotquiteall
casesisthenoncentralityparameterncpcurrentlyavailable:seetheonlinehelpfordetails.
Thepxxxandqxxxfunctionsallhavelogicalargumentslower.tailandlog.pandthedxxxones
havelog.Thisallows,e.g.,gettingthecumulative(orintegrated)hazardfunction,H(t)=
log(1F(t)),by
pxxx(t,...,lower.tail=FALSE,log.p=TRUE)
ormoreaccurateloglikelihoods(bydxxx(...,log=TRUE)),directly.
Inadditiontherearefunctionsptukeyandqtukeyforthedistributionofthestudentizedrangeof
samplesfromanormaldistribution,anddmultinomandrmultinomforthemultinomial
distribution.Furtherdistributionsareavailableincontributedpackages,notablySuppDists.
Herearesomeexamples
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
40/116
5/28/2015
AnIntroductiontoR
>##2tailedpvaluefortdistribution
>2*pt(2.43,df=13)
>##upper1%pointforanF(2,7)distribution
>qf(0.01,2,7,lower.tail=FALSE)
SeetheonlinehelponRNGforhowrandomnumbergenerationisdoneinR.
Next:Oneandtwosampletests,Previous:Rasasetofstatisticaltables,Up:Probability
distributions[Contents][Index]
8.2Examiningthedistributionofasetofdata
Givena(univariate)setofdatawecanexamineitsdistributioninalargenumberofways.The
simplestistoexaminethenumbers.Twoslightlydifferentsummariesaregivenbysummaryand
fivenumandadisplayofthenumbersbystem(astemandleafplot).
>attach(faithful)
>summary(eruptions)
Min.1stQu.MedianMean3rdQu.Max.
1.6002.1634.0003.4884.4545.100
>fivenum(eruptions)
[1]1.60002.15854.00004.45855.1000
>stem(eruptions)
Thedecimalpointis1digit(s)totheleftofthe|
16|070355555588
18|000022233333335577777777888822335777888
20|00002223378800035778
22|0002335578023578
24|00228
26|23
28|080
30|7
32|2337
34|250077
36|0000823577
38|2333335582225577
40|0000003357788888002233555577778
42|03335555778800233333555577778
44|02222335557780000000023333357778888
46|0000233357700000023578
48|00000022335800333
50|0370
Astemandleafplotislikeahistogram,andRhasafunctionhisttoplothistograms.
>hist(eruptions)
##makethebinssmaller,makeaplotofdensity
>hist(eruptions,seq(1.6,5.2,0.2),prob=TRUE)
>lines(density(eruptions,bw=0.1))
>rug(eruptions)#showtheactualdatapoints
Moreelegantdensityplotscanbemadebydensity,andweaddedalineproducedbydensityin
thisexample.Thebandwidthbwwaschosenbytrialanderrorasthedefaultgivestoomuch
smoothing(itusuallydoesforinterestingdensities).(Betterautomatedmethodsofbandwidth
choiceareavailable,andinthisexamplebw="SJ"givesagoodresult.)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
41/116
5/28/2015
AnIntroductiontoR
images/hist
Wecanplottheempiricalcumulativedistributionfunctionbyusingthefunctionecdf.
>plot(ecdf(eruptions),do.points=FALSE,verticals=TRUE)
Thisdistributionisobviouslyfarfromanystandarddistribution.Howabouttherighthand
mode,sayeruptionsoflongerthan3minutes?Letusfitanormaldistributionandoverlaythe
fittedCDF.
>long<eruptions[eruptions>3]
>plot(ecdf(long),do.points=FALSE,verticals=TRUE)
>x<seq(3,5.4,0.01)
>lines(x,pnorm(x,mean=mean(long),sd=sqrt(var(long))),lty=3)
images/ecdf
Quantilequantile(QQ)plotscanhelpusexaminethismorecarefully.
par(pty="s")#arrangeforasquarefigureregion
qqnorm(long);qqline(long)
whichshowsareasonablefitbutashorterrighttailthanonewouldexpectfromanormal
distribution.Letuscomparethiswithsomesimulateddatafromatdistribution
images/QQ
x<rt(250,df=5)
qqnorm(x);qqline(x)
whichwillusually(ifitisarandomsample)showlongertailsthanexpectedforanormal.We
canmakeaQQplotagainstthegeneratingdistributionby
qqplot(qt(ppoints(250),df=5),x,xlab="QQplotfortdsn")
qqline(x)
Finally,wemightwantamoreformaltestofagreementwithnormality(ornot).Rprovidesthe
ShapiroWilktest
>shapiro.test(long)
ShapiroWilknormalitytest
data:long
W=0.9793,pvalue=0.01052
andtheKolmogorovSmirnovtest
>ks.test(long,"pnorm",mean=mean(long),sd=sqrt(var(long)))
OnesampleKolmogorovSmirnovtest
data:long
D=0.0661,pvalue=0.4284
alternativehypothesis:two.sided
(Notethatthedistributiontheoryisnotvalidhereaswehaveestimatedtheparametersofthe
normaldistributionfromthesamesample.)
Previous:Examiningthedistributionofasetofdata,Up:Probabilitydistributions[Contents]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
42/116
5/28/2015
AnIntroductiontoR
[Index]
8.3Oneandtwosampletests
Sofarwehavecomparedasinglesampletoanormaldistribution.Amuchmorecommon
operationistocompareaspectsoftwosamples.NotethatinR,allclassicaltestsincludingthe
onesusedbelowareinpackagestatswhichisnormallyloaded.
Considerthefollowingsetsofdataonthelatentheatofthefusionofice(cal/gm)fromRice
(1995,p.490)
MethodA:79.9880.0480.0280.0480.0380.0380.0479.97
80.0580.0380.0280.0080.02
MethodB:80.0279.9479.9879.9779.9780.0379.9579.97
Boxplotsprovideasimplegraphicalcomparisonofthetwosamples.
A<scan()
79.9880.0480.0280.0480.0380.0380.0479.97
80.0580.0380.0280.0080.02
B<scan()
80.0279.9479.9879.9779.9780.0379.9579.97
boxplot(A,B)
whichindicatesthatthefirstgrouptendstogivehigherresultsthanthesecond.
images/ice
Totestfortheequalityofthemeansofthetwoexamples,wecanuseanunpairedttestby
>t.test(A,B)
WelchTwoSamplettest
data:AandB
t=3.2499,df=12.027,pvalue=0.00694
alternativehypothesis:truedifferenceinmeansisnotequalto0
95percentconfidenceinterval:
0.013855260.07018320
sampleestimates:
meanofxmeanofy
80.0207779.97875
whichdoesindicateasignificantdifference,assumingnormality.BydefaulttheRfunctiondoes
notassumeequalityofvariancesinthetwosamples(incontrasttothesimilarSPLUSt.test
function).WecanusetheFtesttotestforequalityinthevariances,providedthatthetwo
samplesarefromnormalpopulations.
>var.test(A,B)
Ftesttocomparetwovariances
data:AandB
F=0.5837,numdf=12,denomdf=7,pvalue=0.3938
alternativehypothesis:trueratioofvariancesisnotequalto1
95percentconfidenceinterval:
0.12510972.1052687
sampleestimates:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
43/116
5/28/2015
AnIntroductiontoR
ratioofvariances
0.5837405
whichshowsnoevidenceofasignificantdifference,andsowecanusetheclassicalttestthat
assumesequalityofthevariances.
>t.test(A,B,var.equal=TRUE)
TwoSamplettest
data:AandB
t=3.4722,df=19,pvalue=0.002551
alternativehypothesis:truedifferenceinmeansisnotequalto0
95percentconfidenceinterval:
0.016690580.06734788
sampleestimates:
meanofxmeanofy
80.0207779.97875
Allthesetestsassumenormalityofthetwosamples.ThetwosampleWilcoxon(orMann
Whitney)testonlyassumesacommoncontinuousdistributionunderthenullhypothesis.
>wilcox.test(A,B)
Wilcoxonranksumtestwithcontinuitycorrection
data:AandB
W=89,pvalue=0.007497
alternativehypothesis:truelocationshiftisnotequalto0
Warningmessage:
Cannotcomputeexactpvaluewithtiesin:wilcox.test(A,B)
Notethewarning:thereareseveraltiesineachsample,whichsuggestsstronglythatthesedata
arefromadiscretedistribution(probablyduetorounding).
Thereareseveralwaystocomparegraphicallythetwosamples.Wehavealreadyseenapairof
boxplots.Thefollowing
>plot(ecdf(A),do.points=FALSE,verticals=TRUE,xlim=range(A,B))
>plot(ecdf(B),do.points=FALSE,verticals=TRUE,add=TRUE)
willshowthetwoempiricalCDFs,andqqplotwillperformaQQplotofthetwosamples.The
KolmogorovSmirnovtestisofthemaximalverticaldistancebetweenthetwoecdfs,assuming
acommoncontinuousdistribution:
>ks.test(A,B)
TwosampleKolmogorovSmirnovtest
data:AandB
D=0.5962,pvalue=0.05919
alternativehypothesis:twosided
Warningmessage:
cannotcomputecorrectpvalueswithtiesin:ks.test(A,B)
Next:Writingyourownfunctions,Previous:Probabilitydistributions,Up:Top[Contents]
[Index]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
44/116
5/28/2015
AnIntroductiontoR
9Grouping,loopsandconditionalexecution
Groupedexpressions:
Controlstatements:
Next:Controlstatements,Previous:Loopsandconditionalexecution,Up:Loopsandconditional
execution[Contents][Index]
9.1Groupedexpressions
Risanexpressionlanguageinthesensethatitsonlycommandtypeisafunctionorexpression
whichreturnsaresult.Evenanassignmentisanexpressionwhoseresultisthevalueassigned,
anditmaybeusedwhereveranyexpressionmaybeusedinparticularmultipleassignmentsare
possible.
Commandsmaybegroupedtogetherinbraces,{expr_1;;expr_m},inwhichcasethevalueof
thegroupistheresultofthelastexpressioninthegroupevaluated.Sincesuchagroupisalsoan
expressionitmay,forexample,beitselfincludedinparenthesesandusedapartofanevenlarger
expression,andsoon.
Previous:Groupedexpressions,Up:Loopsandconditionalexecution[Contents][Index]
9.2Controlstatements
Conditionalexecution:
Repetitiveexecution:
Next:Repetitiveexecution,Previous:Controlstatements,Up:Controlstatements[Contents]
[Index]
9.2.1Conditionalexecution:ifstatements
Thelanguagehasavailableaconditionalconstructionoftheform
>if(expr_1)expr_2elseexpr_3
whereexpr_1mustevaluatetoasinglelogicalvalueandtheresultoftheentireexpressionis
thenevident.
Theshortcircuitoperators&&and||areoftenusedaspartoftheconditioninanifstatement.
Whereas&and|applyelementwisetovectors,&&and||applytovectorsoflengthone,and
onlyevaluatetheirsecondargumentifnecessary.
Thereisavectorizedversionoftheif/elseconstruct,theifelsefunction.Thishastheform
ifelse(condition,a,b)andreturnsavectorofthelengthofitslongestargument,with
elementsa[i]ifcondition[i]istrue,otherwiseb[i].
Previous:Conditionalexecution,Up:Controlstatements[Contents][Index]
9.2.2Repetitiveexecution:forloops,repeatandwhile
Thereisalsoaforloopconstructionwhichhastheform
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
45/116
5/28/2015
AnIntroductiontoR
>for(nameinexpr_1)expr_2
wherenameistheloopvariable.expr_1isavectorexpression,(oftenasequencelike1:20),and
expr_2isoftenagroupedexpressionwithitssubexpressionswrittenintermsofthedummy
name.expr_2isrepeatedlyevaluatedasnamerangesthroughthevaluesinthevectorresultof
expr_1.
Asanexample,supposeindisavectorofclassindicatorsandwewishtoproduceseparateplots
ofyversusxwithinclasses.Onepossibilityhereistousecoplot(),21whichwillproducean
arrayofplotscorrespondingtoeachlevelofthefactor.Anotherwaytodothis,nowputtingall
plotsontheonedisplay,isasfollows:
>xc<split(x,ind)
>yc<split(y,ind)
>for(iin1:length(yc)){
plot(xc[[i]],yc[[i]])
abline(lsfit(xc[[i]],yc[[i]]))
}
(Notethefunctionsplit()whichproducesalistofvectorsobtainedbysplittingalargervector
accordingtotheclassesspecifiedbyafactor.Thisisausefulfunction,mostlyusedin
connectionwithboxplots.Seethehelpfacilityforfurtherdetails.)
Warning:for()loopsareusedinRcodemuchlessoftenthanincompiled
languages.Codethattakesawholeobjectviewislikelytobebothclearerand
fasterinR.
Otherloopingfacilitiesincludethe
>repeatexpr
statementandthe
>while(condition)expr
statement.
Thebreakstatementcanbeusedtoterminateanyloop,possiblyabnormally.Thisistheonly
waytoterminaterepeatloops.
Thenextstatementcanbeusedtodiscontinueoneparticularcycleandskiptothenext.
Controlstatementsaremostoftenusedinconnectionwithfunctionswhicharediscussedin
Writingyourownfunctions,andwheremoreexampleswillemerge.
Next:StatisticalmodelsinR,Previous:Loopsandconditionalexecution,Up:Top[Contents]
[Index]
10Writingyourownfunctions
Aswehaveseeninformallyalongtheway,theRlanguageallowstheusertocreateobjectsof
modefunction.ThesearetrueRfunctionsthatarestoredinaspecialinternalformandmaybe
usedinfurtherexpressionsandsoon.Intheprocess,thelanguagegainsenormouslyinpower,
convenienceandelegance,andlearningtowriteusefulfunctionsisoneofthemainwaysto
makeyouruseofRcomfortableandproductive.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
46/116
5/28/2015
AnIntroductiontoR
ItshouldbeemphasizedthatmostofthefunctionssuppliedaspartoftheRsystem,suchas
mean(),var(),postscript()andsoon,arethemselveswritteninRandthusdonotdiffer
materiallyfromuserwrittenfunctions.
Afunctionisdefinedbyanassignmentoftheform
>name<function(arg_1,arg_2,)expression
TheexpressionisanRexpression,(usuallyagroupedexpression),thatusesthearguments,
arg_i,tocalculateavalue.Thevalueoftheexpressionisthevaluereturnedforthefunction.
Acalltothefunctionthenusuallytakestheformname(expr_1,expr_2,)andmayoccur
anywhereafunctioncallislegitimate.
Simpleexamples:
Definingnewbinaryoperators:
Namedargumentsanddefaults:
Thethreedotsargument:
Assignmentwithinfunctions:
Moreadvancedexamples:
Scope:
Customizingtheenvironment:
Objectorientation:
Next:Definingnewbinaryoperators,Previous:Writingyourownfunctions,Up:Writingyour
ownfunctions[Contents][Index]
10.1Simpleexamples
Asafirstexample,considerafunctiontocalculatethetwosampletstatistic,showingallthe
steps.Thisisanartificialexample,ofcourse,sincethereareother,simplerwaysofachieving
thesameend.
Thefunctionisdefinedasfollows:
>twosam<function(y1,y2){
n1<length(y1);n2<length(y2)
yb1<mean(y1);yb2<mean(y2)
s1<var(y1);s2<var(y2)
s<((n11)*s1+(n21)*s2)/(n1+n22)
tst<(yb1yb2)/sqrt(s*(1/n1+1/n2))
tst
}
Withthisfunctiondefined,youcouldperformtwosamplettestsusingacallsuchas
>tstat<twosam(data$male,data$female);tstat
Asasecondexample,considerafunctiontoemulatedirectlytheMATLABbackslashcommand,
whichreturnsthecoefficientsoftheorthogonalprojectionofthevectoryontothecolumnspace
ofthematrix,X.(Thisisordinarilycalledtheleastsquaresestimateoftheregression
coefficients.)Thiswouldordinarilybedonewiththeqr()functionhoweverthisissometimesa
bittrickytousedirectlyanditpaystohaveasimplefunctionsuchasthefollowingtouseit
safely.
Thusgivenanby1vectoryandannbypmatrixXthenX\yisdefinedas(XX)^{}Xy,where
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
47/116
5/28/2015
AnIntroductiontoR
(XX)^{}isageneralizedinverseofX'X.
>bslash<function(X,y){
X<qr(X)
qr.coef(X,y)
}
Afterthisobjectiscreateditmaybeusedinstatementssuchas
>regcoeff<bslash(Xmat,yvar)
andsoon.
TheclassicalRfunctionlsfit()doesthisjobquitewell,andmore22.Itinturnusesthe
functionsqr()andqr.coef()intheslightlycounterintuitivewayabovetodothispartofthe
calculation.Hencethereisprobablysomevalueinhavingjustthispartisolatedinasimpleto
usefunctionifitisgoingtobeinfrequentuse.Ifso,wemaywishtomakeitamatrixbinary
operatorforevenmoreconvenientuse.
Next:Namedargumentsanddefaults,Previous:Simpleexamples,Up:Writingyourown
functions[Contents][Index]
10.2Definingnewbinaryoperators
Hadwegiventhebslash()functionadifferentname,namelyoneoftheform
%anything%
itcouldhavebeenusedasabinaryoperatorinexpressionsratherthaninfunctionform.
Suppose,forexample,wechoose!fortheinternalcharacter.Thefunctiondefinitionwouldthen
startas
>"%!%"<function(X,y){}
(Notetheuseofquotemarks.)ThefunctioncouldthenbeusedasX%!%y.(Thebackslash
symbolitselfisnotaconvenientchoiceasitpresentsspecialproblemsinthiscontext.)
Thematrixmultiplicationoperator,%*%,andtheouterproductmatrixoperator%o%areother
examplesofbinaryoperatorsdefinedinthisway.
Next:Thethreedotsargument,Previous:Definingnewbinaryoperators,Up:Writingyourown
functions[Contents][Index]
10.3Namedargumentsanddefaults
AsfirstnotedinGeneratingregularsequences,ifargumentstocalledfunctionsaregiveninthe
name=objectform,theymaybegiveninanyorder.Furthermoretheargumentsequencemay
beginintheunnamed,positionalform,andspecifynamedargumentsafterthepositional
arguments.
Thusifthereisafunctionfun1definedby
>fun1<function(data,data.frame,graph,limit){
[functionbodyomitted]
}
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
48/116
5/28/2015
AnIntroductiontoR
thenthefunctionmaybeinvokedinseveralways,forexample
>ans<fun1(d,df,TRUE,20)
>ans<fun1(d,df,graph=TRUE,limit=20)
>ans<fun1(data=d,limit=20,graph=TRUE,data.frame=df)
areallequivalent.
Inmanycasesargumentscanbegivencommonlyappropriatedefaultvalues,inwhichcasethey
maybeomittedaltogetherfromthecallwhenthedefaultsareappropriate.Forexample,iffun1
weredefinedas
>fun1<function(data,data.frame,graph=TRUE,limit=20){}
itcouldbecalledas
>ans<fun1(d,df)
whichisnowequivalenttothethreecasesabove,oras
>ans<fun1(d,df,limit=10)
whichchangesoneofthedefaults.
Itisimportanttonotethatdefaultsmaybearbitraryexpressions,eveninvolvingotherarguments
tothesamefunctiontheyarenotrestrictedtobeconstantsasinoursimpleexamplehere.
Next:Assignmentwithinfunctions,Previous:Namedargumentsanddefaults,Up:Writingyour
ownfunctions[Contents][Index]
10.4Theargument
Anotherfrequentrequirementistoallowonefunctiontopassonargumentsettingstoanother.
Forexamplemanygraphicsfunctionsusethefunctionpar()andfunctionslikeplot()allowthe
usertopassongraphicalparameterstopar()tocontrolthegraphicaloutput.(SeeThepar()
function,formoredetailsonthepar()function.)Thiscanbedonebyincludinganextra
argument,literally,ofthefunction,whichmaythenbepassedon.Anoutlineexampleis
givenbelow.
fun1<function(data,data.frame,graph=TRUE,limit=20,...){
[omittedstatements]
if(graph)
par(pch="*",...)
[moreomissions]
}
Lessfrequently,afunctionwillneedtorefertocomponentsof.Theexpressionlist(...)
evaluatesallsuchargumentsandreturnstheminanamedlist,while..1,..2,etc.evaluatethem
oneatatime,with..nreturningthenthunmatchedargument.
Next:Moreadvancedexamples,Previous:Thethreedotsargument,Up:Writingyourown
functions[Contents][Index]
10.5Assignmentswithinfunctions
Notethatanyordinaryassignmentsdonewithinthefunctionarelocalandtemporaryandare
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
49/116
5/28/2015
AnIntroductiontoR
lostafterexitfromthefunction.ThustheassignmentX<qr(X)doesnotaffectthevalueofthe
argumentinthecallingprogram.
TounderstandcompletelytherulesgoverningthescopeofRassignmentsthereaderneedstobe
familiarwiththenotionofanevaluationframe.Thisisasomewhatadvanced,thoughhardly
difficult,topicandisnotcoveredfurtherhere.
Ifglobalandpermanentassignmentsareintendedwithinafunction,theneitherthe
superassignmentoperator,<<orthefunctionassign()canbeused.Seethehelpdocumentfor
details.SPLUSusersshouldbeawarethat<<hasdifferentsemanticsinR.Thesearediscussed
furtherinScope.
Next:Scope,Previous:Assignmentwithinfunctions,Up:Writingyourownfunctions
[Contents][Index]
10.6Moreadvancedexamples
Efficiencyfactorsinblockdesigns:
Droppingallnamesinaprintedarray:
Recursivenumericalintegration:
Next:Droppingallnamesinaprintedarray,Previous:Moreadvancedexamples,Up:More
advancedexamples[Contents][Index]
10.6.1Efficiencyfactorsinblockdesigns
Asamorecomplete,ifalittlepedestrian,exampleofafunction,considerfindingtheefficiency
factorsforablockdesign.(SomeaspectsofthisproblemhavealreadybeendiscussedinIndex
matrices.)
Ablockdesignisdefinedbytwofactors,sayblocks(blevels)andvarieties(vlevels).IfRand
Karethevbyvandbbybreplicationsandblocksizematrices,respectively,andNisthebbyv
incidencematrix,thentheefficiencyfactorsaredefinedastheeigenvaluesofthematrixE=I_v
R^{1/2}NK^{1}NR^{1/2}=I_vAA,whereA=K^{1/2}NR^{1/2}.Onewaytowrite
thefunctionisgivenbelow.
>bdeff<function(blocks,varieties){
blocks<as.factor(blocks)#minorsafetymove
b<length(levels(blocks))
varieties<as.factor(varieties)#minorsafetymove
v<length(levels(varieties))
K<as.vector(table(blocks))#removedimattr
R<as.vector(table(varieties))#removedimattr
N<table(blocks,varieties)
A<1/sqrt(K)*N*rep(1/sqrt(R),rep(b,v))
sv<svd(A)
list(eff=1sv$d^2,blockcv=sv$u,varietycv=sv$v)
}
Itisnumericallyslightlybettertoworkwiththesingularvaluedecompositiononthisoccasion
ratherthantheeigenvalueroutines.
Theresultofthefunctionisalistgivingnotonlytheefficiencyfactorsasthefirstcomponent,
butalsotheblockandvarietycanonicalcontrasts,sincesometimesthesegiveadditionaluseful
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
50/116
5/28/2015
AnIntroductiontoR
qualitativeinformation.
Next:Recursivenumericalintegration,Previous:Efficiencyfactorsinblockdesigns,Up:More
advancedexamples[Contents][Index]
10.6.2Droppingallnamesinaprintedarray
Forprintingpurposeswithlargematricesorarrays,itisoftenusefultoprintthemincloseblock
formwithoutthearraynamesornumbers.Removingthedimnamesattributewillnotachievethis
effect,butratherthearraymustbegivenadimnamesattributeconsistingofemptystrings.For
exampletoprintamatrix,X
>temp<X
>dimnames(temp)<list(rep("",nrow(X)),rep("",ncol(X)))
>temp;rm(temp)
Thiscanbemuchmoreconvenientlydoneusingafunction,no.dimnames(),shownbelow,asa
wraparoundtoachievethesameresult.Italsoillustrateshowsomeeffectiveandusefuluser
functionscanbequiteshort.
no.dimnames<function(a){
##Removealldimensionnamesfromanarrayforcompactprinting.
d<list()
l<0
for(iindim(a)){
d[[l<l+1]]<rep("",i)
}
dimnames(a)<d
a
}
Withthisfunctiondefined,anarraymaybeprintedincloseformatusing
>no.dimnames(X)
Thisisparticularlyusefulforlargeintegerarrays,wherepatternsaretherealinterestratherthan
thevalues.
Previous:Droppingallnamesinaprintedarray,Up:Moreadvancedexamples[Contents]
[Index]
10.6.3Recursivenumericalintegration
Functionsmayberecursive,andmaythemselvesdefinefunctionswithinthemselves.Note,
however,thatsuchfunctions,orindeedvariables,arenotinheritedbycalledfunctionsinhigher
evaluationframesastheywouldbeiftheywereonthesearchpath.
Theexamplebelowshowsanaivewayofperformingonedimensionalnumericalintegration.
Theintegrandisevaluatedattheendpointsoftherangeandinthemiddle.Iftheonepanel
trapeziumruleansweriscloseenoughtothetwopanel,thenthelatterisreturnedasthevalue.
Otherwisethesameprocessisrecursivelyappliedtoeachpanel.Theresultisanadaptive
integrationprocessthatconcentratesfunctionevaluationsinregionswheretheintegrandis
farthestfromlinear.Thereis,however,aheavyoverhead,andthefunctionisonlycompetitive
withotheralgorithmswhentheintegrandisbothsmoothandverydifficulttoevaluate.
TheexampleisalsogivenpartlyasalittlepuzzleinRprogramming.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
51/116
5/28/2015
AnIntroductiontoR
area<function(f,a,b,eps=1.0e06,lim=10){
fun1<function(f,a,b,fa,fb,a0,eps,lim,fun){
##functionfun1isonlyvisibleinsidearea
d<(a+b)/2
h<(ba)/4
fd<f(d)
a1<h*(fa+fd)
a2<h*(fd+fb)
if(abs(a0a1a2)<eps||lim==0)
return(a1+a2)
else{
return(fun(f,a,d,fa,fd,a1,eps,lim1,fun)+
fun(f,d,b,fd,fb,a2,eps,lim1,fun))
}
}
fa<f(a)
fb<f(b)
a0<((fa+fb)*(ba))/2
fun1(f,a,b,fa,fb,a0,eps,lim,fun1)
}
Scope:
Objectorientation:
Next:Customizingtheenvironment,Previous:Moreadvancedexamples,Up:Writingyourown
functions[Contents][Index]
10.7Scope
Thediscussioninthissectionissomewhatmoretechnicalthaninotherpartsofthisdocument.
However,itdetailsoneofthemajordifferencesbetweenSPLUSandR.
Thesymbolswhichoccurinthebodyofafunctioncanbedividedintothreeclassesformal
parameters,localvariablesandfreevariables.Theformalparametersofafunctionarethose
occurringintheargumentlistofthefunction.Theirvaluesaredeterminedbytheprocessof
bindingtheactualfunctionargumentstotheformalparameters.Localvariablesarethosewhose
valuesaredeterminedbytheevaluationofexpressionsinthebodyofthefunctions.Variables
whicharenotformalparametersorlocalvariablesarecalledfreevariables.Freevariables
becomelocalvariablesiftheyareassignedto.Considerthefollowingfunctiondefinition.
f<function(x){
y<2*x
print(x)
print(y)
print(z)
}
Inthisfunction,xisaformalparameter,yisalocalvariableandzisafreevariable.
InRthefreevariablebindingsareresolvedbyfirstlookingintheenvironmentinwhichthe
functionwascreated.Thisiscalledlexicalscope.Firstwedefineafunctioncalledcube.
cube<function(n){
sq<function()n*n
n*sq()
}
Thevariableninthefunctionsqisnotanargumenttothatfunction.Thereforeitisafree
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
52/116
5/28/2015
AnIntroductiontoR
variableandthescopingrulesmustbeusedtoascertainthevaluethatistobeassociatedwithit.
Understaticscope(SPLUS)thevalueisthatassociatedwithaglobalvariablenamedn.Under
lexicalscope(R)itistheparametertothefunctioncubesincethatistheactivebindingforthe
variablenatthetimethefunctionsqwasdefined.ThedifferencebetweenevaluationinRand
evaluationinSPLUSisthatSPLUSlooksforaglobalvariablecallednwhileRfirstlooksfora
variablecallednintheenvironmentcreatedwhencubewasinvoked.
##firstevaluationinS
S>cube(2)
Errorinsq():Object"n"notfound
Dumped
S>n<3
S>cube(2)
[1]18
##thenthesamefunctionevaluatedinR
R>cube(2)
[1]8
Lexicalscopecanalsobeusedtogivefunctionsmutablestate.Inthefollowingexamplewe
showhowRcanbeusedtomimicabankaccount.Afunctioningbankaccountneedstohavea
balanceortotal,afunctionformakingwithdrawals,afunctionformakingdepositsanda
functionforstatingthecurrentbalance.Weachievethisbycreatingthethreefunctionswithin
accountandthenreturningalistcontainingthem.Whenaccountisinvokedittakesanumerical
argumenttotalandreturnsalistcontainingthethreefunctions.Becausethesefunctionsare
definedinanenvironmentwhichcontainstotal,theywillhaveaccesstoitsvalue.
Thespecialassignmentoperator,<<,isusedtochangethevalueassociatedwithtotal.This
operatorlooksbackinenclosingenvironmentsforanenvironmentthatcontainsthesymbol
totalandwhenitfindssuchanenvironmentitreplacesthevalue,inthatenvironment,withthe
valueofrighthandside.Iftheglobalortoplevelenvironmentisreachedwithoutfindingthe
symboltotalthenthatvariableiscreatedandassignedtothere.Formostusers<<createsa
globalvariableandassignsthevalueoftherighthandsidetoit23.Onlywhen<<hasbeenused
inafunctionthatwasreturnedasthevalueofanotherfunctionwillthespecialbehavior
describedhereoccur.
open.account<function(total){
list(
deposit=function(amount){
if(amount<=0)
stop("Depositsmustbepositive!\n")
total<<total+amount
cat(amount,"deposited.Yourbalanceis",total,"\n\n")
},
withdraw=function(amount){
if(amount>total)
stop("Youdon'thavethatmuchmoney!\n")
total<<totalamount
cat(amount,"withdrawn.Yourbalanceis",total,"\n\n")
},
balance=function(){
cat("Yourbalanceis",total,"\n\n")
}
)
}
ross<open.account(100)
robert<open.account(200)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
53/116
5/28/2015
AnIntroductiontoR
ross$withdraw(30)
ross$balance()
robert$balance()
ross$deposit(50)
ross$balance()
ross$withdraw(500)
Next:Objectorientation,Previous:Scope,Up:Writingyourownfunctions[Contents][Index]
10.8Customizingtheenvironment
Userscancustomizetheirenvironmentinseveraldifferentways.Thereisasiteinitializationfile
andeverydirectorycanhaveitsownspecialinitializationfile.Finally,thespecialfunctions
.Firstand.Lastcanbeused.
ThelocationofthesiteinitializationfileistakenfromthevalueoftheR_PROFILEenvironment
variable.Ifthatvariableisunset,thefileRprofile.siteintheRhomesubdirectoryetcisused.
ThisfileshouldcontainthecommandsthatyouwanttoexecuteeverytimeRisstartedunder
yoursystem.Asecond,personal,profilefilenamed.Rprofile24canbeplacedinanydirectory.
IfRisinvokedinthatdirectorythenthatfilewillbesourced.Thisfilegivesindividualusers
controlovertheirworkspaceandallowsfordifferentstartupproceduresindifferentworking
directories.Ifno.Rprofilefileisfoundinthestartupdirectory,thenRlooksfora.Rprofilefile
intheusershomedirectoryandusesthat(ifitexists).Iftheenvironmentvariable
R_PROFILE_USERisset,thefileitpointstoisusedinsteadofthe.Rprofilefiles.
Anyfunctionnamed.First()ineitherofthetwoprofilefilesorinthe.RDataimagehasa
specialstatus.ItisautomaticallyperformedatthebeginningofanRsessionandmaybeusedto
initializetheenvironment.Forexample,thedefinitionintheexamplebelowaltersthepromptto
$andsetsupvariousotherusefulthingsthatcanthenbetakenforgrantedintherestofthe
session.
Thus,thesequenceinwhichfilesareexecutedis,Rprofile.site,theuserprofile,.RDataand
then.First().Adefinitioninlaterfileswillmaskdefinitionsinearlierfiles.
>.First<function(){
options(prompt="$",continue="+\t")#$istheprompt
options(digits=5,length=999)#customnumbersandprintout
x11()#forgraphics
par(pch="+")#plottingcharacter
source(file.path(Sys.getenv("HOME"),"R","mystuff.R"))
#mypersonalfunctions
library(MASS)#attachapackage
}
Similarlyafunction.Last(),ifdefined,is(normally)executedattheveryendofthesession.An
exampleisgivenbelow.
>.Last<function(){
graphics.off()#asmallsafetymeasure.
cat(paste(date(),"\nAdios\n"))#Isittimeforlunch?
}
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
54/116
5/28/2015
AnIntroductiontoR
Previous:Customizingtheenvironment,Up:Writingyourownfunctions[Contents][Index]
10.9Classes,genericfunctionsandobjectorientation
Theclassofanobjectdetermineshowitwillbetreatedbywhatareknownasgenericfunctions.
Puttheotherwayround,agenericfunctionperformsataskoractiononitsargumentsspecificto
theclassoftheargumentitself.Iftheargumentlacksanyclassattribute,orhasaclassnot
cateredforspecificallybythegenericfunctioninquestion,thereisalwaysadefaultaction
provided.
Anexamplemakesthingsclearer.Theclassmechanismofferstheuserthefacilityofdesigning
andwritinggenericfunctionsforspecialpurposes.Amongtheothergenericfunctionsareplot()
fordisplayingobjectsgraphically,summary()forsummarizinganalysesofvarioustypes,and
anova()forcomparingstatisticalmodels.
Thenumberofgenericfunctionsthatcantreataclassinaspecificwaycanbequitelarge.For
example,thefunctionsthatcanaccommodateinsomefashionobjectsofclass"data.frame"
include
[[[<anyas.matrix
[<meanplotsummary
Acurrentlycompletelistcanbegotbyusingthemethods()function:
>methods(class="data.frame")
Converselythenumberofclassesagenericfunctioncanhandlecanalsobequitelarge.For
exampletheplot()functionhasadefaultmethodandvariantsforobjectsofclasses
"data.frame","density","factor",andmore.Acompletelistcanbegotagainbyusingthe
methods()function:
>methods(plot)
Formanygenericfunctionsthefunctionbodyisquiteshort,forexample
>coef
function(object,...)
UseMethod("coef")
ThepresenceofUseMethodindicatesthisisagenericfunction.Toseewhatmethodsareavailable
wecanusemethods()
>methods(coef)
[1]coef.aov*coef.Arima*coef.default*coef.listof*
[5]coef.nls*coef.summary.nls*
Nonvisiblefunctionsareasterisked
Inthisexampletherearesixmethods,noneofwhichcanbeseenbytypingitsname.Wecan
readthesebyeitherof
>getAnywhere("coef.aov")
Asingleobjectmatchingcoef.aovwasfound
Itwasfoundinthefollowingplaces
registeredS3methodforcoeffromnamespacestats
namespace:stats
withvalue
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
55/116
5/28/2015
AnIntroductiontoR
function(object,...)
{
z<object$coef
z[!is.na(z)]
}
>getS3method("coef","aov")
function(object,...)
{
z<object$coef
z[!is.na(z)]
}
Afunctionnamedgen.clwillbeinvokedbythegenericgenforclasscl,sodonotname
functionsinthisstyleunlesstheyareintendedtobemethods.
ThereaderisreferredtotheRLanguageDefinitionforamorecompletediscussionofthis
mechanism.
Next:Graphics,Previous:Writingyourownfunctions,Up:Top[Contents][Index]
11StatisticalmodelsinR
Thissectionpresumesthereaderhassomefamiliaritywithstatisticalmethodology,inparticular
withregressionanalysisandtheanalysisofvariance.Laterwemakesomerathermoreambitious
presumptions,namelythatsomethingisknownaboutgeneralizedlinearmodelsandnonlinear
regression.
Therequirementsforfittingstatisticalmodelsaresufficientlywelldefinedtomakeitpossibleto
constructgeneraltoolsthatapplyinabroadspectrumofproblems.
Rprovidesaninterlockingsuiteoffacilitiesthatmakefittingstatisticalmodelsverysimple.As
wementionintheintroduction,thebasicoutputisminimal,andoneneedstoaskforthedetails
bycallingextractorfunctions.
Formulaeforstatisticalmodels:
Linearmodels:
Genericfunctionsforextractingmodelinformation:
Analysisofvarianceandmodelcomparison:
Updatingfittedmodels:
Generalizedlinearmodels:
Nonlinearleastsquaresandmaximumlikelihoodmodels:
Somenonstandardmodels:
Next:Linearmodels,Previous:StatisticalmodelsinR,Up:StatisticalmodelsinR[Contents]
[Index]
11.1Definingstatisticalmodels;formulae
Thetemplateforastatisticalmodelisalinearregressionmodelwithindependent,homoscedastic
errors
y_i=sum_{j=0}^pbeta_jx_{ij}+e_i,i=1,,n,
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
56/116
5/28/2015
AnIntroductiontoR
wherethee_iareNID(0,sigma^2).Inmatrixtermsthiswouldbewritten
y=Xbeta+e
wheretheyistheresponsevector,Xisthemodelmatrixordesignmatrixandhascolumnsx_0,
x_1,,x_p,thedeterminingvariables.Veryoftenx_0willbeacolumnofonesdefiningan
interceptterm.
Examples
Beforegivingaformalspecification,afewexamplesmayusefullysetthepicture.
Supposey,x,x0,x1,x2,arenumericvariables,XisamatrixandA,B,C,arefactors.The
followingformulaeontheleftsidebelowspecifystatisticalmodelsasdescribedontheright.
y~x
y~1+x
Bothimplythesamesimplelinearregressionmodelofyonx.Thefirsthasanimplicit
interceptterm,andthesecondanexplicitone.
y~0+x
y~1+x
y~x1
Simplelinearregressionofyonxthroughtheorigin(thatis,withoutaninterceptterm).
log(y)~x1+x2
Multipleregressionofthetransformedvariable,log(y),onx1andx2(withanimplicit
interceptterm).
y~poly(x,2)
y~1+x+I(x^2)
Polynomialregressionofyonxofdegree2.Thefirstformusesorthogonalpolynomials,
andthesecondusesexplicitpowers,asbasis.
y~X+poly(x,2)
MultipleregressionywithmodelmatrixconsistingofthematrixXaswellaspolynomial
termsinxtodegree2.
y~A
Singleclassificationanalysisofvariancemodelofy,withclassesdeterminedbyA.
y~A+x
Singleclassificationanalysisofcovariancemodelofy,withclassesdeterminedbyA,and
withcovariatex.
y~A*B
y~A+B+A:B
y~B%in%A
y~A/B
TwofactornonadditivemodelofyonAandB.Thefirsttwospecifythesamecrossed
classificationandthesecondtwospecifythesamenestedclassification.Inabstractterms
allfourspecifythesamemodelsubspace.
y~(A+B+C)^2
y~A*B*CA:B:C
Threefactorexperimentbutwithamodelcontainingmaineffectsandtwofactor
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
57/116
5/28/2015
AnIntroductiontoR
interactionsonly.Bothformulaespecifythesamemodel.
y~A*x
y~A/x
y~A/(1+x)1
SeparatesimplelinearregressionmodelsofyonxwithinthelevelsofA,withdifferent
codings.Thelastformproducesexplicitestimatesofasmanydifferentinterceptsand
slopesastherearelevelsinA.
y~A*B+Error(C)
Anexperimentwithtwotreatmentfactors,AandB,anderrorstratadeterminedbyfactor
C.Forexampleasplitplotexperiment,withwholeplots(andhencealsosubplots),
determinedbyfactorC.
Theoperator~isusedtodefineamodelformulainR.Theform,foranordinarylinearmodel,is
response~op_1term_1op_2term_2op_3term_3
where
response
isavectorormatrix,(orexpressionevaluatingtoavectorormatrix)definingtheresponse
variable(s).
op_i
isanoperator,either+or,implyingtheinclusionorexclusionofaterminthemodel,
(thefirstisoptional).
term_i
iseither
avectorormatrixexpression,or1,
afactor,or
aformulaexpressionconsistingoffactors,vectorsormatricesconnectedbyformula
operators.
Inallcaseseachtermdefinesacollectionofcolumnseithertobeaddedtoorremoved
fromthemodelmatrix.A1standsforaninterceptcolumnandisbydefaultincludedinthe
modelmatrixunlessexplicitlyremoved.
TheformulaoperatorsaresimilarineffecttotheWilkinsonandRogersnotationusedbysuch
programsasGlimandGenstat.Oneinevitablechangeisthattheoperator.becomes:since
theperiodisavalidnamecharacterinR.
Thenotationissummarizedbelow(basedonChambers&Hastie,1992,p.29):
Y~M
YismodeledasM.
M_1+M_2
IncludeM_1andM_2.
M_1M_2
IncludeM_1leavingouttermsofM_2.
M_1:M_2
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
58/116
5/28/2015
AnIntroductiontoR
ThetensorproductofM_1andM_2.Ifbothtermsarefactors,thenthesubclassesfactor.
M_1%in%M_2
SimilartoM_1:M_2,butwithadifferentcoding.
M_1*M_2
M_1+M_2+M_1:M_2.
M_1/M_2
M_1+M_2%in%M_1.
M^n
AlltermsinMtogetherwithinteractionsuptoordern
I(M)
InsulateM.InsideMalloperatorshavetheirnormalarithmeticmeaning,andthatterm
appearsinthemodelmatrix.
Notethatinsidetheparenthesesthatusuallyenclosefunctionargumentsalloperatorshavetheir
normalarithmeticmeaning.ThefunctionI()isanidentityfunctionusedtoallowtermsinmodel
formulaetobedefinedusingarithmeticoperators.
Noteparticularlythatthemodelformulaespecifythecolumnsofthemodelmatrix,the
specificationoftheparametersbeingimplicit.Thisisnotthecaseinothercontexts,forexample
inspecifyingnonlinearmodels.
Contrasts:
Previous:Formulaeforstatisticalmodels,Up:Formulaeforstatisticalmodels[Contents]
[Index]
11.1.1Contrasts
Weneedatleastsomeideahowthemodelformulaespecifythecolumnsofthemodelmatrix.
Thisiseasyifwehavecontinuousvariables,aseachprovidesonecolumnofthemodelmatrix
(andtheinterceptwillprovideacolumnofonesifincludedinthemodel).
WhataboutaklevelfactorA?Theanswerdiffersforunorderedandorderedfactors.For
unorderedfactorsk1columnsaregeneratedfortheindicatorsofthesecond,,kthlevelsof
thefactor.(Thustheimplicitparameterizationistocontrasttheresponseateachlevelwiththat
atthefirst.)Fororderedfactorsthek1columnsaretheorthogonalpolynomialson1,,k,
omittingtheconstantterm.
Althoughtheanswerisalreadycomplicated,itisnotthewholestory.First,iftheinterceptis
omittedinamodelthatcontainsafactorterm,thefirstsuchtermisencodedintokcolumns
givingtheindicatorsforallthelevels.Second,thewholebehaviorcanbechangedbythe
optionssettingforcontrasts.ThedefaultsettinginRis
options(contrasts=c("contr.treatment","contr.poly"))
ThemainreasonformentioningthisisthatRandShavedifferentdefaultsforunorderedfactors,
SusingHelmertcontrasts.Soifyouneedtocompareyourresultstothoseofatextbookorpaper
whichusedSPLUS,youwillneedtoset
options(contrasts=c("contr.helmert","contr.poly"))
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
59/116
5/28/2015
AnIntroductiontoR
Thisisadeliberatedifference,astreatmentcontrasts(Rsdefault)arethoughteasierfor
newcomerstointerpret.
Wehavestillnotfinished,asthecontrastschemetobeusedcanbesetforeachterminthe
modelusingthefunctionscontrastsandC.
Wehavenotyetconsideredinteractionterms:thesegeneratetheproductsofthecolumns
introducedfortheircomponentterms.
Althoughthedetailsarecomplicated,modelformulaeinRwillnormallygeneratethemodels
thatanexpertstatisticianwouldexpect,providedthatmarginalityispreserved.Fitting,for
example,amodelwithaninteractionbutnotthecorrespondingmaineffectswillingenerallead
tosurprisingresults,andisforexpertsonly.
Next:Genericfunctionsforextractingmodelinformation,Previous:Formulaeforstatistical
models,Up:StatisticalmodelsinR[Contents][Index]
11.2Linearmodels
Thebasicfunctionforfittingordinarymultiplemodelsislm(),andastreamlinedversionofthe
callisasfollows:
>fitted.model<lm(formula,data=data.frame)
Forexample
>fm2<lm(y~x1+x2,data=production)
wouldfitamultipleregressionmodelofyonx1andx2(withimplicitinterceptterm).
Theimportant(buttechnicallyoptional)parameterdata=productionspecifiesthatany
variablesneededtoconstructthemodelshouldcomefirstfromtheproductiondataframe.This
isthecaseregardlessofwhetherdataframeproductionhasbeenattachedonthesearchpathor
not.
Next:Analysisofvarianceandmodelcomparison,Previous:Linearmodels,Up:Statistical
modelsinR[Contents][Index]
11.3Genericfunctionsforextractingmodelinformation
Thevalueoflm()isafittedmodelobjecttechnicallyalistofresultsofclass"lm".Information
aboutthefittedmodelcanthenbedisplayed,extracted,plottedandsoonbyusinggeneric
functionsthatorientthemselvestoobjectsofclass"lm".Theseinclude
add1devianceformulapredictstep
aliasdrop1kappaprintsummary
anovaeffectslabelsprojvcov
coeffamilyplotresiduals
Abriefdescriptionofthemostcommonlyusedonesisgivenbelow.
anova(object_1,object_2)
Compareasubmodelwithanoutermodelandproduceananalysisofvariancetable.
coef(object)
Extracttheregressioncoefficient(matrix).
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
60/116
5/28/2015
AnIntroductiontoR
Longform:coefficients(object).
deviance(object)
Residualsumofsquares,weightedifappropriate.
formula(object)
Extractthemodelformula.
plot(object)
Producefourplots,showingresiduals,fittedvaluesandsomediagnostics.
predict(object,newdata=data.frame)
Thedataframesuppliedmusthavevariablesspecifiedwiththesamelabelsastheoriginal.
Thevalueisavectorormatrixofpredictedvaluescorrespondingtothedetermining
variablevaluesindata.frame.
print(object)
Printaconciseversionoftheobject.Mostoftenusedimplicitly.
residuals(object)
Extractthe(matrixof)residuals,weightedasappropriate.
Shortform:resid(object).
step(object)
Selectasuitablemodelbyaddingordroppingtermsandpreservinghierarchies.The
modelwiththesmallestvalueofAIC(AkaikesAnInformationCriterion)discoveredin
thestepwisesearchisreturned.
summary(object)
Printacomprehensivesummaryoftheresultsoftheregressionanalysis.
vcov(object)
Returnsthevariancecovariancematrixofthemainparametersofafittedmodelobject.
Next:Updatingfittedmodels,Previous:Genericfunctionsforextractingmodelinformation,Up:
StatisticalmodelsinR[Contents][Index]
11.4Analysisofvarianceandmodelcomparison
Themodelfittingfunctionaov(formula,data=data.frame)operatesatthesimplestlevelina
verysimilarwaytothefunctionlm(),andmostofthegenericfunctionslistedinthetablein
Genericfunctionsforextractingmodelinformationapply.
Itshouldbenotedthatinadditionaov()allowsananalysisofmodelswithmultipleerrorstrata
suchassplitplotexperiments,orbalancedincompleteblockdesignswithrecoveryofinterblock
information.Themodelformula
response~mean.formula+Error(strata.formula)
specifiesamultistratumexperimentwitherrorstratadefinedbythestrata.formula.Inthe
simplestcase,strata.formulaissimplyafactor,whenitdefinesatwostrataexperiment,namely
betweenandwithinthelevelsofthefactor.
Forexample,withalldeterminingvariablesfactors,amodelformulasuchasthatin:
>fm<aov(yield~v+n*p*k+Error(farms/blocks),data=farm.data)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
61/116
5/28/2015
AnIntroductiontoR
wouldtypicallybeusedtodescribeanexperimentwithmeanmodelv+n*p*kandthreeerror
strata,namelybetweenfarms,withinfarms,betweenblocksandwithinblocks.
ANOVAtables:
Previous:Analysisofvarianceandmodelcomparison,Up:Analysisofvarianceandmodel
comparison[Contents][Index]
11.4.1ANOVAtables
Notealsothattheanalysisofvariancetable(ortables)areforasequenceoffittedmodels.The
sumsofsquaresshownarethedecreaseintheresidualsumsofsquaresresultingfroman
inclusionofthatterminthemodelatthatplaceinthesequence.Henceonlyfororthogonal
experimentswilltheorderofinclusionbeinconsequential.
Formultistratumexperimentstheprocedureisfirsttoprojecttheresponseontotheerrorstrata,
againinsequence,andtofitthemeanmodeltoeachprojection.Forfurtherdetails,see
Chambers&Hastie(1992).
AmoreflexiblealternativetothedefaultfullANOVAtableistocomparetwoormoremodels
directlyusingtheanova()function.
>anova(fitted.model.1,fitted.model.2,)
ThedisplayisthenanANOVAtableshowingthedifferencesbetweenthefittedmodelswhen
fittedinsequence.Thefittedmodelsbeingcomparedwouldusuallybeanhierarchicalsequence,
ofcourse.Thisdoesnotgivedifferentinformationtothedefault,butrathermakesiteasierto
comprehendandcontrol.
Next:Generalizedlinearmodels,Previous:Analysisofvarianceandmodelcomparison,Up:
StatisticalmodelsinR[Contents][Index]
11.5Updatingfittedmodels
Theupdate()functionislargelyaconveniencefunctionthatallowsamodeltobefittedthat
differsfromonepreviouslyfittedusuallybyjustafewadditionalorremovedterms.Itsformis
>new.model<update(old.model,new.formula)
Inthenew.formulathespecialnameconsistingofaperiod,.,only,canbeusedtostandfor
thecorrespondingpartoftheoldmodelformula.Forexample,
>fm05<lm(y~x1+x2+x3+x4+x5,data=production)
>fm6<update(fm05,.~.+x6)
>smf6<update(fm6,sqrt(.)~.)
wouldfitafivevariatemultipleregressionwithvariables(presumably)fromthedataframe
production,fitanadditionalmodelincludingasixthregressorvariable,andfitavariantonthe
modelwheretheresponsehadasquareroottransformapplied.
Noteespeciallythatifthedata=argumentisspecifiedontheoriginalcalltothemodelfitting
function,thisinformationispassedonthroughthefittedmodelobjecttoupdate()anditsallies.
Thename.canalsobeusedinothercontexts,butwithslightlydifferentmeaning.For
example
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
62/116
5/28/2015
AnIntroductiontoR
>fmfull<lm(y~.,data=production)
wouldfitamodelwithresponseyandregressorvariablesallothervariablesinthedataframe
production.
Otherfunctionsforexploringincrementalsequencesofmodelsareadd1(),drop1()andstep().
Thenamesofthesegiveagoodcluetotheirpurpose,butforfulldetailsseetheonlinehelp.
Next:Nonlinearleastsquaresandmaximumlikelihoodmodels,Previous:Updatingfitted
models,Up:StatisticalmodelsinR[Contents][Index]
11.6Generalizedlinearmodels
Generalizedlinearmodelingisadevelopmentoflinearmodelstoaccommodatebothnonnormal
responsedistributionsandtransformationstolinearityinacleanandstraightforwardway.A
generalizedlinearmodelmaybedescribedintermsofthefollowingsequenceofassumptions:
Thereisaresponse,y,ofinterestandstimulusvariablesx_1,x_2,,whosevalues
influencethedistributionoftheresponse.
Thestimulusvariablesinfluencethedistributionofythroughasinglelinearfunction,
only.Thislinearfunctioniscalledthelinearpredictor,andisusuallywritten
eta=beta_1x_1+beta_2x_2++beta_px_p,
hencex_ihasnoinfluenceonthedistributionofyifandonlyifbeta_iiszero.
Thedistributionofyisoftheform
f_Y(ymu,phi)
=exp((A/phi)*(ylambda(mu)gamma(lambda(mu)))+tau(y,phi))
wherephiisascaleparameter(possiblyknown),andisconstantforallobservations,A
representsapriorweight,assumedknownbutpossiblyvaryingwiththeobservations,and
$\mu$isthemeanofy.Soitisassumedthatthedistributionofyisdeterminedbyits
meanandpossiblyascaleparameteraswell.
Themean,mu,isasmoothinvertiblefunctionofthelinearpredictor:
mu=m(eta),eta=m^{1}(mu)=ell(mu)
andthisinversefunction,ell(),iscalledthelinkfunction.
Theseassumptionsarelooseenoughtoencompassawideclassofmodelsusefulinstatistical
practice,buttightenoughtoallowthedevelopmentofaunifiedmethodologyofestimationand
inference,atleastapproximately.Thereaderisreferredtoanyofthecurrentreferenceworkson
thesubjectforfulldetails,suchasMcCullagh&Nelder(1989)orDobson(1990).
Families:
Theglm()function:
Next:Theglm()function,Previous:Generalizedlinearmodels,Up:Generalizedlinearmodels
[Contents][Index]
11.6.1Families
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
63/116
5/28/2015
AnIntroductiontoR
TheclassofgeneralizedlinearmodelshandledbyfacilitiessuppliedinRincludesgaussian,
binomial,poisson,inversegaussianandgammaresponsedistributionsandalsoquasilikelihood
modelswheretheresponsedistributionisnotexplicitlyspecified.Inthelattercasethevariance
functionmustbespecifiedasafunctionofthemean,butinothercasesthisfunctionisimplied
bytheresponsedistribution.
Eachresponsedistributionadmitsavarietyoflinkfunctionstoconnectthemeanwiththelinear
predictor.Thoseautomaticallyavailableareshowninthefollowingtable:
Familyname
Linkfunctions
binomial
logit,probit,log,cloglog
gaussian
identity,log,inverse
Gamma
identity,inverse,log
inverse.gaussian
1/mu^2,identity,inverse,log
poisson
identity,log,sqrt
quasi
logit,probit,cloglog,identity,inverse,log,1/mu^2,
sqrt
Thecombinationofaresponsedistribution,alinkfunctionandvariousotherpiecesof
informationthatareneededtocarryoutthemodelingexerciseiscalledthefamilyofthe
generalizedlinearmodel.
Previous:Families,Up:Generalizedlinearmodels[Contents][Index]
11.6.2Theglm()function
Sincethedistributionoftheresponsedependsonthestimulusvariablesthroughasinglelinear
functiononly,thesamemechanismaswasusedforlinearmodelscanstillbeusedtospecifythe
linearpartofageneralizedmodel.Thefamilyhastobespecifiedinadifferentway.
TheRfunctiontofitageneralizedlinearmodelisglm()whichusestheform
>fitted.model<glm(formula,family=family.generator,data=data.frame)
Theonlynewfeatureisthefamily.generator,whichistheinstrumentbywhichthefamilyis
described.Itisthenameofafunctionthatgeneratesalistoffunctionsandexpressionsthat
togetherdefineandcontrolthemodelandestimationprocess.Althoughthismayseemalittle
complicatedatfirstsight,itsuseisquitesimple.
Thenamesofthestandard,suppliedfamilygeneratorsaregivenunderFamilyNameinthe
tableinFamilies.Wherethereisachoiceoflinks,thenameofthelinkmayalsobesupplied
withthefamilyname,inparenthesesasaparameter.Inthecaseofthequasifamily,thevariance
functionmayalsobespecifiedinthisway.
Someexamplesmaketheprocessclear.
Thegaussianfamily
Acallsuchas
>fm<glm(y~x1+x2,family=gaussian,data=sales)
achievesthesameresultas
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
64/116
5/28/2015
AnIntroductiontoR
>fm<lm(y~x1+x2,data=sales)
butmuchlessefficiently.Notehowthegaussianfamilyisnotautomaticallyprovidedwitha
choiceoflinks,sonoparameterisallowed.Ifaproblemrequiresagaussianfamilywitha
nonstandardlink,thiscanusuallybeachievedthroughthequasifamily,asweshallseelater.
Thebinomialfamily
Considerasmall,artificialexample,fromSilvey(1970).
OntheAegeanislandofKalythosthemaleinhabitantssufferfromacongenitaleyedisease,the
effectsofwhichbecomemoremarkedwithincreasingage.Samplesofislandermalesofvarious
agesweretestedforblindnessandtheresultsrecorded.Thedataisshownbelow:
Age:
20 35 45 55 70
No.tested: 50 50 50 50 50
No.blind: 6 17 26 37 44
Theproblemweconsideristofitbothlogisticandprobitmodelstothisdata,andtoestimatefor
eachmodeltheLD50,thatistheageatwhichthechanceofblindnessforamaleinhabitantis
50%.
Ifyisthenumberofblindatagexandnthenumbertested,bothmodelshavetheformy~B(n,
F(beta_0+beta_1x))wherefortheprobitcase,F(z)=Phi(z)isthestandardnormaldistribution
function,andinthelogitcase(thedefault),F(z)=e^z/(1+e^z).InbothcasestheLD50isLD50
=beta_0/beta_1thatis,thepointatwhichtheargumentofthedistributionfunctioniszero.
Thefirststepistosetthedataupasadataframe
>kalythos<data.frame(x=c(20,35,45,55,70),n=rep(50,5),
y=c(6,17,26,37,44))
Tofitabinomialmodelusingglm()therearethreepossibilitiesfortheresponse:
Iftheresponseisavectoritisassumedtoholdbinarydata,andsomustbea0/1vector.
Iftheresponseisatwocolumnmatrixitisassumedthatthefirstcolumnholdsthenumber
ofsuccessesforthetrialandthesecondholdsthenumberoffailures.
Iftheresponseisafactor,itsfirstlevelistakenasfailure(0)andallotherlevelsas
success(1).
Hereweneedthesecondoftheseconventions,soweaddamatrixtoourdataframe:
>kalythos$Ymat<cbind(kalythos$y,kalythos$nkalythos$y)
Tofitthemodelsweuse
>fmp<glm(Ymat~x,family=binomial(link=probit),data=kalythos)
>fml<glm(Ymat~x,family=binomial,data=kalythos)
Sincethelogitlinkisthedefaulttheparametermaybeomittedonthesecondcall.Toseethe
resultsofeachfitwecoulduse
>summary(fmp)
>summary(fml)
Bothmodelsfit(alltoo)well.TofindtheLD50estimatewecanuseasimplefunction:
>ld50<function(b)b[1]/b[2]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
65/116
5/28/2015
AnIntroductiontoR
>ldp<ld50(coef(fmp));ldl<ld50(coef(fml));c(ldp,ldl)
Theactualestimatesfromthisdataare43.663yearsand43.601yearsrespectively.
Poissonmodels
WiththePoissonfamilythedefaultlinkisthelog,andinpracticethemajoruseofthisfamilyis
tofitsurrogatePoissonloglinearmodelstofrequencydata,whoseactualdistributionisoften
multinomial.Thisisalargeandimportantsubjectwewillnotdiscussfurtherhere.Itevenforms
amajorpartoftheuseofnongaussiangeneralizedmodelsoverall.
OccasionallygenuinelyPoissondataarisesinpracticeandinthepastitwasoftenanalyzedas
gaussiandataaftereitheralogorasquareroottransformation.Asagracefulalternativetothe
latter,aPoissongeneralizedlinearmodelmaybefittedasinthefollowingexample:
>fmod<glm(y~A+B+x,family=poisson(link=sqrt),
data=worm.counts)
Quasilikelihoodmodels
Forallfamiliesthevarianceoftheresponsewilldependonthemeanandwillhavethescale
parameterasamultiplier.Theformofdependenceofthevarianceonthemeanisacharacteristic
oftheresponsedistributionforexampleforthepoissondistributionVar(y)=mu.
Forquasilikelihoodestimationandinferencethepreciseresponsedistributionisnotspecified,
butratheronlyalinkfunctionandtheformofthevariancefunctionasitdependsonthemean.
Sincequasilikelihoodestimationusesformallyidenticaltechniquestothoseforthegaussian
distribution,thisfamilyprovidesawayoffittinggaussianmodelswithnonstandardlink
functionsorvariancefunctions,incidentally.
Forexample,considerfittingthenonlinearregressiony=theta_1z_1/(z_2theta_2)+e
whichmaybewrittenalternativelyasy=1/(beta_1x_1+beta_2x_2)+ewherex_1=
z_2/z_1,x_2=1/z_1,beta_1=1/theta_1,andbeta_2=theta_2/theta_1.Supposingasuitable
dataframetobesetupwecouldfitthisnonlinearregressionas
>nlfit<glm(y~x1+x21,
family=quasi(link=inverse,variance=constant),
data=biochem)
Thereaderisreferredtothemanualandthehelpdocumentforfurtherinformation,asneeded.
Next:Somenonstandardmodels,Previous:Generalizedlinearmodels,Up:Statisticalmodelsin
R[Contents][Index]
11.7Nonlinearleastsquaresandmaximumlikelihoodmodels
CertainformsofnonlinearmodelcanbefittedbyGeneralizedLinearModels(glm()).Butinthe
majorityofcaseswehavetoapproachthenonlinearcurvefittingproblemasoneofnonlinear
optimization.Rsnonlinearoptimizationroutinesareoptim(),nlm()andnlminb(),which
providethefunctionality(andmore)ofSPLUSsms()andnlminb().Weseektheparameter
valuesthatminimizesomeindexoflackoffit,andtheydothisbytryingoutvariousparameter
valuesiteratively.Unlikelinearregressionforexample,thereisnoguaranteethattheprocedure
willconvergeonsatisfactoryestimates.Allthemethodsrequireinitialguessesaboutwhat
parametervaluestotry,andconvergencemaydependcriticallyuponthequalityofthestarting
values.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
66/116
5/28/2015
AnIntroductiontoR
Leastsquares:
Maximumlikelihood:
Next:Maximumlikelihood,Previous:Nonlinearleastsquaresandmaximumlikelihoodmodels,
Up:Nonlinearleastsquaresandmaximumlikelihoodmodels[Contents][Index]
11.7.1Leastsquares
Onewaytofitanonlinearmodelisbyminimizingthesumofthesquarederrors(SSE)or
residuals.Thismethodmakessenseiftheobservederrorscouldhaveplausiblyarisenfroma
normaldistribution.
HereisanexamplefromBates&Watts(1988),page51.Thedataare:
>x<c(0.02,0.02,0.06,0.06,0.11,0.11,0.22,0.22,0.56,0.56,
1.10,1.10)
>y<c(76,47,97,107,123,139,159,152,191,201,207,200)
Thefitcriteriontobeminimizedis:
>fn<function(p)sum((y(p[1]*x)/(p[2]+x))^2)
Inordertodothefitweneedinitialestimatesoftheparameters.Onewaytofindsensible
startingvaluesistoplotthedata,guesssomeparametervalues,andsuperimposethemodel
curveusingthosevalues.
>plot(x,y)
>xfit<seq(.02,1.1,.05)
>yfit<200*xfit/(0.1+xfit)
>lines(spline(xfit,yfit))
Wecoulddobetter,butthesestartingvaluesof200and0.1seemadequate.Nowdothefit:
>out<nlm(fn,p=c(200,0.1),hessian=TRUE)
Afterthefitting,out$minimumistheSSE,andout$estimatearetheleastsquaresestimatesofthe
parameters.Toobtaintheapproximatestandarderrors(SE)oftheestimateswedo:
>sqrt(diag(2*out$minimum/(length(y)2)*solve(out$hessian)))
The2whichissubtractedinthelineaboverepresentsthenumberofparameters.A95%
confidenceintervalwouldbetheparameterestimate+/1.96SE.Wecansuperimposetheleast
squaresfitonanewplot:
>plot(x,y)
>xfit<seq(.02,1.1,.05)
>yfit<212.68384222*xfit/(0.06412146+xfit)
>lines(spline(xfit,yfit))
Thestandardpackagestatsprovidesmuchmoreextensivefacilitiesforfittingnonlinearmodels
byleastsquares.ThemodelwehavejustfittedistheMichaelisMentenmodel,sowecanuse
>df<data.frame(x=x,y=y)
>fit<nls(y~SSmicmen(x,Vm,K),df)
>fit
Nonlinearregressionmodel
model:y~SSmicmen(x,Vm,K)
data:df
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
67/116
5/28/2015
AnIntroductiontoR
VmK
212.683707110.06412123
residualsumofsquares:1195.449
>summary(fit)
Formula:y~SSmicmen(x,Vm,K)
Parameters:
EstimateStd.ErrortvaluePr(>|t|)
Vm2.127e+026.947e+0030.6153.24e11
K6.412e028.281e037.7431.57e05
Residualstandarderror:10.93on10degreesoffreedom
CorrelationofParameterEstimates:
Vm
K0.7651
Previous:Leastsquares,Up:Nonlinearleastsquaresandmaximumlikelihoodmodels
[Contents][Index]
11.7.2Maximumlikelihood
Maximumlikelihoodisamethodofnonlinearmodelfittingthatapplieseveniftheerrorsarenot
normal.Themethodfindstheparametervalueswhichmaximizetheloglikelihood,or
equivalentlywhichminimizethenegativeloglikelihood.HereisanexamplefromDobson
(1990),pp.108111.Thisexamplefitsalogisticmodeltodoseresponsedata,whichclearly
couldalsobefitbyglm().Thedataare:
>x<c(1.6907,1.7242,1.7552,1.7842,1.8113,
1.8369,1.8610,1.8839)
>y<c(6,13,18,28,52,53,61,60)
>n<c(59,60,62,56,63,59,62,60)
Thenegativeloglikelihoodtominimizeis:
>fn<function(p)
sum((y*(p[1]+p[2]*x)n*log(1+exp(p[1]+p[2]*x))
+log(choose(n,y))))
Wepicksensiblestartingvaluesanddothefit:
>out<nlm(fn,p=c(50,20),hessian=TRUE)
Afterthefitting,out$minimumisthenegativeloglikelihood,andout$estimatearethemaximum
likelihoodestimatesoftheparameters.ToobtaintheapproximateSEsoftheestimateswedo:
>sqrt(diag(solve(out$hessian)))
A95%confidenceintervalwouldbetheparameterestimate+/1.96SE.
Previous:Nonlinearleastsquaresandmaximumlikelihoodmodels,Up:StatisticalmodelsinR
[Contents][Index]
11.8Somenonstandardmodels
WeconcludethischapterwithjustabriefmentionofsomeoftheotherfacilitiesavailableinR
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
68/116
5/28/2015
AnIntroductiontoR
forspecialregressionanddataanalysisproblems.
Mixedmodels.Therecommendednlmepackageprovidesfunctionslme()andnlme()for
linearandnonlinearmixedeffectsmodels,thatislinearandnonlinearregressionsin
whichsomeofthecoefficientscorrespondtorandomeffects.Thesefunctionsmakeheavy
useofformulaetospecifythemodels.
Localapproximatingregressions.Theloess()functionfitsanonparametricregression
byusingalocallyweightedregression.Suchregressionsareusefulforhighlightingatrend
inmessydataorfordatareductiontogivesomeinsightintoalargedataset.
Functionloessisinthestandardpackagestats,togetherwithcodeforprojectionpursuit
regression.
Robustregression.Thereareseveralfunctionsavailableforfittingregressionmodelsin
awayresistanttotheinfluenceofextremeoutliersinthedata.Functionlqsinthe
recommendedpackageMASSprovidesstateofartalgorithmsforhighlyresistantfits.
Lessresistantbutstatisticallymoreefficientmethodsareavailableinpackages,for
examplefunctionrlminpackageMASS.
Additivemodels.Thistechniqueaimstoconstructaregressionfunctionfromsmooth
additivefunctionsofthedeterminingvariables,usuallyoneforeachdeterminingvariable.
Functionsavasandaceinpackageacepackandfunctionsbrutoandmarsinpackagemda
providesomeexamplesofthesetechniquesinusercontributedpackagestoR.An
extensionisGeneralizedAdditiveModels,implementedinusercontributedpackages
gamandmgcv.
Treebasedmodels.Ratherthanseekanexplicitgloballinearmodelforpredictionor
interpretation,treebasedmodelsseektobifurcatethedata,recursively,atcriticalpointsof
thedeterminingvariablesinordertopartitionthedataultimatelyintogroupsthatareas
homogeneousaspossiblewithin,andasheterogeneousaspossiblebetween.Theresults
oftenleadtoinsightsthatotherdataanalysismethodstendnottoyield.
Modelsareagainspecifiedintheordinarylinearmodelform.Themodelfittingfunctionis
tree(),butmanyothergenericfunctionssuchasplot()andtext()arewelladaptedto
displayingtheresultsofatreebasedmodelfitinagraphicalway.
TreemodelsareavailableinRviatheusercontributedpackagesrpartandtree.
Next:Packages,Previous:StatisticalmodelsinR,Up:Top[Contents][Index]
12Graphicalprocedures
GraphicalfacilitiesareanimportantandextremelyversatilecomponentoftheRenvironment.It
ispossibletousethefacilitiestodisplayawidevarietyofstatisticalgraphsandalsotobuild
entirelynewtypesofgraph.
Thegraphicsfacilitiescanbeusedinbothinteractiveandbatchmodes,butinmostcases,
interactiveuseismoreproductive.InteractiveuseisalsoeasybecauseatstartuptimeRinitiates
agraphicsdevicedriverwhichopensaspecialgraphicswindowforthedisplayofinteractive
graphics.Althoughthisisdoneautomatically,itmayusefultoknowthatthecommandusedis
X11()underUNIX,windows()underWindowsandquartz()underOSX.Anewdevicecan
alwaysbeopenedbydev.new().
Oncethedevicedriverisrunning,Rplottingcommandscanbeusedtoproduceavarietyof
graphicaldisplaysandtocreateentirelynewkindsofdisplay.
Plottingcommandsaredividedintothreebasicgroups:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
69/116
5/28/2015
AnIntroductiontoR
Highlevelplottingfunctionscreateanewplotonthegraphicsdevice,possiblywithaxes,
labels,titlesandsoon.
Lowlevelplottingfunctionsaddmoreinformationtoanexistingplot,suchasextra
points,linesandlabels.
Interactivegraphicsfunctionsallowyouinteractivelyaddinformationto,orextract
informationfrom,anexistingplot,usingapointingdevicesuchasamouse.
Inaddition,Rmaintainsalistofgraphicalparameterswhichcanbemanipulatedtocustomize
yourplots.
Thismanualonlydescribeswhatareknownasbasegraphics.Aseparategraphicssubsystem
inpackagegridcoexistswithbaseitismorepowerfulbuthardertouse.Thereisa
recommendedpackagelatticewhichbuildsongridandprovideswaystoproducemultipanel
plotsakintothoseintheTrellissysteminS.
Highlevelplottingcommands:
Lowlevelplottingcommands:
Interactingwithgraphics:
Usinggraphicsparameters:
Graphicsparameters:
Devicedrivers:
Dynamicgraphics:
Next:Lowlevelplottingcommands,Previous:Graphics,Up:Graphics[Contents][Index]
12.1Highlevelplottingcommands
Highlevelplottingfunctionsaredesignedtogenerateacompleteplotofthedatapassedas
argumentstothefunction.Whereappropriate,axes,labelsandtitlesareautomaticallygenerated
(unlessyourequestotherwise.)Highlevelplottingcommandsalwaysstartanewplot,erasing
thecurrentplotifnecessary.
Theplot()function:
Displayingmultivariatedata:
Displaygraphics:
Argumentstohighlevelplottingfunctions:
Next:Displayingmultivariatedata,Previous:Highlevelplottingcommands,Up:Highlevel
plottingcommands[Contents][Index]
12.1.1Theplot()function
OneofthemostfrequentlyusedplottingfunctionsinRistheplot()function.Thisisageneric
function:thetypeofplotproducedisdependentonthetypeorclassofthefirstargument.
plot(x,y)
plot(xy)
Ifxandyarevectors,plot(x,y)producesascatterplotofyagainstx.Thesameeffectcan
beproducedbysupplyingoneargument(secondform)aseitheralistcontainingtwo
elementsxandyoratwocolumnmatrix.
plot(x)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
70/116
5/28/2015
AnIntroductiontoR
Ifxisatimeseries,thisproducesatimeseriesplot.Ifxisanumericvector,itproducesa
plotofthevaluesinthevectoragainsttheirindexinthevector.Ifxisacomplexvector,it
producesaplotofimaginaryversusrealpartsofthevectorelements.
plot(f)
plot(f,y)
fisafactorobject,yisanumericvector.Thefirstformgeneratesabarplotoffthe
secondformproducesboxplotsofyforeachleveloff.
plot(df)
plot(~expr)
plot(y~expr)
dfisadataframe,yisanyobject,exprisalistofobjectnamesseparatedby+(e.g.,a+b
+c).Thefirsttwoformsproducedistributionalplotsofthevariablesinadataframe(first
form)orofanumberofnamedobjects(secondform).Thethirdformplotsyagainstevery
objectnamedinexpr.
Next:Displaygraphics,Previous:Theplot()function,Up:Highlevelplottingcommands
[Contents][Index]
12.1.2Displayingmultivariatedata
Rprovidestwoveryusefulfunctionsforrepresentingmultivariatedata.IfXisanumericmatrix
ordataframe,thecommand
>pairs(X)
producesapairwisescatterplotmatrixofthevariablesdefinedbythecolumnsofX,thatis,every
columnofXisplottedagainsteveryothercolumnofXandtheresultingn(n1)plotsarearranged
inamatrixwithplotscalesconstantovertherowsandcolumnsofthematrix.
Whenthreeorfourvariablesareinvolvedacoplotmaybemoreenlightening.Ifaandbare
numericvectorsandcisanumericvectororfactorobject(allofthesamelength),thenthe
command
>coplot(a~b|c)
producesanumberofscatterplotsofaagainstbforgivenvaluesofc.Ifcisafactor,thissimply
meansthataisplottedagainstbforeverylevelofc.Whencisnumeric,itisdividedintoa
numberofconditioningintervalsandforeachintervalaisplottedagainstbforvaluesofcwithin
theinterval.Thenumberandpositionofintervalscanbecontrolledwithgiven.values=
argumenttocoplot()thefunctionco.intervals()isusefulforselectingintervals.Youcan
alsousetwogivenvariableswithacommandlike
>coplot(a~b|c+d)
whichproducesscatterplotsofaagainstbforeveryjointconditioningintervalofcandd.
Thecoplot()andpairs()functionbothtakeanargumentpanel=whichcanbeusedto
customizethetypeofplotwhichappearsineachpanel.Thedefaultispoints()toproducea
scatterplotbutbysupplyingsomeotherlowlevelgraphicsfunctionoftwovectorsxandyasthe
valueofpanel=youcanproduceanytypeofplotyouwish.Anexamplepanelfunctionusefulfor
coplotsispanel.smooth().
Next:Argumentstohighlevelplottingfunctions,Previous:Displayingmultivariatedata,Up:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
71/116
5/28/2015
AnIntroductiontoR
Highlevelplottingcommands[Contents][Index]
12.1.3Displaygraphics
Otherhighlevelgraphicsfunctionsproducedifferenttypesofplots.Someexamplesare:
qqnorm(x)
qqline(x)
qqplot(x,y)
Distributioncomparisonplots.Thefirstformplotsthenumericvectorxagainstthe
expectedNormalorderscores(anormalscoresplot)andthesecondaddsastraightlineto
suchaplotbydrawingalinethroughthedistributionanddataquartiles.Thethirdform
plotsthequantilesofxagainstthoseofytocomparetheirrespectivedistributions.
hist(x)
hist(x,nclass=n)
hist(x,breaks=b,)
Producesahistogramofthenumericvectorx.Asensiblenumberofclassesisusually
chosen,butarecommendationcanbegivenwiththenclass=argument.Alternatively,the
breakpointscanbespecifiedexactlywiththebreaks=argument.Iftheprobability=TRUE
argumentisgiven,thebarsrepresentrelativefrequenciesdividedbybinwidthinsteadof
counts.
dotchart(x,)
Constructsadotchartofthedatainx.Inadotcharttheyaxisgivesalabellingofthedata
inxandthexaxisgivesitsvalue.Forexampleitallowseasyvisualselectionofalldata
entrieswithvalueslyinginspecifiedranges.
image(x,y,z,)
contour(x,y,z,)
persp(x,y,z,)
Plotsofthreevariables.Theimageplotdrawsagridofrectanglesusingdifferentcoloursto
representthevalueofz,thecontourplotdrawscontourlinestorepresentthevalueofz,
andtheperspplotdrawsa3Dsurface.
Previous:Displaygraphics,Up:Highlevelplottingcommands[Contents][Index]
12.1.4Argumentstohighlevelplottingfunctions
Thereareanumberofargumentswhichmaybepassedtohighlevelgraphicsfunctions,as
follows:
add=TRUE
Forcesthefunctiontoactasalowlevelgraphicsfunction,superimposingtheplotonthe
currentplot(somefunctionsonly).
axes=FALSE
Suppressesgenerationofaxesusefulforaddingyourowncustomaxeswiththeaxis()
function.Thedefault,axes=TRUE,meansincludeaxes.
log="x"
log="y"
log="xy"
Causesthex,yorbothaxestobelogarithmic.Thiswillworkformany,butnotall,types
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
72/116
5/28/2015
AnIntroductiontoR
ofplot.
type=
Thetype=argumentcontrolsthetypeofplotproduced,asfollows:
type="p"
Plotindividualpoints(thedefault)
type="l"
Plotlines
type="b"
Plotpointsconnectedbylines(both)
type="o"
Plotpointsoverlaidbylines
type="h"
Plotverticallinesfrompointstothezeroaxis(highdensity)
type="s"
type="S"
Stepfunctionplots.Inthefirstform,thetopoftheverticaldefinesthepointinthe
second,thebottom.
type="n"
Noplottingatall.Howeveraxesarestilldrawn(bydefault)andthecoordinate
systemissetupaccordingtothedata.Idealforcreatingplotswithsubsequentlow
levelgraphicsfunctions.
xlab=string
ylab=string
Axislabelsforthexandyaxes.Usetheseargumentstochangethedefaultlabels,usually
thenamesoftheobjectsusedinthecalltothehighlevelplottingfunction.
main=string
Figuretitle,placedatthetopoftheplotinalargefont.
sub=string
Subtitle,placedjustbelowthexaxisinasmallerfont.
Next:Interactingwithgraphics,Previous:Highlevelplottingcommands,Up:Graphics
[Contents][Index]
12.2Lowlevelplottingcommands
Sometimesthehighlevelplottingfunctionsdontproduceexactlythekindofplotyoudesire.In
thiscase,lowlevelplottingcommandscanbeusedtoaddextrainformation(suchaspoints,
linesortext)tothecurrentplot.
Someofthemoreusefullowlevelplottingfunctionsare:
points(x,y)
lines(x,y)
Addspointsorconnectedlinestothecurrentplot.plot()stype=argumentcanalsobe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
73/116
5/28/2015
AnIntroductiontoR
passedtothesefunctions(anddefaultsto"p"forpoints()and"l"forlines().)
text(x,y,labels,)
Addtexttoaplotatpointsgivenbyx,y.Normallylabelsisanintegerorcharacter
vectorinwhichcaselabels[i]isplottedatpoint(x[i],y[i]).Thedefaultis
1:length(x).
Note:Thisfunctionisoftenusedinthesequence
>plot(x,y,type="n");text(x,y,names)
Thegraphicsparametertype="n"suppressesthepointsbutsetsuptheaxes,andthetext()
functionsuppliesspecialcharacters,asspecifiedbythecharactervectornamesforthe
points.
abline(a,b)
abline(h=y)
abline(v=x)
abline(lm.obj)
Addsalineofslopebandinterceptatothecurrentplot.h=ymaybeusedtospecifyy
coordinatesfortheheightsofhorizontallinestogoacrossaplot,andv=xsimilarlyforthe
xcoordinatesforverticallines.Alsolm.objmaybelistwithacoefficientscomponentof
length2(suchastheresultofmodelfittingfunctions,)whicharetakenasaninterceptand
slope,inthatorder.
polygon(x,y,)
Drawsapolygondefinedbytheorderedverticesin(x,y)and(optionally)shadeitinwith
hatchlines,orfillitifthegraphicsdeviceallowsthefillingoffigures.
legend(x,y,legend,)
Addsalegendtothecurrentplotatthespecifiedposition.Plottingcharacters,linestyles,
colorsetc.,areidentifiedwiththelabelsinthecharactervectorlegend.Atleastoneother
argumentv(avectorthesamelengthaslegend)withthecorrespondingvaluesofthe
plottingunitmustalsobegiven,asfollows:
legend(,fill=v)
Colorsforfilledboxes
legend(,col=v)
Colorsinwhichpointsorlineswillbedrawn
legend(,lty=v)
Linestyles
legend(,lwd=v)
Linewidths
legend(,pch=v)
Plottingcharacters(charactervector)
title(main,sub)
Addsatitlemaintothetopofthecurrentplotinalargefontand(optionally)asubtitlesub
atthebottominasmallerfont.
axis(side,)
Addsanaxistothecurrentplotonthesidegivenbythefirstargument(1to4,counting
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
74/116
5/28/2015
AnIntroductiontoR
clockwisefromthebottom.)Otherargumentscontrolthepositioningoftheaxiswithinor
besidetheplot,andtickpositionsandlabels.Usefulforaddingcustomaxesaftercalling
plot()withtheaxes=FALSEargument.
Lowlevelplottingfunctionsusuallyrequiresomepositioninginformation(e.g.,xandy
coordinates)todeterminewheretoplacethenewplotelements.Coordinatesaregiveninterms
ofusercoordinateswhicharedefinedbytheprevioushighlevelgraphicscommandandare
chosenbasedonthesupplieddata.
Wherexandyargumentsarerequired,itisalsosufficienttosupplyasingleargumentbeinga
listwithelementsnamedxandy.Similarlyamatrixwithtwocolumnsisalsovalidinput.Inthis
wayfunctionssuchaslocator()(seebelow)maybeusedtospecifypositionsonaplot
interactively.
Mathematicalannotation:
Hersheyvectorfonts:
Next:Hersheyvectorfonts,Previous:Lowlevelplottingcommands,Up:Lowlevelplotting
commands[Contents][Index]
12.2.1Mathematicalannotation
Insomecases,itisusefultoaddmathematicalsymbolsandformulaetoaplot.Thiscanbe
achievedinRbyspecifyinganexpressionratherthanacharacterstringinanyoneoftext,
mtext,axis,ortitle.Forexample,thefollowingcodedrawstheformulafortheBinomial
probabilityfunction:
>text(x,y,expression(paste(bgroup("(",atop(n,x),")"),p^x,q^{nx})))
Moreinformation,includingafulllistingofthefeaturesavailablecanobtainedfromwithinR
usingthecommands:
>help(plotmath)
>example(plotmath)
>demo(plotmath)
Previous:Mathematicalannotation,Up:Lowlevelplottingcommands[Contents][Index]
12.2.2Hersheyvectorfonts
ItispossibletospecifyHersheyvectorfontsforrenderingtextwhenusingthetextandcontour
functions.TherearethreereasonsforusingtheHersheyfonts:
Hersheyfontscanproducebetteroutput,especiallyonacomputerscreen,forrotated
and/orsmalltext.
Hersheyfontsprovidecertainsymbolsthatmaynotbeavailableinthestandardfonts.In
particular,therearezodiacsigns,cartographicsymbolsandastronomicalsymbols.
Hersheyfontsprovidecyrillicandjapanese(KanaandKanji)characters.
Moreinformation,includingtablesofHersheycharacterscanbeobtainedfromwithinRusing
thecommands:
>help(Hershey)
>demo(Hershey)
>help(Japanese)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
75/116
5/28/2015
AnIntroductiontoR
>demo(Japanese)
Next:Usinggraphicsparameters,Previous:Lowlevelplottingcommands,Up:Graphics
[Contents][Index]
12.3Interactingwithgraphics
Ralsoprovidesfunctionswhichallowuserstoextractoraddinformationtoaplotusinga
mouse.Thesimplestoftheseisthelocator()function:
locator(n,type)
Waitsfortheusertoselectlocationsonthecurrentplotusingtheleftmousebutton.This
continuesuntiln(default512)pointshavebeenselected,oranothermousebuttonis
pressed.Thetypeargumentallowsforplottingattheselectedpointsandhasthesame
effectasforhighlevelgraphicscommandsthedefaultisnoplotting.locator()returns
thelocationsofthepointsselectedasalistwithtwocomponentsxandy.
locator()isusuallycalledwithnoarguments.Itisparticularlyusefulforinteractivelyselecting
positionsforgraphicelementssuchaslegendsorlabelswhenitisdifficulttocalculatein
advancewherethegraphicshouldbeplaced.Forexample,toplacesomeinformativetextnear
anoutlyingpoint,thecommand
>text(locator(1),"Outlier",adj=0)
maybeuseful.(locator()willbeignoredifthecurrentdevice,suchaspostscriptdoesnot
supportinteractivepointing.)
identify(x,y,labels)
Allowtheusertohighlightanyofthepointsdefinedbyxandy(usingtheleftmouse
button)byplottingthecorrespondingcomponentoflabelsnearby(ortheindexnumberof
thepointiflabelsisabsent).Returnstheindicesoftheselectedpointswhenanother
buttonispressed.
Sometimeswewanttoidentifyparticularpointsonaplot,ratherthantheirpositions.For
example,wemaywishtheusertoselectsomeobservationofinterestfromagraphicaldisplay
andthenmanipulatethatobservationinsomeway.Givenanumberof(x,y)coordinatesintwo
numericvectorsxandy,wecouldusetheidentify()functionasfollows:
>plot(x,y)
>identify(x,y)
Theidentify()functionsperformsnoplottingitself,butsimplyallowstheusertomovethe
mousepointerandclicktheleftmousebuttonnearapoint.Ifthereisapointnearthemouse
pointeritwillbemarkedwithitsindexnumber(thatis,itspositioninthex/yvectors)plotted
nearby.Alternatively,youcouldusesomeinformativestring(suchasacasename)asahighlight
byusingthelabelsargumenttoidentify(),ordisablemarkingaltogetherwiththeplot=FALSE
argument.Whentheprocessisterminated(seeabove),identify()returnstheindicesofthe
selectedpointsyoucanusetheseindicestoextracttheselectedpointsfromtheoriginalvectors
xandy.
Next:Graphicsparameters,Previous:Interactingwithgraphics,Up:Graphics[Contents]
[Index]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
76/116
5/28/2015
AnIntroductiontoR
12.4Usinggraphicsparameters
Whencreatinggraphics,particularlyforpresentationorpublicationpurposes,Rsdefaultsdonot
alwaysproduceexactlythatwhichisrequired.Youcan,however,customizealmosteveryaspect
ofthedisplayusinggraphicsparameters.Rmaintainsalistofalargenumberofgraphics
parameterswhichcontrolthingssuchaslinestyle,colors,figurearrangementandtext
justificationamongmanyothers.Everygraphicsparameterhasaname(suchascol,which
controlscolors,)andavalue(acolornumber,forexample.)
Aseparatelistofgraphicsparametersismaintainedforeachactivedevice,andeachdevicehasa
defaultsetofparameterswheninitialized.Graphicsparameterscanbesetintwoways:either
permanently,affectingallgraphicsfunctionswhichaccessthecurrentdeviceortemporarily,
affectingonlyasinglegraphicsfunctioncall.
Thepar()function:
Argumentstographicsfunctions:
Next:Argumentstographicsfunctions,Previous:Usinggraphicsparameters,Up:Using
graphicsparameters[Contents][Index]
12.4.1Permanentchanges:Thepar()function
Thepar()functionisusedtoaccessandmodifythelistofgraphicsparametersforthecurrent
graphicsdevice.
par()
Withoutarguments,returnsalistofallgraphicsparametersandtheirvaluesforthecurrent
device.
par(c("col","lty"))
Withacharactervectorargument,returnsonlythenamedgraphicsparameters(again,asa
list.)
par(col=4,lty=2)
Withnamedarguments(orasinglelistargument),setsthevaluesofthenamedgraphics
parameters,andreturnstheoriginalvaluesoftheparametersasalist.
Settinggraphicsparameterswiththepar()functionchangesthevalueoftheparameters
permanently,inthesensethatallfuturecallstographicsfunctions(onthecurrentdevice)willbe
affectedbythenewvalue.Youcanthinkofsettinggraphicsparametersinthiswayassetting
defaultvaluesfortheparameters,whichwillbeusedbyallgraphicsfunctionsunlessan
alternativevalueisgiven.
Notethatcallstopar()alwaysaffecttheglobalvaluesofgraphicsparameters,evenwhenpar()
iscalledfromwithinafunction.Thisisoftenundesirablebehaviorusuallywewanttosetsome
graphicsparameters,dosomeplotting,andthenrestoretheoriginalvaluessoasnottoaffectthe
usersRsession.Youcanrestoretheinitialvaluesbysavingtheresultofpar()whenmaking
changes,andrestoringtheinitialvalueswhenplottingiscomplete.
>oldpar<par(col=4,lty=2)
plottingcommands
>par(oldpar)
Tosaveandrestoreallsettable25graphicalparametersuse
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
77/116
5/28/2015
AnIntroductiontoR
>oldpar<par(no.readonly=TRUE)
plottingcommands
>par(oldpar)
Previous:Thepar()function,Up:Usinggraphicsparameters[Contents][Index]
12.4.2Temporarychanges:Argumentstographicsfunctions
Graphicsparametersmayalsobepassedto(almost)anygraphicsfunctionasnamedarguments.
Thishasthesameeffectaspassingtheargumentstothepar()function,exceptthatthechanges
onlylastforthedurationofthefunctioncall.Forexample:
>plot(x,y,pch="+")
producesascatterplotusingaplussignastheplottingcharacter,withoutchangingthedefault
plottingcharacterforfutureplots.
Unfortunately,thisisnotimplementedentirelyconsistentlyanditissometimesnecessarytoset
andresetgraphicsparametersusingpar().
Next:Devicedrivers,Previous:Usinggraphicsparameters,Up:Graphics[Contents][Index]
12.5Graphicsparameterslist
Thefollowingsectionsdetailmanyofthecommonlyusedgraphicalparameters.TheRhelp
documentationforthepar()functionprovidesamoreconcisesummarythisisprovidedasa
somewhatmoredetailedalternative.
Graphicsparameterswillbepresentedinthefollowingform:
name=value
Adescriptionoftheparameterseffect.nameisthenameoftheparameter,thatis,the
argumentnametouseincallstopar()oragraphicsfunction.valueisatypicalvalueyou
mightusewhensettingtheparameter.
Notethataxesisnotagraphicsparameterbutanargumenttoafewplotmethods:seexaxtand
yaxt.
Graphicalelements:
Axesandtickmarks:
Figuremargins:
Multiplefigureenvironment:
Next:Axesandtickmarks,Previous:Graphicsparameters,Up:Graphicsparameters[Contents]
[Index]
12.5.1Graphicalelements
Rplotsaremadeupofpoints,lines,textandpolygons(filledregions.)Graphicalparameters
existwhichcontrolhowthesegraphicalelementsaredrawn,asfollows:
pch="+"
Charactertobeusedforplottingpoints.Thedefaultvarieswithgraphicsdrivers,butitis
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
78/116
5/28/2015
AnIntroductiontoR
usuallyacircle.Plottedpointstendtoappearslightlyaboveorbelowtheappropriate
positionunlessyouuse"."astheplottingcharacter,whichproducescenteredpoints.
pch=4
Whenpchisgivenasanintegerbetween0and25inclusive,aspecializedplottingsymbol
isproduced.Toseewhatthesymbolsare,usethecommand
>legend(locator(1),as.character(0:25),pch=0:25)
Thosefrom21to25mayappeartoduplicateearliersymbols,butcanbecolouredin
differentways:seethehelponpointsanditsexamples.
Inaddition,pchcanbeacharacteroranumberintherange32:255representingacharacter
inthecurrentfont.
lty=2
Linetypes.Alternativelinestylesarenotsupportedonallgraphicsdevices(andvaryon
thosethatdo)butlinetype1isalwaysasolidline,linetype0isalwaysinvisible,andline
types2andonwardsaredottedordashedlines,orsomecombinationofboth.
lwd=2
Linewidths.Desiredwidthoflines,inmultiplesofthestandardlinewidth.Affectsaxis
linesaswellaslinesdrawnwithlines(),etc.Notalldevicessupportthis,andsomehave
restrictionsonthewidthsthatcanbeused.
col=2
Colorstobeusedforpoints,lines,text,filledregionsandimages.Anumberfromthe
currentpalette(see?palette)oranamedcolour.
col.axis
col.lab
col.main
col.sub
Thecolortobeusedforaxisannotation,xandylabels,mainandsubtitles,respectively.
font=2
Anintegerwhichspecifieswhichfonttousefortext.Ifpossible,devicedriversarrangeso
that1correspondstoplaintext,2toboldface,3toitalic,4tobolditalicand5toasymbol
font(whichincludeGreekletters).
font.axis
font.lab
font.main
font.sub
Thefonttobeusedforaxisannotation,xandylabels,mainandsubtitles,respectively.
adj=0.1
Justificationoftextrelativetotheplottingposition.0meansleftjustify,1meansright
justifyand0.5meanstocenterhorizontallyabouttheplottingposition.Theactualvalueis
theproportionoftextthatappearstotheleftoftheplottingposition,soavalueof0.1
leavesagapof10%ofthetextwidthbetweenthetextandtheplottingposition.
cex=1.5
Characterexpansion.Thevalueisthedesiredsizeoftextcharacters(includingplotting
characters)relativetothedefaulttextsize.
cex.axis
cex.lab
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
79/116
5/28/2015
AnIntroductiontoR
cex.main
cex.sub
Thecharacterexpansiontobeusedforaxisannotation,xandylabels,mainandsubtitles,
respectively.
Next:Figuremargins,Previous:Graphicalelements,Up:Graphicsparameters[Contents]
[Index]
12.5.2Axesandtickmarks
ManyofRshighlevelplotshaveaxes,andyoucanconstructaxesyourselfwiththelowlevel
axis()graphicsfunction.Axeshavethreemaincomponents:theaxisline(linestylecontrolled
bytheltygraphicsparameter),thetickmarks(whichmarkoffunitdivisionsalongtheaxisline)
andtheticklabels(whichmarktheunits.)Thesecomponentscanbecustomizedwiththe
followinggraphicsparameters.
lab=c(5,7,12)
Thefirsttwonumbersarethedesirednumberoftickintervalsonthexandyaxes
respectively.Thethirdnumberisthedesiredlengthofaxislabels,incharacters(including
thedecimalpoint.)Choosingatoosmallvalueforthisparametermayresultinalltick
labelsbeingroundedtothesamenumber!
las=1
Orientationofaxislabels.0meansalwaysparalleltoaxis,1meansalwayshorizontal,and
2meansalwaysperpendiculartotheaxis.
mgp=c(3,1,0)
Positionsofaxiscomponents.Thefirstcomponentisthedistancefromtheaxislabeltothe
axisposition,intextlines.Thesecondcomponentisthedistancetotheticklabels,andthe
finalcomponentisthedistancefromtheaxispositiontotheaxisline(usuallyzero).
Positivenumbersmeasureoutsidetheplotregion,negativenumbersinside.
tck=0.01
Lengthoftickmarks,asafractionofthesizeoftheplottingregion.Whentckissmall
(lessthan0.5)thetickmarksonthexandyaxesareforcedtobethesamesize.Avalueof
1givesgridlines.Negativevaluesgivetickmarksoutsidetheplottingregion.Use
tck=0.01andmgp=c(1,1.5,0)forinternaltickmarks.
xaxs="r"
yaxs="i"
Axisstylesforthexandyaxes,respectively.Withstyles"i"(internal)and"r"(the
default)tickmarksalwaysfallwithintherangeofthedata,howeverstyle"r"leavesa
smallamountofspaceattheedges.(ShasotherstylesnotimplementedinR.)
Next:Multiplefigureenvironment,Previous:Axesandtickmarks,Up:Graphicsparameters
[Contents][Index]
12.5.3Figuremargins
AsingleplotinRisknownasafigureandcomprisesaplotregionsurroundedbymargins
(possiblycontainingaxislabels,titles,etc.)and(usually)boundedbytheaxesthemselves.
Atypicalfigureis
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
80/116
5/28/2015
AnIntroductiontoR
images/fig11
Graphicsparameterscontrollingfigurelayoutinclude:
mai=c(1,0.5,0.5,0)
Widthsofthebottom,left,topandrightmargins,respectively,measuredininches.
mar=c(4,2,2,1)
Similartomai,exceptthemeasurementunitistextlines.
marandmaiareequivalentinthesensethatsettingonechangesthevalueoftheother.The
defaultvalueschosenforthisparameterareoftentoolargetherighthandmarginisrarely
needed,andneitheristhetopmarginifnotitleisbeingused.Thebottomandleftmarginsmust
belargeenoughtoaccommodatetheaxisandticklabels.Furthermore,thedefaultischosen
withoutregardtothesizeofthedevicesurface:forexample,usingthepostscript()driverwith
theheight=4argumentwillresultinaplotwhichisabout50%marginunlessmarormaiareset
explicitly.Whenmultiplefiguresareinuse(seebelow)themarginsarereduced,howeverthis
maynotbeenoughwhenmanyfiguressharethesamepage.
Previous:Figuremargins,Up:Graphicsparameters[Contents][Index]
12.5.4Multiplefigureenvironment
Rallowsyoutocreateannbymarrayoffiguresonasinglepage.Eachfigurehasitsown
margins,andthearrayoffiguresisoptionallysurroundedbyanoutermargin,asshowninthe
followingfigure.
images/fig12
Thegraphicalparametersrelatingtomultiplefiguresareasfollows:
mfcol=c(3,2)
mfrow=c(2,4)
Setthesizeofamultiplefigurearray.Thefirstvalueisthenumberofrowsthesecondis
thenumberofcolumns.Theonlydifferencebetweenthesetwoparametersisthatsetting
mfcolcausesfigurestobefilledbycolumnmfrowfillsbyrows.
ThelayoutintheFigurecouldhavebeencreatedbysettingmfrow=c(3,2)thefigureshows
thepageafterfourplotshavebeendrawn.
Settingeitherofthesecanreducethebasesizeofsymbolsandtext(controlledby
par("cex")andthepointsizeofthedevice).Inalayoutwithexactlytworowsandcolumns
thebasesizeisreducedbyafactorof0.83:iftherearethreeormoreofeitherrowsor
columns,thereductionfactoris0.66.
mfg=c(2,2,3,2)
Positionofthecurrentfigureinamultiplefigureenvironment.Thefirsttwonumbersare
therowandcolumnofthecurrentfigurethelasttwoarethenumberofrowsandcolumns
inthemultiplefigurearray.Setthisparametertojumpbetweenfiguresinthearray.You
canevenusedifferentvaluesforthelasttwonumbersthanthetruevaluesforunequally
sizedfiguresonthesamepage.
fig=c(4,9,1,4)/10
Positionofthecurrentfigureonthepage.Valuesarethepositionsoftheleft,right,bottom
andtopedgesrespectively,asapercentageofthepagemeasuredfromthebottomleft
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
81/116
5/28/2015
AnIntroductiontoR
corner.Theexamplevaluewouldbeforafigureinthebottomrightofthepage.Setthis
parameterforarbitrarypositioningoffigureswithinapage.Ifyouwanttoaddafigureto
acurrentpage,usenew=TRUEaswell(unlikeS).
oma=c(2,0,3,0)
omi=c(0,0,0.8,0)
Sizeofoutermargins.Likemarandmai,thefirstmeasuresintextlinesandthesecondin
inches,startingwiththebottommarginandworkingclockwise.
Outermarginsareparticularlyusefulforpagewisetitles,etc.Textcanbeaddedtotheouter
marginswiththemtext()functionwithargumentouter=TRUE.Therearenooutermarginsby
default,however,soyoumustcreatethemexplicitlyusingomaoromi.
Morecomplicatedarrangementsofmultiplefigurescanbeproducedbythesplit.screen()and
layout()functions,aswellasbythegridandlatticepackages.
Next:Dynamicgraphics,Previous:Graphicsparameters,Up:Graphics[Contents][Index]
12.6Devicedrivers
Rcangenerategraphics(ofvaryinglevelsofquality)onalmostanytypeofdisplayorprinting
device.Beforethiscanbegin,however,Rneedstobeinformedwhattypeofdeviceitisdealing
with.Thisisdonebystartingadevicedriver.Thepurposeofadevicedriveristoconvert
graphicalinstructionsfromR(drawaline,forexample)intoaformthattheparticulardevice
canunderstand.
Devicedriversarestartedbycallingadevicedriverfunction.Thereisonesuchfunctionfor
everydevicedriver:typehelp(Devices)foralistofthemall.Forexample,issuingthecommand
>postscript()
causesallfuturegraphicsoutputtobesenttotheprinterinPostScriptformat.Somecommonly
useddevicedriversare:
X11()
ForusewiththeX11windowsystemonUnixalikes
windows()
ForuseonWindows
quartz()
ForuseonOSX
postscript()
ForprintingonPostScriptprinters,orcreatingPostScriptgraphicsfiles.
pdf()
ProducesaPDFfile,whichcanalsobeincludedintoPDFfiles.
png()
ProducesabitmapPNGfile.(Notalwaysavailable:seeitshelppage.)
jpeg()
ProducesabitmapJPEGfile,bestusedforimageplots.(Notalwaysavailable:seeitshelp
page.)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
82/116
5/28/2015
AnIntroductiontoR
Whenyouhavefinishedwithadevice,besuretoterminatethedevicedriverbyissuingthe
command
>dev.off()
Thisensuresthatthedevicefinishescleanlyforexampleinthecaseofhardcopydevicesthis
ensuresthateverypageiscompletedandhasbeensenttotheprinter.(Thiswillhappen
automaticallyatthenormalendofasession.)
PostScriptdiagramsfortypesetdocuments:
Multiplegraphicsdevices:
Next:Multiplegraphicsdevices,Previous:Devicedrivers,Up:Devicedrivers[Contents]
[Index]
12.6.1PostScriptdiagramsfortypesetdocuments
Bypassingthefileargumenttothepostscript()devicedriverfunction,youmaystorethe
graphicsinPostScriptformatinafileofyourchoice.Theplotwillbeinlandscapeorientation
unlessthehorizontal=FALSEargumentisgiven,andyoucancontrolthesizeofthegraphicwith
thewidthandheightarguments(theplotwillbescaledasappropriatetofitthesedimensions.)
Forexample,thecommand
>postscript("file.ps",horizontal=FALSE,height=5,pointsize=10)
willproduceafilecontainingPostScriptcodeforafigurefiveincheshigh,perhapsforinclusion
inadocument.Itisimportanttonotethatifthefilenamedinthecommandalreadyexists,itwill
beoverwritten.ThisisthecaseevenifthefilewasonlycreatedearlierinthesameRsession.
ManyusagesofPostScriptoutputwillbetoincorporatethefigureinanotherdocument.This
worksbestwhenencapsulatedPostScriptisproduced:Ralwaysproducesconformantoutput,
butonlymarkstheoutputassuchwhentheonefile=FALSEargumentissupplied.Thisunusual
notationstemsfromScompatibility:itreallymeansthattheoutputwillbeasinglepage(which
ispartoftheEPSFspecification).Thustoproduceaplotforinclusionusesomethinglike
>postscript("plot1.eps",horizontal=FALSE,onefile=FALSE,
height=8,width=6,pointsize=10)
Previous:PostScriptdiagramsfortypesetdocuments,Up:Devicedrivers[Contents][Index]
12.6.2Multiplegraphicsdevices
InadvanceduseofRitisoftenusefultohaveseveralgraphicsdevicesinuseatthesametime.
Ofcourseonlyonegraphicsdevicecanacceptgraphicscommandsatanyonetime,andthisis
knownasthecurrentdevice.Whenmultipledevicesareopen,theyformanumberedsequence
withnamesgivingthekindofdeviceatanyposition.
Themaincommandsusedforoperatingwithmultipledevices,andtheirmeaningsareasfollows:
X11()
[UNIX]
windows()
win.printer()
win.metafile()
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
83/116
5/28/2015
AnIntroductiontoR
[Windows]
quartz()
[OSX]
postscript()
pdf()
png()
jpeg()
tiff()
bitmap()
Eachnewcalltoadevicedriverfunctionopensanewgraphicsdevice,thusextendingby
onethedevicelist.Thisdevicebecomesthecurrentdevice,towhichgraphicsoutputwill
besent.
dev.list()
Returnsthenumberandnameofallactivedevices.Thedeviceatposition1onthelistis
alwaysthenulldevicewhichdoesnotacceptgraphicscommandsatall.
dev.next()
dev.prev()
Returnsthenumberandnameofthegraphicsdevicenextto,orprevioustothecurrent
device,respectively.
dev.set(which=k)
Canbeusedtochangethecurrentgraphicsdevicetotheoneatpositionkofthedevice
list.Returnsthenumberandlabelofthedevice.
dev.off(k)
Terminatethegraphicsdeviceatpointkofthedevicelist.Forsomedevices,suchas
postscriptdevices,thiswilleitherprintthefileimmediatelyorcorrectlycompletethefile
forlaterprinting,dependingonhowthedevicewasinitiated.
dev.copy(device,,which=k)
dev.print(device,,which=k)
Makeacopyofthedevicek.Heredeviceisadevicefunction,suchaspostscript,with
extraarguments,ifneeded,specifiedby.dev.printissimilar,butthecopieddeviceis
immediatelyclosed,sothatendactions,suchasprintinghardcopies,areimmediately
performed.
graphics.off()
Terminateallgraphicsdevicesonthelist,exceptthenulldevice.
Previous:Devicedrivers,Up:Graphics[Contents][Index]
12.7Dynamicgraphics
Rdoesnothavebuiltincapabilitiesfordynamicorinteractivegraphics,e.g.rotatingpointclouds
ortobrushing(interactivelyhighlighting)points.However,extensivedynamicgraphics
facilitiesareavailableinthesystemGGobibySwayne,CookandBujaavailablefrom
http://www.ggobi.org/
andthesecanbeaccessedfromRviathepackagerggobi,describedat
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
84/116
5/28/2015
AnIntroductiontoR
http://www.ggobi.org/rggobi.
Also,packagerglprovideswaystointeractwith3Dplots,forexampleofsurfaces.
Next:OSfacilities,Previous:Graphics,Up:Top[Contents][Index]
13Packages
AllRfunctionsanddatasetsarestoredinpackages.Onlywhenapackageisloadedareits
contentsavailable.Thisisdonebothforefficiency(thefulllistwouldtakemorememoryand
wouldtakelongertosearchthanasubset),andtoaidpackagedevelopers,whoareprotected
fromnameclasheswithothercode.TheprocessofdevelopingpackagesisdescribedinCreating
RpackagesinWritingRExtensions.Here,wewilldescribethemfromauserspointofview.
Toseewhichpackagesareinstalledatyoursite,issuethecommand
>library()
withnoarguments.Toloadaparticularpackage(e.g.,thebootpackagecontainingfunctions
fromDavison&Hinkley(1997)),useacommandlike
>library(boot)
UsersconnectedtotheInternetcanusetheinstall.packages()andupdate.packages()functions
(availablethroughthePackagesmenuintheWindowsandOSXGUIs,seeInstallingpackages
inRInstallationandAdministration)toinstallandupdatepackages.
Toseewhichpackagesarecurrentlyloaded,use
>search()
todisplaythesearchlist.Somepackagesmaybeloadedbutnotavailableonthesearchlist(see
Namespaces):thesewillbeincludedinthelistgivenby
>loadedNamespaces()
Toseealistofallavailablehelptopicsinaninstalledpackage,use
>help.start()
tostarttheHTMLhelpsystem,andthennavigatetothepackagelistingintheReferencesection.
Standardpackages:
ContributedpackagesandCRAN:
Namespaces:
Next:ContributedpackagesandCRAN,Previous:Packages,Up:Packages[Contents][Index]
13.1Standardpackages
Thestandard(orbase)packagesareconsideredpartoftheRsourcecode.Theycontainthebasic
functionsthatallowRtowork,andthedatasetsandstandardstatisticalandgraphicalfunctions
thataredescribedinthismanual.TheyshouldbeautomaticallyavailableinanyRinstallation.
SeeRpackagesinRFAQ,foracompletelist.
Next:Namespaces,Previous:Standardpackages,Up:Packages[Contents][Index]
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
85/116
5/28/2015
AnIntroductiontoR
13.2ContributedpackagesandCRAN
TherearethousandsofcontributedpackagesforR,writtenbymanydifferentauthors.Someof
thesepackagesimplementspecializedstatisticalmethods,othersgiveaccesstodataorhardware,
andothersaredesignedtocomplementtextbooks.Some(therecommendedpackages)are
distributedwitheverybinarydistributionofR.MostareavailablefordownloadfromCRAN
(http://CRAN.Rproject.org/anditsmirrors)andotherrepositoriessuchasBioconductor
(http://www.bioconductor.org/)andOmegahat(http://www.omegahat.org/).TheRFAQcontains
alistofCRANpackagescurrentatthetimeofrelease,butthecollectionofavailablepackages
changesveryfrequently.
Previous:ContributedpackagesandCRAN,Up:Packages[Contents][Index]
13.3Namespaces
Allpackageshavenamespaces,andhavesinceR2.14.0.Namespacesdothreethings:theyallow
thepackagewritertohidefunctionsanddatathataremeantonlyforinternaluse,theyprevent
functionsfrombreakingwhenauser(orotherpackagewriter)picksanamethatclasheswith
oneinthepackage,andtheyprovideawaytorefertoanobjectwithinaparticularpackage.
Forexample,t()isthetransposefunctioninR,butusersmightdefinetheirownfunctionnamed
t.Namespacespreventtheusersdefinitionfromtakingprecedence,andbreakingeveryfunction
thattriestotransposeamatrix.
Therearetwooperatorsthatworkwithnamespaces.Thedoublecolonoperator::selects
definitionsfromaparticularnamespace.Intheexampleabove,thetransposefunctionwill
alwaysbeavailableasbase::t,becauseitisdefinedinthebasepackage.Onlyfunctionsthatare
exportedfromthepackagecanberetrievedinthisway.
Thetriplecolonoperator:::maybeseeninafewplacesinRcode:itactslikethedoublecolon
operatorbutalsoallowsaccesstohiddenobjects.UsersaremorelikelytousethegetAnywhere()
function,whichsearchesmultiplepackages.
Packagesareofteninterdependent,andloadingonemaycauseotherstobeautomatically
loaded.Thecolonoperatorsdescribedabovewillalsocauseautomaticloadingoftheassociated
package.Whenpackageswithnamespacesareloadedautomaticallytheyarenotaddedtothe
searchlist.
Next:Asamplesession,Previous:Packages,Up:Top[Contents][Index]
14OSfacilities
RhasquiteextensivefacilitiestoaccesstheOSunderwhichitisrunning:thisallowsittobe
usedasascriptinglanguageandthatabilityismuchusedbyRitself,forexampletoinstall
packages.
BecauseRsownscriptsneedtoworkacrossallplatforms,considerableefforthasgoneinto
makethescriptingfacilitiesasplatformindependentasisfeasible.
Filesanddirectories:
Filepaths:
Systemcommands:
CompressionandArchives:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
86/116
5/28/2015
AnIntroductiontoR
Next:Filepaths,Previous:OSfacilities,Up:OSfacilities[Contents][Index]
14.1Filesanddirectories
Therearemanyfunctionstomanipulatefilesanddirectories.Herearepointerstosomeofthe
morecommonlyusedones.
Tocreatean(empty)fileordirectory,usefile.createorcreate.dir.(Thesearetheanalogues
ofthePOSIXutilitiestouchandmkdir.)FortemporaryfilesanddirectoriesintheRsession
directoryseetempfile.
Filescanberemovedbyeitherfile.removeorunlink:thelattercanremovedirectorytrees.
Fordirectorylistingsuselist.files(alsoavailableasdir)orlist.dirs.Thesecanselectfiles
usingaregularexpression:toselectbywildcardsuseSys.glob.
Manytypesofinformationonafilepath(includingforexampleifitisafileordirectory)canbe
foundbyfile.info.
Thereareseveralwaystofindoutifafileexists(andfilecanexistonthefilesystemandnotbe
visibletothecurrentuser).Therearefunctionsfile.exists,file.accessandfile_testwith
variousversionsofthistest:file_testisaversionofthePOSIXtestcommandforthose
familiarwithshellscripting.
Functionfile.copyistheRanalogueofthePOSIXcommandcp.
Choosingfilescanbedoneinteractivelybyfile.choose:theWindowsporthasthemore
versatilefunctionschoose.filesandchoose.dirandtherearesimilarfunctionsinthetcltk
package:tk_choose.filesandtk_choose.dir.
Functionsfile.showandfile.editwilldisplayandeditoneormorefilesinawayappropriate
totheRport,usingthefacilitiesofaconsole(suchasRGuionWindowsorR.apponOSX)if
oneisinuse.
Thereissomesupportforlinksinthefilesystem:seefunctionsfile.linkandSys.readlink.
Next:Systemcommands,Previous:Filesanddirectories,Up:OSfacilities[Contents][Index]
14.2Filepaths
Withafewexceptions,RreliesontheunderlyingOSfunctionstomanipulatefilepaths.Some
aspectsofthisareallowedtodependontheOS,anddo,evendowntotheversionoftheOS.
TherearePOSIXstandardsforhowOSesshouldinterpretfilepathsandmanyRusersassume
POSIXcompliance:butWindowsdoesnotclaimtobecompliantandotherOSesmaybeless
thancompletelycompliant.
Thefollowingaresomeissueswhichhavebeenencounteredwithfilepaths.
POSIXfilesystemsarecasesensitive,sofoo.pngandFoo.PNGaredifferentfiles.However,
thedefaultsonWindowsandOSXaretobecaseinsensitive,andFATfilesystems
(commonlyusedonremovablestorage)arenotnormallycasesensitive(andallfilepaths
maybemappedtolowercase).
AlmostalltheWindowsOSservicessupporttheuseofslashorbackslashasthefilepath
separator,andRconvertstheknownexceptionstotheformrequiredbyWindows.
ThebehaviouroffilepathswithatrailingslashisOSdependent.Suchpathsarenotvalid
onWindowsandshouldnotbeexpectedtowork.POSIX2008requiressuchpathsto
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
87/116
5/28/2015
AnIntroductiontoR
matchonlydirectories,butearlierversionsallowedthemtoalsomatchfiles.Sotheyare
bestavoided.
Multipleslashesinfilepathssuchas/abc//defarevalidonPOSIXfilesystemsandtreated
asiftherewasonlyoneslash.TheyareusuallyacceptedbyWindowsOSfunctions.
However,leadingdoubleslashesmayhaveadifferentmeaning.
WindowsUNCfilepaths(suchas\\server\dir1\dir2\fileand\\?
\UNC\server\dir1\dir2\file)arenotsupported,buttheymayworkinsomeRfunctions.
POSIXfilesystemsareallowedtotreataleadingdoubleslashspecially.
Windowsallowsfilepathscontainingdrivesandrelativetothecurrentdirectoryonadrive,
e.g.d:foo/barreferstod:/a/b/c/foo/barifthecurrentdirectoryondrived:is/a/b/c.It
isintendedthatthesework,buttheuseofabsolutepathsissafer.
Functionsbasenameanddirnameselectpartsofafilepath:therecommendedwaytoassemblea
filepathfromcomponentsisfile.path.Functionpathexpanddoestildeexpansion,substituting
valuesforhomedirectories(thecurrentusers,andperhapsthoseofotherusers).
Onfilesystemswithlinks,asinglefilecanbereferredtobymanyfilepaths.Function
normalizePathwillfindacanonicalfilepath.
Windowshastheconceptsofshort(8.3)andlongfilenames:normalizePathwillreturnan
absolutepathusinglongfilenamesandshortPathNamewillreturnaversionusingshortnames.
Thelatterdoesnotcontainspacesandusesbackslashastheseparator,soissometimesusefulfor
exportingnamesfromR.
Filepermissionsarearelatedtopic.RhassupportforthePOSIXconceptsofread/write/execute
permissionforowner/group/allbutthismaybeonlypartiallysupportedonthefilesystem(sofor
exampleonWindowsonlyreadonlyfiles(fortheaccountrunningtheRsession)arerecognized.
AccessControlLists(ACLs)areemployedonseveralfilesystems,butdonothaveanagreed
standardandRhasnofacilitiestocontrolthem.UseSys.chmodtochangepermissions.
Next:CompressionandArchives,Previous:Filepaths,Up:OSfacilities[Contents][Index]
14.3Systemcommands
Functionssystemandsystem2areusedtoinvokeasystemcommandandoptionallycollectits
output.system2isalittlemoregeneralbutitsmainadvantageisthatitiseasiertowritecross
platformcodeusingit.
systembehavesdifferentlyonWindowsfromotherOSes(becausetheAPICcallofthatname
does).Elsewhereitinvokesashelltorunthecommand:theWindowsportofRhasafunction
shelltodothat.
TofindoutiftheOSincludesacommand,useSys.which,whichattemptstodothisinacross
platformway(unfortunatelyitisnotastandardOSservice).
FunctionshQuotewillquotefilepathsasneededforcommandsinthecurrentOS.
Previous:Systemcommands,Up:OSfacilities[Contents][Index]
14.4CompressionandArchives
RecentversionsofRhaveextensivefacilitiestoreadandwritecompressedfiles,often
transparently.ReadingoffilesinRistoaveylargeextentdonebyconnections,andthefile
functionwhichisusedtoopenaconnectiontoafile(oraURL)andisabletoidentifythe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
88/116
5/28/2015
AnIntroductiontoR
compressionusedfromthemagicheaderofthefile.
Thetypeofcompressionwhichhasbeensupportedforlongestisgzipcompression,andthat
remainsagoodgeneralcompromise.FilescompressedbytheearlierUnixcompressutilitycan
alsoberead,butthesearebecomingrare.Twootherformsofcompression,thoseofthebzip2
andxzutilitiesarealsoavailable.Thesegenerallyachievehigherratesofcompression
(dependingonthefile,muchhigher)attheexpenseofslowerdecompressionandmuchslower
compression.
Thereissomeconfusionbetweenxzandlzmacompression(seehttp://en.wikipedia.org/wiki/Xz
andhttp://en.wikipedia.org/wiki/LZMA):Rcanreadfilescompressedbymostversionsofeither.
Filearchivesaresinglefileswhichcontainacollectionoffiles,themostcommononesbeing
tarballsandzipfilesasusedtodistributeRpackages.Rcanlistandunpackboth(seefunctions
untarandunzip)andcreateboth(forzipwiththehelpofanexternalprogram).
Next:InvokingR,Previous:OSfacilities,Up:Top[Contents][Index]
AppendixAAsamplesession
ThefollowingsessionisintendedtointroducetoyousomefeaturesoftheRenvironmentby
usingthem.Manyfeaturesofthesystemwillbeunfamiliarandpuzzlingatfirst,butthis
puzzlementwillsoondisappear.
StartRappropriatelyforyourplatform(seeInvokingR).
TheRprogrambegins,withabanner.
(WithinRcode,thepromptonthelefthandsidewillnotbeshowntoavoidconfusion.)
help.start()
StarttheHTMLinterfacetoonlinehelp(usingawebbrowseravailableatyourmachine).
Youshouldbrieflyexplorethefeaturesofthisfacilitywiththemouse.
Iconifythehelpwindowandmoveontothenextpart.
x<rnorm(50)
y<rnorm(x)
Generatetwopseudorandomnormalvectorsofxandycoordinates.
plot(x,y)
Plotthepointsintheplane.Agraphicswindowwillappearautomatically.
ls()
SeewhichRobjectsarenowintheRworkspace.
rm(x,y)
Removeobjectsnolongerneeded.(Cleanup).
x<1:20
Makex=(1,2,,20).
w<1+sqrt(x)/2
Aweightvectorofstandarddeviations.
dummy<data.frame(x=x,y=x+rnorm(x)*w)
dummy
Makeadataframeoftwocolumns,xandy,andlookatit.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
89/116
5/28/2015
AnIntroductiontoR
fm<lm(y~x,data=dummy)
summary(fm)
Fitasimplelinearregressionandlookattheanalysis.Withytotheleftofthetilde,weare
modellingydependentonx.
fm1<lm(y~x,data=dummy,weight=1/w^2)
summary(fm1)
Sinceweknowthestandarddeviations,wecandoaweightedregression.
attach(dummy)
Makethecolumnsinthedataframevisibleasvariables.
lrf<lowess(x,y)
Makeanonparametriclocalregressionfunction.
plot(x,y)
Standardpointplot.
lines(x,lrf$y)
Addinthelocalregression.
abline(0,1,lty=3)
Thetrueregressionline:(intercept0,slope1).
abline(coef(fm))
Unweightedregressionline.
abline(coef(fm1),col="red")
Weightedregressionline.
detach()
Removedataframefromthesearchpath.
plot(fitted(fm),resid(fm),
xlab="Fittedvalues",
ylab="Residuals",
main="ResidualsvsFitted")
Astandardregressiondiagnosticplottocheckforheteroscedasticity.Canyouseeit?
qqnorm(resid(fm),main="ResidualsRankitPlot")
Anormalscoresplottocheckforskewness,kurtosisandoutliers.(Notveryusefulhere.)
rm(fm,fm1,lrf,x,dummy)
Cleanupagain.
ThenextsectionwilllookatdatafromtheclassicalexperimentofMichelsontomeasurethe
speedoflight.Thisdatasetisavailableinthemorleyobject,butwewillreadittoillustratethe
read.tablefunction.
filepath<system.file("data","morley.tab",package="datasets")
filepath
Getthepathtothedatafile.
file.show(filepath)
Optional.Lookatthefile.
mm<read.table(filepath)
mm
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
90/116
5/28/2015
AnIntroductiontoR
ReadintheMichelsondataasadataframe,andlookatit.Therearefiveexperiments
(columnExpt)andeachhas20runs(columnRun)andslistherecordedspeedoflight,
suitablycoded.
mm$Expt<factor(mm$Expt)
mm$Run<factor(mm$Run)
ChangeExptandRunintofactors.
attach(mm)
Makethedataframevisibleatposition3(thedefault).
plot(Expt,Speed,main="SpeedofLightData",xlab="ExperimentNo.")
Comparethefiveexperimentswithsimpleboxplots.
fm<aov(Speed~Run+Expt,data=mm)
summary(fm)
Analyzeasarandomizedblock,withrunsandexperimentsasfactors.
fm0<update(fm,.~.Run)
anova(fm0,fm)
Fitthesubmodelomittingruns,andcompareusingaformalanalysisofvariance.
detach()
rm(fm,fm0)
Cleanupbeforemovingon.
Wenowlookatsomemoregraphicalfeatures:contourandimageplots.
x<seq(pi,pi,len=50)
y<x
xisavectorof50equallyspacedvaluesintheinterval[pi\,pi].yisthesame.
f<outer(x,y,function(x,y)cos(y)/(1+x^2))
fisasquarematrix,withrowsandcolumnsindexedbyxandyrespectively,ofvaluesof
thefunctioncos(y)/(1+x^2).
oldpar<par(no.readonly=TRUE)
par(pty="s")
Savetheplottingparametersandsettheplottingregiontosquare.
contour(x,y,f)
contour(x,y,f,nlevels=15,add=TRUE)
Makeacontourmapoffaddinmorelinesformoredetail.
fa<(ft(f))/2
faistheasymmetricpartoff.(t()istranspose).
contour(x,y,fa,nlevels=15)
Makeacontourplot,
par(oldpar)
andrestoretheoldgraphicsparameters.
image(x,y,f)
image(x,y,fa)
Makesomehighdensityimageplots,(ofwhichyoucangethardcopiesifyouwish),
objects();rm(x,y,f,fa)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
91/116
5/28/2015
AnIntroductiontoR
andcleanupbeforemovingon.
Rcandocomplexarithmetic,also.
th<seq(pi,pi,len=100)
z<exp(1i*th)
1iisusedforthecomplexnumberi.
par(pty="s")
plot(z,type="l")
Plottingcomplexargumentsmeansplotimaginaryversusrealparts.Thisshouldbea
circle.
w<rnorm(100)+rnorm(100)*1i
Supposewewanttosamplepointswithintheunitcircle.Onemethodwouldbetotake
complexnumberswithstandardnormalrealandimaginaryparts
w<ifelse(Mod(w)>1,1/w,w)
andtomapanyoutsidethecircleontotheirreciprocal.
plot(w,xlim=c(1,1),ylim=c(1,1),pch="+",xlab="x",ylab="y")
lines(z)
Allpointsareinsidetheunitcircle,butthedistributionisnotuniform.
w<sqrt(runif(100))*exp(2*pi*runif(100)*1i)
plot(w,xlim=c(1,1),ylim=c(1,1),pch="+",xlab="x",ylab="y")
lines(z)
Thesecondmethodusestheuniformdistribution.Thepointsshouldnowlookmore
evenlyspacedoverthedisc.
rm(th,w,z)
Cleanupagain.
q()
QuittheRprogram.YouwillbeaskedifyouwanttosavetheRworkspace,andforan
exploratorysessionlikethis,youprobablydonotwanttosaveit.
Next:Thecommandlineeditor,Previous:Asamplesession,Up:Top[Contents][Index]
AppendixBInvokingR
UsersofRonWindowsorOSXshouldreadtheOSspecificsectionfirst,butcommandlineuse
isalsosupported.
InvokingRfromthecommandline:
InvokingRunderWindows:
InvokingRunderOSX:
ScriptingwithR:
Next:InvokingRunderWindows,Previous:InvokingR,Up:InvokingR[Contents][Index]
B.1InvokingRfromthecommandline
WhenworkingatacommandlineonUNIXorWindows,thecommandRcanbeusedbothfor
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
92/116
5/28/2015
AnIntroductiontoR
startingthemainRprogramintheform
R[options][<infile][>outfile],
or,viatheRCMDinterface,asawrappertovariousRtools(e.g.,forprocessingfilesinR
documentationformatormanipulatingaddonpackages)whicharenotintendedtobecalled
directly.
AttheWindowscommandline,Rterm.exeispreferredtoR.
YouneedtoensurethateithertheenvironmentvariableTMPDIRisunsetoritpointstoavalid
placetocreatetemporaryfilesanddirectories.
MostoptionscontrolwhathappensatthebeginningandattheendofanRsession.Thestartup
mechanismisasfollows(seealsotheonlinehelpfortopicStartupformoreinformation,and
thesectionbelowforsomeWindowsspecificdetails).
Unlessnoenvironwasgiven,Rsearchesforuserandsitefilestoprocessforsetting
environmentvariables.Thenameofthesitefileistheonepointedtobytheenvironment
variableR_ENVIRONifthisisunset,R_HOME/etc/Renviron.siteisused(ifitexists).The
userfileistheonepointedtobytheenvironmentvariableR_ENVIRON_USERifthisisset
otherwise,files.Renvironinthecurrentorintheusershomedirectory(inthatorder)are
searchedfor.Thesefilesshouldcontainlinesoftheformname=value.(See
help("Startup")foraprecisedescription.)Variablesyoumightwanttosetinclude
R_PAPERSIZE(thedefaultpapersize),R_PRINTCMD(thedefaultprintcommand)andR_LIBS
(specifiesthelistofRlibrarytreessearchedforaddonpackages).
ThenRsearchesforthesitewidestartupprofileunlessthecommandlineoptionno
sitefilewasgiven.ThenameofthisfileistakenfromthevalueoftheR_PROFILE
environmentvariable.Ifthatvariableisunset,thedefaultR_HOME/etc/Rprofile.siteis
usedifthisexists.
Then,unlessnoinitfilewasgiven,Rsearchesforauserprofileandsourcesit.The
nameofthisfileistakenfromtheenvironmentvariableR_PROFILE_USERifunset,afile
called.Rprofileinthecurrentdirectoryorintheusershomedirectory(inthatorder)is
searchedfor.
Italsoloadsasavedworkspacefromfile.RDatainthecurrentdirectoryifthereisone
(unlessnorestoreornorestoredatawasspecified).
Finally,ifafunction.First()exists,itisexecuted.Thisfunction(aswellas.Last()
whichisexecutedattheendoftheRsession)canbedefinedintheappropriatestartup
profiles,orresidein.RData.
Inaddition,thereareoptionsforcontrollingthememoryavailabletotheRprocess(seetheon
linehelpfortopicMemoryformoreinformation).Userswillnotnormallyneedtousethese
unlesstheyaretryingtolimittheamountofmemoryusedbyR.
Racceptsthefollowingcommandlineoptions.
help
h
Printshorthelpmessagetostandardoutputandexitsuccessfully.
version
Printversioninformationtostandardoutputandexitsuccessfully.
encoding=enc
Specifytheencodingtobeassumedforinputfromtheconsoleorstdin.Thisneedstobe
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
93/116
5/28/2015
AnIntroductiontoR
anencodingknowntoiconv:seeitshelppage.(encodingencisalsoaccepted.)The
inputisreencodedtothelocaleRisrunninginandneedstoberepresentableinthe
lattersencoding(soe.g.youcannotreencodeGreektextinaFrenchlocaleunlessthat
localeusestheUTF8encoding).
RHOME
PrintthepathtotheRhomedirectorytostandardoutputandexitsuccessfully.Apart
fromthefrontendshellscriptandthemanpage,Rinstallationputseverything
(executables,packages,etc.)intothisdirectory.
save
nosave
ControlwhetherdatasetsshouldbesavedornotattheendoftheRsession.Ifneitheris
giveninaninteractivesession,theuserisaskedforthedesiredbehaviorwhenendingthe
sessionwithq()innoninteractiveuseoneofthesemustbespecifiedorimpliedbysome
otheroption(seebelow).
noenviron
Donotreadanyuserfiletosetenvironmentvariables.
nositefile
Donotreadthesitewideprofileatstartup.
noinitfile
Donotreadtheusersprofileatstartup.
restore
norestore
norestoredata
Controlwhethersavedimages(file.RDatainthedirectorywhereRwasstarted)shouldbe
restoredatstartupornot.Thedefaultistorestore.(norestoreimpliesallthespecific
norestore*options.)
norestorehistory
Controlwhetherthehistoryfile(normallyfile.RhistoryinthedirectorywhereRwas
started,butcanbesetbytheenvironmentvariableR_HISTFILE)shouldberestoredat
startupornot.Thedefaultistorestore.
noRconsole
(Windowsonly)PreventloadingtheRconsolefileatstartup.
vanilla
Combinenosave,noenviron,nositefile,noinitfileandnorestore.
UnderWindows,thisalsoincludesnoRconsole.
ffile
file=file
(notRgui.exe)Takeinputfromfile:meansstdin.Impliesnosaveunlesssavehas
beenset.OnaUnixalike,shellmetacharactersshouldbeavoidedinfile(butspacesare
allowed).
eexpression
(notRgui.exe)Useexpressionasaninputline.Oneormoreeoptionscanbeused,but
nottogetherwithforfile.Impliesnosaveunlesssavehasbeenset.(Thereisa
limitof10,000bytesonthetotallengthofexpressionsusedinthisway.Expressions
containingspacesorshellmetacharacterswillneedtobequoted.)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
94/116
5/28/2015
AnIntroductiontoR
noreadline
(UNIXonly)Turnoffcommandlineeditingviareadline.ThisisusefulwhenrunningR
fromwithinEmacsusingtheESS(EmacsSpeaksStatistics)package.SeeThe
commandlineeditor,formoreinformation.Commandlineeditingisenabledfordefault
interactiveuse(seeinteractive).Thisoptionalsoaffectstildeexpansion:seethehelp
forpath.expand.
minvsize=N
minnsize=N
Forexpertuseonly:settheinitialtriggersizesforgarbagecollectionofvectorheap(in
bytes)andconscells(number)respectively.SuffixMspecifiesmegabytesormillionsof
cellsrespectively.Thedefaultsare6Mband350krespectivelyandcanalsobesetby
environmentvariablesR_NSIZEandR_VSIZE.
maxppsize=N
SpecifythemaximumsizeofthepointerprotectionstackasNlocations.Thisdefaultsto
10000,butcanbeincreasedtoallowlargeandcomplicatedcalculationstobedone.
Currentlythemaximumvalueacceptedis100000.
maxmemsize=N
(Windowsonly)SpecifyalimitfortheamountofmemorytobeusedbothforRobjects
andworkingareas.ThisissetbydefaulttothesmalleroftheamountofphysicalRAMin
themachineandfor32bitR,1.5Gb26,andmustbebetween32Mbandthemaximum
allowedonthatversionofWindows.
quiet
silent
q
Donotprintouttheinitialcopyrightandwelcomemessages.
slave
MakeRrunasquietlyaspossible.Thisoptionisintendedtosupportprogramswhichuse
Rtocomputeresultsforthem.Itimpliesquietandnosave.
interactive
(UNIXonly)AssertthatRreallyisbeingruninteractivelyevenifinputhasbeen
redirected:useifinputisfromaFIFOorpipeandfedfromaninteractiveprogram.(The
defaultistodeducethatRisbeingruninteractivelyifandonlyifstdinisconnectedtoa
terminalorpty.)Usinge,forfileassertsnoninteractiveuseevenifinteractive
isgiven.
Notethatthisdoesnotturnoncommandlineediting.
ess
(Windowsonly)SetRtermupforusebyRinferiormodeinESS,includingasserting
interactiveuse(withoutthecommandlineeditor)andnobufferingofstdout.
verbose
Printmoreinformationaboutprogress,andinparticularsetRsoptionverbosetoTRUE.R
codeusesthisoptiontocontroltheprintingofdiagnosticmessages.
debugger=name
dname
(UNIXonly)RunRthroughdebuggername.Formostdebuggers(theexceptionsare
valgrindandrecentversionsofgdb),furthercommandlineoptionsaredisregarded,and
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
95/116
5/28/2015
AnIntroductiontoR
shouldinsteadbegivenwhenstartingtheRexecutablefrominsidethedebugger.
gui=type
gtype
(UNIXonly)Usetypeasgraphicaluserinterface(notethatthisalsoincludesinteractive
graphics).Currently,possiblevaluesfortypeareX11(thedefault)and,providedthat
Tcl/Tksupportisavailable,Tk.(Forbackcompatibility,x11andtkareaccepted.)
arch=name
(UNIXonly)Runthespecifiedsubarchitecture.
args
Thisflagdoesnothingexceptcausetherestofthecommandlinetobeskipped:thiscanbe
usefultoretrievevaluesfromitwithcommandArgs(TRUE).
Notethatinputandoutputcanberedirectedintheusualway(using<and>),buttheline
lengthlimitof4095bytesstillapplies.Warninganderrormessagesaresenttotheerrorchannel
(stderr).
ThecommandRCMDallowstheinvocationofvarioustoolswhichareusefulinconjunctionwith
R,butnotintendedtobecalleddirectly.Thegeneralformis
RCMDcommandargs
wherecommandisthenameofthetoolandargstheargumentspassedontoit.
Currently,thefollowingtoolsareavailable.
BATCH
RunRinbatchmode.RunsRrestoresavewithpossiblyfurtheroptions(see?
BATCH).
COMPILE
(UNIXonly)CompileC,C++,FortranfilesforusewithR.
SHLIB
Buildsharedlibraryfordynamicloading.
INSTALL
Installaddonpackages.
REMOVE
Removeaddonpackages.
build
Build(thatis,package)addonpackages.
check
Checkaddonpackages.
LINK
(UNIXonly)Frontendforcreatingexecutableprograms.
Rprof
PostprocessRprofilingfiles.
Rdconv
Rd2txt
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
96/116
5/28/2015
AnIntroductiontoR
ConvertRdformattovariousotherformats,includingHTML,LaTeX,plaintext,and
extractingtheexamples.Rd2txtcanbeusedasshorthandforRd2convttxt.
Rd2pdf
ConvertRdformattoPDF.
Stangle
ExtractS/RcodefromSweaveorothervignettedocumentation
Sweave
ProcessSweaveorothervignettedocumentation
Rdiff
DiffRoutputignoringheadersetc
config
Obtainconfigurationinformation
javareconf
(Unixonly)UpdatetheJavaconfigurationvariables
rtags
(Unixonly)CreateEmacsstyletagfilesfromC,R,andRdfiles
open
(Windowsonly)OpenafileviaWindowsfileassociations
texify
(Windowsonly)Process(La)TeXfileswithRsstylefiles
Use
RCMDcommandhelp
toobtainusageinformationforeachofthetoolsaccessibleviatheRCMDinterface.
Inaddition,youcanuseoptionsarch=,noenviron,noinitfile,nositefileand
vanillabetweenRandCMD:theseaffectanyRprocessesrunbythetools.(Herevanillais
equivalenttonoenvironnositefilenoinitfile.)However,notethatRCMDdoesnot
ofitselfuseanyRstartupfiles(inparticular,neitherusernorsiteRenvironfiles),andalloftheR
processesrunbythesetools(exceptBATCH)usenorestore.Mostusevanillaandsoinvoke
noRstartupfiles:thecurrentexceptionsareINSTALL,REMOVE,SweaveandSHLIB(whichuses
nositefilenoinitfile).
RCMDcmdargs
foranyotherexecutablecmdonthepathorgivenbyanabsolutefilepath:thisisusefultohave
thesameenvironmentasRorthespecificcommandsrununder,forexampletorunlddor
pdflatex.UnderWindowscmdcanbeanexecutableorabatchfile,orifithasextension.shor
.pltheappropriateinterpreter(ifavailable)iscalledtorunit.
Next:InvokingRunderOSX,Previous:InvokingRfromthecommandline,Up:InvokingR
[Contents][Index]
B.2InvokingRunderWindows
TherearetwowaystorunRunderWindows.Withinaterminalwindow(e.g.cmd.exeoramore
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
97/116
5/28/2015
AnIntroductiontoR
capableshell),themethodsdescribedintheprevioussectionmaybeused,invokingbyR.exeor
moredirectlybyRterm.exe.Forinteractiveuse,thereisaconsolebasedGUI(Rgui.exe).
ThestartupprocedureunderWindowsisverysimilartothatunderUNIX,butreferencestothe
homedirectoryneedtobeclarified,asthisisnotalwaysdefinedonWindows.Ifthe
environmentvariableR_USERisdefined,thatgivesthehomedirectory.Next,iftheenvironment
variableHOMEisdefined,thatgivesthehomedirectory.Afterthosetwousercontrollablesettings,
Rtriestofindsystemdefinedhomedirectories.ItfirsttriestousetheWindows"personal"
directory(typicallyC:\DocumentsandSettings\username\MyDocumentsinWindowsXP).Ifthat
fails,andenvironmentvariablesHOMEDRIVEandHOMEPATHaredefined(andtheynormallyare)
thesedefinethehomedirectory.Failingallthose,thehomedirectoryistakentobethestarting
directory.
YouneedtoensurethateithertheenvironmentvariablesTMPDIR,TMPandTEMPareeitherunsetor
oneofthempointstoavalidplacetocreatetemporaryfilesanddirectories.
Environmentvariablescanbesuppliedasname=valuepairsonthecommandline.
Ifthereisanargumentending.RData(inanycase)itisinterpretedasthepathtotheworkspace
toberestored:itimpliesrestoreandsetstheworkingdirectorytotheparentofthenamedfile.
(ThismechanismisusedfordraganddropandfileassociationwithRGui.exe,butalsoworksfor
Rterm.exe.Ifthenamedfiledoesnotexistitsetstheworkingdirectoryiftheparentdirectory
exists.)
ThefollowingadditionalcommandlineoptionsareavailablewheninvokingRGui.exe.
mdi
sdi
nomdi
ControlwhetherRguiwilloperateasanMDIprogram(withmultiplechildwindows
withinonemainwindow)oranSDIapplication(withmultipletoplevelwindowsforthe
console,graphicsandpager).Thecommandlinesettingoverridesthesettingintheusers
Rconsolefile.
debug
EnabletheBreaktodebuggermenuiteminRgui,andtriggerabreaktothedebugger
duringcommandlineprocessing.
UnderWindowswithRCMDyoumayalsospecifyyourown.bat,.exe,.shor.plfile.Itwillbe
runundertheappropriateinterpreter(Perlfor.pl)withseveralenvironmentvariablesset
appropriately,includingR_HOME,R_OSTYPE,PATH,BSTINPUTSandTEXINPUTS.Forexample,ifyou
alreadyhavelatex.exeonyourpath,then
RCMDlatex.exemydoc
willrunLaTeXonmydoc.tex,withthepathtoRsshare/texmfmacrosappendedtoTEXINPUTS.
(Unfortunately,thisdoesnothelpwiththeMiKTeXbuildofLaTeX,butRCMDtexifymydoc
willworkinthatcase.)
Next:ScriptingwithR,Previous:InvokingRunderWindows,Up:InvokingR[Contents]
[Index]
B.3InvokingRunderOSX
TherearetwowaystorunRunderOSX.WithinaTerminal.appwindowbyinvokingR,the
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
98/116
5/28/2015
AnIntroductiontoR
methodsdescribedinthefirstsubsectionapply.ThereisalsoconsolebasedGUI(R.app)thatby
defaultisinstalledintheApplicationsfolderonyoursystem.Itisastandarddoubleclickable
OSXapplication.
ThestartupprocedureunderOSXisverysimilartothatunderUNIX,butR.appdoesnotmake
useofcommandlinearguments.ThehomedirectoryistheoneinsidetheR.framework,butthe
startupandcurrentworkingdirectoryaresetastheusershomedirectoryunlessadifferent
startupdirectoryisgiveninthePreferenceswindowaccessiblefromwithintheGUI.
Previous:InvokingRunderOSX,Up:InvokingR[Contents][Index]
B.4ScriptingwithR
Ifyoujustwanttorunafilefoo.RofRcommands,therecommendedwayistouseRCMDBATCH
foo.R.IfyouwanttorunthisinthebackgroundorasabatchjobuseOSspecificfacilitiestodo
so:forexampleinmostshellsonUnixalikeOSesRCMDBATCHfoo.R&runsabackgroundjob.
Youcanpassparameterstoscriptsviaadditionalargumentsonthecommandline:forexample
(wheretheexactquotingneededwilldependontheshellinuse)
RCMDBATCH"argsarg1arg2"foo.R&
willpassargumentstoascriptwhichcanberetrievedasacharactervectorby
args<commandArgs(TRUE)
ThisismadesimplerbythealternativefrontendRscript,whichcanbeinvokedby
Rscriptfoo.Rarg1arg2
andthiscanalsobeusedtowriteexecutablescriptfileslike(atleastonUnixalikes,andinsome
Windowsshells)
#!/path/to/Rscript
args<commandArgs(TRUE)
...
q(status=<exitstatuscode>)
Ifthisisenteredintoatextfilerunfooandthisismadeexecutable(bychmod755runfoo),itcan
beinvokedfordifferentargumentsby
runfooarg1arg2
Forfurtheroptionsseehelp("Rscript").ThiswritesRoutputtostdoutandstderr,andthiscan
beredirectedintheusualwayfortheshellrunningthecommand.
IfyoudonotwishtohardcodethepathtoRscriptbuthaveitinyourpath(whichisnormallythe
caseforaninstalledRexceptonWindows,bute.g.OSXusersmayneedtoadd/usr/local/bin
totheirpath),use
#!/usr/bin/envRscript
...
AtleastinBourneandbashshells,the#!mechanismdoesnotallowextraargumentslike#!
/usr/bin/envRscriptvanilla.
Onethingtoconsideriswhatstdin()refersto.ItiscommonplacetowriteRscriptswith
segmentslike
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
99/116
5/28/2015
AnIntroductiontoR
chem<scan(n=24)
2.903.103.403.403.703.702.802.502.402.402.702.20
5.283.373.033.0328.953.773.402.203.503.603.703.70
andstdin()referstothescriptfiletoallowsuchtraditionalusage.Ifyouwanttorefertothe
processsstdin,use"stdin"asafileconnection,e.g.scan("stdin",...).
Anotherwaytowriteexecutablescriptfiles(suggestedbyFranoisPinard)istouseahere
documentlike
#!/bin/sh
[environmentvariablescanbesethere]
Rslave[otheroptions]<<EOF
Rprogramgoeshere...
EOF
butherestdin()referstotheprogramsourceand"stdin"willnotbeusable.
ShortscriptscanbepassedtoRscriptonthecommandlineviatheeflag.(Emptyscriptsare
notaccepted.)
NotethatonaUnixaliketheinputfilename(suchasfoo.R)shouldnotcontainspacesnorshell
metacharacters.
Next:Functionandvariableindex,Previous:InvokingR,Up:Top[Contents][Index]
AppendixCThecommandlineeditor
C.1Preliminaries
WhentheGNUreadlinelibraryisavailableatthetimeRisconfiguredforcompilationunder
UNIX,aninbuiltcommandlineeditorallowingrecall,editingandresubmissionofprior
commandsisused.Notethatotherversionsofreadlineexistandmaybeusedbytheinbuilt
commandlineeditor:thisusedtohappenonOSX.
Itcanbedisabled(usefulforusagewithESS27)usingthestartupoptionnoreadline.
WindowsversionsofRhavesomewhatsimplercommandlineediting:seeConsoleunderthe
HelpmenuoftheGUI,andthefileREADME.RtermforcommandlineeditingunderRterm.exe.
WhenusingRwithreadlinecapabilities,thefunctionsdescribedbelowareavailable,aswellas
others(probably)documentedinmanreadlineorinforeadlineonyoursystem.
ManyoftheseuseeitherControlorMetacharacters.Controlcharacters,suchasControlm,are
obtainedbyholdingtheCTRLdownwhileyoupressthemkey,andarewrittenasCmbelow.Meta
characters,suchasMetab,aretypedbyholdingdownMETA28andpressingb,andwrittenasMb
inthefollowing.IfyourterminaldoesnothaveaMETAkeyenabled,youcanstilltypeMeta
charactersusingtwocharactersequencesstartingwithESC.Thus,toenterMb,youcouldtype
ESCb.TheESCcharactersequencesarealsoallowedonterminalswithrealMetakeys.Notethat
caseissignificantforMetacharacters.
C.2Editingactions
TheRprogramkeepsahistoryofthecommandlinesyoutype,includingtheerroneouslines,
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
100/116
5/28/2015
AnIntroductiontoR
andcommandsinyourhistorymayberecalled,changedifnecessary,andresubmittedasnew
commands.InEmacsstylecommandlineeditinganystraighttypingyoudowhileinthisediting
phasecausesthecharacterstobeinsertedinthecommandyouareediting,displacingany
characterstotherightofthecursor.InvimodecharacterinsertionmodeisstartedbyMiorMa,
charactersaretypedandinsertionmodeisfinishedbytypingafurtherESC.(Thedefaultis
Emacsstyle,andonlythatisdescribedhere:forvimodeseethereadlinedocumentation.)
PressingtheRETcommandatanytimecausesthecommandtoberesubmitted.
Othereditingactionsaresummarizedinthefollowingtable.
C.3Commandlineeditorsummary
Commandrecallandverticalmotion
Cp
Gotothepreviouscommand(backwardsinthehistory).
Cn
Gotothenextcommand(forwardsinthehistory).
Crtext
Findthelastcommandwiththetextstringinit.
Onmostterminals,youcanalsousetheupanddownarrowkeysinsteadofCpandCn,
respectively.
Horizontalmotionofthecursor
Ca
Gotothebeginningofthecommand.
Ce
Gototheendoftheline.
Mb
Gobackoneword.
Mf
Goforwardoneword.
Cb
Gobackonecharacter.
Cf
Goforwardonecharacter.
Onmostterminals,youcanalsousetheleftandrightarrowkeysinsteadofCbandCf,
respectively.
Editingandresubmission
text
Inserttextatthecursor.
Cftext
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
101/116
5/28/2015
AnIntroductiontoR
Appendtextafterthecursor.
DEL
Deletethepreviouscharacter(leftofthecursor).
Cd
Deletethecharacterunderthecursor.
Md
Deletetherestofthewordunderthecursor,andsaveit.
Ck
Deletefromcursortoendofcommand,andsaveit.
Cy
Insert(yank)thelastsavedtexthere.
Ct
Transposethecharacterunderthecursorwiththenext.
Ml
Changetherestofthewordtolowercase.
Mc
Changetherestofthewordtouppercase.
RET
ResubmitthecommandtoR.
ThefinalRETterminatesthecommandlineeditingsequence.
Thereadlinekeybindingscanbecustomizedintheusualwayviaa~/.inputrcfile.These
customizationscanbeconditionedonapplicationR,thatisbyincludingasectionlike
$ifR
"\Cxd":"q('no')\n"
$endif
Next:Conceptindex,Previous:Thecommandlineeditor,Up:Top[Contents][Index]
AppendixDFunctionandvariableindex
Jumpto: !%&*+./:<=>?^|~
ABCDEFGHIJKLMNOPQRSTUVW
X
IndexEntry
Section
!:
!=:
Logicalvectors
Logicalvectors
%*%:
Multiplication
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
102/116
5/28/2015
AnIntroductiontoR
%o%:
Theouterproductoftwoarrays
&:
&&:
Logicalvectors
Conditionalexecution
*:
Vectorarithmetic
+:
Vectorarithmetic
Vectorarithmetic
.:
.Last:
Updatingfittedmodels
Customizingtheenvironment
Customizingtheenvironment
/:
Vectorarithmetic
::
Generatingregularsequences
Namespaces
Namespaces
&
.
.First:
:
:::
::::
<
<:
<=:
Logicalvectors
Scope
Logicalvectors
==:
Logicalvectors
>:
Logicalvectors
Logicalvectors
<<:
>
>=:
?
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
103/116
5/28/2015
AnIntroductiontoR
?:
??:
Gettinghelp
Gettinghelp
^:
Vectorarithmetic
|:
||:
Logicalvectors
Conditionalexecution
~:
Formulaeforstatisticalmodels
A
abline:
Lowlevelplottingcommands
ace:
Somenonstandardmodels
add1:
Updatingfittedmodels
anova:
Genericfunctionsforextractingmodelinformation
anova:
ANOVAtables
aov:
Analysisofvarianceandmodelcomparison
aperm:
Generalizedtransposeofanarray
array:
Thearray()function
as.data.frame: Makingdataframes
as.vector:
Theconcatenationfunctionc()witharrays
attach:
attach()anddetach()
attr:
Gettingandsettingattributes
attr:
Gettingandsettingattributes
attributes:
Gettingandsettingattributes
attributes:
Gettingandsettingattributes
avas:
Somenonstandardmodels
axis:
Lowlevelplottingcommands
B
boxplot:
break:
bruto:
Oneandtwosampletests
Repetitiveexecution
Somenonstandardmodels
C
c:
c:
c:
Vectorsandassignment
Charactervectors
Theconcatenationfunctionc()witharrays
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
104/116
5/28/2015
AnIntroductiontoR
c:
Concatenatinglists
C:
Contrasts
cbind:
Formingpartitionedmatrices
coef:
Genericfunctionsforextractingmodelinformation
coefficients: Genericfunctionsforextractingmodelinformation
contour:
Displaygraphics
contrasts:
Contrasts
coplot:
Displayingmultivariatedata
cos:
Vectorarithmetic
crossprod:
Indexmatrices
crossprod:
Multiplication
cut:
Frequencytablesfromfactors
D
data:
data.frame:
density:
det:
detach:
determinant:
dev.list:
dev.next:
dev.off:
dev.prev:
dev.set:
deviance:
diag:
dim:
dotchart:
drop1:
Accessingbuiltindatasets
Makingdataframes
Examiningthedistributionofasetofdata
Singularvaluedecompositionanddeterminants
attach()anddetach()
Singularvaluedecompositionanddeterminants
Multiplegraphicsdevices
Multiplegraphicsdevices
Multiplegraphicsdevices
Multiplegraphicsdevices
Multiplegraphicsdevices
Genericfunctionsforextractingmodelinformation
Multiplication
Arrays
Displaygraphics
Updatingfittedmodels
E
ecdf:
edit:
eigen:
else:
Error:
example:
exp:
Examiningthedistributionofasetofdata
Editingdata
Eigenvaluesandeigenvectors
Conditionalexecution
Analysisofvarianceandmodelcomparison
Gettinghelp
Vectorarithmetic
F
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
105/116
5/28/2015
AnIntroductiontoR
F:
factor:
FALSE:
fivenum:
for:
formula:
function:
Logicalvectors
Factors
Logicalvectors
Examiningthedistributionofasetofdata
Repetitiveexecution
Genericfunctionsforextractingmodelinformation
Writingyourownfunctions
G
getAnywhere:
getS3method:
glm:
Objectorientation
Objectorientation
Theglm()function
H
help:
help:
help.search:
help.start:
hist:
hist:
Gettinghelp
Gettinghelp
Gettinghelp
Gettinghelp
Examiningthedistributionofasetofdata
Displaygraphics
I
identify:
is.nan:
Interactingwithgraphics
Conditionalexecution
Conditionalexecution
Conditionalexecution
Displaygraphics
Missingvalues
Missingvalues
jpeg:
Devicedrivers
ks.test:
Examiningthedistributionofasetofdata
legend:
Lowlevelplottingcommands
Vectorarithmetic
Theintrinsicattributesmodeandlength
Factors
if:
if:
ifelse:
image:
is.na:
L
length:
length:
levels:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
106/116
5/28/2015
AnIntroductiontoR
lines:
list:
lm:
lme:
locator:
loess:
loess:
log:
lqs:
lsfit:
Lowlevelplottingcommands
Lists
Linearmodels
Somenonstandardmodels
Interactingwithgraphics
Somenonstandardmodels
Somenonstandardmodels
Vectorarithmetic
Somenonstandardmodels
LeastsquaresfittingandtheQRdecomposition
M
mars:
max:
mean:
methods:
min:
mode:
Somenonstandardmodels
Vectorarithmetic
Vectorarithmetic
Objectorientation
Vectorarithmetic
Theintrinsicattributesmodeandlength
N
NA:
NaN:
ncol:
next:
nlm:
nlm:
nlm:
nlme:
nlminb:
nrow:
Missingvalues
Missingvalues
Matrixfacilities
Repetitiveexecution
Nonlinearleastsquaresandmaximumlikelihoodmodels
Leastsquares
Maximumlikelihood
Somenonstandardmodels
Nonlinearleastsquaresandmaximumlikelihoodmodels
Matrixfacilities
O
optim:
order:
ordered:
ordered:
outer:
Nonlinearleastsquaresandmaximumlikelihoodmodels
Vectorarithmetic
Orderedfactors
Orderedfactors
Theouterproductoftwoarrays
P
pairs:
par:
Displayingmultivariatedata
Thepar()function
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
107/116
5/28/2015
AnIntroductiontoR
paste:
pdf:
persp:
plot:
plot:
pmax:
pmin:
png:
points:
polygon:
postscript:
predict:
print:
prod:
Charactervectors
Devicedrivers
Displaygraphics
Genericfunctionsforextractingmodelinformation
Theplot()function
Vectorarithmetic
Vectorarithmetic
Devicedrivers
Lowlevelplottingcommands
Lowlevelplottingcommands
Devicedrivers
Genericfunctionsforextractingmodelinformation
Genericfunctionsforextractingmodelinformation
Vectorarithmetic
Q
qqline:
qqline:
qqnorm:
qqnorm:
qqplot:
qr:
quartz:
Examiningthedistributionofasetofdata
Displaygraphics
Examiningthedistributionofasetofdata
Displaygraphics
Displaygraphics
LeastsquaresfittingandtheQRdecomposition
Devicedrivers
R
range:
rbind:
read.table:
rep:
repeat:
resid:
residuals:
rlm:
rm:
Vectorarithmetic
Formingpartitionedmatrices
Theread.table()function
Generatingregularsequences
Repetitiveexecution
Genericfunctionsforextractingmodelinformation
Genericfunctionsforextractingmodelinformation
Somenonstandardmodels
Datapermanencyandremovingobjects
S
scan:
Thescan()function
sd:
Thefunctiontapply()andraggedarrays
search:
Managingthesearchpath
seq:
Generatingregularsequences
shapiro.test: Examiningthedistributionofasetofdata
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
108/116
5/28/2015
AnIntroductiontoR
sin:
sink:
solve:
sort:
source:
split:
sqrt:
stem:
step:
step:
sum:
summary:
summary:
svd:
Vectorarithmetic
Executingcommandsfromordivertingoutputtoafile
Linearequationsandinversion
Vectorarithmetic
Executingcommandsfromordivertingoutputtoafile
Repetitiveexecution
Vectorarithmetic
Examiningthedistributionofasetofdata
Genericfunctionsforextractingmodelinformation
Updatingfittedmodels
Vectorarithmetic
Examiningthedistributionofasetofdata
Genericfunctionsforextractingmodelinformation
Singularvaluedecompositionanddeterminants
T
T:
t:
t.test:
table:
table:
tan:
tapply:
text:
title:
tree:
TRUE:
Logicalvectors
Generalizedtransposeofanarray
Oneandtwosampletests
Indexmatrices
Frequencytablesfromfactors
Vectorarithmetic
Thefunctiontapply()andraggedarrays
Lowlevelplottingcommands
Lowlevelplottingcommands
Somenonstandardmodels
Logicalvectors
U
unclass:
update:
Theclassofanobject
Updatingfittedmodels
V
var:
vector:
Vectorarithmetic
Thefunctiontapply()andraggedarrays
Oneandtwosampletests
Genericfunctionsforextractingmodelinformation
Vectorsandassignment
while:
Repetitiveexecution
var:
var.test:
vcov:
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
109/116
5/28/2015
AnIntroductiontoR
wilcox.test:
windows:
Oneandtwosampletests
Devicedrivers
X11:
Devicedrivers
Jumpto: !%&*+./:<=>?^|~
ABCDEFGHIJKLMNOPQRSTUVW
X
Next:References,Previous:Functionandvariableindex,Up:Top[Contents][Index]
AppendixEConceptindex
Jumpto: ABCDEFGIKLMNOPQRSTUVW
IndexEntry
Section
A
Accessingbuiltindatasets:
Additivemodels:
Analysisofvariance:
Arithmeticfunctionsand
operators:
Arrays:
Assignment:
Attributes:
Accessingbuiltindatasets
Somenonstandardmodels
Analysisofvarianceandmodelcomparison
Vectorarithmetic
Binaryoperators:
Boxplots:
Definingnewbinaryoperators
Oneandtwosampletests
Charactervectors:
Classes:
Classes:
Concatenatinglists:
Contrasts:
Controlstatements:
CRAN:
Customizingtheenvironment:
Charactervectors
Theclassofanobject
Objectorientation
Concatenatinglists
Contrasts
Controlstatements
ContributedpackagesandCRAN
Customizingtheenvironment
Arrays
Vectorsandassignment
Objects
D
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
110/116
5/28/2015
AnIntroductiontoR
Dataframes:
Defaultvalues:
Densityestimation:
Determinants:
Divertinginputandoutput:
Dynamicgraphics:
Dataframes
Namedargumentsanddefaults
Examiningthedistributionofasetofdata
Singularvaluedecompositionanddeterminants
Executingcommandsfromordivertingoutputtoafile
Dynamicgraphics
Eigenvaluesandeigenvectors:
EmpiricalCDFs:
Eigenvaluesandeigenvectors
Examiningthedistributionofasetofdata
Factors:
Factors:
Families:
Formulae:
Factors
Contrasts
Families
Formulaeforstatisticalmodels
Generalizedlinearmodels:
Generalizedtransposeofanarray:
Genericfunctions:
Graphicsdevicedrivers:
Graphicsparameters:
Groupedexpressions:
Generalizedlinearmodels
Generalizedtransposeofanarray
Objectorientation
Devicedrivers
Thepar()function
Groupedexpressions
Indexingofandbyarrays:
Indexingvectors:
Arrayindexing
Indexvectors
KolmogorovSmirnovtest:
Examiningthedistributionofasetofdata
Leastsquaresfitting:
Linearequations:
LeastsquaresfittingandtheQRdecomposition
Linearequationsandinversion
Linearmodels:
Lists:
Localapproximatingregressions:
Loopsandconditionalexecution:
Linearmodels
Lists
Somenonstandardmodels
Loopsandconditionalexecution
M
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
111/116
5/28/2015
AnIntroductiontoR
Matrices:
Matrixmultiplication:
Maximumlikelihood:
Missingvalues:
Mixedmodels:
Arrays
Multiplication
Maximumlikelihood
Missingvalues
Somenonstandardmodels
Namedarguments:
Namespace:
Nonlinearleastsquares:
Namedargumentsanddefaults
Namespaces
Nonlinearleastsquaresandmaximumlikelihood
models
Objectorientation:
Objects:
Oneandtwosampletests:
Orderedfactors:
Orderedfactors:
Outerproductsofarrays:
Objectorientation
Objects
Oneandtwosampletests
Factors
Contrasts
Theouterproductoftwoarrays
Packages:
Packages:
Probabilitydistributions:
Randstatistics
Packages
Probabilitydistributions
QRdecomposition:
Quantilequantileplots:
LeastsquaresfittingandtheQRdecomposition
Examiningthedistributionofasetofdata
Readingdatafromfiles:
Recyclingrule:
Recyclingrule:
Readingdatafromfiles
Vectorarithmetic
Therecyclingrule
Regularsequences:
Removingobjects:
Robustregression:
Generatingregularsequences
Datapermanencyandremovingobjects
Somenonstandardmodels
Scope:
Searchpath:
ShapiroWilktest:
Scope
Managingthesearchpath
Examiningthedistributionofasetofdata
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
112/116
5/28/2015
AnIntroductiontoR
Singularvaluedecomposition:
Statisticalmodels:
Studentsttest:
Singularvaluedecompositionanddeterminants
StatisticalmodelsinR
Oneandtwosampletests
Tabulation:
Treebasedmodels:
Frequencytablesfromfactors
Somenonstandardmodels
Updatingfittedmodels:
Updatingfittedmodels
Vectors:
Simplemanipulationsnumbersandvectors
Wilcoxontest:
Workspace:
Writingfunctions:
Oneandtwosampletests
Datapermanencyandremovingobjects
Writingyourownfunctions
Jumpto: ABCDEFGIKLMNOPQRSTUVW
Previous:Conceptindex,Up:Top[Contents][Index]
AppendixFReferences
D.M.BatesandD.G.Watts(1988),NonlinearRegressionAnalysisandItsApplications.John
Wiley&Sons,NewYork.
RichardA.Becker,JohnM.ChambersandAllanR.Wilks(1988),TheNewSLanguage.
Chapman&Hall,NewYork.ThisbookisoftencalledtheBlueBook.
JohnM.ChambersandTrevorJ.Hastieeds.(1992),StatisticalModelsinS.Chapman&Hall,
NewYork.ThisisalsocalledtheWhiteBook.
JohnM.Chambers(1998)ProgrammingwithData.Springer,NewYork.Thisisalsocalledthe
GreenBook.
A.C.DavisonandD.V.Hinkley(1997),BootstrapMethodsandTheirApplications,Cambridge
UniversityPress.
AnnetteJ.Dobson(1990),AnIntroductiontoGeneralizedLinearModels,ChapmanandHall,
London.
PeterMcCullaghandJohnA.Nelder(1989),GeneralizedLinearModels.Secondedition,
ChapmanandHall,London.
JohnA.Rice(1995),MathematicalStatisticsandDataAnalysis.Secondedition.DuxburyPress,
Belmont,CA.
S.D.Silvey(1970),StatisticalInference.Penguin,London.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
113/116
5/28/2015
AnIntroductiontoR
Footnotes
(1)
ACMSoftwareSystemsaward,1998:
http://awards.acm.org/award_winners/chambers_6640862.cfm.
(2)
ForportableRcode(includingthattobeusedinRpackages)onlyAZaz09shouldbeused.
(3)
notinsidestrings,norwithintheargumentlistofafunctiondefinition
(4)
someoftheconsoleswillnotallowyoutoentermore,andamongstthosewhichdosomewill
silentlydiscardtheexcessandsomewilluseitasthestartofthenextline.
(5)
ofunlimitedlength.
(6)
TheleadingdotinthisfilenamemakesitinvisibleinnormalfilelistingsinUNIX,andin
defaultGUIfilelistingsonOSXandWindows.
(7)
Withotherthanvectortypesofargument,suchaslistmodearguments,theactionofc()is
ratherdifferent.SeeConcatenatinglists.
(8)
Actually,itisstillavailableas.Last.valuebeforeanyotherstatementsareexecuted.
(9)
paste(...,collapse=ss)joinstheargumentsintoasinglecharacterstringputtingssinbetween,
e.g.,ss<"|".Therearemoretoolsforcharactermanipulation,seethehelpforsuband
substring.
(10)
numericmodeisactuallyanamalgamoftwodistinctmodes,namelyintegeranddouble
precision,asexplainedinthemanual.
(11)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
114/116
5/28/2015
AnIntroductiontoR
Notehoweverthatlength(object)doesnotalwayscontainintrinsicusefulinformation,e.g.,
whenobjectisafunction.
(12)
Ingeneral,coercionfromnumerictocharacterandbackagainwillnotbeexactlyreversible,
becauseofroundofferrorsinthecharacterrepresentation.
(13)
AdifferentstyleusingformalorS4classesisprovidedinpackagemethods.
(14)
ReadersshouldnotethatthereareeightstatesandterritoriesinAustralia,namelytheAustralian
CapitalTerritory,NewSouthWales,theNorthernTerritory,Queensland,SouthAustralia,
Tasmania,VictoriaandWesternAustralia.
(15)
Notethattapply()alsoworksinthiscasewhenitssecondargumentisnotafactor,e.g.,
tapply(incomes,state),andthisistrueforquiteafewotherfunctions,sinceargumentsare
coercedtofactorswhennecessary(usingas.factor()).
(16)
Notethatx%*%xisambiguous,asitcouldmeaneitherxxorxx,wherexisthecolumnform.
Insuchcasesthesmallermatrixseemsimplicitlytobetheinterpretationadopted,sothescalar
xxisinthiscasetheresult.Thematrixxxmaybecalculatedeitherbycbind(x)%*%xorx%*%
rbind(x)sincetheresultofrbind()orcbind()isalwaysamatrix.However,thebestwayto
computexxorxxiscrossprod(x)orx%o%xrespectively.
(17)
EvenbetterwouldbetoformamatrixsquarerootBwithA=BBandfindthesquaredlengthof
thesolutionofBy=x,perhapsusingtheCholeskyoreigendecompositionofA.
(18)
ConversionofcharactercolumnstofactorsisoverriddenusingthestringsAsFactorsargument
tothedata.frame()function.
(19)
Seetheonlinehelpforautoloadforthemeaningofthesecondterm.
(20)
UnderUNIX,theutilitiessedorawkcanbeused.
(21)
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
115/116
5/28/2015
AnIntroductiontoR
tobediscussedlater,orusexyplotfrompackagelattice.
(22)
SeealsothemethodsdescribedinStatisticalmodelsinR
(23)
InsomesensethismimicsthebehaviorinSPLUSsinceinSPLUSthisoperatoralwayscreatesor
assignstoaglobalvariable.
(24)
SoitishiddenunderUNIX.
(25)
Somegraphicsparameterssuchasthesizeofthecurrentdeviceareforinformationonly.
(26)
2.5GbonversionsofWindowsthatsupport3Gbperprocessandhavethesupportenabled:see
therwFAQQ2.93.5Gbonmost64bitversionsofWindows.
(27)
TheEmacsSpeaksStatisticspackageseetheURLhttp://ESS.Rproject.org
(28)
OnaPCkeyboardthisisusuallytheAltkey,occasionallytheWindowskey.OnaMac
keyboardnormallynometakeyisavailable.
http://cran.rproject.org/doc/manuals/Rintro.html#One_002dandtwo_002dsampletests
116/116