0% found this document useful (0 votes)
664 views17 pages

A Guide To Sabermetrics

This document provides an overview and introduction to sabermetrics, the statistical analysis of baseball. It discusses some key aspects of sabermetrics including: - What sabermetrics is, as originally defined by Bill James in 1980 as "the search for objective knowledge about baseball". - Some pioneers of sabermetrics included Branch Rickey in the 1940s, Henry Chadwick in the mid-19th century who developed stats like batting average, and Earl Weaver in the 1960s. - The Society for American Baseball Research (SABR) was founded in 1971 and helped popularize sabermetric research through publications and conferences. - Today, sabermetric research is widespread on websites and blogs, making it more difficult

Uploaded by

TigerPoke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
664 views17 pages

A Guide To Sabermetrics

This document provides an overview and introduction to sabermetrics, the statistical analysis of baseball. It discusses some key aspects of sabermetrics including: - What sabermetrics is, as originally defined by Bill James in 1980 as "the search for objective knowledge about baseball". - Some pioneers of sabermetrics included Branch Rickey in the 1940s, Henry Chadwick in the mid-19th century who developed stats like batting average, and Earl Weaver in the 1960s. - The Society for American Baseball Research (SABR) was founded in 1971 and helped popularize sabermetric research through publications and conferences. - Today, sabermetric research is widespread on websites and blogs, making it more difficult

Uploaded by

TigerPoke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 17

A GUIDE TO SABERMETRIC RESEARCH

We'reoftenasked,"I'dliketoknowmoreaboutsabermetrics,butwheredoIbegin?"LongtimeSABR
memberPhilBirnbaumhasauthoredAGuidetoSabermetricResearchtohelpansweryourquestions.We're
pleasedtopublishitatSABR.org/sabermetrics.BirnbaumistheeditoroftheSABRStatisticalAnalysis
Committeenewsletter,"BytheNumbers",andhecanbefoundwritingonvarioustopicsathis
blog,SabermetricResearch.

First,let'sgooversomebasics:
Whatissabermetrics?AsoriginallydefinedbyBillJamesin1980,sabermetricsis"thesearchfor
objectiveknowledgeaboutbaseball".JamescoinedthephraseinparttohonortheSocietyforAmerican
BaseballResearch.

Whoinventedsabermetrics?Statisticalanalysishasbeenaroundaslongasbaseballhasbeenplayed
competitively.LongbeforeMoneyballbecameaworldwidephenomenoninthe21stcenturyandbefore
BillJames'baseballwritingsgainedmainstreampopularityinthe1980s,HallofFamemanagerEarl
WeaverwasusingindexcardstofinetunehisplatooningsystemandpitchingchangeswiththeBaltimore
Oriolesinthe1960s,whileBranchRickeyhiredstatisticianAllanRothinthe1940stoevaluateplayer
performancewiththeBrooklynDodgers.Agenerationbeforethat,BaseballMagazineeditorF.C.
Lanewascreatingnewstatisticalmethodstomeasureoffensiveproduction,culminatinginhisclassic
bookofessays,Batting.Inthemid19thcentury,HenryChadwickiscreditedwithdevelopingthebox
scoreandhistabulationofhits,homerunsandtotalbasesledtotheformulationofmetricssuchas
battingaverageandsluggingpercentage.

SABRorsabermetrics?Withmorethan6,000membersaroundtheworld,SABRisamembership
organizationcomprisedofpassionateandknowledgeablebaseballfanswithavarietyofinterestsone
ofthembeingstatisticalanalysis.SABRmembersBillJames,PetePalmerandDickCramercofounded
SABR'sStatisticalAnalysisCommitteein1974andhelpedpopularizethestudyofsabermetrics.The
phrase"sabermetrics"itselfisinthepublicdomainandisgenerallyusedtodescribeanymathematicalor
statisticalstudyofbaseball.

Sabermetricresearchersoftenusestatisticalanalysistoquestiontraditionalmeasuresofbaseballevaluation
suchasbattingaverageandpitcherwins.Earlyon,James'theorieswerelargelymocked(orignored)bythe
baseballestablishment,butasJoePosnanskiwroteinTheBalladofBillJames,overtimehisworkstartedto
berecognized.TimeMagazineoncenamedhimoneofthe100mostinfluentialpeopleintheworld.The
BostonRedSoxhiredhimin2003andsubsequentlywontwoWorldSeries.Jamesisstillaskingrelevant
questionstodayatbilljamesonline.com,andsoarelegionsofhisdisciplessuchasRobNeyer,baseballeditor
atSBNation;Birnbaum;andallthegreatwritersatBaseballAnalysts,BaseballProspectus,BeyondtheBox
Score,FanGraphs,TheHardballTimesandothersites.

Wantaprimeronsabermetrics?CheckouttheFanGraphsLibraryfordowntoearthexplanationsof
advancedmetricssuchaswOBA(weightedonbaseaverage),FIP(fieldingindependentpitching)andWAR
(winsabovereplacement),writtenbySteveSlowinski.SABRmemberscanalsoreadcuttingedgearticleson
statisticalanalysisineveryissueoftheBaseballResearchJournal,suchasTheManyFlavorsofDIPS:AHistory
andOverview,byDanBascoandMichaelDavies.We'vegotafulllistofresourcesonourRelatedLinkspage
attheendofthissection.

BesuretocheckouttheannualSABRAnalyticsConference,wherewebringtogetherthetopmindsofthe
baseballanalyticcommunityunderonerooftodiscuss,debateandshareinsightfulwaystoanalyzeand
examinethegreatgameofbaseball.

Whetheryou'rejuststartingoutoryou'dlikearefreshercourse,whetheryou'reanumberswizardoryou
consideryourselfmathphobic,wehopeyou'llfindPhilBirnbaum'sGuidetoSabermetricResearchinformative
andinteresting.

TheBasics

Sabermetricswasfirstintroducedtoawidepublicin1982,withthefirstmassmarketpublicationoftheBill
JamesBaseballAbstract.Andformygenerationofsabermetriciansofacertainage,thiswastheveryfirst
sentenceaboutsabermetricsthatweeverread:

Ifyousometimesgetthefeelingbetweenhereandthebackcoverthat
youarecominginonthemiddleofadiscussion,itisbecauseyouare.

Thatis:BillJamesandahandfulofcolleagues,mostlySABRmembers,hadbeenworkingonabodyof
knowledgeforafewyears.Therewasanestablished,althoughprivate,literatureofsabermetrics,andpartof
Jamestaskwastoexplainwhathadalreadybeendiscovered,andhow.

Thatwasanumberofpeoplewhocouldcongregatepeacefullyintherestroomsintheleftfieldbleachersof
YankeeStadium,workingforafewyears,withoutcomputersorformalpublication.Still,thosefew
researchershadbuiltaconsiderablebaseofknowledgethatwehadtobecaughtupon.

Imagine,then,thesituationtoday.Sabermetricshasbeeninfullforcesincethemid1970s.ByThe
Numbers,theSABRStatisticalAnalysisCommitteenewsletter,hasbeenpublishingsincethelate1980s.
Beforethat,therewasBaseballAnalyst,BillJamesownsabermetricsjournalinthe1980s.Withtheadvent
ofRotisserie/FantasyBaseball,anindustryofprofessionalsabermetricsresearchsprangup.Publications
likeBaseballProspectusandBaseballForecasterdotheirownproprietaryresearchandpublishsomeofitin
theirannualbooks.

And,mostimportantly,inthepastfewyears,amateursabermetricshasfounditsstrideand,inmyopinion,
takenoverthelead.Inthepasthalfdecade,avastnumberofresearchershavepublishedtowebsitesand
blogs,givingusserious,stateoftheartresultsthatareinstantlyseenbythousandsinthecommunity,who
oftenbuildonthefindingsandtakethemfurther.

Fiveyearsago,Iwouldhavearguedthatthemainoutletsforsabermetricresearchwereprintpublications,
andthatafewbooksandwebsitescouldbringyoureasonablywelluptodateonwhatsabermetricianshad
learnedovertheyears.But,now,thingshavemovedsofastthatitshardtokeepup,especiallywitharticles
andpapersandstudiesspreadallovertheweb.
Itsalittlelikethesoftwareindustry.Inthe1990s,almostallsoftwarecameshrinkwrappedfromretail
stores,andmostofitwasbybigindustryplayers,suchasMicrosoftandIBM.Todaythatstillexists,butwith
opensource,shareware,filesharing,andhundredsofthirdpartyiPhoneappscreatedeveryyearwell,now
ittakessomeefforttokeeptrack.

Still,thebasicshaventchangedthatmuch.Aswithanyscience,theearliestdiscoveredprinciplestendtobe
themostfundamental,and,overtime,theregetstobeabitofanunwrittenconsensusofwhatfindingsare
mostimportant.SoImgoingtodomybestheretogiveyouashortreadinglistofclassicalsabermetrics,a
waytotrytogetagoodfeelforwhatsabermetricshasbeenuptooverthepastfewdecades.

TheBillJamesBaseballAbstract(1982)Thisworkisthreedecadesoldandcounting,anditsgettingharder
tofind.Still,itremainsthebestplacetolearnwhatsabermetricsis,howitworks,andhowsabermetricians
think.

ThatsallattributabletoBillJameshimself.Notonlydidhemakemostofthediscoveriesinthebook(there
wereothersabermetriciansactiveatthetime,butJameswaswellover90%ofthefield),buthiswritingstyle
makestheexplanationseffortless.AnythingbyBillJamesisajoytoread.

Ifyoucantfindthe1982edition,trywhateverotheryearsyoucanfind.Generally,theearliertheyear,the
morespaceisdevotedtothebasics.

Mathletics(PrincetonUniversityPress,2009)WayneWinston,aprofessorandconsultanttotheNBAs
DallasMavericks,wrotethis2009summaryofsabermetricsfindingsinvarioussports.Thebaseballsection
comprisesseventeenbasicexplanationsofvarioussabermetricprinciples,suchasrunscreated,streakiness
andmomentum,pitcherevaluation,andsituationalstrategy.

TheresnooriginalbaseballresearchinMathletics,butifyouwantaquickandconciseintroductiontosomeof
thebasicfindingsinthefield,thisisthebooktoget.

TheHiddenGameofBaseball(Doubleday,1984)Thisbook,bysabermetricianPetePalmerandbaseball
historianJohnThorn,isconsideredbymanytobethebibleofsabermetrics.Idconsideritacomplementto
theBillJamesbooks.

WhileJamesdevelopedsomemethodsandformulasbytrialanderror,Palmermineshistoricaldataand
showsthetheoreticalunderpinningsofthemethodsheuses.Ifyoulikeamoremathematicalapproachto
sabermetrics,thisistheworkthatlaysthefoundation.

ThornandPalmersbookwilltellyou,forinstance,thataleadoffdoublehelpstheteambyanaverageof.614
runs.Howdotheyknowthat?Well,theylookedatmanyyearsofplaybyplaydata,andtheyfoundthat,on
average,.454runsarescoredintheaverageinning.But,witharunneronsecondandnobodyout,anaverage
1.068runswerescored.Andso,thedoubleisworththedifferencebetweenthetwosituations,whichis0.614
runs.

WhiletheBillJamesBaseballAbstractisthephilosopherandtheoreticianofsabermetricthought,TheHidden
GameofBaseballisitsengineeringdepartment.

TheBook(Potomac,2007)Acollaborationbythreeexceptionalsabermetricians,TheBookstudiesover100
differentquestionsonbaseballstrategy.Whileitdoescovertopicsthathavepreviouslybeenstudiedby
others,itdoesso,usually,withmuchgreaterrigor.Forinstance,whenlookingatplayerperformancein
varioussituations,TheBookwilloftencorrectforpark,home/road,theidentityoftheopposingpitcher,the
ball/strikecount.Asaresult,itsconclusionsareverydetailedandverywellconsidered.

TheBookisintendedmoreforfanswithahardcoreinterestinsabermetricsandstrategyissues.Itsincluded
herebecauseithasbeensoinfluentialamongcurrentresearchers,andyouwillseeitswaysofthinking,
especiallyasdescribedinChapter1,repeatedlysurfaceinemergingresearch.

IfthethreebooksabovecomprisethereadinglistforSabermetrics101,thenTheBookcouldbethetextfor
Sabermetrics301or401.

AskingtheRightQuestions

Sabermetricsisthesearchforobjectiveknowledgeaboutbaseballthroughanalysisofthestatisticalrecord.
Atitsmostbasic,theevidenceisjustsimpleobservationandcounting.Forinstance,inearly2010,
sabermetricianDaveAllenwonderedifbetterhittersgetfewergoodpitchestohit.Helookedattwentyof
thebesthittersinbaseball,andtwentyoftheworst.Hefoundthat,atalmosteveryballstrikecount,the
betterhitterswerethrownfewerstrikesandfewerfastballs.Forinstance,on00,theworsthittersgot66%
fastballs,whilethebesthitterssawonlyabout63%.

Noteveryquestioninsabermetricsisthatsimple.Oneofthemostcontroversialquestionsinbaseball
statisticalanalysisisthatofclutchhitting.Dosomehittershavetheabilitytoturnitupwhenthegameis
ontheline,andperformbetterthanusual?Dootherhittershavetheoppositetendency,hittingbetterwhen
itdoesntmatterasmuch?

Manystudieshavebeendoneonthetopic,startingasmanyas30yearsago.In1977,DickCrameranalyzed
battingrecordsfrom1969and1970,andfoundonlyaveryslighttendencyforclutchplayerstorepeattheir
performanceinsubsequentseasons.In1990,PetePalmerstudiedclutchhittingovermultipleseasons,and
foundthattherewerealmostexactlyasmanyapparentclutchandchokehittersasyouwouldexpectby
luck,ifclutchhittingskilldidntexistatall.

Afewyearslater,TomRuanerepeatedaversionofPalmersstudy,usingalargerdataset,andgotroughly
thesameresult.

Finally,in2006,AndyDolphindidamoresophisticatedmathematicalanalysis,andfoundevidenceforavery,
veryslightvariationinhowplayersvariedintheclutch.Butheconcludedthatitsimpossibletodiscover,with
anydegreeofaccuracy,whichplayersarewhich,andthatforallpracticalpurposes,playersshouldbe
expectedtohitnobetterorworseintheclutchthantheirnormalperformancewouldsuggest.

Sabermetricsisascience,whichmeansthatitfollowsthescientificmethod.Conclusionsmustbebasedon
evidenceandlogic,andanyconclusionscanbereevaluatedoroverturnedifnew,contradictory,evidence
turnsup.Rightnow,theevidencesuggeststhatclutchskillisaveryminorfactorinplayerperformance,if
indeeditexistsatall.Itscertainlypossiblethatsomefuturesabermetricianwillfinddifferingdata,oraflaw
inthepreviousstudies,andforceustochangeourmindsonthequestion.Butifwedochangeourminds,it
willbebecauseofempiricalevidencefortheotherside.

APrimeronStatistics

Forthetypicalfan,sabermetricsdoesntrepresentanythingastheoreticalasscientificinquiry.Rather,
sabermetricsisassociatedwithnewandunfamiliarstatistics.OPSisthemostfamousofthosenewstats.Its
gonefromanearlyunknownstatisticintheearly80s,tobarelyusedadecadeago,tomainstreamnow(it
evenappearsonToppsbaseballcards).TherehavealsobeenstatslikeLinearWeights,Runs
Created,ExtrapolatedRuns,WAR,andsoon.

Idstillarguethatsabermetricsisntreallyaboutthosestatistics;rather,thestatisticshavebeenproventobe
usefulbasedonevidencethatsabermetricianshaveuncovered.RunsCreated,forinstance,isastatistic
thatwascreatedbyBillJamesinthelate1970s.Jamesthinkingwentthisway:ateamsjobonoffenseisto
scorerunsthemoreruns,thebetter.Supposeyoudidntknowhowmanyrunsateamscored,andwanted
tomakeanestimate,basedonitsbattingline.Forinstance,heresarealteambattingline:

G AB H 2B 3B HR BB K AVG
161 5517 1451 234 22 214 604 908 0.263

Howmanyrunswouldyouguessthatteamscoredthatyear?IfImadeyouguess,youdprobablylookovera
fewyearsofteamstatistics,trytofindsometeamthatwasreasonablyclose,andusethatasabaseline.You
mightfindateamthathit.267withlesspower,andscored788runs.Youdfigure,well,thisteamhitonly
.263,buttheyhadafewmorehomeruns,soIguessmaybetheydcancelout,soIdguessthesame788runs.
But,wait,thisteamhadabout20morewalksthantheotherteam,somaybeIshouldbumpupmyestimate
to800orsomething.

WhatBillJamesprobablydidwasworkthroughlogiclikethat,and,aftersometrialanderror,comeupwith
theRunsCreated(RC)formula.Thatstatisticisintendedtoprovideaformalwayofestimatinghowabatting
linetranslatesintoruns.Initsmostbasicform,RClookslike:

RunsCreated=(TB)(H+BB)/(AB+BB)

Ifyouplugthenumbersinfromtheabovebattingline,youget

RunsCreated=(2461)(2055)/(6121)
whichgives826runs.

Asitturnsout,thatwasactuallythebattinglineforthe1985BaltimoreOrioles.Theyactuallyscored818
runs.Theestimateisoffby8runs,whichisverygood,alittlebetterthantypical.

WhyisRunsCreatedimportant?WhydoweneedRCifwealreadyknowtheOriolesscored818runs?Well,
knowingthatthereisapredictablerelationshipbetweenabattinglineandrunsisusefulwhenwedontknow
howmanyrunsweactuallyhave.Forinstance,wecanuseRConanindividualplayersbattingline.Heres
AlbertPujolsin2009:

G AB H 2B 3B HR BB K AVG
160 568 186 45 1 47 115 64 0.327

UsingthebasicRCformula,wecanestimatethatifagivenmajorleagueteamhadabattinglinelikePujols
did,itwouldscoreabout149runs.Thatbattinglinewouldcompriseabout15games,whichgivesabout10
runspergame.

Whatwecanconclude,then,isthatifyouputtogetheralineupofnineAlbertPujolsclones,onaverage
theydscore10runspergame.ThatsahugetotaltheaverageMLBteamscoressomewherebetween4.5
and5.0.

WecancomparePujolstoJoeMauer,orAdamLind,orAlexRodriguez,tohelpinformourconclusionson
howmucheachcontributedtohisteam,oreventoourargumentsaboutwhichplayerdeservestheMVP
award.

RunsCreatedisoneofthemostfamousofthestatisticsusedtoevaluateoffense.OthersincludePete
PalmersLinearWeights,JimFurtadosExtrapolatedRuns,andDavidSmythsBaseRuns.Allareverygood
estimators.Butwhichisthebest?Well,thatdepends.Noestimatorisperfect,andallhavetheirstrengths
andweaknesses.

Onewaytocomparethevariousestimatorsistotestthemforaccuracy.Applythemtothelast(say)fifty
yearsofbaseball,whichshouldgiveyouaround700teamseasons.Havethemeachestimaterunsforall700
teams,andseewhichonesdothebest.

OffensiveStatisticsACaution

Whatdoesallthishavetodowithhowtodobaseballresearch?Well,itbringsmetomyfirstsuggestion:if
yourejuststartingout,youmightwanttoconsiderresearchingsomethingotherthancomingupwithnew
waystoevaluateplayeroffenses.

Itsjustthatitsbeendonetodeath.Ivelistedfourdifferentstatisticsthatevaluateoffenses,andthereare
evenmorethanthose.Allofthemareprettygood,andallofthemarepushingthelimitsofhowaccuratea
statisticcanpossiblybe.

Now,Imnotsayingthattheresnowayyoulldobetter.Iwouldhavethoughtthesamethingmaybe20
yearsago,thattherewasnowaytobeatLinearWeightsandRunsCreatedbutthenDavidSmythcameto
inventBaseRuns,which,bysomemeasures,isthebestyet.Myadviceisnottosuggestthatyoucantdo
better,but,ratherthatyourresearcheffortmayyieldmorefruitifappliedelsewhere.

But,ontheotherhand,evaluatingplayersisfun.Andifthisareaofsabermetricsissomethingthatyoufind
mostinteresting,thengoahead!Butifyoucomeupwithanewstatistic,youwillbeexpectedtocomeup
withhardevidencethatyoursworksbetterthananythatarealreadyoutthere.Itsnotenoughtoargue
theoreticallywhyitshouldworkyouhavetoproveitdoes.

Theresasabermetricadage:JustbecauseastatistichasBabeRuthontopandMarioMendozaonthe
bottom,thatdoesntmeanitsaccuratelymeasuringwhatitssupposedtomeasure.
So,asyouworkonyournewstatistic,keepthesepointsinmind:

Itspossibletogetmoreandmoreaccuratebyincludingmoreandmoreinformation.TheversionofRuns
Createdincludesonlysixdataitems:AB,H,2B,3B,HRandBB.Obviously,youcangetmoreaccurateif
youincludeSBandCS,andHBP,andSF,andotherinformation.Indeed,someoftheotherstatistics
alreadyincludethosecategories,sowhenyoucompareyourstatistictoothers,makesureyouusethe
equivalentversion,toensureyourecomparingapplestoapples.Ifyoushowthatyourstatisticthat
includes20categoriesismoreaccuratethanastatisticthatincludesonlysixcategories,thatsnot
necessarilyabreakthrough.

Itispossibletogetveryaccurateifyouincludesituationalstatisticsthatgiveinformation
aboutwhenthevariouseventshappened.Forinstance,ifyouweretoaddbattingaveragewithrunners
inscoringposition,youdincreasetheaccuracyofyourestimatesquiteabit.Butyouwouldnt
necessarilyincreaseyourstatisticsusefulness.

Ifyouretryingtoshowhowvariousfactorsleadtorunsscored,youcantincludecategoriesthatare
basedonhowmanyrunsactuallyscored!Forinstance,youcandoalotbetterthanRunsCreatedifyou
includerunnersleftonbase.Forinstance(H+BBCSDPrunnerslefton)isalmostexactlyequalto
runs!Thatsbecauseitsalmostequalto(runnersreachingbaserunnerswhodidntscore),which
isexactlythedefinitionofruns.

Afterkeepingallthisinmind,ifyoudocomeupwithastatisticthatyoucandemonstrateismoreaccurate
thanitscounterparts,youllhavesomethingofveryhighinteresttothesabermetriccommunity.But,again,
asIsaid,youhaveanuphillclimb.Thisistheoneareaofsabermetricsthathashadthemosteffortpoured
intoitoverthepastthreeorfourdecades,andabettermousetrapwillnotbeeasytoinvent.
Asimilarcautionappliestoanynewstatistic,especiallyonethatssupposedtoevaluateorrankplayersor
teamsinsomedimension.Ifyournewstatistryingtoestimatesomethingthatcanbemeasured,showhow
wellitdoesthat,especiallycomparedtoanyotherstatsthatareoutthere.Andifitstryingtoestimate
somethingethereal,likeconsistencyordurability,somethingthatdoesnthavearealdefinition,howdo
youknowthatyouremeasuringitthebestwaypossible?Theresnothingwrongwithastatisticlikethat
BillJameshasspeedscore,whichestimatesthefuzzynotionofaplayersbaseballspeedbutbeaware
thatthosekindsofthingsareroughtools,notstrongempiricalfindings.

WhatToResearch

Insabermetrics,asprobablylikeanyotherdiscipline,theresnoofficiallistoftopicstoresearch.Most
sabermetriciansjuststudywhattheyreinterestedin.Often,ideasforsubjectscomeupduringconversations
withotherfans.Youllbetalkingbaseballoverabeer,andsomeonewillsay,well,Imworriedaboutthe
Indiansnextyeartheywent725inSeptemberandOctober,andthatsprobablyabadsignofthingsto
come.

Andyouthink,hmmm,Iwonderifthatstrue,thatabadSeptemberislikelytobeanegativeindicatorfor
nextyearsperformance?And,suddenly,youhaveatopictostudy.

Anothercommonsourceforideasisbaseballbroadcasterstheyllmakesomeclaimontheair,without
givingevidence,andyouspotanopportunitytocheckifwhattheysayistrue.BillJamesusedtodothisalot.
Or,youmightbereadingacertainstudyononeofthemanysabermetricinternetsites,andsomeonemakes
asuggestioninthecommentsor,thestudyraisesaquestioninyourmindthatyouthinkitwouldbe
interestingtoinvestigate.

Ifyourejuststartingout,mysuggestionwouldbetostartfairlysimple.Onepossibilityistofindabunchof
oldBillJamesAbstracts,andreadthroughthem(whichIrecommendyoudoanyway,ifyourenewto
sabermetrics).ThosebooksarefulloflittlestudiesthatBillJamesthrowsinwhenaquestionoccurstohim,
andthosemightleadyoutorelatedquestionsthatyoucantest.EvenrepeatingoneofBillsstudieswith
morecurrentdatacanbeuseful.

Forinstance,inthe1982BillJamesBaseballAbstract(Ballantine,1982),Billliststheaverageattendancefor
everystartingpitcherinthemajorleagues,andfindsthattheonlypitcherwhoreliablyseemedtodrawfans,
in1981,wasrookiephenomFernandoValenzuela.Itimmediatelyoccurredtome:isitstilltruethatthe
startingpitcherdoesntaffectattendance?Idlovetoseeasimilarstudyforrecentyears.
1
Idalsolovetosee
someonetakethisabitfurther.Billjusteyeballedthedatabeforeconcludingthattheredidntseemtobean
effect.Butmighttherebeasmalleffectthatyoudfindifyoulookedharder?Youmightcheckwhetherthe
betterpitcherstendedtodrawmorefansthantheworsepitchers,afteradjustingforday,weather,and
opponent.Maybetheresasmalleffect,butmaybetheresnot.

ThenicethingaboutusingtheBillJamesAbstractsforideasisthatBilltendstousestraightforward
techniquesthatdontrequireanyformalstatisticalexpertise.Histechniquesmaynotbeformalenoughfor,
say,academicjournals,buttheyreexcellentnonetheless,andtheyhaveenabledBillJamestoteachusmore
aboutbaseballthananyothersabermetrician.

Ofcourse,ifyoudohavesomeexpertiseinstatisticaltechniques,thatwillhelptoo.Fortheattendance
study,youmightrunaregressiontopredictattendancebasedonteam,dayoftheweek,opponent,and
startingpitchersquality.But,evenifyoudontuseaformalstatisticaltechnique(and,fortherecord,Ithink
inallofBillJamesswork,hesusedlinearregressionmaybetwice),withabitofcreativityyoucanusuallystill
figureoutwhatsgoingon.

1
UPDATE:itturnsoutthatsomeonehasfollowedupBill'sstudy!InanexcellentpieceinTheHardballTimes2012
BaseballAnnual,MaxMarchilookedatallpitcherssince1947,adjustedforoveralltrends,andfoundmanygreat
starterswhodrewinthefans.NolanRyanwasthecareerleader,with641,000estimatedextraticketssold,whileMark
Fidrychhadthehighestseasonaverage,withatotalofaround300,000ticketsoverthreeyears.
Onceyouvesettledonaquestion,youhavetofigureouthowyouregoingtoworkyourwaytoananswer.
Thatllbedifficultwithoutsomeknowledgeofsabermetrics.Theresnofieldofhumanknowledgewhereyou
canjustjumpinwithoutsomebasicunderstandingofhowthefieldworksandwhatsalreadybeendone.
Indeed,iftherewereonlyonepieceofadviceIwasallowedtogivetoaspiringresearchers,itwouldbe:learn
somesabermetricsfirst.AsmyfriendJohnMatthewIVsaid,Ifyouwereinterestedinastronomy,you
wouldreadatleastafewbooksbeforetryingtopredictthepathofacomet.

Andso:knowsomeofthesabermetriccanon.Inthenextsection,Illoutlinewhatmightbeareadinglistfor
Sabermetrics101.

Also,beforeyoustartworkingonyourproblem,youregoingtowanttocheckwhetherothershaveworked
ontheproblembefore.Maybetheyvealreadydonetheexactsamethingyoureplanningtodo.Maybe
theyvegoneonlypartoftheway,andyoucanexpandonwhattheyvedone.Andmaybetheyvethoughtof
somethingsthatyouhavent,ormaybeyouwontagreeonhowtheydidit.

Inanycase,nomatterhowknowledgeableyouareinsabermetrics,nobodyisawareofeverything.Before
youstart,youllwanttosearchtheliterature,toseewhatprogresshasalreadybeenmadeonyourproblem.
Welltalkaboutthatabitlatertoo.

LiteratureSearch

Soyoureatthepointwhereyouhavearesearchideainmind.Yournextstep,then,istofindanyprevious
workthatsalreadybeendoneonyourtopic.

Inacademia,theresaconventionalwisdomonhowtodoaliteraturesearch,andalotofitinvolvesindexesto
scholarlyjournalsthatcoveryourtopic.Insabermetrics,however,itsnotquitesosimplemuchofthebest
researchispublishedonline,onanyoneofhundredsofwebsites,withoutaformalpeerreviewprocessto
separatethegoodfromtheflawed.

So,asmuchaswemightwishtherewereastepbystepprocessforfindingexistingwork,therealityisthatit
becomesabitofaseatofthepantsthing.Somesuggestions,though,forhowtoproceed:

1.Scantheresearchrepositories

Whilemostsabermetricworkofrecentvintageiswebpublished,therearestillseveralmoreformal
repositoriesofstudies.Theadvantageofthoseisthat,iftheyreallatonespecificwebsite,youcansearch
themonlinebyusinganynormalsearchengine(suchasGoogle),butusingtheadvancedsearchfeatureto
askforresultsonlyfromthatonesite.
Somespecificplacestolook:

EverybackissueofSABRsByTheNumbersisavailable.ThereisarepositoryattheSABRwebsiteandat
myownwebsite,www.philbirnbaum.com.

Inthe1980s,BillJameseditedtheBaseballAnalyst,asabermetricsnewsletterthatwentouttowhatI
thinkwereonlyafewdozensubscribers.In2012,SABRpublishedthoseonlineforthefirsttime
atsabr.org/research/baseballanalystarchives.

TomTango,oneoftheleadingactivesabermetricresearcherstoday,hassomeofhisownstudiesathis
website,tangotiger.net.

TangoandhiscoauthorsofTheBookhavesetupawiki,anopensourceencyclopediaofsabermetric
subjects.Therehasbeensometalkofabandoningtheproject,but,attimeofwriting,itsstillactive
attangotiger.net/wiki/index.php?title=Main_Page.

CharliePavitt,aSABRmemberandregularcontributortoBytheNumbers,hascompiledabibliography
ofpublishedsabermetricpapers.Itsdedicatedtoonlythemoreformalpublicationoutlets,soitsmissing
alargepartoftherecentexplosioninwebresearch.Still,itsaworthysource.Adescriptioncanbefound
hereandthebibliographyitselfcanbefoundhere.

2.Searchthebiggestwebsitesdedicatedtosabermetricresearch

Myadvicewouldbetostartbysearching"TheBook"blog.There,TomTangoreviews,oratleastmentions,a
largeproportionofthemostsignificantstudies.Also,thesitehas,inmyopinion,thedensestpopulationof
knowledgeablecommenters;almostalways,youlearnmorefromthecommentdiscussionthanfromthe
studiesthemselves.Commentsdoshowupinthesearches,Ibelieve.

Fromthere,considertheseothersites:

TheHardballTimes
BaseballProspectus(subscriptionrequiredforsomecontent)
BaseballAnalysts
BeyondtheBoxScore
FanGraphs
TangoTiger

In2010,BeyondtheBoxScoreheldapolltovoteforthebestsabermetricwebsitesandstudies.Allthe
nomineewebsitesareworthalookandasearch,andcanbefoundat
http://www.beyondtheboxscore.com/2010/1/21/1263306/yourbtbsabermetricawardvoting.

3.Ask

Perhapsthebestwaytofindresearchonacertaintopicistoaskaround.Therearevariousplacestoask,but,
beforedoingso,pleasespendsometimelookingforyourself.Thatsjustacourtesytothosetowhomyou
arerequestingassistance.IhavehadpeopleemailmeaboutfindingresearchontopicX,whentheycould
havefoundwhattheyrelookingforbydoingthesimplestsearchforXonGoogle.

Peoplearegenerallyverywillingtohelpwhenyoushowwhatyouvetried,andyouletthemknowwhat
youvefoundsofar.

Placestoask:

Onegoodbetistowritetoauthorsofstudiesontopicsthatareclosetoyours.Ifyourethinkingofdoing
astudyonhowaccuratescoutsarewhentheyevaluatepitchers,andyoufindastudyonhowaccurate
scoutsarewhentheyevaluatebatterswell,theauthorisprobablyasinterestedinthesubjectasyou
are,andislikelytobeabletohelp.Eveniftheansweris,sorry,Idontknowofanything,thatsasign
thatyourtopicmayindeedbeafreshone.

Mostwebsitesallowcommentsonthestudiestheypublish.Iftheresatopicthatssimilarinsomeways,
postacommentaskingaboutyourtopic.

Askonemailforums.SABRhasSABRL,whichisprobablyabittoogeneralformanydetailed
sabermetricinquiries,butstillworthashot.AbetterplaceistheYahoogroupstatisticalanalysis,which
isfreetojoinforSABRmemberswithaninterestinsabermetrics.

Finally,youcantryspecificpeople.Idontmindanoccasionalinquiry,andImsuremanyothersare
happytoanswertoo.Ifyourestuck,youcanalwaystrywritingtosomeonewhoyouknowisanactive
researcher.Manyofthesabermetricwebsiteshavelinkstocontactauthors.SabermetricianJohnDoe
maynothavepublishedanythingthattouchesonyourspecifictopic,butifhepublishesacolumnevery
weekandaresearchpaperonceamonth,youwouldntbeoutoflineoccasionallysendingacourteous
requestforassistance.

HowtoFindRawData

Backinthebeginningdaysofsabermetrics,datawashardtocomeby.Somethingswerenttoobadifyou
wantedtoknowBillTerrysbattingaveragein1933,thereweretwoencyclopedias,Macmillanand
Neft/Cohen,thatwouldtellyou.Butifyouwantedmoreesotericstatistics,likeJoeMorganscareer
performancewiththebasesloaded,youwereoutofluck.

WhenBillJamesstartedwritinghisselfpublishedBaseballAbstractsbackinthelate1970s,hehadtocompile
situationalstatisticshimself,fromthedailyboxscores,withoutacomputer.Atthetime,Billmarketedhis
bookasfeaturing18categoriesofstatisticalinformationthatyoujustcantgetanywhereelse.
Jamesfoundthathehadtokeepcompilingthosestatsevenintothe1980s;famously,inhis1981book,he
reprintedaletterfromtheChicagoCubsrefusingtoprovidehimwithsuchintelligencetypestats.
Now,ofcourse,thingsaredifferent.Thereisnoshortageofalmostanykindofdata.Myfourfavoritesin
roughorderofincreasingdetailare:
MLB.com
BaseballReference.com
TheLahmanDatabase
Retrosheet.org

MLB'swebsiteprovidescopiousstatisticaldata,sortableandprintable,updatedinstantlyasgamesprogress.
Butthatstuffcanbefoundelsewhere.ThemainattractionoftheMLBwebsiteisthatitprovidesPITCHf/x
data.Thatis,foreverypitchthrownbyanypitchersinMLB,theylltellyouthetypeofpitch,whereitcrossed
theplate,andhowmuchitbrokeverticallyandhorizontally.Asaresult,andnotsurprisingly,muchofthe
groundbreakingresearchthesedayshastodowithpitchanalysis.

EasilythebestsourceforprecalculatedhistoricalstatisticsisBaseballReference.com(BR).Thatsitehas
prettymuchrenderedprintedbaseballencyclopediasobsolete.NotonlydoyougettheregularBillTerrys
battingaveragedata,butyoualsogetalargeselectionofsabermetricstats,breakdownsbytensofdifferent
criteria(left/right,day/night,April/September,andsoon),andtheabilitytomanipulatethedatainwaysthat
otherwebsitesdontallow.Youcanalsodoabsurdlyspecificsearches.WanttoknowJoeMorganslongest
consecutivestreakofgameswherehecametotheplateatleasttwice?Theanswer:235games.(Ifyouwant
thedetails,youhavetosubscribe,buttheoverwhelmingmajorityoftheinformationonthesitecanbehad
forfree.)

Forthoseofuswhowanttodomorecomplicatedthings,BaseballReference,awesomeasitis,justisnt
enough.Weneedtherawdataonourowncomputers,sowecanmanipulateitinwaysthatBRnever
thoughtof.Therearetwomainsourcesofrawdata:theLahmanDatabaseandRetrosheet.

TheLahmanDatabasecanbeobtainedforfreeatseanlahman.com/baseballarchive/statistics,thewebsiteof
itscreator,SeanLahman.ItsbasicallyastandardBaseballEncyclopediaindownloadableform.Youcanget
itintextform,forloadingintoExcel,but,moreimportantly,italsocomesinrelationaldatabaseformat
(MicrosoftAccess).IfyourefamiliarwithAccessandwithSQLdatabasequeries,youknowhowconvenientit
istouseittodopowerful,specificdatasearchesquickly.(IfyourenotfamiliarwithSQL,therehave
beenafewtutorialsonsabermetricsitesrecently.)

Anyway,theLahmanDatabasehaseveryplayersstandardbattingandpitchinglineforeveryyear.Itsgot
managers,birthdates,awards,allstargames,andothergoodstuff.Itslimitationisthatdataisavailableonly
forsingleseasonsifyouwanttoknowhowEddieMurrayhitinJuly1979,theresnowaytheLahman
Databasewilltellyou.Forthat,youhavetoturntoRetrosheet.

Retrosheetis,basically,amiracle.Itstheresultofasmallarmyofvolunteers,combinghistoricalsourcesto
trytorecreatetheplaybyplayofeverygameinbaseballhistoryanddigitizingitfordownloadandanalysis.
Icantbegintoimaginehowdifficultitistofindallthatinformation,toreconstructthetopofthe6thinning
oftheCardinals/PhilliesgameofApril29,1953.Buttheydid.(D.Ricegroundedout(shortstoptofirst);
Preskopoppedtofirstinfoulterritory;Hemuspoppedtofirstinfoulterritory.)

Youcanalsoseetheentirecareerofanyplayer,gamebygame.Youcanseethestandingsandresults
fromanydateinbaseballhistory.Youcanseeacoachscareer,whichteamshecoachedforandwhathe
coached,andevenhowmanytimeshewasejected.

Youcanseethisstuffonline,or,ifyouhavecomputerdatamanipulationskills,youcandownloaditandwork
withityourself.YoucanloadthedataintoExcelandwritemacrostomanipulateit.Or,youcanwrite
programstoanalyzeit;IuseVisualBasic,butanylanguagewilldo.Theresa2006bookcalledBaseball
Hacks(OReilly),whichexplainshowtouseacomputerlanguagecalledRtodownloadandanalyze
Retrosheetdata(and,actually,lotsofotherbaseballdatathatcanbefoundontheinternet).

NotallofbaseballhistoryisavailableonRetrosheetyet.Thevolunteersarestillworkingonit,though.
(Wanttohelp?Clickherefordetails.)Fornow,youcanseegamebygamesummariesfrom1871on.You
canseeboxscoresformorethan90percentofgamessince1916.And,ifyouwantfullplaybyplaydata,its
availableforanygameafter1952,andalargenumberofgamesbeforethat.Someyearsevenincludepitch
bypitchdata,intermsofball,strike,foul.

Theresultofliterallytensofthousandsofhoursofvolunteerlabor,Retrosheetisthegreatestsabermetric
resourceever.

ComputerAidedResearch

Beforethe1990s,asignificantproportionofsabermetricresearchwasdonewithoutthebenefitofcomputers
oratleast,withoutthekindofcomputerpowerandsoftwarewehavetoday.Agreatdealofstatistical
informationhadtobecompiledbyhand,ortypedbyhandintospreadsheets.Asaresult,manystudiesused
onlyasmallamountofdata,inordertokeeptheworkloadmanageable.

Thingsaredifferentnow,ofcourse,anditshardertostudynewareaswithoutthebenefitofacomputerand
agoodbaseballdatabase.Thatsbecausealotofthelowhangingfruithasbeenpicked,andwerenow
lookingformoreandmoresubtleeffects.In1977,DickCramersclutchhittingstudyconsistedofonlytwo
yearsofbattingaveragedata,enteredbyhand.Fromthat,hewasabletofindthatclutchhittingconsistency
wasnexttonothing.But,thatwasonlyoneyearsdata,notenoughforadefinitiveconclusion.Ittook
others,withmoresophisticatedcomputers,andexistingbaseballdatabases,torefinethatresulttothelevel
ofunderstandingwehavetoday.

Inanearly2000sessayonthistopic,NealTravenwrote,thecomputerisalmostanobligatorytoolfor
sabermetricresearch.Thatholdsevenmoretoday.

Itsunfortunatelytruethatyouregoingtoneedacertainamountofcomputerskillsinordertobeableto
takeahugemountainofbaseballdataandtrytosqueezeconclusionsoutofit.

Benefits

Sabermetricshasamixedreputationintheoutsideworld.Inmainstreamsportswriting,itssometimesseen
assomethingnerdsdofromtheirparentsbasements,somethingrealsportswritersdontneedbecausethey
seeallthegamesandknowalltheplayers.Inacademia,itsnotalwaysrespectedasseriousresearch,
becauseitoftendoesntfitintoanyspecificestablisheddiscipline(althougheconomistsarestartingtoget
involved),becauseitoftendoesntuseenoughfancymath,andbecauseitsonlyaboutbaseball.Andit
usedtobethatinbaseballitself,sabermetricswasnotperceivedtobeanythingthatwouldbeofusetothe
insidersofamajorleagueteam.

ButthesituationinMLBischanging,perhapsduetoMoneyball(Norton,2004),MichaelLewisstoryofhow
BillyBeanesOaklandAthleticsusedsabermetricstobuildawinningteamonthecheap.In2003,theRedSox
hiredBillJames.Sincethen,otherteamshavehiredstatisticalanalystsandbegunadvertisingforsimilar
positions.

Still,theseriousstudyofbaseballthroughitsstatisticsisnttakenallthatseriouslyoutsideofthe
Moneyballcrowd.Overthepastcoupleofyears,therehavebeenseveraluniversityprofessorswhohavehad
theirschoolsissueapressreleasewhentheycameupwithsomethingsabermetric.Usually,thoseacademic
studiesarentanymoreworthyofspecialrecognitionthanmanyotherstudiespublishedontheInternetat
thesametime.ButIguessbaseballisasubjectthatmanyconsiderlessserious,than,say,sociology,sothe
ideathatpeoplestudyitinearnestbecomesabitofanovelty.

Evenifthewiderworlddoesntseesabermetricsascompletelyserious,itspractitionersdo.Inonerecent
universitypressrelease,theprofessorexpresseshisinterestinsomedaygettinghisdreamjobdoing
sabermetricconsultingforamajorleagueteam.Thatssomethingalotofsabermetricianswouldbe
interestedin,obviously.Manyhavealreadygottenthere,inrecentyears.

Buttherewillprobablyalwaysbemoresabermetriciansthanemploymentopportunities.Formostofus,the
motivationforsabermetricsisnottheglamourofhavinganinsidejobwithabaseballteam,butjustour
interestinbaseball.Andscientificcuriosityisabigfactortoo.Becauseoftheabundanceofcheapdata,its
relativeneglectbytheacademiccommunity,andthefactthatthescienceissoyoung,sabermetricsis
perhapsthebestseriousfieldwhereparttimeresearcherscanroutinelymakethemostsignificant
discoveries.Andtheresacertainthrillincreatingnewknowledge,discoveringsomethingthatnobodyknew
before.

Andifthethrillofscientificdiscoveryisntenough,thefactthatthosediscoveriesareaboutbaseballfor
many,ourfavoritesubjectonearthisicingonthecake.

You might also like