Ted Kwartler

Sports Analytics in Practice with R


Скачать книгу

work at all. It’s just some crap some people who were really smart made up.

       Charles Barkley, former NBA player

      Just because you don’t understand something doesn’t mean it’s crap.

       Ross Drucker, NBA Future Analytics Stats Program Analyst

      My dear Nora & Brenna,

      My inspiration and guides. I wrote this book in your honor though don’t expect either of you to follow my footsteps into analysis. Your journey is your own, may you find a passion and, if desirable, have the opportunity to write about it. No matter where your attention and intellect lead you I remain.

      Your loving father,

      Ted

      Writing a book is no easy task yet for some reason I decided to write a second! Overall, I am grateful to the countless people that helped me learn, expand, and apply these methods. Data science and analytics is as much as “team sport” as any, where collaboration, communication, and effort often wins the day.

      First I would like to acknowledge Jack W, whose intellect and athleticism left us far too early. For anyone struggling with mental health, know that you are loved, you are valuable, and people in your community are here for you. Your passing was a motivating reminder of the short time we have to make contributions along with the need for more kindness toward those that may be suffering silently.

      Next, Anup B, one of the most brilliant supportive leaders I have worked for. Not to mention your passion for cricket helped open my eyes to a noteworthy and enjoyable sport. Losing you to the pandemic was a disturbing blow felt by many people who were touched by your intelligence, humor, and positivity.

      This entire book would not have been possible without the fine professors at the University of Notre Dame that put me on my own professional journey. I fondly remember building my first logistic regression predicting March Madness after learning these techniques from Dr. Keating, the late Dr. Gilbride, and Dr. Devaraj.

      Further I would like to acknowledge my parents, Anatol and Trish, and my endearing wife, Meghan. Your support and patience has been significant. Writing a book is no small undertaking with much of the logistical burden falling to each of you. Completing this book is a shared victory.

      Lastly, my sincerest gratitude to the wonderful team at Wiley, particularly Kimberly Monroe-Hill. Your patience and flexibility to late submissions and delayed seasons stemming from the unusual 2020 year in sports (among other more important hardships) has been greatly appreciated. I was ready to give up on the project yet your e-mails demonstrated a commitment from Wiley that I cherish.

      Objectives

       Learn about R as a programming language

       Define Integrated Development Environment

       Define objects

       Learn the assignment operator

       Define functions

       Executing a loop

       Learn logical operators

       Learn about R data types

       Learn about object classes

       Indexing data objects

       Extending R functionality with packages

       Writing a custom function

       Create a scatter plot with sports data

       Create a heatmap with sports data

      R Libraries

      ggplot2 ggthemes RCurl tidyr

      R Functions

      + plot <- round class as.factor as.character c cbind rbind data.frame as.matrix as.data.frame install.packages library getURL read.csv dim names head tail summary table qplot pivot_longer geom_tile scale_fill_gradient xlab ggtitle theme theme_hc

      The R Programming Language

      In this textbook, the R language is applied specifically to sports contexts. Of course, the code in this book can be used to extend your understanding of sports analytics. It may give you insights to a particular sport or analytical aspect within the sport itself such as what statistics should be focused on to win a basketball game. However, learning the code in this book can also help open up a world of analytical capabilities beyond sports. One of the benefits of learning statistics, programming, and various analysis methods with sports data is that the data is widely available and outcomes are known. This means that your analysis, models, and visualizations can be applied, and you can review the outcomes as you expand upon what is covered in this book. This differs from other programming and statistical examples which may resort to boring, synthetic data to illustrate an analytical result. Using sports data is realistic and can be future oriented, making the learning more challenging yet engaging. Modeling the survivors of the Titanic pales in comparison since you cannot change the historical outcome or save future cruise ship mates. Thus, modeling which team will win a match or which player is a good draft pick is a superior learning experience.

      If you are new to programming don’t be intimidated. R is a forgiving language in that things like spacing an indentation are ignored. Further, the R community is well supported and a simple online search of any error message usually finds an answer quickly on any number of sites.

      To begin your R and sports analytics journey, please download the “base-R” distribution for your operating system. The “Comprehensive R Archive Network,” CRAN, is the home of the official R distribution as well as officially supported packages (more on that in a bit). The site to download base-R is https://cran.r-project.org.

      Unfortunately, base-R, having started in the nineties, looks abysmal and lacks some modern day functionality. Thus, you will need to next download the R-Studio Integrated Development Environment, or IDE. An IDE is software that consolidates many of the aspects needed to code into one place. For example, you will need to write code which could be done in a simple notepad like program, a place to execute the code written, a place to visualize plots that were output from the code, and so on. These individual components are assembled into the IDE for ease of use and fast development. R and many other languages have IDEs. In fact, R has multiple IDE optimized for the type of analysis you are performing such as biostatistics or working with another language like Java. The most popular and easily supported IDE for