Ted Kwartler

Sports Analytics in Practice with R


Скачать книгу

type are used. However, you can coerce either object type to the other using `as.matrix` or `as.data.frame` to switch. Just keep in mind the mixed data coercion mentioned previously.

      Figure 1.7 The representation of the list with varying objects.

      xList <- list(xDataFrame, fanTweet, teamA, xVec)

      xList[[4]]

      The same can be done with matrices or data frames using single brackets. Indexing row and column data requires two inputs separated by a comma. The selection for rows is first followed by the selection for columns. For example, let’s first call the `xDataFrame` object in its entirety to establish familiarity. Then select the first row and third column which represents a single cell value of the data frame. Next, you can select a different row, column combination on your own within the console to establish this single value is returned.

      xDataFrame xDataFrame[1,3]

      Indexing also works for entire columns or entire rows. This is done by leaving the rows position blank or the columns position blank on either side of the comma. To call the second column of the data frame simply use single brackets, nothing on the left of the comma and a 2 to the right of the comma as shown.

      xDataFrame[, 2]

      Similarly, you can switch the index number to the left of the comma to obtain a specific row. Here, the entire fourth row is returned while the column position is left blank.

      xDataFrame[4, ]

      xDataFrame$number1

      In fact, indexing can become more complex. You can access a specific list element, then a specific row, column, or single value by utilizing double then single brackets or `$` as shown. First, the fourth element of the list is obtained with `[[4]]`; then the second value is obtained within that vector. Keep in mind there is no need for a comma because a vector does not have rows or column. Instead, a vector merely has a position. In this case, “2” is returned.

      # 4th element, vector 2nd position xList[[4]][2]

      Next, the first list element is accessed, and as a data frame, the single brackets with a comma refer to the second row.

      # 1st element, 2nd row xList[[1]][2,]

      Similarly, the same data frame is indexed to return the first column because the “1” is to the right of the comma within the single square brackets.

      # 1st element, 1st column xList[[1]][,1]

      Of course, you can also use both rows and column positions separated by the comma within the single brackets.

      # 1st element, 2nd row, 1st column xList[[1]][2,1]

      Just to make things a bit more complex, if the list element is a data frame with named vectors, the second part of the code can employ the `$` along with the name. This will return the first list element, a data frame, and only the named column called “logical2.”

      # 1st element, named column with $ xList[[1]]$logical2

      Lastly, since the column of this list element is being accessed, it too can be indexed. Once again, the single column does not have a row and column pairing, it only has a position. Thus, no comma is needed and only the third position is returned in this example.

      If all this seems wildly complex, do not fret. Throughout the book extensive explanation is given for both functions, inputs, and indexing. Further, with enough practice, this becomes commonplace and more readily understood.

      So far, this basic explanation of R functionality has relied on base-R functions and libraries that are part of the standard installation. As mentioned previously, R can be specialized to a particular task by loading libraries. In order to obtain libraries, the `install.packages` function must be run with a package name to download the specialized functions. This is done only once per library so that the library code is installed locally to your R installation. After the download occurs you can merely call the `library` function with the name in order to enable the specialized functionality using the local installation. The code below installs a popular graphics library called “grammar of graphics” known as `ggplot2` using the `install.packages` function. After it is downloaded, the next line merely loads it as part of your R environment. This allows your R session to call functions within a “namespace” that includes base-R and now `ggplot2` functions. It serves the purpose of specializing R for improved visualizations.

      install.packages('ggplot2') library(ggplot2)

      Throughout this book, multiple libraries are loaded. Novice R programmers can run into errors and frustrations regarding package installations. When executing scripts in this book that begin with `library(…)`, an error of “there is no package called …” means you first need to use `install.packages` to download the functionality to your library. Additionally errors may occur during the `install.packages` step. This can be due to multiple reasons but most often stems from the fact that a package to be downloaded requires another package first. As a result, carefully read the console messages during the install phase to identify any other package prerequisites. If the `install.packages` function executes correctly, then it is not necessary to repeat that function for each script. Thus, the code in this book only calls `library` for each specific library enabling corresponding functionality needed for the task at hand. This assumes all libraries have been previously and successfully installed.

      To specialize R, first install a package with `install.packages` with the corresponding name. If installed without issue, simply call `library` any time your R session needs specialized functionality corresponding to the specific library. You will only need to use `install.packages` once but `library` will need to be called each time you start R and require the specialized functions of a particularly library.