Vries Andrie de

R For Dummies


Скачать книгу

Extending R with user packages

      Before you start discovering the different ways you can use R on your data, you need to know a few more fundamental things about R.

      In Chapter 2, we show you how to use the command line and work with the global environment, so if you read that chapter, you can write a simple script and use the print(), paste(), and readline() functions – at least in the most basic way. But functions in R are more complex than that, so in this chapter we tell you how to get the most out of your functions.

      As you add more arguments to your functions and more functions to your scripts, those scripts can become pretty complex. To keep your code clear – and yourself sane – you can follow the basic organizational principles we cover in this chapter.

      Finally, much of R allows you to use other people’s code very easily. You can extend R with packages that have been contributed to the R community by hundreds of developers. In this chapter, we tell you where you can find these packages and how you can use them in R.

      Using the Full Power of Functions

      Functions form the core of R; everything you do in R uses a function in one way or another. More importantly, the way functions work in R allows you to carry out multiple complex operations in one step or a few simple steps. In this section, we show you how you can use functions the smart way. First, you learn about the key property of functions that makes R so different from other programming languages. Then we tell you how you can reach a whole set of functionalities in R functions with arguments. Finally, we tell you how you can save the history of all the commands you’ve used in a session with – you guessed it! – a function.

Vectorizing your functions

      Vectorized functions are a very useful feature of R, but programmers who are used to other languages often have trouble with this concept at first. A vectorized function works not just on a single value, but on a whole vector of values at the same time. Your natural reflex as a programmer may be to loop over all values of the vector and apply the function on every element, but vectorization makes that unnecessary. Trust us: When you start using vectorization in R, it’ll help simplify your code.

      To try vectorized functions, you have to make a vector. You do this by using the c() function, which stands for combine. The actual values are separated by commas.

      Here’s an example: Suppose that Granny plays basketball with her friend Geraldine, and you keep a score of Granny’s number of baskets in each game. After six games, you want to know how many baskets Granny has made so far this season. You can combine these numbers into a vector, like this:

      > baskets.of.Granny <– c(12, 4, 4, 6, 9, 3)

      > baskets.of.Granny

      [1] 12 4 4 6 9 3

      To find the total number of baskets Granny made, you just type the following:

      > sum(baskets.of.Granny)

      [1] 38

      You could get the same result by going over the vector number by number, adding each new number to the sum of the previous numbers. But that method would require you to write more code and it would take longer to calculate. You won’t notice it on just six numbers, but the difference will be obvious when you have to sum a few thousand of them.

      Actually, this kind of vectorization occurs in many programming languages. Functions that work this way summarize the data in a vector; they take all values in the vector and calculate a single result.

      R also can carry out functions along vectors. This type of vectorization is pretty unique, and forms the core of R’s incredible power. Quite a few people have difficulties grasping that behavior in the beginning, but it’s easy to understand when you see it happen.

      To see how it works, try using the paste() function. First, you construct two vectors (for example, a vector with first names and a vector with last names). To create a vector with the full names from the original vectors, you can simply use the paste() function, like this:

      > firstnames <– c("Andrie", "Joris")

      > lastnames <– c("de Vries", "Meys")

      > paste(firstnames, lastnames)

      [1] "Andrie de Vries" "Joris Meys"

      R automatically loops over the values of each vector, and concatenates (pastes) them together, element by element. So the first value of the vector firstnames is pasted to the first value of lastnames, the second value of firstnames to the second of lastnames, and so forth. That’s how vectorization works.

      What happens if both vectors don't have the same amount of values? If you make a vector with the first names of the members of your family, paste() can add the last name to all of them with one command, as in the following example:

      > firstnames <– c("Joris", "Carolien", "Koen")

      > lastname <– "Meys"

      > paste(firstnames, lastname)

      [1] "Joris Meys" "Carolien Meys" "Koen Meys"

      R takes the vector firstnames and then pastes the lastname into each value. How cool is that? Actually, R again combines two vectors. The second vector – in this case, lastname – is only one value long. That value gets recycled by the paste() function as long as necessary (for more on recycling, turn to Chapter 4).

      So to process multiple values in R, you don't need complicated code. All you have to do is make the vectors and put them in the function. In Chapter 5, you can find more information about the power of paste().

Putting the argument in a function

      Most functions in R have arguments that allow you to specify exactly what you want the function to do. All these arguments also have a name. For example, the first argument of the print() function is called x. You can check this yourself by looking at the help file of the function using ?print.

      By specifying an argument, in other words passing a value to that argument, you tell the function what you want to do. So if you use print("Hello world!"), you actually pass the value "Hello world!" to the argument x of the print() function. The print() function tells R that you want to print something, and the value for the argument x tells R what exactly you want to print.

      In R, you have two general types of arguments:

      ✔ Arguments with default values

      ✔ Arguments without default values

      If an argument has no default value, the value may be optional or required. In general, the first argument is almost always required. Try entering the following:

      > print()

      R tells you that it needs the argument x specified:

      Error in .Internal(print.default(x, digits, quote, na.print, print.gap,: ’x’ is missing

      You can pass a value to an argument using the = sign like this:

      > print(x = "Isn’t this fun?")

      Sure it is. But wait – when you entered the print("Hello world!") command in Chapter 2, you didn’t add the name of the argument, and the function worked. That’s because R knows the names of the arguments and just assumes that you pass them in exactly the same order as they’re shown in the usage line of the Help page for that function. (For more information on reading