Probability with R. Jane M. Horgan. Читать онлайн. Hotlib. HOTLIB.NET

Probability with R

source("C:\\test")

retrieves the program named test.R from the C directory. Another way of doing this, while working in R, is to click on

on the tool bar where you will be given the option to Source R code, and then you can browse and retrieve the program you require.

Exercises 2.1

1 For the class of 50 students of computing detailed in Exercise 1.1, use R to:obtain the summary statistics for each gender, and for the entire class;calculate the deciles for each gender and for the entire class;obtain the skewness coefficient for the females and for the males.

2 It is required to estimate the number of message buffers in use in the main memory of the computer system at Power Products Ltd. To do this, 20 programs were run, and the number of message buffers in use were found to beCalculate the average number of buffers used. What is the standard deviation? Would you say these data are skewed?

3 To get an idea of the runtime of a particular server, 20 jobs were processed and their execution times (in seconds) were observed as follows:Examine these data and calculate appropriate measures of central tendency and dispersion.

4 Ten data sets were used to run a program and measure the execution time. The results (in milliseconds) were observed as follows:Use appropriate measures of central tendency and dispersion to describe these data.

5 The following data give the amount of time (in minutes) in one day spent on Facebook by each of 15 students.Obtain appropriate measures of central tendency and measures of dispersion for these data.

2.5 Project

Write the skewness program, and use it to calculate the skewness coefficient of the four examination subjects in results.txt. What can you say about these data?

Pearson has given an approximate formula for the skewness that is easier to calculate than the exact formula given in Equation 2.1.

Write a program to calculate this, and apply it to the data in results.txt. Is it a reasonable approximation?

3 Graphical Displays

In addition to numerical summaries of statistical data, there are various pictorial representations and graphical displays available in R that have a more dramatic impact and make for a better understanding of the data. The ease and speed with which graphical displays can be produced is one of the important features of R. By writing

demo(graphics)

you will see examples of the many graphical procedures of R, along with the code needed to implement them. In this chapter, we will examine some of the most common of these.

3.1 BOXPLOTS

A boxplot is a graphical summary based on the median, quartiles, and extreme values. To display the downtime data given in Example 1.1 using a boxplot, write

boxplot(downtime)

which gives Fig. 3.1. Often called the Box and Whiskers Plot, the box represents the interquartile range that contains 50% of cases. The whiskers are the lines that extend from the box to the highest and lowest values. The line across the box indicates the median.

Figure 3.1 A Simple Boxplot

To improve the look of the graph, we could label the axes as follows:

boxplot(downtime, xlab = "Downtime", ylab = "Minutes")

which gives Fig. 3.2.

Figure 3.2 A Boxplot with Axis Labels

Multiple boxplots can be displayed on the same axis, by adding extra arguments to the boxplot function. For example,

boxplot(results$arch1, results$arch2, xlab = "Architecture Semesters 1 and 2")

or simply

boxplot(arch1, arch2, xlab = "Architecture Semesters 1 and 2")

gives Fig. 3.3.

Figure 3.3 Multiple Boxplots

Figure 3.3 allows us to compare the performance of the students in Architecture in the two semesters. It shows, for example, that the marks are lower in Architecture in Semester 2 and the range of marks is narrower than those obtained in Architecture in Semester 1.

Notice also in Fig. 3.3 that there are points outside the whiskers of the boxplot in Architecture in Semester 2. These points represent cases over 1.5 box lengths from the upper or lower end of the box and are called outliers. They are considered atypical of the data in general, being either extremely low or extremely high compared to the rest of the data.

Looking at Exercise 1.1 with the uncorrected data, Fig. 3.4 is obtained using

boxplot(marks˜gendermarks) c03f004

Figure 3.4 A Gender Comparison

Notice the outlier in Fig. 3.4 in the male boxplot, a value that appears large compared to the rest of the data. You will recall that a check on the examination results indicated that this value should have been 46, not 86, and we corrected it using

marks[34] <- 46

Repeating the analysis, after making this correction

boxplot(marks˜gendermarks)

gives Fig. 3.5.

Figure 3.5 A Gender Comparison (corrected)

You will now observe from Fig. 3.5 that there are no outliers in the male or female data. In this way, a boxplot may be used as a data validation tool. Of course, it is possible that the mark of 86 may in fact be valid, and that a male student did indeed obtain a mark that was

Скачать книгу