matrix or probability table. For example, pretend we are investigating a company that has issued both bonds and stock. The bonds can either be downgraded, be upgraded, or have no change in rating. The stock can either outperform the market or underperform the market.
In Table 3.1, the probability of both the company's stock outperforming the market and the bonds being upgraded is 15 percent. Similarly, the probability of the stock underperforming the market and the bonds having no change in rating is 25 percent. We can also see the unconditional probabilities, by adding across a row or down a column. The probability of the bonds being upgraded, irrespective of the stock's performance, is: 15 percent + 5 percent = 20 percent. Similarly, the probability of the equity outperforming the market is: 15 percent + 30 percent + 5 percent = 50 percent. Importantly, all of the joint probabilities add to 100 percent. Given all the possible events, one of them must happen.
Sample Problem
Question:
You are investigating a second company. As with our previous example, the company has issued both bonds and stock. The bonds can either be downgraded, be upgraded, or have no change in rating. The stock can either outperform the market or underperform the market. You are given the following probability matrix, which is missing three probabilities: X, Y, and Z. Calculate values for the missing probabilities.
Answer:
All of the values in the first column must add to 50 percent, the probability of the equity outperforming the market; therefore, we have:
We can check our answer for X by summing across the third row: 5 percent + 30 percent = 35 percent.
Looking down the second column, we see that Y is equal to 20 percent:
Finally, knowing that Y = 20 percent, we can sum across the second row to get Z:
Part III Basic Statistics
In this section we will learn how to describe a collection of data in precise statistical terms. Many of the concepts will be familiar, but the notation and terminology might be new. This notation and terminology will be used throughout the rest of the book.
Averages
Everybody knows what an average is. We come across averages every day, whether they are earned-run averages in baseball or grade point averages in school. In statistics there are actually three different types of averages: means, modes, and medians. By far the most commonly used average in risk management is the mean.
POPULATION AND SAMPLE DATA
If you wanted to know the mean age of people working in your firm, you would simply ask every person in the firm his or her age, add the ages together, and divide by the number of people in the firm. Assuming there are n employees and ai is the age of the ith employee, then the mean, μ, is simply:
It is important at this stage to differentiate between population statistics and sample statistics. In this example, μ is the population mean. Assuming nobody lied about his or her age, and forgetting about rounding errors and other trivial details, we know the mean age of people in your firm exactly. We have a complete data set of everybody in your firm; we've surveyed the entire population.
This state of absolute certainty is, unfortunately, quite rare in finance. More often, we are faced with a situation such as this: estimate the mean return of stock ABC, given the most recent year of daily returns. In a situation like this, we assume there is some underlying data generating process, whose statistical properties are constant over time. The underlying process still has a true mean, but we cannot observe it directly. We can only estimate that mean based on our limited data sample. In our example, assuming n returns, we estimate the mean using the same formula as before:
where
(pronounced “mu hat”) is our estimate of the true mean based on our sample of n returns. We call this the sample mean.The median and mode are also types of averages. They are used less frequently in finance, but both can be useful. The median represents the center of a group of data; within the group, half the data points will be less than the median, and half will be greater. The mode is the value that occurs most frequently.
Sample Problem
Question:
Calculate the mean, median, and mode of the following data set:
Answer:
If there is an even number of data points, the median is found by averaging the two center-most points. In the following series:
the median is 15 percent. The median can be useful for summarizing data that is asymmetrical or contains significant outliers.
A data set can also have more than one mode. If the maximum frequency is shared by two or more values, all of those values are considered modes. In the following example, the modes are 10 percent and 20 percent:
In calculating the mean in Equation 3.21 and Equation 3.22, each data point was counted exactly once. In certain situations, we might want to give more or less weight to certain data points. In calculating the average return of stocks in an equity index, we might want to give more weight to larger firms, perhaps weighting their returns in proportion to their market capitalization. Given n data points, xi = x1, x2, … , xn, with corresponding weights, wi, we can define the weighted mean, μw, as:
(3.23)
The standard mean from Equation 3.21 can be viewed as a special case of the weighted mean, where all the values have equal weight.
DISCRETE RANDOM VARIABLES
For a discrete random variable, we can also calculate the mean, median, and mode. For a random variable, X, with possible values, xi, and corresponding probabilities, pi, we define the mean, μ, as:
(3.24)
The equation for the mean of a discrete random variable is a special case of the weighted mean, where the outcomes are weighted by their probabilities, and the sum of the weights is equal to one.
The median of a discrete random variable is the value such that the probability that a value is less than or equal to the median is equal to 50 percent. Working from the other end of the distribution, we can also define the median such that 50 percent of the values are greater than or equal to the median. For a random variable, X, if we denote the median as m, we have:
(3.25)