delay it due to its problematic issues? Does “finding” a cluster in cluster analysis provide evidential support for real, substantive clusters? Or, does it simply provide evidence of clusters on an abstract mathematical level?
For the conscientious reader, it does not take long to realize that the above questions are important ones to answer if one is to make any sense of the scientific evidence they have obtained from their own study of nature. Now, for relatively simple experiments with non-controversial variables, philosophical issues do not arise as much, if at all. For instance, correlating heart rate with blood pressure does not typically require extensive philosophical examination of underlying methodology. It is an easy correlation on non-controversial variables. This is why biological sciences are often considered “harder” than the softer sciences, not in their level of difficulty necessarily, but because it is generally much easier to establish convincing evidence in those sciences. A cure for COVID-19 is difficult to come by, but once it does arrive, we can visually observe people living longer who once had the disease. In some areas of social science, however, including fields such as economics, psychology, etc., establishing evidence is much harder, simply because the matter under investigation does not lend itself to such neat and nice definition and experimentation. That is, establishing convincing evidence in such fields is often quite complex, especially if non-experimental methods are used on variables for which measurement and even “existence” can be controversial. You can correlate depression with anxiety all you like, but you should also first be able to defend the idea that asking people about their depression symptoms on a questionnaire is actually a reasonable or valid way to measure something called “depression.” Some would say it is not, and that self-report is a very weak way to establish evidence of anything other than, well, the self-report of what people say! Hence, finding that depression is linked to overall well-being (or lack thereof), for example, means little if we first do not agree on how these constructs are defined and measured.
Other issues that at first glance might appear purely “mathematical” have groundings in philosophical principles. For example, whether a research variable should be considered continuous or discrete in a scientific sense cannot be answered via mathematics; it must be answered through thoughtful consideration in a philosophical or scientific sense. We survey this issue now.
1.6 Continuous vs. Discrete Variables
A variable in mathematics is usually represented by a symbol such as x or y, etc. It is usually indexed by a subscript such as “i” to indicate that it represents the complete set of possible values that the symbol can take on. For example, for a variable xi, the “i” implies that the variable in question can take on any of the ith values in the given set of possibilities. For instance, for a sample of 10 individuals in a room, weight is a variable. It is a variable because not everyone in the room has the same weight. Hence, xi in this case implies that there are i = 1 to i = 10 values for weight, not all necessarily distinct from one another. Now, if everyone in the room had the same weight, such that it was a constant instead of a variable, then the “sub-i” index would not be required, at least not for describing this particular set of individuals.
There are two types of variables in mathematics that are requisite knowledge for understanding applied statistics, especially when it comes to generating statistical models and conducting moderate to advanced techniques. A variable is generally considered to be either discrete or continuous. A discrete variable, crudely defined, can take on values for which in between those values none are possible. For example, a variable with numbers 0, 1, 2 is discrete if there is no possibility of values between 0 and 1 or between 1 and 2. If, on the other hand, any values between 0 and 1 and 1 and 2 are theoretically possible, then the variable is no longer discrete. Rather, it is continuous. For a continuous variable then, any values are possible, even if only theoretically, as depicted in the following (below) graphical and more formal definition of continuity.
In this plot, continuity is said to exist at the given point f (x0) on the y-axis if for small changes on the x-axis, either above or below x0 (i.e. x0 + δ2 or x0 – δ2), we have an equally allowable small change on the y-axis (i.e. f(x0) + ε2 or f(x0) − ε2). These changes can be made extremely small, and actually as small as we wish them to be on a theoretical level. That is, changes in delta δ2 and epsilon ε2 can be made infinitesimally small right up to the point f(x0). Informally, continuity implies a sense of narrowing infinitely on a given point, such that we can make smaller and smaller, in fact infinitesimally smaller, divisions. Though we have only skimmed the formal definition of continuity here, this general idea is the most formal definition that currently exists for what continuity is mathematically. The point is to emphasize that true continuity is something that exists in theory only, and is, or at least can be, rigorously defined. For further details on this precise definition of continuity, see Bartle and Sherbert (2011), who discuss the foundations of calculus in much more detail. The foundations of calculus come generally under the topic of the branch of mathematics called real analysis. The essence of continuity does not lie in the mathematical definition of it, however, and has likely existed since early thought. Mathematics provided a precise definition for it so it could be used by, and communicated between, other mathematicians and scientists. Continuity, however, is first and foremost an idea. If you understand what we are getting at here, you will start to see mathematics in many cases as a bit more of a surface layer to deeper concepts, rather than simply the “mathematics” you may have associated earlier in your studies.
Now, in practice, one cannot measure to an infinite number of decimal places in practical research. One can never keep narrowing in on a given value by an infinite number of refined slices. Hence, while researchers may sometimes like to believe their variables have an underlying continuity to them, they are always far from being truly continuous. Only philosophically speaking (read: theoretical mathematics and its underlying philosophy) can a variable be continuous. So why is this brief discussion of continuity vs. discreteness important to the scientist? It is important since the starting point to any statistical analysis is in determining whether one’s variables are best considered discrete or continuous. Though there is much more flexibility in statistical models than the following may suggest, it is nonetheless useful to give a broad overview of where traditional statistical models use continuous vs. discrete variables. When using z-tests and t-tests for means, as well as most ANOVA-type models, it is understood that the dependent or response variable is continuous in nature, or at minimum, has sufficient distribution such that we may consider it to be of a general continuous nature. The independent variables in these models are discrete or categorical, indicating the different populations on the response.
As an example, suppose we wanted to measure the VO2-max of participants treated with a new COVID-19 medication vs. those not treated. VO2-max is essentially a measure of oxygen uptake during exercise of greater intensity (e.g. a Tour de France cyclist has better VO2-max than you and I). The VO2-max variable is the response, which is considered continuous, as a function of the independent variable treatment vs. control. For this, we are in the realm of z-tests or t-tests for means, or we could also perform an ANOVA on these variables. A regression analysis is also an option since we can operationalize the independent variable as a binary dummy-coded predictor. When we flip things around, such that the grouping variable is now the response and VO2-max is the predictor, we are in the realm of discriminant analysis or logistic regression on two groups. Here, we would like to predict group membership based on the continuous predictor. Notice that these models are answering different research questions, but at their core, it stands that they must have great technical similarity. As we will see as we progress, indeed they do. Within a t-test, for example, can be considered, at least