Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics Using Python


Скачать книгу

delay it due to its problematic issues? Does “finding” a cluster in cluster analysis provide evidential support for real, substantive clusters? Or, does it simply provide evidence of clusters on an abstract mathematical level?

      Other issues that at first glance might appear purely “mathematical” have groundings in philosophical principles. For example, whether a research variable should be considered continuous or discrete in a scientific sense cannot be answered via mathematics; it must be answered through thoughtful consideration in a philosophical or scientific sense. We survey this issue now.

      1.6 Continuous vs. Discrete Variables

      A variable in mathematics is usually represented by a symbol such as x or y, etc. It is usually indexed by a subscript such as “i” to indicate that it represents the complete set of possible values that the symbol can take on. For example, for a variable xi, the “i” implies that the variable in question can take on any of the ith values in the given set of possibilities. For instance, for a sample of 10 individuals in a room, weight is a variable. It is a variable because not everyone in the room has the same weight. Hence, xi in this case implies that there are i = 1 to i = 10 values for weight, not all necessarily distinct from one another. Now, if everyone in the room had the same weight, such that it was a constant instead of a variable, then the “sub-i” index would not be required, at least not for describing this particular set of individuals.

      There are two types of variables in mathematics that are requisite knowledge for understanding applied statistics, especially when it comes to generating statistical models and conducting moderate to advanced techniques. A variable is generally considered to be either discrete or continuous. A discrete variable, crudely defined, can take on values for which in between those values none are possible. For example, a variable with numbers 0, 1, 2 is discrete if there is no possibility of values between 0 and 1 or between 1 and 2. If, on the other hand, any values between 0 and 1 and 1 and 2 are theoretically possible, then the variable is no longer discrete. Rather, it is continuous. For a continuous variable then, any values are possible, even if only theoretically, as depicted in the following (below) graphical and more formal definition of continuity.

A graph shows a plot for continuous variable.

      As an example, suppose we wanted to measure the VO2-max of participants treated with a new COVID-19 medication vs. those not treated. VO2-max is essentially a measure of oxygen uptake during exercise of greater intensity (e.g. a Tour de France cyclist has better VO2-max than you and I). The VO2-max variable is the response, which is considered continuous, as a function of the independent variable treatment vs. control. For this, we are in the realm of z-tests or t-tests for means, or we could also perform an ANOVA on these variables. A regression analysis is also an option since we can operationalize the independent variable as a binary dummy-coded predictor. When we flip things around, such that the grouping variable is now the response and VO2-max is the predictor, we are in the realm of discriminant analysis or logistic regression on two groups. Here, we would like to predict group membership based on the continuous predictor. Notice that these models are answering different research questions, but at their core, it stands that they must have great technical similarity. As we will see as we progress, indeed they do. Within a t-test, for example, can be considered, at least