Chris Jones

End-to-end Data Analytics for Product Development


Скачать книгу

is the performance of a new product compared with the industry standard or products currently on the market?

       What is causing high levels of variation and waste during processing?

      These questions are examples of inferential problems.

      Inferential problems are usually related to:

Estimation of a population parameter (e.g. a mean) Illustration of a black right arrow. What is the stability of a new formulation?
Comparison among groups Illustration of a black right arrow. What is the performance of a new product compared with the industry standard or products currently on the market?
Assessing relationships among variables Illustration of a black right arrow. What is causing high levels of variation and waste during processing?
Estimation of a population parameter:Point estimateConfidence intervals Illustration of a black right arrow. Graphical illustration of estimation of a population parameter, a two‐sample test, and a regression model that assess relationships among variables.
Comparison among groups:Hypothesis testing (one‐sample tests; two‐sample tests; ANOVA) Illustration of a black right arrow. Graphical illustration of estimation of a population parameter, a two‐sample test, and a regression model that assess relationships among variables.
Assessing relationships among variables:Regression models Illustration of a black right arrow. Graphical illustration of estimation of a population parameter, a two‐sample test, and a regression model that assess relationships among variables.

      Stat Tool 1.4 Shapes of Data Distributions Icon01

      By observing the frequency distribution of a categorical or quantitative variable, several shapes may be detected:

       When values or classes have similar percentages, the distribution is said to be fairly uniform. In a fairly uniform distribution there are no values or classes predominant over the others (a).

       When there is one value or class predominant over the others, the distribution is said to be nonuniform and unimodal with one peak (b).

       When there is more than one value or class predominant over the others, the distribution is said to be nonuniform and multimodal with more than one peak (c).

      The value or class with the highest frequency is the mode of the distribution (see Figure 1.2).

Illustration of shapes of distributions: Fairly Uniform distribution, Unimodal distribution, and Bimodal distribution.

       Figure 1.2 Shapes of distributions.

      Stat Tool 1.5 Shapes of Data Distributions for Quantitative Variables Icon01

      By observing the frequency distribution of a quantitative discrete or continuous variable, several shapes may be detected related also to the presence or absence of symmetry (Figures 1.3 and 1.4).

Illustration of symmetric and skewed distributions: Fairly symmetric, Skewed to the right, and Skewed to the left.

       Figure 1.3 Shapes of distributions (symmetric and skewed distributions).

Illustration of J-shaped, U-shaped, and Presence of outliers distributions.

       Figure 1.4 Other shapes of distributions.

      If histograms (or bar charts for quantitative discrete variables) show ever‐decreasing or ever‐increasing frequencies, the distribution is said to be J‐shaped (d). If frequencies are decreasing on the left side of the graph and increasing on the right side, the distribution is said to be U‐shaped (e). Sometimes there are values that do not fall near any others. These extremely high or low values are called outliers (f).

      Stat Tool 1.6 Measures of Central Tendency: Mean and Median Icon01

      When quantitative data distributions tend to concentrate around certain values, we can try to locate these values by calculating the so‐called measures of central tendency: the mean and the median. These measures describe the area of the distribution where most values occur.

      The mean is the sum of all data divided by the number of data. It represents the “balance point” of a set of values.

Diagrammatic illustration of mean representing the balance point of a set of values.

      The median is the middle value in a sorted list of data. It divides data in half: 50% of data are greater than the median, 50% are less than the median.

Diagrammatic illustration of median that is the middle value in a sorted list of data.

      For