Chris Jones

End-to-end Data Analytics for Product Development


Скачать книгу

where readers can review several basic statistical concepts before moving on to the next chapters. Sixteen sections titled Stat Tools will introduce some key terms and procedures that will be further elaborated and referred to throughout the text.

      Specifically, this chapter deals with the following:

Topics Stat tools
Statistical variables and types of data 1.1
Statistical Units, populations, samples 1.2
Introduction to descriptive and inferential analyses 1.3, 1.12, 1.13
Data distributions 1.4, 1.5
Mean values 1.6, 1.7
Measures of variability 1.8, 1.9, 1.10
Boxplots 1.11
Introduction to confidence intervals 1.14
Introduction to hypothesis testing procedures, including the p‐value approach 1.15, 1.16

      Learning Objectives and Outcomes

      Upon completion of the review of these basic statistical concepts, you should be able to do the following:

       Recognize and distinguish between different types of variables.

       Distinguish between a population and a sample and know the meaning of random sampling.

       Detect the shape of data distributions.

       Calculate and interpret descriptive measures (means, measures of variability).

       Understand the basic concept and interpretation of a confidence interval.

       Understand the general idea of hypothesis testing.

       Understand the p‐value approach to hypothesis testing.

      Stat Tool 1.1 Statistical Variables and Types of Data Icon01

      In statistical studies, several characteristics are observed or measured to obtain information on a phenomenon of interest. The observed or measured characteristics are called statistical variables. Statistical variables differ according to the type of values they store.

      Qualitative or categorical variables can assume values that are qualitative categories and can be either ordinal or nominal.

      Quantitative or numeric variables can assume numeric values and can be discrete or continuous. Discrete data (or count data) are numerical values only measurable as integers. Continuous data are numeric values (typically instrumental measures) that can be meaningfully subdivided into fractions.

Illustration of several characteristics observed or measured to obtain information on a phenomenon of interest.

       Example 1.1. For a new shaving oil, it is of interest to compare fragrance A and fragrance B to investigate which is preferred. A sample of female respondents is presented with the two fragrances and asked their age and to answer the following question: How suitable or unsuitable is the fragrance to a shaving aid? Each was asked to assign an integer score from 0 (very unsuitable) to 10 (very suitable). Respondents also expressed their purchase intentions for selecting one of the following categories: Probably would not buy it, Neither, Probably would buy it.The variable “Fragrance” is a nominal categorical variable, assuming two different categories: A and B. The variable “Appropriateness” is a discrete quantitative variable assuming values from 0 to 10. “Age” of the respondents is a continuous quantitative variable, and “purchase intent” is an ordinal categorical variable assuming three different ordered categories.

      In some contexts, you may find different terminology used to refer to similar data types. In quality control, categorical and discrete data are referred to as attributes and continuous data as variables.

      When performing a statistical analysis, take into account the type of variable(s) you have, i.e. is it qualitative or quantitative? Different graphs, descriptive statistics, and inferential procedures must be used to study different types of data.

      Stat Tool 1.2 Statistical Unit, Population, Sample Icon01

      A statistical unit is the unit of observation (e.g. entity, person, object, product) for which data are collected. For each statistical unit, qualitative or quantitative variables are observed or measured.

      The whole set of statistical units is the population. It may also be virtually infinite (e.g. all products of a production process).

      A sample is a subset of statistical units (sampling units) selected from the population in a suitable way. The sample size of a study is the total number of sampling units (see Figure 1.1).

Diagrammatic illustration of population, samples, and sampling units.

       Figure 1.1 Population, samples, sampling units.

      When we use a sample to draw conclusions about a population, sample selection must be performed at random. Random sampling is carried out in such a way as to ensure that no element in the population is given preference over any other. Random sampling is used to avoid nonrepresentative samples of the population.

      In Example 1.1 the statistical unit is the single respondent.

      Usually, the first step of a statistical analysis is descriptive analysis, where tables, graphs, and simple measures help to quickly assess and summarize important aspects of sample data.

      When performing descriptive analysis, take into account the type of variable(s) present, i.e. is it qualitative (categorical) or quantitative? Different graphs, descriptive statistics, and inferential procedures have to be used to study different types of data.

      The descriptive phase evaluates the following aspects:Graphical illustration of frequency distribution, a histogram, mean values, measures of variability, frequency distribution histogram, and a bar graph.

      After outlining important sample data characteristics through descriptive statistics, the second step of a statistical analysis is inferential analysis, where sample findings are generalized to the referring population.

      We often wish to answer questions about our processes or products to make improvements and predictions, save money and time, and increase customer satisfaction:

       What is the stability of a new formulation?