where denotes the empty set.
Definition 2.1
A sampling design without replacement is a probability distribution on such that
Definition 2.2
A random sample is a random variable whose values are the samples:
A random sample can also be defined as a discrete random vector composed of non‐negative integer variables . The variable represents the number of times unit is selected in the sample. If the sample is without replacement then variable can only take the values 0 or 1 and therefore has a Bernoulli distribution. In general, random variables are not independent except in very special cases. The use of indicator variables was introduced by Cornfield (1944) and greatly simplified the notation in survey sampling theory because it allows us to clearly separate the values of the variables or from the source of randomness .
Often, we try to select the sample as randomly as possible. The usual measure of randomness of a probability distribution is the entropy.
Definition 2.3
The entropy of a sampling design is the quantity
We suppose that
We can search for sampling designs that maximize the entropy, with constraints such as a fixed sample size or given inclusion probabilities (see Section 2.3). A very random sampling design has better asymptotic properties and allows a more reliable inference (Berger, 1996, 1998a; Brewer & Donadio, 2003).
The sample size is the number of units selected in the sample. We can write
When the sample size is not random, we say that the sample is of fixed sample size and we simply denote it by .
The variables are observed only on the units selected in the sample. A statistic is a function of the values that are observed on the random sample: . This statistic takes the value on the sample . The expectation under the design is defined from the sampling design:
The variance operator is defined using the expectation operator:
2.3 Inclusion Probabilities
The inclusion probability is the probability that unit is selected in the sample. This probability is, in theory, derived from the sampling design:
for all . In sampling designs without replacement, the random variables have Bernoulli distributions with parameter There is no particular reason to select units with equal probabilities. However, it will be seen below that it is important that all inclusion probabilities be nonzero.
The second‐order inclusion probability (or joint inclusion probability) is the probability that units and are selected together in the sample:
for all In sampling designs without replacement, when , the second‐order inclusion probability is reduced to the first‐order inclusion probability, in other words for all
The variance of the indicator variable is denoted by
which is the variance of a Bernoulli variable. The covariances between indicators are
One can also use a matrix notation. Let
be a column vector. The