Prof Carla Moreira

The Statistical Analysis of Doubly Truncated Data


Скачать книгу

and Meier, 1958), which corrects for the fact that some of the recorded values for

are smaller than the true ones. With truncated data, every value in the sample corresponds to a true observation of
; however, the distribution of the observed values may be shifted with respect to the true one due to the truncation event. This difference between truncation and censoring suggests that specific methods to estimate the target distribution under random truncation should be employed. Indeed, Woodroofe (1985) provides a deep analysis of one‐sided truncation, introducing the original idea of Lynden–Bell (1971) as a nonparametric maximum likelihood estimator (NPMLE) of the probability distribution in that setting. The estimator in Woodroofe (1985) is a particular case of the estimator corresponding to doubly truncated data, on which this book is focused.

      A variable of interest

is said to be doubly truncated by a couple of random variables
if the observation of
is possible only when
occurs. In such a case,
and
are called left‐ and right‐truncation variables respectively. Double truncation reduces to left‐truncation when
degenerates at
, while it corresponds to right‐truncation when
. This book is focused on the problem of estimating the distribution of
, and other related curves, from a set of iid triplets with the distribution of
given
.

and
(Zhu and Wang, 2012). Then, the right‐truncation time is
, where
denotes the date of onset for the time‐to‐event, and the left‐truncation time is
, where
is the interval width. The Childhood Cancer Data in Section 1.4.1 is an example of data obtained through interval sampling.

      With interval sampling the variable

is degenerated at
. This occurs in other sampling schemes too, in which
and
are certain subject‐specific event dates. An illustrative example is given by the Parkinson's Disease Data, see Section 1.4.5, where
is the individual age at blood sampling. When
is constant, the couple
falls on a line, and its joint density does not exist, even when the truncating variables may be continuous.

      In other situations, the truncating variables

and
are not linked through the linear equation
. For example,
and
could represent some random observation limits beyond which the variable of interest
can not be sampled or detected. Situations like this occur for example in Astronomy, as it is illustrated in Section 1.4.4.

      With random double truncation, both large and small values of

are observed in principle with a relatively small probability. However, the real observational bias for
varies from application to application, depending on the joint distribution of
. We will see, for example, that the probability of sampling a value
, namely
, may be roughly constant, inducing no observational bias; or that it may be roughly decreasing, indicating the dominance of the right‐truncation bias relative to the left‐truncation bias.

      Another issue of relevance is that of the identifiability of the distribution of

. Intuitively it is clear that with doubly truncated data it is only possible to estimate the distribution of
conditional on
, where
and