Prof Carla Moreira

The Statistical Analysis of Doubly Truncated Data


Скачать книгу

alt="b Subscript upper V"/> denote respectively the lower and upper endpoints of the supports of

and
(see Chapter 2 for details). This may have important practical consequences, as we will see. On the other hand, in applications with doubly truncated survival data the estimates correspond to the susceptible population for which the terminal event of interest is sure. This is in contrast to the standard analysis of survival times where a portion of the individuals may belong to the so‐called cured fraction, or immunes. This should be taken into account when interpreting the results from the analysis.

      In this section we introduce the datasets that will be used throughout the book for illustration purposes. All of them suffer from double truncation. These examples are available within the last update of the DTDA package (Moreira et al., 2021a).

      1.4.1 Childhood Cancer Data

      The Childhood Cancer Data were gathered from the IPO (Instituto Português de Oncologia) of Porto, Portugal, by the RORENO (Registro Oncológico do Norte) service. The information corresponds to all children diagnosed from cancer between 1 January 1999 (

) and 31 December 2003 (
) in the region of North Portugal, which includes five districts: Porto, Braga, Bragança, Vila Real and Viana do Castelo. The variable of main interest
is the age at diagnosis which, by definition of childhood cancer, is supported on the
interval (time in years). The number of cases was 409. However, for three cases the value of
was not available, so we only consider the
children who report complete information.

      Because of the interval sampling, the age at diagnosis

is doubly truncated by the pair
, where the right‐truncation variable
is the time in years from birth (date of onset,
) to 31 December 2003, and
. The
triplets
,
, with the values observed for
were reported in Moreira and de Uña‐Álvarez (2010), while de Uña‐Álvarez (2020) included the cancer group in the statistical analysis. Ordinary descriptive statistics can be applied to the information gathered along this 5 year long window to compute, for instance, the average age at cancer diagnosis. However, if the goal is to describe the population of children eventually developing cancer, the double truncation issue should be acknowledged and properly corrected, so potential biases are avoided.

range between
and 14.5 (years); equivalently, the observed values for
range between 0.5 and 19.5. This means that the lower and upper endpoints of
and
satisfy
and
. Thus, in this case, the target variable
is observable on its whole support
, and there are no identification issues for
, the cdf of
. Information on
is summarized in Table 1.1.

and mean (and standard deviation, SD) for the age at diagnosis (years).

Group
Mean (SD)
All 406 6.47 (4.50)
By gender Female 178 6.43 (4.51)
Male 228 6.51 (4.51)