may be possible or necessary to access IPD for one or more trials via a data‐sharing repository or platform. For example, there is a reasonable likelihood that trial investigators conducting trials of newer therapies, or that have published results more recently, might have to comply with a funder‐mandated data‐sharing policy requiring that IPD are uploaded to a repository. As described in detail in Section 3.2.2, repositories may not include all trials of relevance to a particular meta‐analysis, or the trials may be distributed across multiple platforms. Therefore, accessing trial IPD in this way can present challenges for the central collection, management, checking and analysis of data, and thereby potentially limit the ability to realise all the advantages generally associated with IPD meta‐analysis projects.45 The central research team may need to find ways to work around the issues.
The IPD from a repository may not contain all the variables required to conduct the planned analyses, or these may not be defined in a standard way across different repositories, making data harmonisation difficult. In such circumstances, it may be possible to get an investigator to upload a more appropriate dataset, if the IPD research team can persuade them of the value of doing so. For example, they might emphasise that this is important not only for the current IPD meta‐analysis, but also for subsequent projects that may make use of their trial IPD, and therefore may ultimately prevent data providers receiving repeated requests for further data. However, this would take additional time and resource on the part of both the data provider and the central IPD meta‐analysis research team.
In order to preserve participant confidentiality, often the IPD contained in a repository are subject to a greater degree of de‐identification (Section 4.4.1) than might be required or expected of IPD obtained directly from investigators. This can limit the ability to thoroughly check data (Section 4.5) and assess risk of bias (Section 4.6), and moreover, the opportunity to query any anomalies is lost. Once again, it may be possible to query aspects of the trial, or its associated IPD, through additional contact with trial investigators (including the trial statistician), or to request that they run some validity or risk of bias checks on behalf of the research team. It should be noted though, that for some platforms, the associated data‐sharing agreements require that all queries are mediated through them.
Provided appropriate data are available, are more or less in the same format across trials, and can be downloaded from the relevant repository to use locally, there should not be any restrictions to the analyses of IPD. However, if access to IPD (and therefore data management and analyses) are confined to within a platform, it will only be possible to download results (e.g. regression model parameter estimates and confidence intervals), and this will restrict the meta‐analysis to a two‐stage approach (Part 2).
4.5 Checking and Harmonising Incoming IPD
Checking IPD, and harmonisation across trials, are integral and necessary components of an IPD meta‐analysis project. The checking process ensures that any missing data, or major errors in the supplied IPD, can be identified and brought diplomatically to the attention on the trial investigators. Often, the problems that arise turn out to be simple errors or reflect misunderstandings, which can be resolved readily, with major problems being rare. As well as preventing serious issues in the meta‐analysis, checking the IPD also promotes a better understanding of each trial by the central research team, and their independent scrutiny of the IPD can enhance the IPD project’s credibility.
Often trials will have collected and defined data items in a variety of ways, and it will be necessary to re‐code or re‐define certain variables to a common format, in preparation for meta‐analysis. Depending on how data have been provided, and whether trial teams have followed the supplied data dictionary, this can be an involved process, as described next. Checking and harmonising data requires care and meticulous attention to detail, in order to avoid misunderstandings and the introduction of errors. Contact with those supplying data may be needed to resolve any uncertainties or sanction particular changes.
4.5.1 The Process and Principles
Data management is usually done in a number of steps, with queries back and forth to trial investigators or other data contacts, to resolve any issues and ensure accuracy (Figure 4.1), ultimately leading to trial data that are in the best possible shape for the IPD meta‐analysis. Therefore, it is one of the most involved stages of an IPD project, and needs sufficient time and resources. Conducting the individual checking and harmonisation steps, and waiting for responses to queries across multiple trials, can take place over many months. As different members of the research team may handle different trials at different times, keeping track of the process and outputs can be challenging. Producing a detailed plan, and adopting a standardised approach to data checking and harmonisation, will help to ensure that the process is implemented consistently across trials, and between those managing the IPD. For example, using a checklist for all data checkers to follow, together with a common suite of statistical analysis code, can help to ensure and maintain consistency.
Regardless of the extent of checking and data transformation needed, it is always sensible to use formal database or statistical software code to carry out the different steps. The code and the associated outputs help to maintain a detailed log of the checks, and any conversions or modifications to the data, thereby providing a comprehensive and transparent audit trail for each trial. It is also important to record where checks have identified problems, how these were (or were not resolved), and equally to record where no problems were identified.
The information generated for each trial may be held on a number of forms, spreadsheets or as output from statistical software. A summary document is a useful means of bringing together the various elements of checking, querying and decision‐making, and might include hyperlinks to the different outputs, together with correspondence from trial teams. Where resource allows, ideally two individuals would independently check each trial, blinded to the other’s results, and compare and discuss the findings. At the very least, another research team member should review the checking results, and discuss problems arising. Any major or sensitive issues should be raised with senior research team members, prior to any dialogue with the trial investigators.
In the following sub‐sections, we suggest a range of checks for IPD obtained from randomised trials evaluating treatment effects,7,9,43,101 but most of these are applicable to other types of primary study.
4.5.2 Initial Checking of IPD for Each Trial
When IPD are received for a trial, and often before processing the data further, it is worth conducting some preliminary checks. For example, it is useful to confirm that all the participants randomised appear to have been included, and check that there are no obvious omissions or duplicates in the sequence of participant identifiers (if they have been provided). Similarly, it is helpful to check which outcomes, baseline covariates and other variables are included in the IPD, and whether any that are ‘missing’ were truly not collected in the trial (called systematically missing variables; Chapter 18) or were recorded, but not included in the IPD supplied. If the latter, then either more complete IPD should be requested again, or a full explanation for non‐provision sought.
For time‐to‐event outcomes, it is worth checking that the extent of follow‐up for each trial is sufficient for the condition and outcome of interest, and encouraging trial investigators to supply up‐to‐date follow‐up information where possible.7 Although short follow‐up may not introduce bias, it might prevent a trial, and therefore an IPD meta‐analysis, picking up benefits or harms of interventions that take