Study design, participants and data
The IBS has been described previously [16]. Briefly, the IBS was an observational longitudinal immuno-epidemiological study. The study successfully enrolled infants born to mothers with (n = 132) or without (n = 150) LTBI and followed them up for one year. Since the study fell short of the targeted 150 respondents for mothers with LTBI, there was a slight reduction in power from 80 to 78% for the specified scenario. Blood samples were taken at selected time points throughout infancy, with more frequent sampling in early infancy when immune responses have been shown to change most rapidly [17]. There was random assignment of individual infants to two sampling strategies, in a 1:1 ratio, to reduce the blood-sampling burden. The first sampling strategy comprised collection of 2 ml venous blood at 1, 6, and 14 weeks, and 5 ml venous blood at 52 weeks; the second sampling strategy comprised collection of 2 ml venous blood at 4, 10, and 24 weeks, and 5 ml venous blood at 52 weeks. All infants were BCG immunised at birth or within the first week of life with BCG (Statens Serum Institut (SSI), Denmark). Immunological parameters measured included 7 cytokine responses (IL-2, IL-5, IL-10, IL-13, IL-17A, TNF, and IFN-γ) to PPD. These were assessed by Luminex (Bio-Rad Luminex® 200 system and Bioplex Manager Software version 6.1 (Bio-Rad)) in 6-day whole blood cultures, in cord blood and at weeks 1, 4, 6, 10, 14, 24 and 52. Net cytokine response values were log-transformed to meet distributional assumptions while fitting mixed effects models.
Conceptual framework
Figure 1 highlights the multivariate longitudinal outcomes, cytokine responses (IL-2, IL-5, IL-10, IL-13, IL-17A, TNF, and IFN-γ) to PPD. It shows the underlying immunological functions for the different cytokines namely proinflammatory, T helper (Th)2, Th17 and T-cell regulation. The figure also shows that mixed effects and pairwise joint models were applied to study the evolution of cytokine responses over time and how that evolution depends on maternal LTBI status.
Statistical software
Data analysis was conducted using SAS version 9.4 (SAS Institute, Cary NC, USA), Stata 15.0 (College Station, Texas, USA) and R version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria). A 5% significance level was used for all analyses.
Participants’ baseline characteristics
Baseline characteristics of participants were summarised, by LTBI status, using percentages, means and standard deviations, and medians and interquartile ranges.
Simple analyses ignoring correlations between time points and outcomes
Simple analyses often reported for such cytokine response data include t-tests if the distributions are approximately normal and non-parametric alternatives such as Mann–Whitney tests when the distributional assumptions are relaxed. Such tests are usually conducted separately for individual cytokine responses and separately at each time point. Adopting this simple approach, our initial analyses employed Mann–Whitney tests comparing responses between infants born of mothers with or without LTBI. Conservative Bonferroni corrections were applied to demonstrate their use in adjusting for multiple comparisons at each time point.
Graphical exploration of longitudinal profiles
Individual profile plots were constructed for each of the seven longitudinal outcomes to obtain insight into how infant responses evolved over time as well as to give an indication of between and within infant variability. When this kind of variability is present it provides the motivation for modelling approaches which take this into consideration, most commonly via the specification of random effects.
Average profile plots were then constructed, for each outcome, to describe the mean evolution of infant responses, disaggregated by maternal LTBI status. These plots give an indication of the functional form of the evolution and an initial idea of whether this evolution differs by maternal LTBI status.
Analyses allowing for longitudinal data but ignoring correlations between different outcomes
Changes in log-transformed cytokine responses over time, by mother’s LTBI status, were studied using univariate linear mixed models (LMM) adjusted for factors that showed baseline differences between the two groups.
The use of the linear mixed model for analysing univariate longitudinal data has been discussed extensively [3, 18,19,20,21]. The model handles continuous longitudinal data in an easy, valid and flexible manner [22] and can be used for data with an unequal number of measurements per subject [18]. The model is defined as
$${\varvec{Y}}_{i} = {\varvec{X}}_{i} {\varvec{\beta}} + {\varvec{Z}}_{i} {\varvec{b}}_{i} + {\varvec{\varepsilon}}_{i} ;\quad {\varvec{b}}_{i} \sim N\left( {0, D} \right); {\varvec{\varepsilon}}_{i} \sim N\left( {0, {\varvec{\varSigma}}_{i} } \right);\quad {\varvec{b}}_{i} , {\varvec{\varepsilon}}_{i} \,are\, independent$$
where Yi is the ni dimensional response vector for the ith subject, 1 ≤ i ≤ N, N is the number of subjects, Xi and Zi are (ni × p) and (ni × q) dimensional matrices of known covariates, β is a p-dimensional vector of fixed effects, bi is a q-dimensional vector of random effects, εi is an ni-dimensional vector of residual components, D is a covariance matrix of random effects and Σi is a covariance matrix of residuals [18]. Random effects represent an aggregation of all unobserved or unmeasured factors that make individuals respond differently to each other [21].
Longitudinal immuno-epidemiological data are often characterised by non-linear patterns over time. Such patterns can still be handled under the linear mixed effects models framework with power transformations of time. Fractional polynomials (FPs) are characterised by power terms which can be negative values and/or fractions. Conventional polynomials (CPs) are a special case of FPs with power terms having only integer values. FPs have been shown to have more favourable characteristics than higher order CPs when modelling non-linear growth curves within the context of the linear mixed model [23,24,25]. The R package ‘mfp’ [26] was used to determine the best fitting FP for each outcome. Graphics and criteria such as the Akaike Information Criteria (AIC), Deviance, R2 and adjusted R2 were used to compare the best fitting FP to the best higher order CP (which had been chosen using graphical means).
Under the LMM framework, likelihood based tests, specifically Restricted Maximum Likelihood (REML), were used to check the need for inclusion of various serial correlation structures (simple, autoregressive, compound symmetry, unstructured, exponential, power and gaussian). Mixtures of chi-square distributions were then used to assess the need for extending the random effects structures. Finally, to discover the most parsimonious mean structure, likelihood ratio tests were employed under maximum likelihood estimation.
Analyses allowing for longitudinal data and correlations between outcomes
Patterns of correlation between the different outcomes were expected, for instance, certain cytokines are associated with particular cell types or functions (such as T-helper (Th)1, Th2 or regulatory functions). A statistical modelling approach which accounts for such correlation structures was necessary, to provide additional insight into the data. The seven univariate linear mixed models were thus combined into a full multivariate model resulting in a 7 × 7 covariance matrix for random effects (if only random intercepts are considered) and a 7 × 7 covariance matrix for error components. Each of these two matrices had 28 variance–covariance components (i.e. 7 variance components + 21 covariance components). This then resulted in a total of 28*2 = 56 covariance parameters altogether. This dimensionality of random effects is too large for standard software packages to handle.
The pairwise joint modelling approach was introduced as a novel procedure for fitting such random effects models without restricting the dimensionality [14]. The general idea is that all parameters in the full multivariate model can be identified from fitting all bivariate models for each pair of outcomes separately, then, afterwards, estimates are averaged to obtain one single estimate for each parameter of the full joint model. For standard errors, in order to correctly calculate the sampling variability of the estimates from the pairwise approach, pseudo-log-likelihood estimation is applied [14, 27]. This approach was applied to our data in order to properly account for and to study the strength of association between the evolutions of cytokine responses over time. Parameter estimates from this approach were used to construct a Wald-type test statistic for the joint effect of maternal LTBI on evolution of all outcomes together. Principal components analysis (PCA) was then carried out on the 7 × 7 correlation matrix of random effects and the results were compared to PCA of cord blood cytokine responses to establish whether the clustering of responses before immunisation with BCG differed from the clustering of responses after BCG. A SAS macro [28] was adapted and used to implement the pairwise joint modelling approach.