Evaluation of regression methods when immunological measurements are constrained by detection limits
- Hae-Won Uh^{1, 2}Email author,
- Franca C Hartgers^{2},
- Maria Yazdanbakhsh^{2} and
- Jeanine J Houwing-Duistermaat^{1}
https://doi.org/10.1186/1471-2172-9-59
© Uh et al; licensee BioMed Central Ltd. 2008
Received: 17 March 2008
Accepted: 17 October 2008
Published: 17 October 2008
Abstract
Background
The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects), and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary) linear regression. The second aim is to compare these methods by simulation studies based on real data.
Results
We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best.
Conclusion
Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
Background
The number of immunological parameters that can be measured in large scale epidemiological studies has been rapidly increasing. Not all of these quantitative levels can be determined precisely. Reasons for this lack of precision are that the signal produced by the stimulant is too small for the instrumentation to discriminate the signal from the background noise, or a signal is registered, but certain (laboratory) criteria that identify the substance are not met. Values that cannot be quantified are called nondetects (NDs). We assume that all NDs are below a given detection limit (DL), and therefore we are dealing with censored data. Simple solutions such as deletion of NDs and single value substitution are often used, but it is unknown to what extent these methods provide unbiased results and thus would be adequate for the analysis. Applying various approaches yielded different parameter estimates in the environmental studies [1, 2].
When the number of NDs is rather small, one approach of dealing with NDs is simply dropping NDs and apply linear regression to the remaining data. A second commonly used approach is to substitute NDs with a certain value smaller than the DL (0, DL/2 or DL) and to use linear regression [3, 4]. The validity of these approaches will depend on the number and the unknown range of NDs. A third common practice is to dichotomize the cytokine measurements based on a certain cut-off point (DL or median) and to apply logistic regression to this binary variable [3, 5]. A major drawback of this approach is that by dichotomizing much information is lost. Note also that the choice of 0, DL/2 or DL in the single value substitution and the threshold in the logistic regression approach is arbitrary. An important issue is then how to decide which method is optimal for a particular data set. Moreover, more sophisticated statistical methods may be needed for analyzing this type of data.
Description of cytokine data
Cytokine | sample size | proportion of NDs | DL |
---|---|---|---|
1 | 181 | 5.5% | 10 pg/ml |
2 | 173 | 66% | 5 pg/ml |
Considering the efforts made by collecting data, it seems worth while to investigate sophisticated and (maybe) time-consuming statistical methods to analyze data appropriately [8]. In this paper we review several commonly used methods in immunology and more advanced methods used in other fields such as environmetrics and econometrics [1, 9, 10]. A second goal is to evaluate the performances of these methods via simulation studies [2, 11, 12]. The validity and precision of simple methods such as deletion and single value substitution will be studied for various scenarios including different proportions on ND's and different error models. In addition the utility of advanced statistical methods will be quantified.
Results and discussion
Results of simulation studies
Methods used for comparing
Methods | Description | Software | Disadvantage |
---|---|---|---|
Deletion | Remove NDs. | Any statistical package | Bias |
DL/2 | Substitute each ND with half of the value of DL. | Any statistical package | Large RMSE for large proportion of NDs |
ROS | After computing a linear regression for data versus their normalized scores below-DL values are extrapolated under distributional assumption. | Underestimation of variance for large proportion of NDs | |
MI | Estimation of mean and standard deviation by MLE. Creating 10 complete samples. Pool the results from 10 individual analyses. | R (software available on request) | Bias for small proportion of NDs |
TOBIT | Parametric estimation method for incorporating NDs. | Sensitive to heteroscedastic errors | |
LOGIT | Create binary dependent variable of NDs (0s) and detects (1s). | Any statistical package | Loss of information & parameter estimates are less interpretable |
The three rows display the results from the three different covariates. For the quantitative covariates generated from three-component mixture (row 1) and two-component mixture (row 2), the simulation results were similar. Therefore, the results from the three-component mixture imitating Cytokine 1 are discussed in details.
Variance of estimates provided by the MI, TOBIT and LOGIT approaches at various proportions of nondetects (entries are averages of 1000 repetitions)
Sample size of 200 | Sample size of 400 | Sample size of 1000 | |||||||
---|---|---|---|---|---|---|---|---|---|
% NDs | MI | TOBIT | Logit | MI | TOBIT | Logit | MI | TOBIT | Logit |
10% | 0.0050 | 0.0044 | 0.0358 | 0.0022 | 0.0022 | 0.0164 | 0.0008 | 0.0009 | 0.0063 |
30% | 0.0055 | 0.0050 | 0.0185 | 0.0022 | 0.0025 | 0.0089 | 0.0007 | 0.0010 | 0.0035 |
50% | 0.0055 | 0.0061 | 0.0195 | 0.0021 | 0.0030 | 0.0095 | 0.0006 | 0.0012 | 0.0037 |
70% | 0.0048 | 0.0092 | 0.0319 | 0.0018 | 0.0043 | 0.0147 | 0.0005 | 0.0017 | 0.0056 |
Regarding efficiency of the methods, the right panel of Figure 2 shows the power to detect at the nominal significance level of α = 5% for the TOBIT, MI, and LOGIT methods. The MI method was the best at all proportions of NDs and for all sample sizes. For a small proportion of NDs, the performance of the TOBIT and MI methods was equivalent. Overall, the LOGIT method performed worst.
In the last row of Figure 1, the results for the third (microscopic) category with reference to negative category are given. In contrast to quantitative covariates, there were dissimilarities in the behavior. RMSE did not increase as rapidly as with quantitative covariates, when the proportion of NDs becomes large. However, the actual RMSE values were much higher. Although the order of the best methods did not vary much, the TOBIT model gave bad performance when the sample size was small (n = 200).
Application to immunological data
Application to real data: for the LOGIT model two cut-off points were used, median for Cytokine 1 and DL for Cytokine 2
Methods | $\widehat{\beta}$ (Slope) | SE($\widehat{\beta}$) | $\widehat{\beta}$ SE($\widehat{\beta}$) | p-value | |
---|---|---|---|---|---|
Cytokine 1 | Standard methods | ||||
Deletion | 0.012 | 0.042 | 0.277 | 0.782 | |
Substitution of 0 | -0.255 | 0.055 | -4.645 | < 0.0001 | |
Substitution of DL/2 | -0.186 | 0.047 | -3.951 | 0.0001 | |
Substitution of DL | -0.117 | 0.040 | -2.879 | 0.005 | |
LOGIT (Median) | -0.198 | 0.139 | -1.423 | 0.155 | |
Advanced methods | |||||
ROS | -0.085 | 0.039 | -2.184 | 0.030 | |
TOBIT | -0.190 | 0.048 | -3.960 | 0.00008 | |
MI | -0.149 | 0.046 | -3.264 | 0.0006 | |
Cytokine 2 | Standard methods | ||||
Deletion | 0.801 | 0.129 | 6.193 | < 0.0001 | |
Substitution of 0 | 0.585 | 0.1101 | 5.311 | < 0.0001 | |
Substitution of DL/2 | 0.545 | 0.093 | 5.842 | < 0.0001 | |
Substitution of DL | 0.504 | 0.079 | 6.418 | < 0.0001 | |
LOGIT (DL) | 0.208 | 0.113 | 1.841 | 0.066 | |
Advanced methods | |||||
ROS | 0.497 | 0.081 | 6.135 | < 0.0001 | |
TOBIT | 0.792 | 0.179 | 4.436 | < 0.0001 | |
MI | 0.550 | 0.127 | 4.322 | < 0.0001 |
For Cytokine 1, considering the rather small sample size (181), small proportion of NDs (5.5%), and homoscedastic errors (Figure 4) the TOBIT method (with parameter estimate $\widehat{\beta}$ = 0.190 and the corresponding p-value = 0.00008) might be a good choice. The simple DL/2 method ($\widehat{\beta}$ = -0.186) gave similar results, which confirmed the simulation results. The MI method gave the next best estimate $\widehat{\beta}$ = -0.149. Logistic regression using the median value as a cut-off point (p-value = 0.155) resulted in loss of power, and the estimate by the ROS method ($\widehat{\beta}$ = -0.085) was greatly biased. It was noted that the estimate by the Deletion method ($\widehat{\beta}$ = 0.012) was of different direction even with this small proportion of NDs.
Next we consider Cytokine 2 as outcome variable. As can be derived from Figure 4, heteroscedasticity of errors was indicated for Cytokine 2. Based on the simulation results, for a large number of NDs and for heteroscedastic errors the MI method (with parameter estimate $\widehat{\beta}$ = 0.550 and the corresponding SE($\widehat{\beta}$) = 0.113) might be preferable to others. The DL/2 method ($\widehat{\beta}$ = 0.545 and SE($\widehat{\beta}$) = 0.093) yielded a similar effect estimate, but the standard error of the parameter estimate was smaller compared to the MI method. Since the proportion of NDs was larger than the median, DL was used as a cut-off point for logistic regression. The use of binary rather than continuous data caused loss of power and the estimate was not even significant any more. The results given by the ROS method were greatly biased. Finally, as could be expected from the simulation study the TOBIT method overestimated the effect size ($\widehat{\beta}$ = 0.792).
Discussion
In the field of immunology there is great need for specialized methods for analysis of data in order to improve accuracy and power. In this paper we proposed advanced methods to deal with data sets when DL plays a significant role. Via simulation studies we first evaluated performances of several methods. Because NDs are not missing at random, biases can be expected when simply dropping NDs. Even with proportion of NDs of 10%, the bias was unacceptable. For parameter estimation substituting DL/2 in NDs was reasonable, but the variance was underestimated. Furthermore, as illustrated by our data set the choice of the imputed value (0, DL/2, DL) remains an issue. For large proportion of NDs, ROS appeared to yield large biases. Analogous to the DL/2 method, the variance of parameter estimates was underestimated. The TOBIT method appeared to be an elegant method to deal with a small proportion of NDs under the constant error assumption. If possible, the normality assumption should be checked before considering the TOBIT model (Figure 4) [13]. For larger proportions of NDs (larger than 10%), MI outperforms the other methods in terms of RMSE. Since imputations are multiple, the MI method takes into account the uncertainty about the true values of the NDs. Furthermore, it is rather robust against heteroscedasticity of errors. Figure 2 showed for large sample size the MI method produces more accurate estimates than the TOBIT method. Note that the MI method might be improved by using more sophisticated methods to compute the mean and the standard deviation of a truncated normal distribution [14]. However, diminishing variances by increasing proportion of NDs require the careful use of the MI method when proportion of NDs is greater than 50%.
We also compared results from the different scenarios: (1) whether there is positive relationship between dependent and independent variables, (2) when the characteristics of covariates were changed (three- and two component mixture, or categorized), and (3)whether the closeness of the detection limit to zero will influence the results (Figure 1). In general, the type of included covariates in the model did not influence the findings. Therefore, our findings in this paper can be used as a reference. Nevertheless, careful consideration should be given to what are the appropriate methods for analyzing each specific data.
The limitation of our simulation study lies in skewed error distributions. However, we studied a simple solution of dichotomizing a continuous variable. Although this is an inefficient approach, and determination of cut-off points remains arbitrary [5], for some situations creating a binary outcome variable could be the most sensible option when measurements can easily be categorized. The method can also be extended to more than two categories by using ordered logistic regression (or proportional odds model). Note that to reflect the natural ordering of the categories, ordered logistic regression should be preferred to multinomial logistic regression [15, 16]. Additional advantage of using ordered logistic regression is that the results can be presented in one parameter. In contrast, in using multinomial logistic regression as in our simulation study, the first (or most common) level will be considered as reference category (negative level), and the inference of remaining two categories compared to the reference will be given. Although making more categories than two might improve performance, the determination of categories remains arbitrary.
When data are very skewed and normality cannot be achieved by the usual transformation, quantile regression could be considered [17]. This is an econometric regression model, in which a specified conditional quantile (or percentile) of the outcome variable is expressed as a linear function of covariates.
Simply the lines split the population into two parts with the proportion of 70, 80 or 90% lying below the line, and the proportion 30, 20, or 10% above the line, respectively. Similar to logistic regression the choice of the quantile is arbitrary. However, it assumes no underlying distribution, and is reported to be robust against heteroscedastic errors. Good performance of quantile (or median) regression method have been reported elsewhere [18]. However, when the proportion of NDs is greater than 50%, median regression is not suitable. Also with normally distributed data (after appropriate transformation), the improvement using median regression would be little. The computation of quantile regression is possible using R, SAS, and Stata.
In this paper we considered a single variable restricted with NDs. Extending to multiple regression, such as multiple cytokine measurements of the same individuals and/or related cytokine levels within some set, it is very probable that we encounter NDs in more than one covariate and with different DLs. It can be expected that the large number of correlated cytokine variables would enhance the advantage of using the multiple imputation techniques [19]. In fact, using information on other correlated variables such as families would improve the performance of MI. It is not the purpose of this paper to stress that the MI method should be used everywhere in the presence of DL. Nevertheless, we showed that the search for new methods might gain deeper understanding of data, and that simulation studies can contribute to decide the optimal methods for measurement data with NDs.
Conclusion
We showed that a dichotomization of continuous variable generally causes loss of information, hence loss of power. We compared the several linear regression methods to deal with the data containing NDs based on simulation studies. The TOBIT method produced the most accurate estimates with the least bias. When the amount of NDs is relatively small (≤ 30%) and the normality assumption is met as Cytokine 1 in our example data, the use of the TOBIT method is recommended. However, as reported elsewhere [20, 21], the TOBIT model is sensitive to the violation of normality assumption. Therefore, when heteroscedastic errors are suspected, and/or the amount of NDs is large, robust statistical methods have to be considered. We proposed to employ multiple imputation technique. The MI method performed consistently well under different scenarios of various proportion of NDs (≤ 50%), sample sizes and even in the presence of heteroscedastic errors.
Methods
Methods to compare
Since NDs of cytokine measurements reflect levels of exposure, they cannot be considered as missing at random (MAR) [22]. Therefore, deleting the lowest values is expected to produce biased results. Other types of methods to analyze these data are imputation and modelling of NDs. An overview of the available methods is given in Table 1. In environmental statistics a method called robust regression on order statistics (ROS) approach exists [1, 9]. This method is often used to compute summary statistics.
To reflect uncertainty about imputation, we propose to employ multiple imputation approach as introduced by Little and Rubin [22, 23]. Based on a truncated normal distribution, we first compute the mean and the standard deviation. This can be done using the functions cenmle or ros from the R-package NADA [24]. Then, the values for NDs were generated randomly and m complete data sets are created and each data set is analyzed separately. Rubin (Chapter 3, [25]) gives the following rule for combining the results. With m imputations, we obtain m different sets of the point estimate ${\widehat{\beta}}_{i}$ as well as standard errors s_{1}, ..., s_{ m }. The pooled MI point estimate is then simply the average of the m estimates: $\overline{\beta}=\frac{1}{m}{\displaystyle {\sum}_{j=1}^{m}{\widehat{\beta}}_{i}}$.
The variance estimate associated with $\overline{\beta}$ has two components. The within-imputation variance can be estimated by the average of the complete data variance $\overline{U}=\frac{1}{m}{\displaystyle {\sum}_{j=1}^{m}{s}_{i}^{2}}$. The between-imputation variancem is the variance of the estimate $\overline{\beta}$, $B=\frac{1}{m-1}{\displaystyle {\sum}_{j=1}^{m}{({\widehat{\beta}}_{1}-\overline{\beta})}^{2}}$ The total variance is defined by T = Ū + (1 + m^{-1})B and inferences are based on the approximation $\overline{\beta}$/T^{-1/2} ~ t_{ ν }, where the degrees of freedom are given by $\nu =(m-1)\left[1+\frac{\overline{U}}{(1+{m}^{-1})B}\right]$.
The probit part determines whether the outcome variable is below-DL, and the OLS part is a truncated regression model. The TOBIT model estimates a regression model for the data above DL, and assumes that the censored data (below DL) have the same distribution of errors as the observed data. The weakness of this method is that it may be more vulnerable to violation of the assumptions about the error distribution. Many comments can be found in the literature that in the presence of heteroscedasticity the Tobit estimates are inconsistent, and that there is only limited information about the direction of the bias [20, 21].
Simulation study
We simulated data sets by drawing samples from a population similar to the example data in the Background section, and by allocating a proportion of observations as NDs.
For the covariate x (infection intensity) we used (1) a three-component normal mixture distribution, (2) a two-component normal mixture distribution, and (3) three classes. The three-component normal mixture distribution has means equal to 0.77, 3.35 and 4.59 and a within-component variance of 0.027. The proportions of the three components were 0.83, 0.13 and 0.04, respectively. The two-component normal mixture distribution has means equal to 0.77 and 3.69 and a within-component variance of 0.069, with their proportions 0.84 and 0.16, respectively.
Then, based on the characteristic of Cytokine 1, outcome variables were generated using the following regression model,
y_{ i }= 3.04 0 - 16x_{ i }+ ε_{ i },
for individual i ∈ {1, ..., n}. Based on Cytokine 2, we generated outcome variables as
y_{ i }= 0.66 + 0.27x_{ i }+ ε_{ i }.
And, ε were assumed to be standard normally distributed.
Based on biology, the malaria parasite measurements lend to be categorized in three classes: negative, submicroscopic, and microscopic. Instead of looking at the effect of malaria with continuous measurements, we considered the categorical malaria variable, say z. The dummy code z_{ i }= (z_{i 1}z_{i 2}z_{i 3})^{⊤} denotes a vector of malaria category indicators for the i th subject, with elements z_{ ij }= 1 if i th subject has j th category; otherwise z_{ ij }= 0. The categorical covariate vector z were then generated following the multinomial distribution of categorized malaria status with proportions of 0.69, 0.14, and 0.17. Based on Cytokine 1, y were generated following the model:
y_{ i }= 2.97 - 0.13z_{i 2}- 0.58z_{i 3}+ ε,
while based on Cytokine 2
y_{ i }= 0.84 + 0.13z_{i 2}+ 0.77z_{i 3}+ ε.
Here ε were assumed to be standard normally distributed.
We then considered data samples of size n = 200, 400 and 1, 000. The proportions of NDs were set 10%, 30%, 50% and 70%. The corresponding cut-off points of DL values were: (1) for imitation of Cytokine 1, 14.7, 10, 17, and 29 pg/ml, and (2) for mimicking Cytokine 2, 0.7, 1.6, 2.7, and 4.6 pg/ml.
For studying the effect of heteroscedastic errors we used the same model as in (3) but now with a variance depending on the value of x by using ε ~ N(0, $\sqrt{x}$).
Evaluation of methods
Therefore, parameter estimates provided by the various methods were compared in terms of mean bias and RMSE. Also coverage probability was provided, which is the probability that the confidence interval of the estimates contains the value. Additionally, for the unbiased methods performances were also compared for their hypothesis testing abilities in terms of power. The Wald-type statistic $\widehat{\beta}$/SE($\widehat{\beta}$) was used for testing. It is approximately distributed as a t-distribution with n – 2 degrees of freedom for n observations in each sample for continuous outcome.
All computations have been done using the program language R [27].
Declarations
Acknowledgements
This work was financially supported by the Royal Netherlands Academy of Arts and Sciences (KNAW SPIN project 05-PP-35). We thank the anonymous reviewer for the comments.
Authors’ Affiliations
References
- Helsel DL: Nondetects And Data Analysis: Statistics for censored environmental data. 2005, John Wily and Sons, New YorkGoogle Scholar
- Lubin JH, et al.: Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004, 112 (17): 1691-1696.PubMed CentralView ArticlePubMedGoogle Scholar
- Diness BR, Fisker AB, Roth A, Yazdanbakhsh M, Sartono E, Whittle H, Nante JE, Lisse IM, Ravn H, Rodrigues A, Aaby P, Benn CS: Effect of high-dose vitamin A supplementation on the immune response to Bacille Calmette-Guerin vaccine. Am J Clin Nutr. 2007, 86: 1152-1159.PubMedGoogle Scholar
- Hornung RW, Reed LD: Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg. 1990, 5: 46-51.View ArticleGoogle Scholar
- Christiansen SC, et al.: Inflammatory cytokines as risk factors for a first venous thrombosis: a prospective population-based study. PLoS Medicine. 2006, 3: 1414-1419. 10.1371/journal.pmed.0030334.View ArticleGoogle Scholar
- Koch AL: The logarithm in biology. 1. Mechanismsgenerating the log-normal distribution exactly. J Theoret Biol. 1966, 12: 276-290. 10.1016/0022-5193(66)90119-6.View ArticleGoogle Scholar
- Limpert E, Stahel W, Abbt M: Log-normal distributions across the sciences: keys and clues. BioScience. 2001, 51: 341-352. 10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2.View ArticleGoogle Scholar
- Genser B, Cooper PJ, Yazdanbahksh M, Barreto ML, Rodirigues LC: A guide to modern statistical analysis of immunological data. BMC Immunology. 2007, 8: 27-10.1186/1471-2172-8-27.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee L, Helsel D: Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics. Computers & Geosciences. 2005, 31: 1241-1248. 10.1016/j.cageo.2005.03.012.View ArticleGoogle Scholar
- Heijmans-Antonissen C, Wesseldijk F, Munnikes RJM, Huygen FJPM, Meijden van der P, Hop WCJ, Hooijkaas H, Zijlstra FJ: Multiplex bead array assay for detection of 25 soluble cytokines in blister fluid of patients with complex regional pain syndrome type 1. Mediators Inflamm. 2006, 2006 (1): 28398-PubMed CentralPubMedGoogle Scholar
- Richardson DB, Ciampi A: Effects of exposure measurements error when an exposure variable is constrained by a lower limit. Am J Epidemiol. 2002, 157 (4): 355-363. 10.1093/aje/kwf217.View ArticleGoogle Scholar
- Schisterman EF, Vexler A, Whitcomb BW, Liu A: The limitation due to exposure detection limits for regression models. Am J Epidemiol. 2006, 163 (4): 374-383. 10.1093/aje/kwj039.PubMed CentralView ArticlePubMedGoogle Scholar
- Arabmazar A, Schmidt P: An investigation of the robustness of the tobit estimator to non-normality. Econometrica. 1982, 50: 1055-1063. 10.2307/1912776.View ArticleGoogle Scholar
- Eilers PHC: Ill-posed problems with counts, the composite link model and penalized likelihood. Stat Modelling. 2007, 7: 239-254.Google Scholar
- Long KZ: Vitamin A supplementation reduces the monocyte chemoattractant protein-1 intestinal immune response of Mexican children. J Nutr. 2006, 136: 2600-2605.PubMedGoogle Scholar
- Agresti A: Categorical Data Analysis. 2006, New York: John Wiley & SonsGoogle Scholar
- Koenker R: Quantile Regression. 2005, Cambridge University PressView ArticleGoogle Scholar
- White IR, Koupilova I, Carpenter J: The use of regression models for medians when observed outcomes may be modified by intervention. Statistics in Medicine. 2003, 22: 1083-1096. 10.1002/sim.1408.View ArticlePubMedGoogle Scholar
- Dai JY, Ruczinski I, LeBlanc M, Kooperberg C: Imputation methods to improve inference in SNP association studies. Genet Epi. 2006, 30: 690-702. 10.1002/gepi.20180.View ArticleGoogle Scholar
- Maddala GS: Limited-Dependent and Qualitative Variables in Econometrics. 1983, Cambridge University Press, New YorkView ArticleGoogle Scholar
- Austin PC, Escobar M, Kopec JA: The use of the Tobit model for analyzing measures of health status. Quality of Life Research. 2000, 9: 901-910. 10.1023/A:1008938326604.View ArticlePubMedGoogle Scholar
- Little RJA, Rubin DB: Statistical Analysis with Missing Data. 1987, New York: John Wiley & SonsGoogle Scholar
- Hopke PK, Liu C, Rubin DB: Multiple imputation for multivariate data with missing and below-threshhold measurements: time-series concentrations of pollutants in the arctic. Biometrics. 2001, 57: 22-33. 10.1111/j.0006-341X.2001.00022.x.View ArticlePubMedGoogle Scholar
- NADA for R: Nondetects and Data Analysis for the R statistical computing environment. [http://water.usgs.gov/software/nada_r.html]
- Little RJA, Rubin DB: Multiple Imputation for Nonresponse in Surveys. 1987, New York: John Wiley & SonsGoogle Scholar
- Tobin J: Estimation of relationships for limited dependent variables. Econometrica. 1958, 26: 24-36. 10.2307/1907382.View ArticleGoogle Scholar
- The R Project for Statistical Computing. [http://www.r-project.org/]
- Stata Statistical Software. [http://www.stata.com/]
- SAS Statistical Software. [http://www.sas.com/]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.