Imputation methods to handle the problem of missing data: an application using R/Splus
DOI:
https://doi.org/10.46661/revmetodoscuanteconempresa.2120Keywords:
Información auxiliar, encuesta, probabilidades de inclusión, mecanismo de respuesta, auxiliary information, survey, inclusion probabilities, response mechanismAbstract
Missing values are a common problem in many sampling surveys, and imputation is usually employed to compensate for non-response. Most imputation methods are based upon the problem of the mean estimation and its variance, and they also assume simple sampling designs such as the simple random sampling without replacement. In this paper we describe some imputation methods and define them under a general sampling design. Different response mechanisms are also discussed. Assuming some populations based upon real data extracted from the context of the economy and business, Monte Carlo simulations are carried out to analyze the properties of the various imputation methods in the estimation of parameters such as distribution functions and quantiles. The various imputation methods are implemented using the popular statistical softwares R and Splus, and codes are here presented.
Downloads
References
Arcos, A., Gámiz, M.L., González, A., Martínez, M.D. y Rueda, M.M. (2004). Programación en R. Aplicaciones al muestreo. Ed. Los autores. ISBN: 84-609-3077-7. Depósito legal: GR-1880-2004.
Arcos, A., Gámiz, M.L., González, A., Martínez, M.D., Muñoz, J.F., Román, Y. y Rueda, M.M. (2005). Estadística Computacional con SPSS y R. Ed. Los autores. ISBN: 84-689-5347-4. Dep´osito legal: GR-2110-2005.
Bello, A.L. (1993). Choosing among imputation techniques for incomplete multivariate data: a simulation study. Comunication in Statistics, 22 823–877.
Berger, Y.G. y Rao, J.N.K. (2006). Adjusted jackknife for imputation under unequal probability sampling without replacement. Journal of the Royal Statistical Society, Series B, 68 531–547.
Berger, Y.G. y Skinner, C.J. (2003). Variance estimation for a low income proportion. Journal of the Royal Statistical Society, Series B, 52 457–468.
Brick, J.M. y Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5 215–238.
Chambers, R.L. y Dunstan, R. (1986). Estimating distribution functions from survey data. Biometrika, 73 597–604.
Chen, J. y Shao, J. (2000). Nearest neighbor imputation for survey data. Journal of Official Statistics, 16 113–131.
Cohen, M.P. (1996). A new approach to imputation. American Statistical Association Proceding of the Section on Survey Research Methods 293-298.
Everitt, B.S. (1994). A handbook of Statistical Analysis using S-Plus. Chapman and Hall, New York.
Fay, R.E. (1991). A design-based perspective on missing data variance. In Proc. Seventh Annual Res. Conf., Washington, D.C.: U.S. Bureau of the Census. 429-440.
Hu, M., Salvucci, S. y Lee, R. (2001). A Study of Imputation Algorithms. Working Paper No. 200117. Washington DC: U.S. Department of Education, National Center for Education Statistics, 2001. 27 Stata Statistical Software.
Healy, M.J.R. y Westmacott, M. (1956). Missing values in experments analysed on automatic computers. Appled Statistics, 5 203–206.
Ihaka, R. y Gentleman, R. (1996). R: a Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5 299–314.
Kalton, G. (1983). Compensating for missing data. Ann Arbor: Institute for Social Research, University of Michigan.
Kalton, G. y Kasprzyk, D. (1986). The treatment of missing survey data. Survey Methodology 12 1–16.
Krause, A. y Olson, M. (2005). The basic of S-Plus. Fourth Edition. Springer.
Kuk, A.Y.C. y Mak, T.K. (1989). Median estimation in the presence of auxiliary information. Journal of the Royal Statistical Society, Series B, 51 261-269.
Little, R.J.A. y Rubin, D.B. (2002). Statistical analysis with missing data. 2nd edition. New York: John Wiley & Sons, Inc.
Merino, M. y Vadillo, F. (2007). Matemática financiera con MATLAB©. Métodos Cuantitativos para la Economía y la Empresa, 4 35–55.
Murthy, M.N. (1967). Sampling theory and method. Calcutta: Statistical Publishing Society.
Rancourt, E., Lee, H. y Särndal, C.E. (1994). Bias correction for survey estimates from data with ratio imputed values for confounded nonresponse. Survey Methodology, 20 137–147.
Rao, J.N.K. (1996). On variance estimation with imputed survey data (with discussion). Journal of the American Statistical Association, 91 499–520.
Rao, J.N.K., Kovar, J.G. y Mantel, H.J. (1990). On estimating distribution function and quantiles from survey data using auxiliary information. Biometrika, 77 365–375.
Rao, J.N.K. y Shao, J. (1992). Jackknife Variance Estimation With Survey Data Under Hot-Deck Imputation. Biometrika, 79 811–822.
Rubin, D.B. (1978). Multiple imputations in sample surveys. A phenomenological bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section, American Statistical Association. 20–34.
Rubin, D.B. (1996). Mutiple imputation after 18+ years. Journal of the American Statistical Association, 91 473–489.
Särndal, C.E., Swensson, B. y Wretman, J.H. (1992). Model Assisted Survey Sampling. Springer-Verlag, New York.
Sedransk, J. (1985). The objetive and practice of imputation. In Proc. First Annual Res. Conf., Washington, D.C.: Bureau of the Cencus. 445–452.
Silva P.L.D. y Skinner C.J. (1995). Estimating distribution function with auxiliary information using poststratification. Journal of Official Statistics, 11 277–294.
Valliant, R. (1993). Poststratification and conditional variance estimation. Journal of the American Statistical Association, 88 89–96.
Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood methods in survey sampling. Survey Methodology, 31 239–243.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2009 Revista de Métodos Cuantitativos para la Economía y la Empresa
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Submission of manuscripts implies that the work described has not been published before (except in the form of an abstract or as part of thesis), that it is not under consideration for publication elsewhere and that, in case of acceptance, the authors agree to automatic transfer of the copyright to the Journal for its publication and dissemination. Authors retain the authors' right to use and share the article according to a personal or instutional use or scholarly sharing purposes; in addition, they retain patent, trademark and other intellectual property rights (including research data).
All the articles are published in the Journal under the Creative Commons license CC-BY-SA (Attribution-ShareAlike). It is allowed a commercial use of the work (always including the author attribution) and other derivative works, which must be released under the same license as the original work.
Up to Volume 21, this Journal has been licensing the articles under the Creative Commons license CC-BY-SA 3.0 ES. Starting from Volume 22, the Creative Commons license CC-BY-SA 4.0 is used.