Paper title:

A New Imputation Method for Missing Attribute Values in Data Mining

Published in: Issue 1, (Vol. 5) / 2011
Publishing date: 2010-04-29
Pages: 14-19
Author(s): SHUKLA Diwakar, SINGHAI Rahul, THAKUR Narendra Singh
Abstract. One reduction problem in the data cleaning & data reduction step of KDD process is the presence of missing values in attributes. Many of analysis tasks have proposed to deal with missing values and have developed several treatments to guess them. One of the most common methods to replace the missing values is the mean method of imputation. In this paper we suggest a new imputation method using modified ratio estimator in two phase sampling scheme and by using this method, we input the missing values of a target attribute in a data warehouse. Our simulation study shows that the estimator of mean from this method is found more efficient than compare to other imputation methods.
Keywords: KDD (Knowledge Discovery In Databases.) Data Mining Attribute Missing Values, Imputation Methods, Sampling.
References:

1. Charikar, M., Chekuri, C., Feder, T., and Motwani, R. (1997): Incremental Clustering and Dynamic Information Retrieval,” Proc. ACM Symp. Theory of Computing.

2. Chen, C. Y., Hwang, S. C. and Oyang, Y. J. (2002): An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory, Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining.

3. Chuang, K. T., Lin, K. P., and Chen, M. S. (2007): QualityAware Sampling and Its Applications in Incremental Data Mining, IEEE Transactions on Knowledge and Data Engineering,Vol. 19, No. 4,

4. Cochran, W. G. (2005): Sampling Techniques, John Wiley and Sons, New York.

5. Domingo, C., Gavalda, R. and Watanabe, O. (2002): Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms, Data Mining and Knowledge Discovery.

6. Fisher, D. (1987): Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning.

7. Heitjan, D. F. and Basu, S. (1996): Distinguishing ‘Missing at random’ and ‘missing completely at random, The American Statistician, 50, 207-213.

8. Joshi, S. and Jermaine,C. (2008): Materialized Sample Views for Database Approximation, IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 3.

9. Lee, C. H., Lin, C. R. and Chen, M. S. (2001): SlidingWindow Filtering: An Efficient Algorithm for Incremental Mining, Proc. Conf. Information and Knowledge Management. 10.. Lee, S. D., Cheung, D. W. L. and Kao, B. (1998): Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules, Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, vol. 2, no. 3, pp. 233-262.

11. Liu, Li, Tu, Y., Li, Y and Zou, G. (2005, 2006): Imputation for missing data and variance estimation when auxiliary information is incomplete, Model Assisted Statistics and Applications, 83-94.

12. Motwani, R. and Raghavan, P. (1995): Randomized Algorithms, Cambridge Univ. Press.

13. Reddy, V. N. (1978): A study on the use of prior knowledge on certain population parameters in estimation, Sankhya, C, 40, 29-37.

14. Shukla, D. (2010): F-T estimator under two-phase sampling, Metron, 59, 1-2, 253-263. 15.. Shukla, D., Singhai, R., and Dembla, N. (2002): Some Imputation method to treat missing values in knowledge discovery in Data warehouse, IJDE, 1, 2, 1-13.

16. Singh, S. (2009): A new method of imputation in survey sampling, Statistics, Vol. 43, 5 , 499 - 511.

17. Singh, S. and Horn, S. (2000): Compromised imputation in survey sampling, Metrika, 51, 266-276.

18. Yeh, R. L., L. Ching, Shia B. C., Cheng, Y. T., Huwang Y. F. (2008): Imputing manufacturing material in data mining, J. Intell Manuf 109–118. Zaki, M., Parthasarathy, S., Li, W., and Ogihara, M. (1997): Evaluation of Sampling for Data Mining of Association Rules, Proc. Int’l Workshop Research Issues in Data Eng.

19. Zhang, T., Ramakrishnan, R. and Livny M. (1996): BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. ACM SIGMOD.

Back to the journal content
Creative Commons License
This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.
Home | Editorial Board | Author info | Archive | Contact
Copyright JACSM 2007-2024