Comparative Analysis of Scalability Approaches using Data Mining Methods on Health Care Datasets
##plugins.themes.academic_pro.article.main##
Abstract
The primary issue in data analysis is scalability of data mining methods. Various scaling options have been explored in prior research to overcome this problem. Several scaling strategies are explored and tested on various datasets in this research. The cascade scaling method is proposed to improve the efficacy of existing methods. The proposed method starts with gathering a huge dataset and then pre- processed. Once the dataset has undergone pre-processing, it is spitted into smaller subsets of equal size to apply a data mining strategy on each subset. The outcomes of the data mining approach on all subsets are pooled and aggregated for the final results. The accuracy of the given algorithm is used to evaluate its performance. The proposed method and existing methods are evaluated on two health care datasets: PIMA Indian Diabetes and Heart Disease. On the basis of the Data mining methods the proposed scaling approach reflects better results as compared to the existing scaling approaches. On both datasets, the proposed method is compared to previous work published by different authors in earlier studies. It was discovered that the proposed method outperformed previous research. For a few data mining methods, the proposed method achieves 100 percentage accuracy.
##plugins.themes.academic_pro.article.details##
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
- Bondi, A. B. 2000. Characteristics of scalability and their impact on performance. In Proceedings of the 2nd international workshop on Software and performance. 195–203. DOI: https://doi.org/10.1145/350391.350432
- Brain, D. and Webb, G. I. 2002. The need for low bias algorithms in classification learning from large data sets. In European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 62–73. DOI: https://doi.org/10.1007/3-540-45681-3_6
- Chang, V., Bailey, J., Xu, Q. A., and Sun, Z. 2022. Pima indians diabetes mellitus classification based on machine learning (ml) algorithms. Neural Computing and Applications, 1–17. DOI: https://doi.org/10.1007/s00521-022-07049-z
- Das, D., Goje, N., Uparkar, S., Upadhye, S., and Upasani, M. 2021. Performance analysis of support vector machine algorithms. International Journal of Next-Generation Computing 12, 5.
- Demidova, L. A. 2021. Two-stage hybrid data classifiers based on svm and knn algorithms. Symmetry 13, 4, 615. DOI: https://doi.org/10.3390/sym13040615
- Glorot, X., Bordes, A., and Bengio, Y. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML.
- Lane, H. 2000. Technology regression and verification acceptance method. US Patent No: 6,269,457.
- Mc Manus, A. and Kechadi, M.-T. 2004. Scalability issue in mining large data sets. WIT Transactions on Information and Communication Technologies 33.
- Ramesh, T., Lilhore, U. K., Poongodi, M., Simaiya, S., Kaur, A., and Hamdi, M. 2022. Predictive analysis of heart diseases with machine learning approaches. Malaysian Journal of Computer Science, 132–148. DOI: https://doi.org/10.22452/mjcs.sp2022no1.10
- Shah, D. and Patel, S. 2020. Santosh, and k. bharti,“. Heart Disease Prediction using Machine Learning Techniques 1, 345. DOI: https://doi.org/10.1007/s42979-020-00365-y
- Srinivas, K., Rani, B. K., and Govrdhan, A. 2010. Applications of data mining techniques in healthcare and prediction of heart attacks. International Journal on Computer Science and Engineering (IJCSE) 2, 02, 250–255.
- Totad, S. G., Geeta, R., Prasanna, C. R., Santhosh, N. K., and Reddy, P. 2010. Scaling data mining algorithms to large and distributed datasets. Intl J Database Manag Syst 2, 2, 26–35. DOI: https://doi.org/10.5121/ijdms.2010.2403
- Tu, M. C., Shin, D., and Shin, D. 2009. Effective diagnosis of heart disease through bagging approach. In 2009 2nd international conference on biomedical engineering and informatics. IEEE, 1–4. DOI: https://doi.org/10.1109/BMEI.2009.5301650
- Uparkar, S. S. and Lanjewar, U. A. 2022. Analysis of a cascade scaling algorithm using data mining methods. In 2022 IEEE World Conference on Applied Intelligence and Computing (AIC). IEEE, 708–713. DOI: https://doi.org/10.1109/AIC55036.2022.9848879