An Efficient Clustering Technique for Big Data Mining

##plugins.themes.academic_pro.article.main##

Satish S. Banait
Dr. S. S. SANE

Abstract

Data mining and big data analytics are approaches for analyzing data and extracting hidden information. Because big data is complicated and large in volume, traditional techniques to analysis and extraction do not function effectively. Data clustering is a common data mining approach that divides data into groups and makes it simple to extract information from them. Big data can include both organized and semi structured information, and it's becoming increasingly beneficial for companies. Examples include old organized database of inventory level, transactions, and consumer information, as well as non - structured comprehension from the internet, social media platforms, and embedded systems. Numerous schemes have been developed to reach the needed in relation to efficiency and effectiveness, and much study has been committed to Big Data analytics. Nevertheless, a few methodologies, such as clustering algorithms, require further research in regards to performance, usefulness, and other factors, leading to the development of a model which gives proper Big Data Analytics assessment and the impactful use of this methodology to retrieve relevant knowledge. We recorded and analyzed several big data sets in our proposed work, as well as discovered relevant current approaches. In this paper we proposed a new clustering technique using dimensionality reduction approach. For implementation of this work, we used real time streaming data in unstructured form and noisy sometimes. The proposed hybrid clustering techniques that improve the clustering accuracy as well as time for generate effectives clusters on large unstructured data. We confirm the findings by testing the suggested methodology on available information sets and comparing and analyzing the effectiveness of the developed system with that of current systems.

##plugins.themes.academic_pro.article.details##

How to Cite
Banait, S. S., & SANE, D. S. S. . (2022). An Efficient Clustering Technique for Big Data Mining. International Journal of Next-Generation Computing, 13(3). https://doi.org/10.47164/ijngc.v13i3.842

References

  1. Ankita Saldhi, A. G. e. a. 2014. Big data analysis using hadoop cluster. IEEE. DOI: https://doi.org/10.1109/ICCIC.2014.7238418
  2. Anuradha, G. and Roy, B. 2014. Suggested techniques for clustering and mining of data streams. International Conference on Circuits, Systems, Communication and Information Technology Applications. IEEE. DOI: https://doi.org/10.1109/CSCITA.2014.6839270
  3. Arora, S. and Chana, I. 2014. A survey of clustering techniques for big data analysis. IEEE. pp.391–397. DOI: https://doi.org/10.1109/CONFLUENCE.2014.6949256
  4. Bin, N. 2018. Research on methods and techniques for iot big data cluster analysis. In Interna- tional Conference on Information Systems and Computer Aided Education. ICISCAE, pp. 51–60. IEEE. DOI: https://doi.org/10.1109/ICISCAE.2018.8666889
  5. Bina Kotiyal, A. K. 2020. Big data: Mining of log file through hadoop. International Con- ference on Circuits, Systems, Communication and Information Technology Applications. IEEE.
  6. Bordogna, G. and Frigerio, L. 2016. Clustering geo-tagged tweets for advanced big data analytics. International Congress on Big Data, IEEE Vol.12, No.4 (May), pp. 697–701. IEEE. DOI: https://doi.org/10.1109/BigDataCongress.2016.78
  7. Charalampos Chelmis, J. K. e. a. 2015. Big data analytics for demand response: Clustering over space and time. International Conference on Big Data (Big Data) Vol.2, No.1, pp. 36–54. SP. DOI: https://doi.org/10.1109/BigData.2015.7364011
  8. Dajung Lee, A. e. a. 2017. A streaming clustering approach using a heterogeneous system for big data analysis. IEEE Vol.4, No.4, pp. 57–71.
  9. Qureshi, S.R. and Gupta, A., 2014, March. Towards efficient Big Data and data analytics: A review. In 2014 Conference on IT in Business, Industry and Government (CSIBIG) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/CSIBIG.2014.7056933
  10. Dave, D. M. and Gianey., R. 2016. Different clustering algorithms for big data analytics: A review. 5th International Conference on System Modeling and Advancement in Research Trends Vol.2, No.1, pp. 36–54. IEEE. DOI: https://doi.org/10.1109/SYSMART.2016.7894544
  11. Disha D N, S. e. a. 2016. An efficient framework of data mining and its analytics on massive streams of big data repositories,. In Journal Of Information Security And Applications., IEEE, Ed. IEEE, pp. 8–12. DOI: https://doi.org/10.1109/DISCOVER.2016.7806259
  12. Doaa.Sayed, S. e. a. 2020. Enhancing clustream algorithm for clustering big data streaming over sliding window. IEEE. ieee. DOI: https://doi.org/10.1109/ICEENG45378.2020.9171705
  13. Dr. Anu Saini, J. M. e. a. 2016. New approach for clustering of big data: Disk-means. In- ternational Conference on Computing, Communication and Automation. Gen 15693:14443 (Oct), pp. 2–7. ICCCA. DOI: https://doi.org/10.1109/CCAA.2016.7813702
  14. et. al., B. S. G. 2020. The survey on approaches to efficient clustering and classification analysis of big data. IEEE Vol.1, No.1, pp. 88–92.
  15. et. al., L. R. S. 2015. Challenges with big data mining: A review. International Conference on Soft-Computing and Network Security. IEEE.
  16. et. al.., P. V. N. 2020. New approach in big data mining for frequent itemset using mapreduce in hdfs. 3rd International Conference for Convergence in Technology. I2CT.
  17. et. al., S. G. 2017. Survey on big data analytics for digital world. International Conference on Advances in Electronics, Communication and Computer Technology. ICAECCT.
  18. et. al., S. S. 2020. Paper review on data mining ,components, and big data. IEEE. ieee.
  19. Fadia Alaeddin, A. e. a. 2020. An overview on big data mining using evolutionary techniques. International Conference on Innovation and Intelligence for Informatics, Computing and Technologies Vol., pp.4–8. DOI: https://doi.org/10.1109/3ICT51146.2020.9312016
  20. Galina Chernyshova, G. S. e. a. 2016. Technique of cluster validity for text mining. IEEE. DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508139
  21. Gheid, Z. and Challal, Y. 2016. Efficient and privacy-preserving k-means clustering for big data mining. IEEE TrustCom/BigDataSE/ISPA. IEEE. DOI: https://doi.org/10.1109/TrustCom.2016.0140
  22. Giannis Spiliopoulos, K. e. a. 2017. Knowledge extraction from maritime spatiotemporal data: An evaluation of clustering algorithms on big data. International Conference on Big Data (BIGDATA) IEEE Vol.1, No.1, pp. 109–1161. DOI: https://doi.org/10.1109/BigData.2017.8258106
  23. Han, J. and Luo, M. 2014. Bootstrapping k-means for big data analysis. In Bootstrapping K-means for Big data analysis. IEEE, pp.9–15. DOI: https://doi.org/10.1109/BigData.2014.7004279
  24. Huang, X. and Gong., S. 2017. Analysis of big-data based data mining engine. IEEE. DOI: https://doi.org/10.1109/CIS.2017.00043
  25. Ishwank Singh, A. S. S. e. a. 2016. Student perfoemance analysis using clustering algorithm. IEEE. IEEE. DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508131
  26. Kogge., P. M. 2013. Big data, deep data, and the effect of system architectures on performance. IEEE Vol.12, No.1 (August), pp. 7–18. IEEE. DOI: https://doi.org/10.1109/CTS.2013.6567201
  27. Lu, L. Y. Y. and Liu., J. S. 2020. The major research themes of big data literature. Interna- tional Conference on Computer and Information Technology. IEEE.
  28. Maitrey, S. and Jha, C. 2015. Handling big data efficiently by using map reduce technique. International Conference on Computational Intelligence and Communication Technology. IEEE. DOI: https://doi.org/10.1109/CICT.2015.140
  29. Mishra, S. and Misra, D. A. 2017a. Structured and unstructured big data analytics. Inter- national Conference on Current Trends in Computer, Electrical, Electronics and Commu- nication Vol.2, IEEE.
  30. Mishra, S. and Misra, D. A. 2017b. Structured and unstructured big data analytics. IEEE Internet of Things Jou International Conference on Current Trends in Computer, Electrical, Electronics and Communication rnal. pp. 15-26. DOI: https://doi.org/10.1109/CTCEEC.2017.8454999
  31. Neha Bharill, A. e. a. 2016. Fuzzy based scalable clustering algorithms for handling big data using apache spark. Proceedings of 16th IEEE International Colloquium on Signal Processing and Its Applications. IEEE. DOI: https://doi.org/10.1109/BigDataService.2016.34
  32. R, S. and R, S. K. 2017. Data mining with big data. International Conference on Intelligent Systems and Control. pp. 1-8.
  33. R.P.S.Manikandan and Kalpana, D. A. 2017. A study on feature selection in big data. In- ternational Conference on Computer Communication and Informatics (ICCCI),. pp.91–97. DOI: https://doi.org/10.1109/ICCCI.2017.8117751
  34. S. Dhanasekaran, R. S. e. a. 2019. Enhanced map reduce techniques for big data analytics based on k-means clustering. IEEE. IEEE. DOI: https://doi.org/10.1109/INCOS45849.2019.8951368
  35. Shafiq., M. O. 2016. Event segmentation using mapreduce based big data clustering. Interna- tional Conference on Big Data (Big Data). IEEE. DOI: https://doi.org/10.1109/BigData.2016.7840804
  36. Tampakis, P. 2020. Big mobility data analytics: Algorithms and techniques for efficient trajec- tory clustering. IEEE International Conference on Mobile Data Management (MDM) Vol., IEEE. DOI: https://doi.org/10.1109/MDM48529.2020.00055
  37. W, A. V. and Kumar., L. D. 2016. Big data and clustering algorithms. , International Conference on Research Advances in Integrated Navigation Systems. RAINS.
  38. Zhuang, Y. 2016. Symmetric repositioning of bisecting k-means centers for increased reduction of distance calculations for big data clustering. International Conference on Big Data (Big Data). IEEE. DOI: https://doi.org/10.1109/BigData.2016.7840916