An Efficient Clustering Technique for Big Data Mining
##plugins.themes.academic_pro.article.main##
Abstract
Data mining and big data analytics are approaches for analyzing data and extracting hidden information. Because big data is complicated and large in volume, traditional techniques to analysis and extraction do not function effectively. Data clustering is a common data mining approach that divides data into groups and makes it simple to extract information from them. Big data can include both organized and semi structured information, and it's becoming increasingly beneficial for companies. Examples include old organized database of inventory level, transactions, and consumer information, as well as non - structured comprehension from the internet, social media platforms, and embedded systems. Numerous schemes have been developed to reach the needed in relation to efficiency and effectiveness, and much study has been committed to Big Data analytics. Nevertheless, a few methodologies, such as clustering algorithms, require further research in regards to performance, usefulness, and other factors, leading to the development of a model which gives proper Big Data Analytics assessment and the impactful use of this methodology to retrieve relevant knowledge. We recorded and analyzed several big data sets in our proposed work, as well as discovered relevant current approaches. In this paper we proposed a new clustering technique using dimensionality reduction approach. For implementation of this work, we used real time streaming data in unstructured form and noisy sometimes. The proposed hybrid clustering techniques that improve the clustering accuracy as well as time for generate effectives clusters on large unstructured data. We confirm the findings by testing the suggested methodology on available information sets and comparing and analyzing the effectiveness of the developed system with that of current systems.
##plugins.themes.academic_pro.article.details##
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
- Ankita Saldhi, A. G. e. a. 2014. Big data analysis using hadoop cluster. IEEE. DOI: https://doi.org/10.1109/ICCIC.2014.7238418
- Anuradha, G. and Roy, B. 2014. Suggested techniques for clustering and mining of data streams. International Conference on Circuits, Systems, Communication and Information Technology Applications. IEEE. DOI: https://doi.org/10.1109/CSCITA.2014.6839270
- Arora, S. and Chana, I. 2014. A survey of clustering techniques for big data analysis. IEEE. pp.391–397. DOI: https://doi.org/10.1109/CONFLUENCE.2014.6949256
- Bin, N. 2018. Research on methods and techniques for iot big data cluster analysis. In Interna- tional Conference on Information Systems and Computer Aided Education. ICISCAE, pp. 51–60. IEEE. DOI: https://doi.org/10.1109/ICISCAE.2018.8666889
- Bina Kotiyal, A. K. 2020. Big data: Mining of log file through hadoop. International Con- ference on Circuits, Systems, Communication and Information Technology Applications. IEEE.
- Bordogna, G. and Frigerio, L. 2016. Clustering geo-tagged tweets for advanced big data analytics. International Congress on Big Data, IEEE Vol.12, No.4 (May), pp. 697–701. IEEE. DOI: https://doi.org/10.1109/BigDataCongress.2016.78
- Charalampos Chelmis, J. K. e. a. 2015. Big data analytics for demand response: Clustering over space and time. International Conference on Big Data (Big Data) Vol.2, No.1, pp. 36–54. SP. DOI: https://doi.org/10.1109/BigData.2015.7364011
- Dajung Lee, A. e. a. 2017. A streaming clustering approach using a heterogeneous system for big data analysis. IEEE Vol.4, No.4, pp. 57–71.
- Qureshi, S.R. and Gupta, A., 2014, March. Towards efficient Big Data and data analytics: A review. In 2014 Conference on IT in Business, Industry and Government (CSIBIG) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/CSIBIG.2014.7056933
- Dave, D. M. and Gianey., R. 2016. Different clustering algorithms for big data analytics: A review. 5th International Conference on System Modeling and Advancement in Research Trends Vol.2, No.1, pp. 36–54. IEEE. DOI: https://doi.org/10.1109/SYSMART.2016.7894544
- Disha D N, S. e. a. 2016. An efficient framework of data mining and its analytics on massive streams of big data repositories,. In Journal Of Information Security And Applications., IEEE, Ed. IEEE, pp. 8–12. DOI: https://doi.org/10.1109/DISCOVER.2016.7806259
- Doaa.Sayed, S. e. a. 2020. Enhancing clustream algorithm for clustering big data streaming over sliding window. IEEE. ieee. DOI: https://doi.org/10.1109/ICEENG45378.2020.9171705
- Dr. Anu Saini, J. M. e. a. 2016. New approach for clustering of big data: Disk-means. In- ternational Conference on Computing, Communication and Automation. Gen 15693:14443 (Oct), pp. 2–7. ICCCA. DOI: https://doi.org/10.1109/CCAA.2016.7813702
- et. al., B. S. G. 2020. The survey on approaches to efficient clustering and classification analysis of big data. IEEE Vol.1, No.1, pp. 88–92.
- et. al., L. R. S. 2015. Challenges with big data mining: A review. International Conference on Soft-Computing and Network Security. IEEE.
- et. al.., P. V. N. 2020. New approach in big data mining for frequent itemset using mapreduce in hdfs. 3rd International Conference for Convergence in Technology. I2CT.
- et. al., S. G. 2017. Survey on big data analytics for digital world. International Conference on Advances in Electronics, Communication and Computer Technology. ICAECCT.
- et. al., S. S. 2020. Paper review on data mining ,components, and big data. IEEE. ieee.
- Fadia Alaeddin, A. e. a. 2020. An overview on big data mining using evolutionary techniques. International Conference on Innovation and Intelligence for Informatics, Computing and Technologies Vol., pp.4–8. DOI: https://doi.org/10.1109/3ICT51146.2020.9312016
- Galina Chernyshova, G. S. e. a. 2016. Technique of cluster validity for text mining. IEEE. DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508139
- Gheid, Z. and Challal, Y. 2016. Efficient and privacy-preserving k-means clustering for big data mining. IEEE TrustCom/BigDataSE/ISPA. IEEE. DOI: https://doi.org/10.1109/TrustCom.2016.0140
- Giannis Spiliopoulos, K. e. a. 2017. Knowledge extraction from maritime spatiotemporal data: An evaluation of clustering algorithms on big data. International Conference on Big Data (BIGDATA) IEEE Vol.1, No.1, pp. 109–1161. DOI: https://doi.org/10.1109/BigData.2017.8258106
- Han, J. and Luo, M. 2014. Bootstrapping k-means for big data analysis. In Bootstrapping K-means for Big data analysis. IEEE, pp.9–15. DOI: https://doi.org/10.1109/BigData.2014.7004279
- Huang, X. and Gong., S. 2017. Analysis of big-data based data mining engine. IEEE. DOI: https://doi.org/10.1109/CIS.2017.00043
- Ishwank Singh, A. S. S. e. a. 2016. Student perfoemance analysis using clustering algorithm. IEEE. IEEE. DOI: https://doi.org/10.1109/CONFLUENCE.2016.7508131
- Kogge., P. M. 2013. Big data, deep data, and the effect of system architectures on performance. IEEE Vol.12, No.1 (August), pp. 7–18. IEEE. DOI: https://doi.org/10.1109/CTS.2013.6567201
- Lu, L. Y. Y. and Liu., J. S. 2020. The major research themes of big data literature. Interna- tional Conference on Computer and Information Technology. IEEE.
- Maitrey, S. and Jha, C. 2015. Handling big data efficiently by using map reduce technique. International Conference on Computational Intelligence and Communication Technology. IEEE. DOI: https://doi.org/10.1109/CICT.2015.140
- Mishra, S. and Misra, D. A. 2017a. Structured and unstructured big data analytics. Inter- national Conference on Current Trends in Computer, Electrical, Electronics and Commu- nication Vol.2, IEEE.
- Mishra, S. and Misra, D. A. 2017b. Structured and unstructured big data analytics. IEEE Internet of Things Jou International Conference on Current Trends in Computer, Electrical, Electronics and Communication rnal. pp. 15-26. DOI: https://doi.org/10.1109/CTCEEC.2017.8454999
- Neha Bharill, A. e. a. 2016. Fuzzy based scalable clustering algorithms for handling big data using apache spark. Proceedings of 16th IEEE International Colloquium on Signal Processing and Its Applications. IEEE. DOI: https://doi.org/10.1109/BigDataService.2016.34
- R, S. and R, S. K. 2017. Data mining with big data. International Conference on Intelligent Systems and Control. pp. 1-8.
- R.P.S.Manikandan and Kalpana, D. A. 2017. A study on feature selection in big data. In- ternational Conference on Computer Communication and Informatics (ICCCI),. pp.91–97. DOI: https://doi.org/10.1109/ICCCI.2017.8117751
- S. Dhanasekaran, R. S. e. a. 2019. Enhanced map reduce techniques for big data analytics based on k-means clustering. IEEE. IEEE. DOI: https://doi.org/10.1109/INCOS45849.2019.8951368
- Shafiq., M. O. 2016. Event segmentation using mapreduce based big data clustering. Interna- tional Conference on Big Data (Big Data). IEEE. DOI: https://doi.org/10.1109/BigData.2016.7840804
- Tampakis, P. 2020. Big mobility data analytics: Algorithms and techniques for efficient trajec- tory clustering. IEEE International Conference on Mobile Data Management (MDM) Vol., IEEE. DOI: https://doi.org/10.1109/MDM48529.2020.00055
- W, A. V. and Kumar., L. D. 2016. Big data and clustering algorithms. , International Conference on Research Advances in Integrated Navigation Systems. RAINS.
- Zhuang, Y. 2016. Symmetric repositioning of bisecting k-means centers for increased reduction of distance calculations for big data clustering. International Conference on Big Data (Big Data). IEEE. DOI: https://doi.org/10.1109/BigData.2016.7840916