A Clustering Based Approach for Topic Categorization using GloVe Technique

##plugins.themes.academic_pro.article.main##

Farha Naznin
Irani Hazarika
Anjana Kakoti Mahanta

Abstract

Topic extraction and categorization is an important task because by doing that it is easy to find out which are the topics most discussed by the users in their tweets or opinions and need to be analyzed. In this work, topics are extracted from positive and negative opinions and then categorized into different groups. For performing this, first a collection of opinions is divided into two sets- positive opinions and negative opinions by using a sentiment analyzer. Then a method is proposed to find out the most discussed topics in the set of positive opinions and negative opinions. For extracting the topics from a set of opinions the noun words are extracted from the set of the opinions. After extracting the topics, the similar topics have been combined by using synonymy relation. Then the frequent topic words are represented with the help of GloVe embedding technique. Finally, the topics are categorized by using a clustering algorithm by applying it on the frequent topic words. For the evaluation of the proposed method, tweets from a Twitter User dataset are used. The results obtained from the experiments by applying the proposed method on the dataset give promising result and provide interesting and meaningful clusters of topics. Moreover, an analysis of the result obtained for both positive and negative opinions is also presented.

##plugins.themes.academic_pro.article.details##

How to Cite
Naznin, F., HAZARIKA, I. ., & MAHANTA, A. K. (2024). A Clustering Based Approach for Topic Categorization using GloVe Technique. International Journal of Next-Generation Computing, 15(2). https://doi.org/10.47164/ijngc.v15i2.1614

References

  1. AINI, K., NAJAHATY, I., HIDAYATI, L., MURFI, H., AND NURROHMAH, S. 2015. Combination of singular value decomposition and k-means clustering methods for topic detection on twitter. In ICACSIS 2015 - 2015 International Conference on Advanced Computer Science and Information Systems, 123-128.
  2. ASGARI-CHENAGHLU, M., FEIZI-DERAKHSHI, M.R., FARZINVASH, L., BALAFAR, M.A., AND MOTAMED, C. 2021. Topic detection and tracking techniques on twitter: A systematic review. Complexity 2021,1-15. DOI: https://doi.org/10.1155/2021/8833084
  3. BAGHDADI H, RANAIVO-MALANCON B (2011) An automatic topic identification algorithm. Journal of Computer Science 7, 1363–1367. DOI: https://doi.org/10.3844/jcssp.2011.1363.1367
  4. BOGDANOWICZ, A., AND GUAN, C. 2022. Dynamic topic modeling of twitter data during the covid-19 pandemic. PLOS ONE 17. DOI: https://doi.org/10.1371/journal.pone.0268669
  5. CHOI, H.J., AND PARK, C. 2018. Emerging topic detection in twitter stream based on high utility pattern mining. Expert Systems with Applications 115. DOI: https://doi.org/10.1016/j.eswa.2018.07.051
  6. CIGARRAN-RECUERO, J.M., CASTELLANOS, A., AND GARCIA-SERRANO, A. 2016. A step forward for topic detection in twitter: An fca-based approach. Expert Systems with Applications 57. DOI: https://doi.org/10.1016/j.eswa.2016.03.011
  7. CURISKIS, S., DRAKE, B., OSBORN, T., AND KENNEDY, P. 2019. An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Information Processing Management 57. DOI: https://doi.org/10.1016/j.ipm.2019.04.002
  8. FANG, Y., ZHANG, H., YE, Y., AND LI, X. 2014. Detecting hot topics from twitter: A multiview approach. Journal of Information Science 40, 578–593. DOI: https://doi.org/10.1177/0165551514541614
  9. HUANG, J., PENG, M., AND WANG, H. 2015. Topic detection from large scale of microblog stream with high utility pattern clustering. In Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, 3–10. DOI: https://doi.org/10.1145/2809890.2809894
  10. HUTTO, C., AND GILBERT, E. 2015. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM. DOI: https://doi.org/10.1609/icwsm.v8i1.14550
  11. IBRAHIM, R., ELBAGOURY, A., KAMEL, M.S., AND KARRAY, F. 2018. Tools and approaches for topic detection from twitter streams: survey. Knowledge and Information Systems 54. DOI: https://doi.org/10.1007/s10115-017-1081-x
  12. MARTIN, F., JOHNSON, AND M. 2015. More efficient topic modelling through a noun only approach. In Proceedings of the Australasian Language Technology Association Workshop, 111–115.
  13. MOTTAGHINIA, Z., FEIZI-DERAKHSHI, M.R., FARZINVASH, L., AND SALEHPOUR, P. 2020. A review of approaches for topic detection in twitter. Journal of Experimental Theoretical Artificial Intelligence 33, 1–27. DOI: https://doi.org/10.1080/0952813X.2020.1785019
  14. NAZNIN, F., AND MAHANRTA, A.K. 2023. Grouping of Twitter Users according to contents of their tweets. The Indonesian Journal of Electrical Engineering and Computer Science 31, 876-884. DOI: https://doi.org/10.11591/ijeecs.v31.i2.pp876-884
  15. NEGARA, E.S., AND TRIADI, D. 2019. Topic modelling twitter data with latent dirichlet allocation method. In 2019 International Conference on Electrical Engineering and Computer Science (ICECOS), 386-390. DOI: https://doi.org/10.1109/ICECOS47637.2019.8984523
  16. PENNINGTON, J., SOCHER, R., AND MANNING, C.D. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. DOI: https://doi.org/10.3115/v1/D14-1162
  17. PETKOS, G., PAPADOPOULOS, S., KOMPATSIARIS, AND Y. 2014. Two-level message clustering for topic detection in twitter. In SNOW-DC@WWW, 49-56.
  18. ROSA, K.D., SHAH, R., LIN, B., GERSHMAN, A., AND FREDERKING, R.E. 2011. Topical clustering of tweets. In Proceedings of the ACM SIGIR: SWSM 63.
  19. ROUSSEEUW, P. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65. DOI: https://doi.org/10.1016/0377-0427(87)90125-7
  20. SOMAN, S.J., AND MURUGAPPAN, S. 2014. Detecting malicious tweets in trending topics using clustering and classification. In International Conference on Recent Trends in Information Technology, 1-6. DOI: https://doi.org/10.1109/ICRTIT.2014.6996188
  21. TEMBHURNIKAR, S., AND PATIL, N. 2015. Topic detection using bngram method and sentiment analysis on twitter dataset. In 4th International Conference on Reliability, Infocom Technologies and Optimization, 1–6. DOI: https://doi.org/10.1109/ICRITO.2015.7359267
  22. WANG, M., JAYARAMAN, P.P., SOLAIMAN, E., CHEN, L., LI, Z.E., JUN, S., GEOR-GAKOPOULOS, D., AND RANJAN, R. 2018. A multi-layered performance analysis for cloud-based topic detection and tracking in big data applications. Future Generation Computer Systems 87. DOI: https://doi.org/10.1016/j.future.2018.01.047