Effect of Stemming on Hindi Text Classification


Dr. Anjusha Pimpalshende
Dr. Archana Potnurwar


Abstract.  Text classification is very useful to search large amount of textual data available online by dividing it into smaller relevant units. Now a day’s large amount of digital documents are available in Indian languages. Designing text classifiers in Indian languages is one of the research areas so that people can search and read required documents in their local languages. In proposed work tried to design Text classifier for Hindi text documents and tried to show how stemmer affects the performance of Hindi text classifiers. Stemming is a process to convert words in any language to its base or root words. Stemmers are used for written documents not for spoken languages. Performance of many applications such as text summarization, Information Retrieval (IR) system,text classification systems, syntactic parsing can be improved by applying stemmers. Stemmer eliminates suffix or prefix of the word and form original root word. These root words helps in the preprocessing step required in many algorithms. We applied various stemmers on Hindi text classification models. Experiments and results show that performance of the classifiers is improved by applying stemmers.


How to Cite
Pimpalshende, D. A., SINGH, P. ., & Potnurwar, D. A. . (2023). Effect of Stemming on Hindi Text Classification. International Journal of Next-Generation Computing, 14(1). https://doi.org/10.47164/ijngc.v14i1.1063


  1. M. Kasthuri, S. B. R. Kumar and S. Khaddaj, "PLIS: Proposed Language Independent Stemmer for information Retrieval Systems Using Dynamic Programming," 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2017, pp. 132-135, doi: 10.1109/WCCCT.2016.39. DOI: https://doi.org/10.1109/WCCCT.2016.39
  2. Vishal Gupta, “Hindi Rule Based Stemmer for Nouns”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 1, January 2014. .
  3. S. Paul, M. Tandon, N. Joshi, I. Mahtur, "Design of a Rule Based Hindi Lemmatizer". In Proceedings of Third International Workshop on Artificial Intelligence, Soft Computing and Applications, Chennai, India, pp 67-74, 2013. DOI: https://doi.org/10.5121/csit.2013.3408
  4. AnjushaPimpalshende, A.R. Mahajan “Pre-processing phase of Hindi language text summarization System”. International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 5, May 2016
  5. AnjushaPimpalshende AR Mahajan “Extraction of Root Words Using Morphological Analyzer for Hindi Text.”,International Journal of Soft Computing vol 13 (5), pp134-138, June 2019
  6. Pratikkumar Patel, Kashyap Popat and Pushpak Bhattacharyya. 2010. Hybrid Stemmer for Gujarati. Proceedings of the 1st Workshop on South and Southeast Asian Natural Languages Processing (WSSANLP), the 23rd International Conference on Computational Linguistics (COLING), Beijing, 51-5
  7. Swapna Narala, B. Padmaja Rani,K. Ramakrishna, “Experiments in Telugu Language using anguage Dependent and Independent Models”, InternatIonal Journal of Computer Science and technology(IJCST) , Vol. 7, Issue 4, oct - Dec 2016, ISSN : 0976-8491 (online) | ISSN : 2229-4333 (print).
  8. Karan Badlani, Shreya Sawal, M. N. . A. w. 2022. Pneumonia detection through image classification using cnn. International Journal of Next-Generation Computing Vol.13, No. DOI: https://doi.org/10.47164/ijngc.v13i5.931