Text Preprocessing for Urdu Text: A Survey of Techniques and Their Influence on NLP Tasks

Usama Shahid; Mubasher Malik; Talha Farooq Khan; Rabia Rehman

doi:10.63075/6x0cdd67

Authors

Usama Shahid Department of Computer Science, University of Southern Punjab, Multan. Author
Mubasher Malik Department of Computer Science, University of Southern Punjab, Multan. Author
Talha Farooq Khan Department of Computer Science, University of Southern Punjab, Multan. Author
Rabia Rehman Department of Computer Science, University of Southern Punjab, Multan. Author

DOI:

https://doi.org/10.63075/6x0cdd67

Abstract

Text preprocessing (TP) has historically been a critical phase in Natural Language Processing (NLP) pipelines, aimed at transforming raw text into a cleaner, more manageable format for machine consumption. With the advent of sophisticated pre-trained Transformer models, the perceived necessity of explicit TP has been debated. This paper offers a comprehensive review of existing literature concerning text preprocessing, with a specific focus on its application and impact within Urdu Natural Language Processing. We delve into the unique linguistic challenges posed by Urdu, such as its rich morphology and Nastaliq script, and survey various preprocessing techniques including script normalization, stop word removal, and stemming/lemmatization. Through an extensive examination of past studies, we analyze how these techniques have influenced the performance of both traditional machine learning classifiers and modern deep learning architectures, including Transformer models, in Urdu text classification and other NLP tasks. This review synthesizes key findings from the literature, highlighting the enduring relevance of tailored TP strategies for optimizing Urdu NLP applications and identifying critical gaps for future research.

Downloads

Download data is not yet available.

Text Preprocessing for Urdu Text: A Survey of Techniques and Their Influence on NLP Tasks

Authors

DOI:

Abstract

Downloads

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Make a Submission

HEC Recognized Journal

About the Journal

Call for Papers

Information

Latest publications