Depressive Tweet Classification via Machine Learning: Contextualizing Performance with a Broad NLP Review
DOI:
https://doi.org/10.63075/dg0b3n83Abstract
Natural Language Processing (NLP) has significantly advanced the analysis of unstructured text, enabling critical applications across diverse sectors including finance, healthcare, and social media analytics. This paper comprehensively reviews 25 research papers on NLP for topic modeling and sentiment analysis, contrasting traditional methods like Latent Dirichlet Allocation (LDA) and TF-IDF classifiers with advanced transformer models such as BERT. The review highlights that while classical models offer interpretability, they often struggle with noisy, short-form social media content, where newer transformer-based and hybrid approaches demonstrate superior performance in thematic mapping and sentiment detection. Complementing this review, we conducted an empirical study on depressive tweet classification. Utilizing TF-IDF features, we evaluated Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) models. The kernel-based SVM achieved the highest accuracy at 99.5%, surpassing LR (98.9%) and RF (97.2%), a performance consistent with SVM's known efficacy in high-dimensional, sparse feature spaces. Our analysis identifies critical gaps, including the prevalent focus on English-only datasets, the underexplored potential of multimodal data fusion (e.g., text and images for depression detection), and challenges with class imbalance. We recommend future research explore multilingual transformer architectures (e.g., mBERT, DistilBERT), integrate domain-specific lexicons, employ multitask learning frameworks for integrated topic and sentiment analysis, and incorporate Explainable AI (XAI) for enhanced transparency and ethical considerations in model development. This paper synthesizes current advancements and challenges, providing a comprehensive roadmap for developing more robust, equitable, and context-aware NLP systems for social media text analysis.