Disaster Tweet Classification

Fri, 27 Feb 2026 00:00:00 +0000

A binary text classification pipeline that distinguishes real disaster-related tweets from casual ones using the of 7,613 pre-labelled posts. The workflow covers text preprocessing (lowercasing, URL/mention removal, stop-word removal, lemmatisation), feature extraction with TF-IDF, baseline comparison between Logistic Regression and Linear SVC, and hyperparameter tuning via GridSearchCV, reaching 83% accuracy after tuning.

The project also includes an honest error analysis: the model handles explicit disaster keywords and factual, news-like language well, but struggles with figurative language, sarcasm, and short tweets where context is doing most of the work. This bag-of-words ceiling motivates the natural next step toward word embeddings or transformer-based models like BERTweet.

Built locally in VS Code as part of Masterschool’s NLP module.

NLP |

Disaster Tweet Classification