Disaster Tweet Classification

Feb 27, 2026 · 1 min read
projects

A binary text classification pipeline that distinguishes real disaster-related tweets from casual ones using the Kaggle Disaster Tweets dataset of 7,613 pre-labelled posts. The workflow covers text preprocessing (lowercasing, URL/mention removal, stop-word removal, lemmatisation), feature extraction with TF-IDF, baseline comparison between Logistic Regression and Linear SVC, and hyperparameter tuning via GridSearchCV, reaching 83% accuracy after tuning.

The project also includes an honest error analysis: the model handles explicit disaster keywords and factual, news-like language well, but struggles with figurative language, sarcasm, and short tweets where context is doing most of the work. This bag-of-words ceiling motivates the natural next step toward word embeddings or transformer-based models like BERTweet.

Built locally in VS Code as part of Masterschool’s NLP module.

Annelize Krause
Authors
Business & Operations Analyst
Business Analyst with 18+ years of experience in legal, operations, and tech. I help businesses perform better by optimizing workflows and turning messy data into something AI can actually use. Now wrapping up Masterschool’s AI Data Science program with a hands-on internship for real-world work.