Disaster Tweet Classification

A binary text classification pipeline that distinguishes real disaster-related tweets from casual ones using the Kaggle Disaster Tweets dataset of 7,613 pre-labelled posts. The workflow covers text preprocessing (lowercasing, URL/mention removal, stop-word removal, lemmatisation), feature extraction with TF-IDF, baseline comparison between Logistic Regression and Linear SVC, and hyperparameter tuning via GridSearchCV, reaching 83% accuracy after tuning.

The project also includes an honest error analysis: the model handles explicit disaster keywords and factual, news-like language well, but struggles with figurative language, sarcasm, and short tweets where context is doing most of the work. This bag-of-words ceiling motivates the natural next step toward word embeddings or transformer-based models like BERTweet.

Built locally in VS Code as part of Masterschool’s NLP module.

NLP Machine Learning Text Classification TF-IDF Binary Classification

Authors

Annelize Krause

Business & Operations Analyst

Business Analyst with 18+ years of experience in legal, operations, and tech. I help businesses perform better by optimizing workflows and turning messy data into something AI can actually use. Now wrapping up Masterschool’s AI Data Science program with a hands-on internship for real-world work.

← Car Price Prediction with Machine Learning Apr 15, 2026

CIFAR-10 Image Classification Feb 3, 2026 →

No results found

Disaster Tweet Classification