← Back to Projects

Public Transit Delay — Exploratory Data Analysis

Structured exploratory data analysis of public transit delay data: cleaning raw delay data, engineering features (time-based, delay categories), and producing clear visualizations and insights to understand when, where, and how delays occur. Two Jupyter notebooks (cleaning + EDA) with key findings on distribution, temporal patterns, route-level performance, and on-time rates.

Preview

Public transit delay EDA — visualizations and findings
Exploratory analysis of transit delay data: distribution of delays, temporal patterns, and route-level performance. Findings (e.g. median delay ~13 min, high-delay routes) support operational focus and future modeling.

Problem & Context

The goal was to load and clean raw public transit delay data, document data quality issues (missing values, types, outliers), engineer features for analysis and future modeling, explore the cleaned dataset through well-labeled visualizations, and summarize key findings in a reproducible way. The dataset (Kaggle: Public Transport Delays with Weather and Events) covers trip-level records with scheduled/actual times, delay minutes, route, weather, and congestion.

What It Does

Tech Stack

Python Pandas NumPy Matplotlib Seaborn Jupyter

Key Takeaways

A clean two-notebook workflow (cleaning → EDA) keeps the analysis reproducible and portfolio-ready. The findings support operational focus on high-delay routes and inform future modeling (e.g. regression or classification on delay).

← Back to Projects