Flight Delay and Cancellation Prediction
Employing Machine Learning techniques for a classification and regression problem
For this project, we tried to predict whether a flight will be cancelled or not based on various flights related data. This was a classification problem. On the other hand, for flights that have not been cancelled, we decided to predict the number of minutes the flight will be delayed by, making it a regression problem. We selected this Kaggle Dataset containing information of 5.82 million flights for our project.
After necessary baselining, we used advanced machine learning models to improve accuracy. Due to skewed class data in cancellation prediction, we had to perform sampling.
We were able to achieved a Mean Absolute Error (MAE) of 5.54 and Mean Squared Error (MSE) of 74.34 in flight delay prediction using Random Forest Regressor, explaining 95% of target variability (R² = 0.95)
The codes, notebooks and final project report containing more detailed explanation can be found in this github repository.