Abstract
The Big Data analytics are a logical analysis of very large scale datasets.
The data analysis enhances an organization and improve the decision making
process. In this article, we present Airline Delay Analysis and Prediction to
analyze airline datasets with the combination of weather dataset. In this
research work, we consider various attributes to analyze flight delay, for
example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we
present rigorous experiments on various machine learning model to predict
correctly the delay of a flight, namely, logistic regression with L2
regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree
classifier and Random forest model. The accuracy of the Random Forest model is
82% with a delay threshold of 15 minutes of flight delay. The analysis is
carried out using dataset from 1987 to 2008, the training is conducted with
dataset from 2000 to 2007 and validated prediction result using 2008 data.
Moreover, we have got recall 99% in the Random Forest model.
Users
Please
log in to take part in the discussion (add own reviews or comments).