Flight delays come with the experience of flying. It is inevitable that the flights will be delayed. A large scale study by the U.S. military found approximately 25 % of flights were delayed in the late 2000s. Delays have many causes. Some can?t be controlled, like weather and some can be controlled
Airline data analysis and prediction
Flight delays come with the experience of flying. It is inevitable that the flights will be delayed. A large scale study by the U.S. military found approximately 25 % of flights were delayed in the late 2000s. Delays have many causes. Some can’t be controlled, like weather and some can be controlled like Carrier delay. Often when the passengers are ready to fly off to their destination, it’s only then that the flight schedule board at the airport breaks the news that the flight has been delayed which is frustrating and chaotic for the passengers. If only the passengers knew about the delay beforehand, they might have been less frustrated and more happy about on-time information provision. If this system is sped up, it can surely help airlines create a better relationship with their customers. In this context, we have three audiences in mind when researching this matter, namely, the airlines, the passengers and the law-making authorities. A robust machine learning-based prediction system can be deployed which is once trained can predict flight delays in realtime. Other choices are to make a regression model or a deep learning-based prediction model. Among all these options, deep learning would be assumed as a natural choice. In this case, however, it is not. Machine learning is the best tool at hand. After experimenting with different algorithms and methods, it was found that not even deep learning is able to provide a model with dependable accuracy. Therefore, we applied another approach, that is, to find the features with the highest correlation and feed those to a Random Forest prediction model. This model has proven to be robust and state of the art in predicting flight delays. Since the airlines are under no legal obligation to operate a scheduled flight on a given day and are not required to compensate passengers for damages when flights are delayed or even canceled. And this is why passengers must take caution in choosing what airline to fly with and what airport to choose before purchasing tickets. In light of this situation, we analyzed the data and ranked the airlines based on their delay and cancellation performance. Our forecasts model and prediction model are aimed at the passengers and the airlines so that to their own ends, they can take steps to avoid these problems in the future. No other researchers have compared these methods of prediction for the Airline On-time performance dataset in quest of finding the best fit. Hence, there remains a vacuum to make a reliable, robust and scalable delay prediction model.
Our objective is to find insights into the data and prediction. So we can help companies and customers. So we can save our time. And airline companies can also use our insights and help themselves to improve their services.
The Airline On-time Performance dataset is provided by Bureau of Transportation Statistics, USA. There are 109 attributes associated with each flight. And for each month, there are approximately 0.5 million flights. Data is available in a monthly format as ‘.csv’ files. We downloaded these files for the year 2009 to 2019. Next, for data validation, we wrote a script in Python to iterate the whole dataset and validate it with the look-up tables BTS provided. Next, we explored the data and found that out of 109 columns, 47 columns were 99% empty. We removed these columns leaving 62 columns behind as they were not only empty, but also not important for our research. The data is available in a monthly format. We stored the data in Microsoft SQL Server grouped by each year. For TF, we produce forecasts using exponential smoothing methods with ETS. ETS is a function which is based on the classification of methods as described in Hyndman et al (2008) [4]. Forecasts produced using these methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, higher the associated weight. ETS uses these methods to estimate the model parameters and return the information about the fitted model. We then produce forecasts using this model. For delay prediction and EDA, we have limited our dataset to first six months since we do not have computers that can handle such huge data. In EDA, we wrote SQL Queries and saved the results as views in the SQL server and then used these views in Tableau for data visualization. Finally, in arrival and delay prediction. The first step is preprocessing. We applied encoding on the categorical data present in our dataset. For feature selection, we apply PCA and correlation and select the top 25 features for each method. Now, we store these features on the disk for future use, each set representing each method. We split the dataset as 70% for training and 30% for testing.
Airline industries use our data and help them to improve their performances.
We will make a dashboard. So airline companies can use it and help themselves to improve.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| GPU | Equipment | 0 | 70000 | 0 |
| Total in (Rs) | 0 |
Batteries deplete over overtime when continuously connected to load, their lifespan decrea...
Skin cancer is seen as one of the most Hazardous forms of Cancer found in Humans. Skin can...
This project purpose is to create an IOT based system that measure quality of water for pu...
11KV MV DISTRIBUTION NETWORK 11kV MV Distribution network shall be used for Energization o...