Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

Airline data analysis and prediction

Flight delays come with the experience of flying. It is inevitable that the flights will be delayed. A large scale study by the U.S. military found approximately 25 % of flights were delayed in the late 2000s. Delays have many causes. Some can?t be controlled, like weather and some can be controlled

Project Title

Airline data analysis and prediction

Project Area of Specialization

Artificial Intelligence

Project Summary

Flight delays come with the experience of flying. It is inevitable that the flights will be delayed. A large scale study by the U.S. military found approximately 25 % of flights were delayed in the late 2000s. Delays have many causes. Some can’t be controlled, like weather and some can be controlled like Carrier delay. Often when the passengers are ready to fly off to their destination, it’s only then that the flight schedule board at the airport breaks the news that the flight has been delayed which is frustrating and chaotic for the passengers. If only the passengers knew about the delay beforehand, they might have been less frustrated and more happy about on-time information provision. If this system is sped up, it can surely help airlines create a better relationship with their customers. In this context, we have three audiences in mind when researching this matter, namely, the airlines, the passengers and the law-making authorities. A robust machine learning-based prediction system can be deployed which is once trained can predict flight delays in realtime. Other choices are to make a regression model or a deep learning-based prediction model. Among all these options, deep learning would be assumed as a natural choice. In this case, however, it is not. Machine learning is the best tool at hand. After experimenting with different algorithms and methods, it was found that not even deep learning is able to provide a model with dependable accuracy. Therefore, we applied another approach, that is, to find the features with the highest correlation and feed those to a Random Forest prediction model. This model has proven to be robust and state of the art in predicting flight delays. Since the airlines are under no legal obligation to operate a scheduled flight on a given day and are not required to compensate passengers for damages when flights are delayed or even canceled. And this is why passengers must take caution in choosing what airline to fly with and what airport to choose before purchasing tickets. In light of this situation, we analyzed the data and ranked the airlines based on their delay and cancellation performance. Our forecasts model and prediction model are aimed at the passengers and the airlines so that to their own ends, they can take steps to avoid these problems in the future. No other researchers have compared these methods of prediction for the Airline On-time performance dataset in quest of finding the best fit. Hence, there remains a vacuum to make a reliable, robust and scalable delay prediction model.

Project Objectives

Our objective is to find insights into the data and prediction. So we can help companies and customers. So we can save our time. And airline companies can also use our insights and help themselves to improve their services.

Project Implementation Method

The Airline On-time Performance dataset is provided by Bureau of Transportation Statistics, USA. There are 109 attributes associated with each flight. And for each month, there are approximately 0.5 million flights. Data is available in a monthly format as ‘.csv’ files. We downloaded these files for the year 2009 to 2019. Next, for data validation, we wrote a script in Python to iterate the whole dataset and validate it with the look-up tables BTS provided. Next, we explored the data and found that out of 109 columns, 47 columns were 99% empty. We removed these columns leaving 62 columns behind as they were not only empty, but also not important for our research. The data is available in a monthly format. We stored the data in Microsoft SQL Server grouped by each year. For TF, we produce forecasts using exponential smoothing methods with ETS. ETS is a function which is based on the classification of methods as described in Hyndman et al (2008) [4]. Forecasts produced using these methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, higher the associated weight. ETS uses these methods to estimate the model parameters and return the information about the fitted model. We then produce forecasts using this model. For delay prediction and EDA, we have limited our dataset to first six months since we do not have computers that can handle such huge data. In EDA, we wrote SQL Queries and saved the results as views in the SQL server and then used these views in Tableau for data visualization. Finally, in arrival and delay prediction. The first step is preprocessing. We applied encoding on the categorical data present in our dataset. For feature selection, we apply PCA and correlation and select the top 25 features for each method. Now, we store these features on the disk for future use, each set representing each method. We split the dataset as 70% for training and 30% for testing.

Benefits of the Project

Airline industries use our data and help them to improve their performances.

Technical Details of Final Deliverable

We will make a dashboard. So airline companies can use it and help themselves to improve.

Final Deliverable of the Project

Software System

Core Industry

Transportation

Other Industries

Core Technology

Big Data

Other Technologies

Sustainable Development Goals

Partnerships to achieve the Goal

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
GPU Equipment0700000
Total in (Rs) 0
If you need this project, please contact me on contact@adikhanofficial.com
BATTERY BACKUP ESTIMATION AND DEPLETION SENSING SYSTEM

Batteries deplete over overtime when continuously connected to load, their lifespan decrea...

1675638330.png
Adil Khan
9 months ago
Skin Cancer Detection Using Image Processing

Skin cancer is seen as one of the most Hazardous forms of Cancer found in Humans. Skin can...

1675638330.png
Adil Khan
9 months ago
Water Quality and Quantity Monitoring system using IOT

This project purpose is to create an IOT based system that measure quality of water for pu...

1675638330.png
Adil Khan
9 months ago
video

PHP Tutorial (& MySQL) #31 - Getting a Single Record

AdiKhanOfficial
Adil Khan
3 years ago
designing and planning of electrical equipments in modern construction...

11KV MV DISTRIBUTION NETWORK 11kV MV Distribution network shall be used for Energization o...

1675638330.png
Adil Khan
9 months ago