Bankruptcy prediction using diverse machine learning algorithms.

Project Title

Project Area of Specialization

Computer Science

Project Summary

Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. It is a vast area of finance and accounting research. The importance of the area is due in part to the relevance for creditors and investors in evaluating the likelihood that a firm may go bankrupt. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modeling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision trees). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division).

Project Objectives

Our main goal would be to predict in advance through various means which would tell us if our firm is about to go bankrupt or not. To improve the prediction of the model we use ensemble of boosted trees, where each base learner is constructed using additional synthetic features. The synthetic features are developed at each boosting step in an evolutionary fashion by combining features using an arithmetic operation. Each synthetic feature can be seen as a single regression model. The purpose of the synthetic features is to combine the econometric indicators proposed by the domain experts into a complex features. The synthetic features can be seen as hidden features extracted by the neural networks but the fashion they are extracted is different. At the end, we test our solution using collected data about Polish companies.

Project Implementation Method

Importing libraries
Importing and organizing the data
1. Convert the columns types for the features to float
2. Convert the class label types to int
Data Analysis and Preprocessing
1. Missing Data Analysis
  1. Generate Sparsity Matrix for the missing data
  2. Generate Heat Map for the missing data
2. Data Imputation
  1. Mean Imputation
  2. K-NN
  3. EM (Expectaion Maximization)
  4. MICE (Multivariate Imputation using Chained Equation)
3. Dealing with imbalanced data
  1. Oversampling with SMOTE
Data Modeling
1. K-Fold Cross validation
2. Models
  1. Gaussian Naive Bayes classifier
  2. Logistic Regression classifier
  3. Decision Tree classifier
  4. Extreme Gradient Boosting classifier
  5. Random Forest classifier
  6. Balanced Bagging classifier
Model Analysis
1. Model ranking

Benefits of the Project

This project will help us to predict whether our firm is about to bankrupt or not if our firm is about to bankrupt through this we’ll know much sooner so that firm can take precautionary measures to avoid being bankrupt or maybe find’s other ways to just minimize the damage. This is going to help firms a lot more to safe their wealth and status.

Technical Details of Final Deliverable

We explain our step-by-step solution of how we achieved benchmark results for bankruptcy prediction. Firstly, we introduce the Polish bankruptcy dataset and explain the details of the dataset like features, instances, data organization, etc. Next, we delve into data preprocessing steps, where we state the problems present with the data like missing data and data imbalance, and explain how we dealt with them. Next, we introduce the classification models we have considered and explain how we train our data using these models. Later, we analyze and evaluate the performance of these models using certain metrics like accuracy, precision and recall.

Final Deliverable of the Project

Software System

Core Industry

Finance

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Decent Work and Economic Growth

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
GPU	Equipment	1	60000	60000
			Total in (Rs)	60000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

dMechanics Android App

Online Mechanics Application The purpose of this application is to provide automotive serv...

Adil Khan

11 months ago

Pocket Doctor

We are creating an android app which will predict disease from user symptoms selection giv...

Adil Khan

11 months ago

Covid Temperature DETECTION And automatic hand sanitizer

The main objective of this project is to measure and display the temperature of the human...

Adil Khan

11 months ago

Climate Chaos

  The lack of awareness of global warming has been accompanied by a sharp increase in...

Adil Khan

11 months ago

intelligent stove assistant

Cooking fires are the primary cause of residential fires and fire related injuries. Not on...

Adil Khan

11 months ago