MACHINE LEARNING BASED LUEKAEMIA CANCER PREDICTION SYSTEM USING PROTEIN SEQUENTIAL DATA

Leukemia cancer (type of blood cancer) is one of the major problem in health sciences nowadays, it is basically caused by the neoplastic proliferation of White blood cells (WBC), several studies and researches have been conducted to detected cancer using microscopic images of blood cells b

2025-06-28 16:28:31 - Adil Khan

Project Title

MACHINE LEARNING BASED LUEKAEMIA CANCER PREDICTION SYSTEM USING PROTEIN SEQUENTIAL DATA

Project Area of Specialization Artificial IntelligenceProject Summary 1. Summary:

Leukemia cancer (type of blood cancer) is one of the major problem in health sciences nowadays, it is basically caused by the neoplastic proliferation of White blood cells (WBC), several studies and researches have been conducted to detected cancer using microscopic images of blood cells but if we talk about Protein Sequential data this area is not widely researched as compare to other techniques. But the real problem is that we need to visit a hematologist to diagnose it, moreover there are only 10 hematologist specialist in KPK (until 24 November 2021). Acording to WHO and Global Cancer Observatory the mortality rate is highest in Asia, link is under

https://gco.iarc.fr/today/data/factsheets/cancers/36-Leukaemia-fact-sheet.pdf

and generally it is detected at a stage where recovery becomes very difficult so we are developing an algorithm that will detect the cancer through Protein sequential data, after collecting data of leukemia cancer we will use this data-set and apply it to several machine learning algorithms such as SVM, RandomForest, XG boost, logistic regression then we will assess the accuracy of the each one, then the algorithm, which delivered maximum accuracy we will embed this algorithmin our system so it will classify weather the person is affected from cancer or not.

Project Objectives Objectives

So as we discussed that the leukemia cancer is predicted at a stage where recovery chances are minimum so therefore we are proposing Machine Learning based technique to identify those genes which causes Leukemia cancer through Protein Sequences, so if we detect cancer early on then we can decrease the mortality rate exponentially. So in-case we are successful in implementing this project with high accuracy this will become a flagship project for health sciences and then we can also accommodate the outnumbering of hematologist (specialists).

Future Work:

We will embed our system with the PCR test machine in order to use our system in real time hospital patients to detect at a very cheap cost.

Project Implementation Method Project Implementation

Environment & Language:

To implement Machine Learning algorithms we have two options:

Matlab / Octave.

Python.

So we have two options for the implementation of machine learning algorithms. But we are willing

to use Python as it has highly optimized libraries.

4.2 Libraries:

we have numerous libraries for implementing classification problem as:

1. Standard library.

2. Numpy

3. LIBSVM

4. Pandas.

5. Matplotlib

6. Sea born.

7. Scikit-learn

Data Set:

In machine learning projects one of the main and most important player is the data set if we have

data set and related algorithms we can solve a variety of problems through machine learning. We

will take data set for CML from Universal Resource of Protein (UniProtKb) in FASTA file format.

4.4 Implementation Issues and Challenges:

The most difficult issue / challenge, is based on selecting the correct input parameters and to find an optimal fit for the data because if we have enormous parameters then it will over-fit the data, if the data is over-fitted then the algorithm will give excellent result on the trained data set but the specimen or the data given outside the data-set may not have the correct output, if the numbers of parameter are few then the classification curve will be under-fit because the Algorithm will not have enough parameter to judge the correct output, so therefore we must need to achieve the optimal fit or appropriate for the algorithm.

Methods:

 algorithm.

Methods:

Benefits of the Project

Benifits:

this is a first time ever classification system that is bassed on protien sequences, that we can also detect the cancer which using the dataset of the invovvled genes.

This will increase the chance of survival rate of paitents abd we cna use this system in hospitals in real time.

Technical Details of Final Deliverable

Techincal Details:

Our final deliveable will a machine learning algorithm based app and in (FUTURE: we will be embedding this app with PCR test machine can't im[lement PCR test machine upto final presentation becuase it reqiures high level expertise and accuracy as it related to human life 

Final Deliverable of the Project Software SystemCore Industry MedicalOther Industries Health Core Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Good Health and Well-Being for PeopleRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 70000
Matlab / Octave, Equipment100
Anaconda Equipment100
VS code Equipment100
Future Work ( PCR Machine ) Equipment17000070000

More Posts