Leukemia cancer (type of blood cancer) is one of the major problem in health sciences nowadays, it is basically caused by the neoplastic proliferation of White blood cells (WBC), several studies and researches have been conducted to detected cancer using microscopic images of blood cells b
MACHINE LEARNING BASED LUEKAEMIA CANCER PREDICTION SYSTEM USING PROTEIN SEQUENTIAL DATA
Leukemia cancer (type of blood cancer) is one of the major problem in health sciences nowadays, it is basically caused by the neoplastic proliferation of White blood cells (WBC), several studies and researches have been conducted to detected cancer using microscopic images of blood cells but if we talk about Protein Sequential data this area is not widely researched as compare to other techniques. But the real problem is that we need to visit a hematologist to diagnose it, moreover there are only 10 hematologist specialist in KPK (until 24 November 2021). Acording to WHO and Global Cancer Observatory the mortality rate is highest in Asia, link is under
https://gco.iarc.fr/today/data/factsheets/cancers/36-Leukaemia-fact-sheet.pdf

and generally it is detected at a stage where recovery becomes very difficult so we are developing an algorithm that will detect the cancer through Protein sequential data, after collecting data of leukemia cancer we will use this data-set and apply it to several machine learning algorithms such as SVM, RandomForest, XG boost, logistic regression then we will assess the accuracy of the each one, then the algorithm, which delivered maximum accuracy we will embed this algorithmin our system so it will classify weather the person is affected from cancer or not.
So as we discussed that the leukemia cancer is predicted at a stage where recovery chances are minimum so therefore we are proposing Machine Learning based technique to identify those genes which causes Leukemia cancer through Protein Sequences, so if we detect cancer early on then we can decrease the mortality rate exponentially. So in-case we are successful in implementing this project with high accuracy this will become a flagship project for health sciences and then we can also accommodate the outnumbering of hematologist (specialists).
Future Work:
We will embed our system with the PCR test machine in order to use our system in real time hospital patients to detect at a very cheap cost.
Environment & Language:
To implement Machine Learning algorithms we have two options:
Matlab / Octave.
Python.
So we have two options for the implementation of machine learning algorithms. But we are willing
to use Python as it has highly optimized libraries.
4.2 Libraries:
we have numerous libraries for implementing classification problem as:
1. Standard library.
2. Numpy
3. LIBSVM
4. Pandas.
5. Matplotlib
6. Sea born.
7. Scikit-learn
Data Set:
In machine learning projects one of the main and most important player is the data set if we have
data set and related algorithms we can solve a variety of problems through machine learning. We
will take data set for CML from Universal Resource of Protein (UniProtKb) in FASTA file format.
4.4 Implementation Issues and Challenges:
The most difficult issue / challenge, is based on selecting the correct input parameters and to find an optimal fit for the data because if we have enormous parameters then it will over-fit the data, if the data is over-fitted then the algorithm will give excellent result on the trained data set but the specimen or the data given outside the data-set may not have the correct output, if the numbers of parameter are few then the classification curve will be under-fit because the Algorithm will not have enough parameter to judge the correct output, so therefore we must need to achieve the optimal fit or appropriate for the algorithm.
Methods:
algorithm.
Methods:

Benifits:
this is a first time ever classification system that is bassed on protien sequences, that we can also detect the cancer which using the dataset of the invovvled genes.
This will increase the chance of survival rate of paitents abd we cna use this system in hospitals in real time.
Techincal Details:
Our final deliveable will a machine learning algorithm based app and in (FUTURE: we will be embedding this app with PCR test machine can't im[lement PCR test machine upto final presentation becuase it reqiures high level expertise and accuracy as it related to human life
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Matlab / Octave, | Equipment | 1 | 0 | 0 |
| Anaconda | Equipment | 1 | 0 | 0 |
| VS code | Equipment | 1 | 0 | 0 |
| Future Work ( PCR Machine ) | Equipment | 1 | 70000 | 70000 |
| Total in (Rs) | 70000 |
Modern transportation relies heavily on internal combustion engines, including gas turbine...
Most people are spending most of their lifetime in walking. Walking is also known as ambul...
DBpedia is an open,free and complete knowledge base constantly updated and exten...
Electric vehicles are the future with the advancing technology. Conventional battries...
In present era the world has become the global village. Business is moving quickly towards...