Classification models for heart disease prediction using feature selection PCA

The prediction of cardiac disease helps practitioners make more accurate decisions regarding patients' health. Therefore, the use of machine learning (ML) is a solution to reduce and understand the symptoms related to heart disease. ?The aim of this work is the proposal of a dimensionality redu

2025-06-28 16:25:49 - Adil Khan

Project Title

Classification models for heart disease prediction using feature selection PCA

Project Area of Specialization Artificial IntelligenceProject Summary

The prediction of cardiac disease helps practitioners make more accurate decisions regarding patients' health. Therefore, the use of machine learning (ML) is a solution to reduce and understand the symptoms related to heart disease. •The aim of this work is the proposal of a dimensionality reduction method and finding features of heart disease by applying a feature selection technique. The information used for this analysis was obtained from the UCI Machine Learning Repository called Heart Disease.

Project Objectives

The classification learning models combined with dimensionality reduction seek to achieve three primary objectives: (i) to learn the best feature representation of the dataset used;  (ii) to validate the performance of PCA in conjunction with a feature selection technique; and (iii) to learn the classification model that computes the best performance.

Project Implementation Method

A subset of features was used to create an algorithm relevant to clinical situations. The clinical variables considered relevant were AGE, SEX, CP, and TRESTBPS; the routine test data CHOL, FBS, and RESTECG; the exercise electrocardiography test with the features THALACH, EXANG, SLOPE, and OLDPEAK; and the non-invasive test, THAL, and CA. In addition, the label was NUM.

Benefits of the Project

From the analysis, chi-square derived features of anatomical and physiological relevance, such as cholesterol, maximum heart rate, chest pain, features related to ST depression, and heart vessels. Our method can be employed in many real-life applications or in other medical diagnoses to analyze great amounts of data and identify the risk factors involved in different diseases. Our main limitation is the difficulty to extend these findings on heart disease due to the small sample size. For future developments, we plan to apply our method to a larger dataset and perform the analysis of some other disease with different feature selection techniques.

Technical Details of Final Deliverable

Our model outperformed those in the literature. It is important when practitioners can only work with three or four times less than the given number of features and achieve competitive results compared to full features. Our method helps to reduce unnecessary patients’ attributes and reduce the amount of data.

Final Deliverable of the Project Software SystemCore Industry HealthOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Gender EqualityRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 77000
Printing Miscellaneous 550105500
Stationery Miscellaneous 30501500
Panaflex Miscellaneous 45002000
Cloud Server Equipment21400028000
GPU Systems Equipment22000040000

More Posts