Classification models for heart disease prediction using feature selection PCA
The prediction of cardiac disease helps practitioners make more accurate decisions regarding patients' health. Therefore, the use of machine learning (ML) is a solution to reduce and understand the symptoms related to heart disease. ?The aim of this work is the proposal of a dimensionality redu
2025-06-28 16:25:49 - Adil Khan
Classification models for heart disease prediction using feature selection PCA
Project Area of Specialization Artificial IntelligenceProject SummaryThe prediction of cardiac disease helps practitioners make more accurate decisions regarding patients' health. Therefore, the use of machine learning (ML) is a solution to reduce and understand the symptoms related to heart disease. •The aim of this work is the proposal of a dimensionality reduction method and finding features of heart disease by applying a feature selection technique. The information used for this analysis was obtained from the UCI Machine Learning Repository called Heart Disease.
Project ObjectivesThe classification learning models combined with dimensionality reduction seek to achieve three primary objectives: (i) to learn the best feature representation of the dataset used; (ii) to validate the performance of PCA in conjunction with a feature selection technique; and (iii) to learn the classification model that computes the best performance.
Project Implementation MethodA subset of features was used to create an algorithm relevant to clinical situations. The clinical variables considered relevant were AGE, SEX, CP, and TRESTBPS; the routine test data CHOL, FBS, and RESTECG; the exercise electrocardiography test with the features THALACH, EXANG, SLOPE, and OLDPEAK; and the non-invasive test, THAL, and CA. In addition, the label was NUM.
Benefits of the ProjectFrom the analysis, chi-square derived features of anatomical and physiological relevance, such as cholesterol, maximum heart rate, chest pain, features related to ST depression, and heart vessels. Our method can be employed in many real-life applications or in other medical diagnoses to analyze great amounts of data and identify the risk factors involved in different diseases. Our main limitation is the difficulty to extend these findings on heart disease due to the small sample size. For future developments, we plan to apply our method to a larger dataset and perform the analysis of some other disease with different feature selection techniques.
Technical Details of Final DeliverableOur model outperformed those in the literature. It is important when practitioners can only work with three or four times less than the given number of features and achieve competitive results compared to full features. Our method helps to reduce unnecessary patients’ attributes and reduce the amount of data.
Final Deliverable of the Project Software SystemCore Industry HealthOther IndustriesCore Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Gender EqualityRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 77000 | |||
| Printing | Miscellaneous | 550 | 10 | 5500 |
| Stationery | Miscellaneous | 30 | 50 | 1500 |
| Panaflex | Miscellaneous | 4 | 500 | 2000 |
| Cloud Server | Equipment | 2 | 14000 | 28000 |
| GPU Systems | Equipment | 2 | 20000 | 40000 |