Health Detection of Plants using ML
Accurate prediction of plant health to avoid massive agricultural losses is crucial for food security, especially in the face of aggravating environmental factors because of climate change. However, manual methods of plant health assessment are time consuming and error prone. Therefore, the main obj
2025-06-28 16:27:33 - Adil Khan
Health Detection of Plants using ML
Project Area of Specialization Artificial IntelligenceProject SummaryAccurate prediction of plant health to avoid massive agricultural losses is crucial for food security, especially in the face of aggravating environmental factors because of climate change. However, manual methods of plant health assessment are time consuming and error prone. Therefore, the main objective of our project is to design a reliable machine learning binary classifier by leveraging open-source libraries with a suitable dataset which can accurately determine the health status of a given plant. For the choice of type of dataset, both hyperspectral and RGB images are considered. Hyperspectral images enable us to extract more detailed information related to plant health and thus help identify diseases at an early stage. They are used in conjunction with vegetation indices that are a powerful metric for assessing photosynthetic activity and hence overall health of a plant. Each Machine Learning model for this application has its own pros and cons and so the ultimate choice depends upon the type of dataset and the nature of crops in question. Convolutional Neural Networks are known to be the most popular choice for detection of plant diseases in both colored images and hyperspectral images. Regression techniques have also been applied on hyperspectral images especially for cotton crops and are known to produce accuracy of 90% with linear regression models and 80% for partial least square regression. Owing to the limited availability of hyperspectral data, this project makes use of hyperspectral images of cotton farms acquired from Hyperion, which has not been used in any prior machine learning based approach for plant health detection. Therefore, the proposed methodology is a novel one and our project aims to investigate the different techniques that can be applied with a raw, unlabeled, remotely sensed data of this nature to achieve a reasonable accuracy for plant health assessment. Normalized Difference Vegetation Index values calculated from the multispectral images of the same cotton farms will be used to validate the results of our model. As this is a relatively unexplored domain and may pose some technical challenges so a color-based images approach for rice crops is under consideration too, as an alternative. The machine learning model used for the color-based images is a convolutional neural network that is currently in its developmental phases. Hence, one of these approaches will be demonstrated at the end depending on the feasibility and constraints.
Project ObjectivesThe main goal is to investigate the feasibility and performance of using remotely sensed,raw hyperspectral data which is labeled manually using vegetation indices calculated from corresponding multispectral imagery. In case of unsurmountable technical challenges, color based RGB images of rice crops would be used to build upon a CNN model that accurately identifies diseased crops as such.
So we aim to maximize extraction of information from a suitable data set of a specific plant and then train a selected model to get accurate prediction regarding its health status.
The sub-objectives are briefly discussed below:
1) Obtain sufficient dataset of images of healthy and diseased plant i.e, color based images for rice and hyperspectral for cotton.
2) Feature engineering and image analysis. Select a suitable ML model for the dataset to analyze the image provided and predict the health of the plant by classifying it as either healthy or not healthy.
3) Evaluate the model and optimize it to make accurate predictions regarding the health of a specific crop.
Project Implementation MethodHyperspectral:
Hyperspectral data isn't readily available in the open-source market especially in its labelled form. So after exhaustively exploring a number of options like Earth Engine etc, we have settled on the hyperspectral image-based dataset from The Earth-Observing One (EO-1) satellite provided by USGS.
Training and testing samples that pertain to cotton crops only have been extracted using knowledge shared by some domain experts from the company watersprint.io and a fair degree of hit and trial in manual searches. These samples correspond to cotton farms in Southern Punjab. Each pixel of these images would serve as a sample for our dataset together with vegetation indices corresponding to it.
QGIS - a freeware for study and analysis of geospatial data is being used to pre-process this data. Mainly this involves noise removal and then calculation of NDVI values. NDVI values that are a measure of plant's photesynthetic activity are an important metric for plant health and their calculation using multispectral images from Landsat 7 corresponding to our Hyperion cotton farms, would be used to label our dataset. Once we have our labelled dataset then we will move towards preprocessing techniques such as resizing, augmentation and mean normalization on our finalized original dataset to improve results.
Based on the size and nature of the dataset the most appropriate ML model would be developed.
Color-Based Imaging:
For the RGB images-based approach, rice crops are being considered owing to their ready availability and the proven effectiveness of ML based approaches in accurately identifying diseases among them this way. So, a dataset of 400 rice leaf images has been normalized and is being used to train a CNN which consists of 2 convolutional layers and uses Max Pooling for dimensionality reduction of input image. This model is currently being tweaked and built upon further to act as an effective and accurate classifier for healthy and diseased images.
We are experimenting with our python code on Google Colabs using libraries like Keras, TensorFlow, numpy and matplotlib for visualization of results.
Benefits of the ProjectAs mentioned above, Machine Learning will be used in this project. The model will be trained on image data set comprising of specific plants. Images of unhealthy and healthy plants, both will be used. Tensor Flow, Keras and Google Colab will be used to train the deep convolutional neural network and analyze data of rice. Once the model is trained then it will be able to conduct an accurate health assessment on a particular species of plant. On the other hand, hyperspectral approach is also being explored. It is currently in the preprocessing stage and some of the main challenges that might be encountered includes limited hyperspectral cotton data, no labelled data set and imitated hardware resources. However, despite these challenges there is a lot of potential in the methodology adopted for classification as healthy or not healthy of hyperspectral cotton images. The coordinates of cotton farms are provided to us by a company named Watersprint. Hyperspectral analysis gives us much more detail as opposed to a normal image and can be used to extract important details related to the health of the crop. The main objective of our research is to see how far we can go with hyperspectral data and obtain a good accuracy. If the data is insufficient to train our model properly then we will shift to the traditional approach of health classification using colored rice images and CNN. The choice of machine learning model will be dependent on the finalized dataset and hardware resources constraints. Hence, the main objective is to make the most out of the limited resources that are available to us to successfully complete this project.
Our audience broadly includes people who are interested in obtaining the health of a selected crop in a specified area. However, researchers can also benefit from our findings after we have completed experimentation and research on hyperspectral images with the proposed ML model. Our research also aims to be useful for botanists, agribusiness and crop producers who are eager to learn about the underlying reasons related to crop health.
Technical Details of Final DeliverableHyperion dataset has not been used in any prior machine learning based approach for plant health detection, thereby making our proposed methodology a novel one. Thus, this is a relatively unexplored domain and may pose some technical challenges. Therefore, in view of this, our project considers color-based imaging technology too as an alternative to hyperspectral. The limitation of RGB images to not be able to detect early onset of diseases would try to be made up for by tweaking the existing models such that they are able to capture more detail and hence offer an accuracy somewhat comparable to hyperspectral images. For the RGB images-based approach, rice crops are being considered owing to their ready availability and the proven effectiveness of ML based approaches in accurately identifying diseases among them this way. So, a dataset of 400 rice leaf images has been normalized and is being used to train a CNN which consists of 2 convolutional layers and uses Max Pooling for dimensionality reduction of input image. This model is currently being tweaked and built upon further to act as an effective and accurate classifier for healthy and diseased images. We are experimenting with our python code on Google Colabs. Important libraries such as Keras, TensorFlow, numpy and matplotlib are being imported.
As discussed above, we also explored the colored images and found a suitable rice dataset. A CNN is set up on Google Colabs which uses libraries such as Tensor Flow and Keras to classify the rice as healthy or not healthy. It is still in the experimentation phase and the accuracy will be improved. If the hyperspectral approach is unfeasible then we will proceed with this approach. Either way, one of the two described approaches will be completed and presented at end.
Final Deliverable of the Project Software SystemCore Industry ITOther Industries Agriculture Core Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Zero Hunger, Industry, Innovation and InfrastructureRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 14409 | |||
| Google Colab Pro Subscription 3 months for 3 students | Equipment | 9 | 1601 | 14409 |