Hardware Implementation of Deep Convolutional Neural Networks for Computer Vision
Our project focuses on the Hardware Implementation of Deep Convolutional Neural Networks for Computer Vision. We aim to deploy the state-of-the-art image segmentation model e.g. Mask-RCNN on an FPGA. This prototype can be used to solve real-world problems in multiple domains. Our prototype wi
2025-06-28 16:27:32 - Adil Khan
Hardware Implementation of Deep Convolutional Neural Networks for Computer Vision
Project Area of Specialization Artificial IntelligenceProject SummaryOur project focuses on the Hardware Implementation of Deep Convolutional Neural Networks for Computer Vision. We aim to deploy the state-of-the-art image segmentation model e.g. Mask-RCNN on an FPGA.
This prototype can be used to solve real-world problems in multiple domains. Our prototype will include an FPGA connected with a camera and a display screen. It will perform real-time processing to detect objects and generate a high-quality segmentation mask for each instance.
A hardware-based approach to implementing Mask-RCNN will ensure energy efficiency to acquire comparable performance with commercially available Edge device solutions such as Nvidia’s Jetson Nano Kits.
Our designed prototype can be used for multiple applications including Low-cost security solutions, Autonomous Driving, Healthcare Industry, Agriculture Industry, and the Retail industry.
Project ObjectivesOur main objective is to build a low-cost and energy-efficient solution for deploying real-time embedded object detection systems. Current solutions such as Edge devices with GPUs have an excessive cost associated with them and use a lot of power compared to a dedicated hardware-based solution.
Ideally, our prototype will achieve/improve the accuracy of traditional software-based methods while being more efficient with power consumption and having a lower cost when manufactured on silicon at scale. These benefits, however, will be achieved at the cost of reduced flexibility. This, unfortunately, is an inherent limitation of a hardware-based approach to implementing any algorithm.
Another major objective of our project will be to complete our literature review. This is required to provide the reader with a better understanding of the work which is already done in this domain and to accurately convey our results.
To achieve these objectives, we will implement both solutions i.e., hardware-based solution on FPGA and software-based solution on an Edge device. Afterward, we will compare both approaches and share our findings on their pros, cons, and potential pitfalls that may occur during implementation.
Project Implementation MethodThe main goal of our project is to optimize the Mask R-CNN algorithm for real-time objection detection to reduce the computational cost and increase the power efficiency while keeping the accuracy of the network to a suitable level.
The Mask R-CNN model simply predicts the class label, bounding box, and mask for the objects in an image. Our design flow will comprise of the following parts:
- Prepare the model configuration parameters.
- Build the Mask R-CNN model architecture.
- Read the input image with the help of the integrated camera.
- Selecting CNN (Convolution Neural Network) model and structure.
- Designing and optimizing OpenCL Kernel.
- Deployment on FPGA.
- Detect the objects in the image.
- Visualize the results.
The proposed hardware architecture will be composed of an edge device i.e., Advanced RISC Machines (ARM)-centric processing system (PS) and programmable logic (PL). The data from the camera enters the FPGA, then some input processing takes place (deserialization, etc.), then if needed the resolution can be reduced to enable larger sliding windows or lower memory bandwidth requirements.
Only the greyscale component of the camera signal is sent to the external memory, where the entire image is stored. This image is then processed by the FPGA. We will use the ARM (Advanced RISC Machines) Processor to run some parts of the code where it is not required. The remaining part would run on the programmable logic.
The input data reordering module will rearrange the pixels and feed them to the processing array. CNN operates on a higher frequency than the reading from the memory to allow multiple sliding window sizes. We will compare the results with an edge device to show that the FPGA is power efficient and speeds up the process of image recognition.
Benefits of the ProjectThere are many Edge devices available today for implementing computer vision solutions. The most well-known of which is jetson nano. The cost factor associated with these edge devices is extremely high. Furthermore, they also require more energy than a hardware-based approach.
In computer vision, object detection is the most well-known and thoroughly researched problem. Our goal is to provide a hardware solution for this problem. This approach will ensure a lower manufacturing cost than GPU-based edge devices at the cost of flexibility.
The hardware approach will require less energy allowing for battery-powered solutions to last longer while maintaining/exceeding the accuracy achieved on edge devices. There are many applications for an embedded, real-time, and high-efficiency object detection system. Some of these include:
- Low-cost security solutions: A wide range of security applications in video surveillance are based on object detection, for example, to detect people in restricted or dangerous areas, suicide prevention, or automate inspection tasks in remote locations with computer vision.
- Autonomous Driving: Self-driving cars depend on object detection to recognize pedestrians, traffic signs, other vehicles, and more. For example, Tesla’s Autopilot AI heavily utilizes object detection to perceive environmental and surrounding threats such as oncoming vehicles or obstacles.
- Healthcare Industry: Object detection has allowed for many breakthroughs in the medical community. Because medical diagnostics rely heavily on the study of images, scans, and photographs, object detection involving CT and MRI scans has become extremely useful for diagnosing diseases, for example with ML algorithms for tumor detection.
- Agriculture Industry: Object detection is used in agriculture for tasks such as counting, animal monitoring, and evaluation of the quality of agricultural products. Damaged products can be detected while they are processed using machine learning algorithms.
- Retail industry: Human counting systems in retail stores can be used to gather information about how customers spend their time. This data can be used to improve customer experience by optimizing the store layout and making operations more efficient. A popular use case is the detection of queues to reduce waiting time in retail stores.
All these solutions can be brought to the market faster and cheaper when our hardware prototype achieves/improves the accuracy of traditional software-based methods while being more efficient with power consumption.
Technical Details of Final DeliverableOur final delivery would be a prototype consisting of an FPGA, a Camera, and an HDMI Screen. We will compare the results of our product with a traditional Edge device as well.
Our first step would be Image Segmentation. Here, the image given by the camera is segmented based on the color range of the reference object. We get 8-bits each for red, blue, and green colors. We will deal only with the upper 4 bits of RGB channel.
The object gets highlighted by the mask which we made using the captured image. In the second step, we use a counter to count the number of pixels that lie in the color range of our reference object. Lastly, in the final stage, this count value is compared with the threshold value.
The image captured by the camera is received in Bayer Format and passed on to the FPGA which converts it into the proper format to calculate the value of the current pixel. The Camera and HDMI controller interfaces the VMOD CAM (System-on-a-Chip design) and LCD with the FPGA.
FPGA sliding switch is reserved for viewing the video capture and image segmented video separately. Most of the processes are done in parallel by FPGA thus we can say that the performance of this model is better as compared to the other platforms such as DSP and PC. The output data will support up to 1600 X 1200 resolutions with a 24-bit parallel bus in processed RGB.
We will implement the same algorithm on an Edge device and will compare the efficiency of both the models (FPGA & Edge Device) to show FPGA is more efficient in terms of power, speed and cost.
Final Deliverable of the Project HW/SW integrated systemCore Industry ITOther Industries Education , Medical , Energy , Security , Telecommunication Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Industry, Innovation and Infrastructure, Sustainable Cities and Communities, Peace and Justice Strong InstitutionsRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 76000 | |||
| Artix-7™ Field Programmable Gate Array (FPGA) from Xilinx® | Equipment | 1 | 45000 | 45000 |
| Display: BenQ GW2270H | Equipment | 1 | 15000 | 15000 |
| FPGA Interfaceable Camera | Equipment | 1 | 4000 | 4000 |
| Cables and Extensions | Equipment | 1 | 3000 | 3000 |
| Logistics Expenses | Miscellaneous | 1 | 3000 | 3000 |
| Printing and Stationery Expenses | Miscellaneous | 1 | 4000 | 4000 |
| Overhead Expenses | Miscellaneous | 1 | 2000 | 2000 |