Ensemble of Deep Learning and Local Features for Robbery Detection

2025-06-28 16:32:26 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

In our daily lives, we all witness or hear about robberies every other day, but rarely do we hear that the law enforcements were notified in time of the act and/or they stopped it. It is only after the robber has robbed and left the place that the personnel are free to move and alert the local law enforcements. CCTV camera however, watches and usually records the entire act. Usually there is a person who is monitoring the CCTV feed which can then detect the robbery, however, this method is not only expensive but the human error is also significant for example the guard can fall asleep, go for a break, be bribed or even get hurt by the robbers.

Our project aims to automate this process by detecting robbery using just the CCTV video feed as input. Frames are extracted from the input video and are processed in our detection module, which is build using combination of 3 techniques of computer vision and image processing, which then output a binary classification of whether the frame represents a robbery or not.

Project Objectives

Surveillance systems through cameras have become a very necessary part of our daily lives. Whether they are close circuits camera at traffic signals, or security cameras at malls, our lives are constantly being monitored and sometimes being recorded for our security. The main drawback of having a surveillance system is having a person constantly monitoring the footage. This not only makes the surveillance more expensive but also prone to human error and biasness. We propose a solution to automate this process using the power neural networks alongside computer vision techniques to detect if there is a robbery taking place in the footage.

Project Implementation Method

Initially the input video is broken into frames, those videos frames are processed and combined into bundles (bags) which are then fed to three different detection pipelines:

1) 3DCNN: The frames are then fed into the 3DCNN, which are trained to classify anomalous events from normal events.

2) Optical Flow: We have used optical flow to detect motion patterns in the bag of frames. It is a point based motion detection which detects motion along with direction of motion by showing vector in the direction of motion. The detected motion patterns are then compared to the learnt pattern that relate to a robbery and then classified as either relating to robbery or not.

3) YOLO Object detection: Since firearms are an important feature in a robbery and according to our dataset about 85-90% of footages of robbery had a pistol in them. We are using YOLO object detection algorithm for detecting firearms in the footage.

The final step is to combine the output from all three methods into a single binary value; we do that by multiplying the outputs with some weights and then averaging to get one final binary output.

Benefits of the Project

In normal surveillance systems human errors are frequent and can lead up to huge loses in certain cases. Since there are number of that are to be under surveillance, in most cases the human has to iterate through all the different cameras, which can lead to human missing a robbery act. The involvement of the person performing the surveillance in the act of robbery is also common. Less salary and no to very little facilities are among the most common reasons. Our solution fixes all the problems by automating the surveillance process and detecting if there is robbery being enacted in the footage.

Technical Details of Final Deliverable

1. 3D Convolutional Nueral Network

A 3D CNN differentiates from a regular CNN as it has a third dimension, hence the name 3D. The third dimension is of time, this technique is widely used for video feature extraction as it can compare and contrast between two frames. The input format is in a bag of frames, 16 in our case. It learns features and patterns through frames of one bag at a time. After the extraction of frames the models learns the features and patterns common within that bag of frame and then moves on to the next bag.

1. Optical Flow

Optical flow is a technique to observe what objects in your image are moving and in what direction. It detects changes in light between frames and draws a vector to represent the direction of the change. Comparing optical flow with neural networks gives some advantage to optical flow for example when extracting motion and other patterns within the frames we don’t really know what features are learnt by the deep model nor do we see them in action, all we know is that the model learnt some features from the input and has created a template for the input class. Optical flow on the other hand tells what object is moving in what direction is it moving. Providing a better insight on what we are doing.
Another reason for using optical flow along with 3DCNN was to improve overall accuracy. Since we don’t know exactly what features has the deep model learnt from the input (that’s just the nature of deep neural networks) it may generate more false positives and there is no way to correct them. So we use optical flow to improvise the false positives.

1. YOLO Object Detection

You Only Look Once YOLO is a one of the best object detection model. Other detection systems use some sort of classifier or localizer and run it over the image in different segments, sizes and scales. YOLO tackles the detection part completely differently, it applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

Final Deliverable of the Project Software SystemType of Industry Security Technologies Artificial Intelligence(AI)Sustainable Development Goals Good Health and Well-Being for People, Peace and Justice Strong InstitutionsRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	55000
CCTV Cameras	Equipment	3	15000	45000
Miscellaneous	Miscellaneous	1	10000	10000

Ensemble of Deep Learning and Local Features for Robbery Detection

More Posts