Suspicious Speech Tracking Using Machine Learning

2025-06-28 16:36:13 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary

Monitoring every voice call by the law enforcement authorities manually is a hectic job and this difficulty increases many folds when the terrorists use new codes for their operations. Keeping up abreast of the new keywords and monitoring all calls manually is a gigantic problem faced by law enforcement agencies. Many efforts have been put forward by researchers to systematically digitize this process through machine learning techniques. Different techniques have been applied specially in the English language and other commonly used international languages. The Pashto language is still one of those rare languages which do not have enough research done on it. Pashto is a problem for the agencies to cope with as Urdu and English based systems already exist in the literature. Therefore, SSTRUM aims at developing a prototype of the system, using machine learning, which will be capable of detecting suspicious words in the Pashto language, helping in automating the process of call tracing by agencies and thus assisting the agencies and corporate companies to counter any suspicious activity.

The main aim is to build SSTRUM as a standalone device by using Raspberry pi. This device will have the capability to take in live audios as inputs and process them in order to track the suspicious words, which are already stored in the dataset. We will be creating our own dataset, which is an addition to the Pashto language that will be beneficial in the longer run, as there has been very rare work done on this language. The whole process will be seen on a designed Graphical user interface of SSTRUM. Moreover, each user and admin will have his/her own Login ID, which will help keep the privacy. The system will be built in such a way that any company or agency will have the option to either add to the dataset or target their own dataset having selective words that they want to monitor.

Project Objectives

Suspicious Speech Tracking Using Machine Learning (SSTRUM) has some objectives which are highlighted as under:

The main aim is to build a standalone device that can be used anywhere in security agencies or corporate for tracking out suspicious words which will help in automating the process.
The device will be built on Raspberry pi 4 that will be capable of taking in audios live.
A Pashto language dataset will be built which will be enough to train and deploy SSTRUM, and the system will have the capability to be updated according to any different dataset.
The system will be trained such that the accuracy of the whole suspicious word tracking procedure is kept over 70 percent.
We aim to make this system adaptable so that it can be used on low-end pcs and devices.
The Graphical User Interface(GUI) will be kept as user-friendly as possible which will enable easy handling.
Every user’s credentials will be kept safe and private from other users.

Project Implementation Method

SSTRUM will be a python based system having a user-friendly Graphical user interface that will be run on a Raspberry pi4 based standalone device. The user interface will allow the user to either log in as an administrator or as a general user. This will be adaptable for every organization that is using it. The administrator will have additional options where he/she will have the power to remove anyone if they find the individual as a threat. Admin will also have the access to the dataset where he/she can even update the dataset or train the system according to a specific dataset.

The user and the admin both will have the access to the core functionalities of SSTRUM. After logging in with their credentials, they will be shown a screen with multiple options to enter their audio file which they want to audit. This can be in the form of a recorded wav file or live audio using a microphone. The audio file size will be kept limited for fast processing.

Once the user had added the file to the system, the system will wait for the user to start the process and then proceed with the tracking. It will track the spoken word while performing all the steps ranging from feature extraction to matching with the dataset and then determine if the audio had any of such words that can be marked as suspicious. A proper alert message will pop up which will depict the exact word that was marked suspicious along with the exact interval of time at which it was spoken.

Benefits of the Project

SSTRUM will have various advantages over the traditional tracking methods and it will definitely be an effective autonomous solution. SSTRUM will have some of the features shown below

It will be a standalone device that can be planted anywhere easily
It will be an adaptable system that will have the ability to be trained by different datasets for different agencies or companies.
The final product will have the ability to detect suspicious words in real-time audios.
The biggest benefit that it will provide will be the significant decrease in the response time, this will alert the stakeholders to review the audio rather than listening to hours long clips.
It will have a secure database for every individual user making it safe to store and track sensitive calls.
Multiple Users will have the access to login and use it on a single device.
It is the First Pashto based autonomous suspicious speech recognition system as of now.
The Graphical user interface will be kept simple and user-friendly in order to be used in a vast range of organizations.
It will have multiple application areas including but not limited to Law and Enforcement Agencies, Corporate sectors, banks, Telecomm companies, and government organizations.

Technical Details of Final Deliverable

The final deliverable will be a standalone device built using Raspberry pi4 – 4GB variant for quicker processing. The device will have a touchscreen LCD along with a 5500 mAh battery support that will make it a portable device. The SSTRUM system will be run on this device which will have an option to either import already saved wav files or input live audios.

Firstly the user will have to authenticate via his credentials as an administrator or a general user on the display page that is part of our GUI. After authentication as a user, the individual will have access to his specific compartment where he will be able to add recordings to do automated tracking on them. As of now, the user will be able to detect limited suspicious words but the dataset, and list of suspicious words can be increased as the system is built keeping in view its flexible nature, in such a way that it could be altered easily according to the user’s need.

Currently, our dataset consists of 3300 audio samples of 55 different words spoken by 60 different speakers. This can be expanded as our project gives this opportunity for upgradation. The dataset also includes the phonetic transcriptions that make us easier to train the system with a phonetic dictionary. Once the suspicious word is found in the audio, it will be alerted on the screen and the user will have the access to check for the instant at which it is spelled, hence allowing to judge its context.

Final Deliverable of the Project HW/SW integrated systemCore Industry ITOther Industries Security Core Technology OthersOther Technologies Artificial Intelligence(AI)Sustainable Development Goals Industry, Innovation and Infrastructure, Peace and Justice Strong InstitutionsRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	45325
Raspberry Pi4-4GB	Equipment	1	15000	15000
Raspberry pi 3D printed Casing	Equipment	1	5000	5000
7 Inch Serial Touch Display	Equipment	1	9000	9000
Battery 5500 mAh	Equipment	1	6000	6000
Soldering Iron	Equipment	1	1200	1200
Glue gun	Equipment	1	1200	1200
Glue Gun sticks	Equipment	5	30	150
Soldering Wire	Equipment	1	300	300
HDMI cable	Equipment	1	200	200
SD-card	Equipment	1	1500	1500
VGA-HDMI Converter	Equipment	1	400	400
Connecting Wires	Equipment	5	75	375
Transport	Miscellaneous	1	3000	3000
Printing	Miscellaneous	1	2000	2000

Suspicious Speech Tracking Using Machine Learning

More Posts