Offensive Language Detection Using Machine Learning

Offensive Language detection Using Machine Learning (OLDUM) aims at developing a prototype of a system that, using machine learning, will be capable of detecting offensive words in Pashto language, helping in automating the process of AUDIO/VOICE notes by the social media Applications/Website and th

2025-06-28 16:28:41 - Adil Khan

Project Title

Offensive Language Detection Using Machine Learning

Project Area of Specialization Artificial IntelligenceProject Summary

Offensive Language detection Using Machine Learning (OLDUM) aims at developing a prototype of a system that, using machine learning, will be capable of detecting offensive words in Pashto language, helping in automating the process of AUDIO/VOICE notes by the social media Applications/Website and therefore stopping any offensive activity.

Project Objectives

This project will act as a prototype for law enforcement agencies/social media platforms to detect offensive talks among people conversing on phone calls/in audio files. Through this system, we want to help law enforcement/social media companies in tracing such critical calls and stop use of offensive language/cyber bullying.

The scope of our FYP is limited to selected words only. We will be covering limited words of pushto language which will be enough to train and deploy OLDUM. Also, initially, we will be training OLDUM for isolated words and connected words. Later on, after the completion of FYP prototype, the system can be upgraded to spontaneous speech.

Project Implementation Method

Records suspicious words. Call audios as input to the system. Monitor calls based on suspicious words stored in the dataset. Recording will be marked as suspicious so that it can be reviewed by the user.

Creation of suspicious and non-suspicious words dataset.

When the system is provided with new audio it should match the audio words with already given dataset.Dataset can be updated later.It will be able to process one call at a time.The dataset will be limited to selected words and will be trained according to those words, although it will have an option to expand the dataset according to the needs.The system will need high processing power, so we will have to take care of those specifications.

Benefits of the Project

Keeping up abreast of the new keywords and monitoring all calls manually is a gigantic problem faced by law enforcement agencies/Social Application. Many efforts have been put forward by researchers to systematically digitize this process through machine learning techniques. Pushtu language is still a problem for the agencies to cop with as Urdu and English based systems, all ready exits in the literature.

It benifits the non-rich language like pushto , in the use of social media to make the enviroment user friendly 

Technical Details of Final Deliverable

Final deliverable will consist of a software (installed on a raspberry pi ) and hardware

The system will hear specific words (audio) from pushto language using a mic and mark it a offensive or not 

The software will have a login prompt where user can register first than upload the audio to checck for offensive words

Final Deliverable of the Project HW/SW integrated systemCore Industry SecurityOther Industries Telecommunication Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Peace and Justice Strong InstitutionsRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 40000
rasberry Pi 4 Modle B Equipment12500025000
mic Equipment240008000
extra Miscellaneous 170007000

More Posts