Offensive Language Detection Using Machine Learning
Offensive Language detection Using Machine Learning (OLDUM) aims at developing a prototype of a system that, using machine learning, will be capable of detecting offensive words in Pashto language, helping in automating the process of AUDIO/VOICE notes by the social media Applications/Website and th
2025-06-28 16:28:41 - Adil Khan
Offensive Language Detection Using Machine Learning
Project Area of Specialization Artificial IntelligenceProject SummaryOffensive Language detection Using Machine Learning (OLDUM) aims at developing a prototype of a system that, using machine learning, will be capable of detecting offensive words in Pashto language, helping in automating the process of AUDIO/VOICE notes by the social media Applications/Website and therefore stopping any offensive activity.
Project ObjectivesThis project will act as a prototype for law enforcement agencies/social media platforms to detect offensive talks among people conversing on phone calls/in audio files. Through this system, we want to help law enforcement/social media companies in tracing such critical calls and stop use of offensive language/cyber bullying.
The scope of our FYP is limited to selected words only. We will be covering limited words of pushto language which will be enough to train and deploy OLDUM. Also, initially, we will be training OLDUM for isolated words and connected words. Later on, after the completion of FYP prototype, the system can be upgraded to spontaneous speech.
Project Implementation MethodRecords suspicious words. Call audios as input to the system. Monitor calls based on suspicious words stored in the dataset. Recording will be marked as suspicious so that it can be reviewed by the user.
Creation of suspicious and non-suspicious words dataset.
When the system is provided with new audio it should match the audio words with already given dataset.Dataset can be updated later.It will be able to process one call at a time.The dataset will be limited to selected words and will be trained according to those words, although it will have an option to expand the dataset according to the needs.The system will need high processing power, so we will have to take care of those specifications.
Benefits of the ProjectKeeping up abreast of the new keywords and monitoring all calls manually is a gigantic problem faced by law enforcement agencies/Social Application. Many efforts have been put forward by researchers to systematically digitize this process through machine learning techniques. Pushtu language is still a problem for the agencies to cop with as Urdu and English based systems, all ready exits in the literature.
It benifits the non-rich language like pushto , in the use of social media to make the enviroment user friendly
Technical Details of Final DeliverableFinal deliverable will consist of a software (installed on a raspberry pi ) and hardware
The system will hear specific words (audio) from pushto language using a mic and mark it a offensive or not
The software will have a login prompt where user can register first than upload the audio to checck for offensive words
Final Deliverable of the Project HW/SW integrated systemCore Industry SecurityOther Industries Telecommunication Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Peace and Justice Strong InstitutionsRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 40000 | |||
| rasberry Pi 4 Modle B | Equipment | 1 | 25000 | 25000 |
| mic | Equipment | 2 | 4000 | 8000 |
| extra | Miscellaneous | 1 | 7000 | 7000 |