Roman Urdu Hate Speech Detection

Hate speech is ordinarily characterized as any correspondence that belittles an objective gathering of individuals dependent on certain individuals dependent on some trademark, for example, race, shading, identity, sex, sexual direction, ethnicity, religion or profession.  Hate Speech is

2025-06-28 16:34:51 - Adil Khan

Project Title

Roman Urdu Hate Speech Detection

Project Area of Specialization Artificial IntelligenceProject Summary Summary

Hate speech is ordinarily characterized as any correspondence that belittles an objective gathering of individuals dependent on certain individuals dependent on some trademark, for example, race, shading, identity, sex, sexual direction, ethnicity, religion or profession. 

Hate Speech is seen as two classes – Hate speech that ought to be managed or potentially disallowed by law and Hate speech that is dangerous. Be that as it may, falls outside boundaries requiring state activity and guideline. 

Hate Speech Detection is the computerized assignment of distinguishing if a bit of text contains hate discourse. In any case, no such is yet done in the language generally utilized in our nation; Roman Urdu

The proposed project will provide a platform that will contain the statistics of hate speech collected and identified from online media stages; facebook, twitter, Instagram, YouTube remarks and a few sites like Siasat.pk , Jang Roznama, Siasat.pk(urdu), iJunoon, Y This News(Indian news Forum), Roman Urdu News(Pakistan News and Business News Forum), Ahlul Hadees(A Roman Urdu Islamic Blog) and Ahnaf Media Services in textual form

Anyone will be able to get knowledge to get a view that how much and of which type the hated content is being circulated in Roman Urdu. It will give an idea about the mentality and critical thinking of community. The top targets of hate discourse will be evaluated which will then be used to inform intolerance prevention campaigns on both local and national levels.

Project Objectives Objectives

By April 2021, we’ll detect Roman Urdu Hate Speech content on social media and websites via our proposed project. We have 7 weeks, 100 hours per week to complete our research and develop a platform to detect Roman Urdu Hate Speech Detection. 

Main objective is to give the awareness to the public about Hate Speech which is being circulated via social media like platforms. Everyone will have a way of getting to know about Hated Content. And this can also be applied on any website or blog to filter out their posts and comments from hated content.

Industry Objectives

As discussed above the purpose of the proposed project is to detect and remove the hated content from any platform. So from the industrial point of view, we would have the choice to sell our project to any industry that wants to keep check about the negative opinion of people about them. For instance, hatebase.org is working on hate speech detection in 95+ languages but it doesn’t deal with Roman Urdu Hate Speech. So this project can be the part of any existing project that needs to be in Roman Urdu. 

Industries working on such a type of project may help us in understanding and collecting the data. For example we can contact those industries who have collected the data regarding Roman Urdu since already trained data is more helpful for Data Scientists. Secondly, our research will be helpful for the beginners. Simple trained data is more helpful to train the model.

Research Objectives

To research and identify the percentage of hate in different social websites from the text written in Roman Urdu, we will be able to collect all the data from the proposed sources and to work on those data by applying the different techniques of machine learning. 

Research will define what type of data is being hatred frequently and what are the main sources of hated content. It will help to understand the basis of hatred situations.

Academic Objectives

After the study of Artificial Intelligence and Machine Learning, we will implement these techniques in our project. For Machine Learning programs, data is more important part. 

After the model is trained and working well, we will deploy this project in the form of a website application.

Project Implementation Method Methodology

We have divided our project in different small pieces of achievable modules (activities) which are as below for our project according to the current knowledge of our project members: 

Testing of each module  Benefits of the Project Benefits Technical Details of Final Deliverable Technical Details of Final Deliverable

As for achieving each activity there are some sub activities or tasks which have to perform are known as milestones of that project so just like all other projects our project has following key milestones and their deliverables: 

Feasibility Survey 

We did an online survey for the feasibility of our project and found it completely feasible along with some risks.

Current State of the Arts 

Analyzing and comparing the existing projects and apps in the working state in the market. A few of those are Hatebase, Hatebusters, and Haternet.

Functional Requirements 

Defining the functional requirements along with the stakeholders of our project and their importance for the project. 

Literature Review 

Exercising a deep study of the research work already done in the domain of our project; approx 8 research papers will be studied with work in different languages (English, Spanish, Arabic, and Turkish).

Dataset

Collecting a dataset using different scraping techniques. Estimated size dataset 15k.

Data Cleaning 

It includes removal of stop words, stemming, lemmatization and finally tokenization. 

Data Mapping 

To transform the collected data into a form of model to be trained. 

Data Training 

Selected model is then trained by collected and cleaned data. 

Testing 

Testing of the model on the training data, testing data and then on independent data.

Final Deliverable of the Project Software SystemCore Industry ITOther Industries Legal , Telecommunication Core Technology Artificial Intelligence(AI)Other Technologies Blockchain, NeuroTechSustainable Development Goals Peace and Justice Strong InstitutionsRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 57000
GPU Equipment15000050000
Domain Miscellaneous 125002500
Advertisement Miscellaneous 315004500

More Posts