Speech Emotion Recognition For the Urdu Language

Emotions play an extremely important role in human mental life. It is a medium of expression of one?s perspective or one?s mental state to others. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. Speec

2025-06-28 16:29:37 - Adil Khan

Project Title

Speech Emotion Recognition For the Urdu Language

Project Area of Specialization Information & Communication TechnologyProject Summary

Emotions play an extremely important role in human mental life. It is a medium of expression of one’s perspective or one’s mental state to others. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. Speech Emotion Recognition (SER) can be defined as extraction of the emotional state of the speaker from his or her speech signal. In this project we are working on Speech Emotion Recognition for Urdu Language in four emotions neutral, anger, happiness, sadness. Urdu is the national language of Pakistan and is spoken by most Pakistanis as either first or second language.

Data (speech recordings) is collected from individuals using a specialized recording setup. The eGeMAPs (Geneva Minimalistic Acoustic Parameter Set) featureset is used to represent audio signals as acoustic features which are then used to train a support vector machine classifier to predict the category of emotion based on the audio signal. The classifier is trained, validated, and tested using 4-fold Stratified Cross Validation. 

Project Objectives

Objective 1: To gain a basic understanding of speech emotion detection and how to create a novel dataset for speech emotion recognition

Objective 2: To create a large dataset of audio recordings that consist of subjects eliciting different kinds of emotions based on scripted speech.

Objective 3: To train machine learning models which can differentiate between different types of human emotions.

Project Implementation Method

General process flow diagram for Project Implementation

'Speech Emotion Recognition For the Urdu Language' _1659395284.png

Process flow diagram for Project Implementation: Feature Engineering and Machine Learning

'Speech Emotion Recognition For the Urdu Language' _1659395285.png

Benefits of the Project

We, as Pakistanis, need to develop local solutions for local problems 

There are several datasets for Speech Emotion Recognition for many languages but the most popular dataset for Urdu language contains only 400 examples -- there is a need to create a larger dataset to advance the body of knowledge for speech emotion recognition for Urdu language. Our FYP seeks to create such a dataset.

Emotion sensing technology can help employees make better decisions, improve their focus and performances in the workplace and also help them to adopt healthier and more productive working style.

Technical Details of Final Deliverable

Extraction of Audio Features

We shall use acoustic features as defined in eGeMaps feature set (Extended Geneva Minimalistic Acoustic Parameter Set).[1]

This is a standard acoustic feature set that is used to recognize emotions. GeMAPS is based on an automatic extraction system which extracts an acoustic parameter set from an audio waveform without manual interaction or correction

This features set has 88 features.

Some eGeMAPS features are pitch, loudness, zero crossing rate etc.

For eGeMAPS features extraction, we have used openSmile toolkit [3]

open-source Speech and Music Interpretation by Large-space Extraction is an open-source toolkit for audio feature extraction and classification of speech and music signals.

openSMILE is widely applied in automatic emotion recognition for affective computing [1,2,3]

Classification

We shall use K-fold Stratified Cross Validation which dividend Dev partition data into training and validation splits and here we have kept K=4.  

4-fold means that each part is used once for testing the performance of classifier

For classification we shall use Linear SVC (Support Vector Classification).

Test partition is used to optimize Cost parameter of Linear SVC

We will use Python Programming Language and Google Collab to write and execute arbitrary python code as it is well suited to machine learning and data analysis

Consent Forms

We will ask them to utter sentences in different emotions i.e Happy , Sad , Angry and Neutral

We will explain the objectives of our project to each subject and receive their consent (written consent from most and verbal from the rest.We shall get written consent from those as well)

References

[1] Florian Eyben, Klaus Scherer, Bj ?orn Schuller, Johan Sundberg, Elisabeth Andr ?e, Carlos Busso,Laurence Devillers, Julien Epps, Petri Laukka, Shrikanth Narayanan, Khiet Truong “The Geneva Minimalistic Acoustic ParameterSet (GeMAPS) for Voice Research andAffective Computing. Published 1 April 2016.Computer Science.IEEE Transactions on Affective Computing.

[2] Siddique Latif1, Adnan Qayyum1, Muhammad Usman2, and Junaid Qadir1 “Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages, ” published in 2018 

[3] OpenSmile: the munich versatile and fast open-source audio feature extractor

Final Deliverable of the Project HW/SW integrated systemCore Industry TelecommunicationOther Industries IT , Media Core Technology Artificial Intelligence(AI)Other Technologies Others, Big DataSustainable Development Goals Good Health and Well-Being for People, Industry, Innovation and Infrastructure, Sustainable Cities and CommunitiesRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 15500
BOYA BY MM1 Microphone Equipment125002500
Tripod Stand Equipment130003000
Consent form printing Miscellaneous 2000510000

More Posts