Emotions play an extremely important role in human mental life. It is a medium of expression of one?s perspective or one?s mental state to others. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. Speec
Speech Emotion Recognition For the Urdu Language
Emotions play an extremely important role in human mental life. It is a medium of expression of one’s perspective or one’s mental state to others. Emotion recognition is a rapidly growing research domain in recent years. Unlike humans, machines lack the abilities to perceive and show emotions. Speech Emotion Recognition (SER) can be defined as extraction of the emotional state of the speaker from his or her speech signal. In this project we are working on Speech Emotion Recognition for Urdu Language in four emotions neutral, anger, happiness, sadness. Urdu is the national language of Pakistan and is spoken by most Pakistanis as either first or second language.
Data (speech recordings) is collected from individuals using a specialized recording setup. The eGeMAPs (Geneva Minimalistic Acoustic Parameter Set) featureset is used to represent audio signals as acoustic features which are then used to train a support vector machine classifier to predict the category of emotion based on the audio signal. The classifier is trained, validated, and tested using 4-fold Stratified Cross Validation.
Objective 1: To gain a basic understanding of speech emotion detection and how to create a novel dataset for speech emotion recognition
Objective 2: To create a large dataset of audio recordings that consist of subjects eliciting different kinds of emotions based on scripted speech.
Objective 3: To train machine learning models which can differentiate between different types of human emotions.
General process flow diagram for Project Implementation

Process flow diagram for Project Implementation: Feature Engineering and Machine Learning

We, as Pakistanis, need to develop local solutions for local problems
There are several datasets for Speech Emotion Recognition for many languages but the most popular dataset for Urdu language contains only 400 examples -- there is a need to create a larger dataset to advance the body of knowledge for speech emotion recognition for Urdu language. Our FYP seeks to create such a dataset.
Emotion sensing technology can help employees make better decisions, improve their focus and performances in the workplace and also help them to adopt healthier and more productive working style.
Extraction of Audio Features
We shall use acoustic features as defined in eGeMaps feature set (Extended Geneva Minimalistic Acoustic Parameter Set).[1]
This is a standard acoustic feature set that is used to recognize emotions. GeMAPS is based on an automatic extraction system which extracts an acoustic parameter set from an audio waveform without manual interaction or correction
This features set has 88 features.
Some eGeMAPS features are pitch, loudness, zero crossing rate etc.
For eGeMAPS features extraction, we have used openSmile toolkit [3]
open-source Speech and Music Interpretation by Large-space Extraction is an open-source toolkit for audio feature extraction and classification of speech and music signals.
openSMILE is widely applied in automatic emotion recognition for affective computing [1,2,3]
Classification
We shall use K-fold Stratified Cross Validation which dividend Dev partition data into training and validation splits and here we have kept K=4.
4-fold means that each part is used once for testing the performance of classifier
For classification we shall use Linear SVC (Support Vector Classification).
Test partition is used to optimize Cost parameter of Linear SVC
We will use Python Programming Language and Google Collab to write and execute arbitrary python code as it is well suited to machine learning and data analysis
Consent Forms
We will ask them to utter sentences in different emotions i.e Happy , Sad , Angry and Neutral
We will explain the objectives of our project to each subject and receive their consent (written consent from most and verbal from the rest.We shall get written consent from those as well)
References
[1] Florian Eyben, Klaus Scherer, Bj ?orn Schuller, Johan Sundberg, Elisabeth Andr ?e, Carlos Busso,Laurence Devillers, Julien Epps, Petri Laukka, Shrikanth Narayanan, Khiet Truong “The Geneva Minimalistic Acoustic ParameterSet (GeMAPS) for Voice Research andAffective Computing. Published 1 April 2016.Computer Science.IEEE Transactions on Affective Computing.
[2] Siddique Latif1, Adnan Qayyum1, Muhammad Usman2, and Junaid Qadir1 “Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages, ” published in 2018
[3] OpenSmile: the munich versatile and fast open-source audio feature extractor
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| BOYA BY MM1 Microphone | Equipment | 1 | 2500 | 2500 |
| Tripod Stand | Equipment | 1 | 3000 | 3000 |
| Consent form printing | Miscellaneous | 2000 | 5 | 10000 |
| Total in (Rs) | 15500 |
The project reiterates on creating a device which can inform the police and their rel...
During the last years, the developments of new media have revolutionized individuals? beha...
Malware is extremely deleterious to an Android operating systems alike desktop operating s...
Quad-copter is a fun toy and everybody knows about it. What they don?t know is its ba...
Visible-light communication technology has a great scope in future. This technology provid...