Acoustic modelling using deep learning for Quran recitation assistance

The project is the first step in making a recitation assistance system for Quran that will allow the user to know whether or not he is reciting the Qur?an in the correct manner. The traditional Automatic Speech Recognition System or ASR is comprised of several modules namely the acoustic model, pron

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

The project is the first step in making a recitation assistance system for Quran that will allow the user to know whether or not he is reciting the Qur’an in the correct manner. The traditional Automatic Speech Recognition System or ASR is comprised of several modules namely the acoustic model, pronunciation dictionary, language model and search decoder. This project is aimed at development of acoustic model for recitation system. The acoustic model is a model that maps the phonemes of a language to their respective spectral information. The key goal is to model and train an acoustic model that would convert recited ayah into their corresponding phonemes by analyzing patterns of speech using methods of deep learning.

Project Objectives

The aim of the project is to help spread the right way or method of recitation of Holy Qur'an. The technical objectives of the project are

To collect and transcribe overall 10 hours of audio data from multiple Qaris to obtain dataset.
To develop a module for extracting useful features from raw speech signal using Mel-Frequency Cepstral Coefficients or MFCCs.
To train an acoustic model to map the extracted feature vectors to their corresponding phonemes.
To develop a web-based software application based on MVC pattern for using this acoustic model.

Project Implementation Method

The project comprises of two modules namely the feature extraction module and the acoustic model.

The feature extraction module would take input a speech waveform and then apply a series of steps on it to extract MFCC features. The input speech must be sampled at 22050 Hz and 16-bit depth rate for better performance. The raw speech would be sliced into one second small clips and then each clip would be pre-emphasized, windowed into overlapping frames of 25ms with 10ms stride and transformed by using short time Fourier transform to preserve temporal relation. The resultant spectrum would be converted into Mel scale and then log magnitude of this spectrum would be transformed to quefrency domain to extract MFCCs. The final output would be transformed using technique such as LDA to reduce dimensionality of feature vectors. The MFCC extraction process would be implemented in Octave script.

These feature vectors along with their transcriptions would be fed into acoustic model to find a mapping from spectral information to phonemes. Traditionally, gaussian mixture models were used to estimate probabilities and then hidden Markova models were used for mapping but we will use the deep learning approach i.e. convolution neural networks to train this model as it has been proved to give the best performance in acoustic modelling. The CNN would be implemented in python using relevant libraries.

Once these two modules would have been developed, then an MVC based web application would be developed to use this model. This application would take in inputs from user and then use the model we had trained to predict the recited phonemes and display them back to user. The web application would use Angular JS on front end and python on backend.

Benefits of the Project

The beneficiaries of the project are not limited to the people of Pakistan but Muslims all around the world. The users of the complete application would be able to validate their recitation skills without needing human assistance, thus helping in spreading the knowledge of Qur’an among masses. The architecture of application would be designed in such a way that it would be adaptable to other Surahs, recitors and dialects if trained properly thus enabling the people to extend it.

Technical Details of Final Deliverable

The final deliverable would comprise of

A feature extraction module to generate feature vectors from audio waves.
The trained model to predict the phoenemes given the feature vector.
A web application using these two modules to provide a phoneme transcription for the recited verse.
An instruction manual for using the model and application.

Final Deliverable of the Project

Software System

Core Industry

Education

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Quality Education

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
GPU GeForce RTX 2060 AMP	Equipment	1	65000	65000
Printing	Miscellaneous	1	2000	2000
Microphone	Equipment	1	2500	2500
			Total in (Rs)	69500

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

EMS based Hybrid Solar-Wind Power Generation System

The project is based on design and development of a hybrid power generation system that wi...

Adil Khan

9 months ago

Cyber Beep

Cyber beep: Our project is related to cyber/information secuirity, the main goal of our pr...

Adil Khan

9 months ago

Automatic Road Pavement Distress Detection Through Deep Convolutional...

Road potholes and cracks are the types of road pavement distress, these factors jeopardize...

Adil Khan

9 months ago

Early detection and segmentation of malignant brain tumors using machi...

This project aims to develop fast, efficient and reliable methods for diagnosing brain can...

Adil Khan

9 months ago

AAGAHEE

Our project focuses on creating a unique digital platform. This application will&nbsp...

Adil Khan

9 months ago