Adil Khan 11 months ago
AdiKhanOfficial #FYP Ideas

Speaker Identification using deep learning

Voice is a basic or you can say the most important part of human?s everyday routine. Humans talk and they do talk a lot. The question that we are going to solve in this project is, who is speaking? Speaker identification is the procedure of consequently distinguishing the individual talking in the v

Project Title

Speaker Identification using deep learning

Project Area of Specialization

Artificial Intelligence

Project Summary

Voice is a basic or you can say the most important part of human’s everyday routine. Humans talk and they do talk a lot. The question that we are going to solve in this project is, who is speaking? Speaker identification is the procedure of consequently distinguishing the individual talking in the voice sample. This all can be done by using the voice specific information such as frequency, size of vocal tracts in voice waves to correctly identifying the person. Neural Network is train on the dataset contain the voices of different persons, then we give one voice of a person as a sample and neural network then compare that voice sample features to all the other voices to correctly identifying the person. Speaker identification has many applicable services such as telephone banking, control for confidential information, information services, changing passwords and credentials over telephone network, reservation services, access to remote computers and database access services.

We are going to implement a deep neural network that have an input alyer, an output layer and multiple hidden layers in between. The purpose behind making this neural network is to correctly identify the speaker’s name. First the required features will be extracted from the voice samples. At that point these extracted features is given as contribution to the neural system as training data. A tester’s voice print will be given to the user to check whether the neural network correctly identifies the speaker or not.

A voice signal is composed of many different transformations in many variables. Differences in these variables make the human sound different. In speaker identification, we are taking into account all the transformation in the voices. If we achieve 90% accuracy, it can be used in security applications such as users changing their PIN (Personal Identification Number) of credit card number over a telephone network. Below this it can be used for biometric verification such as attendance in universities through voice. This system is going to add an extra layer of security in the existing systems and give an extra edge to our LEAs (Law Enforcement Agencies) in tackling criminal and terrorist activities.

Project Objectives

  • The fundamental objective is to structure and execute a speaker acknowledgment framework utilizing convolutional neural network.
  • To design a software that can help forensic experts, lawyers and government agencies.
  • To identify the speaker from a voice sample.
  • To achieve higher accuracy and efficiency in speaker recognition.
  • To increase the ability of identification process over large number of speakers.
  • To develop a real time speaker recognition system.
  • To discriminate between the given speaker and all others.
  • To know the speaker’s identity.

Project Implementation Method

Project implementation method include the functional requirements, non-functional requirements that we are going to implement and stackholders list, tools and technology, Block diagram and work break down structure diagram.

Functional Requirements

FR 01: Input Voice Signal

Req. No

Functional Requirements

FR 1-1

The speaker identification system prompt user to input voice signal.

FR 1-2

The system should restrict user that voice signal should be in .wav format.

FR 1-3

The voice file can be shown on the screen when uploaded.

FR 02: Signal Pre processing

Req. No

Functional Requirements

FR 2-1

The system should remove background noise to reduce effect on feature extraction.

FR 2-2

The system should convert the analogue voice signals to digital signals.

FR 2-3

The system should apply a high pass speech filter to increase the energy of high frequency

FR-03: Feature Extraction

Req. No

Functional Requirements

FR 3-1

The system should extract MFFC from the input voice signal

FR 3-2

The system should draw a spectrogram (a visual representation) of voice signal.

FR 3-3

The system should display all the resultant of extracted features on screen.

FR - 04: Save and Store Features

Req. No

Functional Requirements

FR 4-1

The system should extract features from the voices stored in database.

FR 4-2

The system should draw a spectrogram (a visual representation) of voices stored in database.

FR 4-3

The system should store all the resultant of extracted features.

FR - 05: Feature Matching

Req. No

Functional Requirements

FR 6-1

The system should match the MFFC features of input voice and voice from database.

FR 6-2

The system should match spectrogram of input voice and voice in database.

FR - 06: Generate Result

Req. No

Functional Requirements

FR 7-1

The system should calculate the similarity between input voice and voices in database.

Non-Functional Requirements

  • Performance - Average system response will be less than 5 seconds.
  • Security - The system should only allow the authorized user to enter through login module.
  • Defect Maintainence - There must be not more than 1 critical defect after releasing system.
  • Usaability - All the data must be recoverable within 30 minutes if system faces any server crash.
  • Reliability - The system will provide detailed help and manuals for speaker recognition process.

Stackholders List

  1. Developers
  2. Project Supervisor
  3. Final year project committee
  4. Users

Tools and Technology

Following are the tools and technologies that we are going to utilize;

  • Python: This language is best for machine learning and deep learning, support object oriented paradigms.
  • PyCharm Editor: This editor is an IDE for python language.
  • Deep Neural System: It is a sub type of neural networks.
  • Github: A reposito

Req. No

FR 1-1

FR 1-2

FR 1-3

Req. No

FR 2-1

FR 2-2

FR 2-3

Req. No

FR 3-1

FR 3-2

FR 3-3

Req. No

FR 4-1

FR 4-2

FR 4-3

Req. No

FR 6-1

FR 6-2

Req. No

FR 7-1

Benefits of the Project

Some benefits of the project are given below:

  • Exploring a field in which a lot less work has been done in Pakistan.
  • To promote a culture of solving complex problems with the use of neural networks.
  • Telephone banking - checking whether the person speaking to the operator is the authorize user by comparing his/her voice with the voice sample.
  • Giving authorize person access to remote computers.
  • Control for confidential information - giving access to the person only who has the authorization to use and see it using voice biometric.
  • Biometric verification of teachers and students in universities and schools.
  • Changing passwords and credentials of personal accounts by user over the telephone network by authorizing them through their voice.
  • In criminal investigations, investigationg the voice of culprits with the recorded voice samples to catch them.
  • In helping LEAs (Law enforcement agencies) IN tackling the terrorist activities by comparing their voice biometrics.
  • Accessing database over telephone network by authenticating and identifying the person.

Technical Details of Final Deliverable

The final delieverable will contain a software and a user manual for the users of the software:

Software

Software has two components, front end and a back end, front end contains the GUI (graphical user interface) and back end contains the neural network, database and all the necessary components to run the software

Front end:

Front end has the graphical user interface, through that graphical user interface user will interact with the software. The GUI will have all the necessary options such as to add the voice sample and label that voice sample. A button to start training the model. A screen to show the spectogram of the voice. Another button to select the tester voice sample. A button to compare that voice sample with all the other samples on which the model has been trained on. A graph to show the probability of similiarity between the different voices, graph only show the top three which has the similarity with the tester voice.

Back end:

Back end will contain a database and a neural network. Database will use for storing the variables from extracted futures. First the features will be extracted from the voice samples for the time and then those extracted feature variable pass on to the neural network to train. Neural network has multiple hidden layers from which the features are extracted and the model train it self to correctly compare the voice sample with the tester voice sample.

User Manual 

 User manual will contain all the necessary information on how to run the software. It is basically a hard printed document that contains the detail about the minimun requirements to run the software, technical details and the training a person require before hand to run the software.

Final Deliverable of the Project

Software System

Type of Industry

IT , Security

Technologies

Artificial Intelligence(AI), Big Data

Sustainable Development Goals

Industry, Innovation and Infrastructure, Peace and Justice Strong Institutions

Required Resources

Req. No

Functional Requirements

FR 2-1

The system should remove background noise to reduce effect on feature extraction.

FR 2-2

The system should convert the analogue voice signals to digital signals.

FR 2-3

The system should apply a high pass speech filter to increase the energy of high frequency

If you need this project, please contact me on contact@adikhanofficial.com
VIRTUAL TRIAL ROOM USING AUGEMENTED REALITY

This report introduces a Virtual Trial Room application utilizing Augmented Reality which...

1675638330.png
Adil Khan
11 months ago
online products recognition through augumented reality

Project summary: AR store is an augmented reality E Commerce mobile application that puts...

1675638330.png
Adil Khan
11 months ago
Refrigeration System based on Vapour Absorption Cycle having Solar Pow...

Our project is based on vapour absorption cycle for refrigeration. It uses heat...

1675638330.png
Adil Khan
11 months ago
Design and Development of Industrial Grade Delta Robot

In international industries various type of delta robots of renowned brands are used for p...

1675638330.png
Adil Khan
11 months ago
Design of a low power and portable ultrasound machine

Ultrasound, also known as ultrasonography, is an important part of medical applications. I...

1675638330.png
Adil Khan
11 months ago