Abstract: Sentiment Analysis aims to detect positive, neutral, or negative feelings from text, whereas Emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise. Emotions can
Real-time Emotion Detection using Speech, Text and Visual Image Data
Abstract:
Sentiment Analysis aims to detect positive, neutral, or negative feelings from text, whereas Emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise. Emotions can play an important role in how we think and behave. The emotions we feel each day can compel us to take action and influence the decisions we make about our lives, both large and small. We have chosen to diversify the data sources we used depending on the type of data considered. For the text input, we are using the Stream-of-consciousness dataset that was gathered in a study by Pennebaker and King [1999]. It consists of a total of 2,468 daily writing submissions from 34 psychology students (29 women and 5 men whose ages ranged from 18 to 67 with a mean of 26.4). For audio data sets, we are using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). This database contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 females, 12 males). Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. All conditions are avail-able in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).” For the video data sets, we are using the popular FER2013 Kaggle Challenge data set. The data consists of 48x48 pixel grayscale images of faces. We analyze facial, vocal and textual emotions using Machine Learning based android Application. We are exploring state of the art models in multimodal sentiment analysis. We have chosen to explore text, sound and video inputs data and develop an ensemble model that gathers the information from all these sources and displays it in a clear and interpretable way on Android Device.
Introduction:
Human beings convey their feelings and certain conditions through different types of emotions. These emotions are conveyed by facial expressions, texts or their speech. By understanding these emotions, we can find out what a particular person is feeling that moment, what are his or her thoughts on a certain topic or a situation.
In today’s modern age of science and technology, the recent advancements in neural networks and machine learning have enabled us to create an emotional interaction between machines and humans. This helps a lot in getting result and other useful information from the machine by understanding the person’s emotions interacting with the machine.
Keeping these things in my mind we have come up with a project which is a Real Time Emotion Recognition system (RTERS), which will predict emotions of a particular person based on his manner of face Expressions, Texts or Speaking.
Aim and Objectives:
During the 1970s, psychologist Paul Ekman identified six basic emotions that he suggested were universally experienced in all human cultures. The emotions he identified were happiness, sadness, disgust, fear, surprise, and anger. We will classify our data into six type of classes of emotion. The emotions classes will be same for text, images and vocal data. The difference will only be in processing and model.

Figure 1: Global Emotion Detection and Recognition Market
How we Processed these different types of Data:

Figure 2: Dealing with different types of data

Figure 3: Flow Diagram for Naive Bayes Fusion
Applications:
Methodology:
Methodology for Speech Emotion Analysis:
Feature extractor:
In this part the recorded audio file is loaded by the algorithm and starts extracting the following features from it:
Data processing:
This section processes all the extracted features into a dataset for prediction.
Emotion Prediction:
Once the processed data is obtained, it will be compared with the data obtained from model training for comparing and matching results. The closest result will be the predicted emotion by the algorithm.
Methodology for Text Emotion Analysis:
Feature extractor:
In this part the recorded text file is loaded by the algorithm and starts extracting the following features from it:
Data processing
This section processes all the extracted features into a dataset for prediction.
Emotion Prediction:
Once the processed data is obtained, it will be compared with the data obtained from model training for comparing and matching results. The closest result will be the predicted emotion by the algorithm.
Methodology for Real Time Face Emotion Analysis:
Feature extractor:
In this part the recorded image file is loaded by the algorithm and starts extracting the following features from it:
| Features | Description | Size |
| Xm1 | Width of mouth Height of mouth | 1 × 1 |
| Xm2 | Distance between nose and mouth | 1 × 1 |
| Xse | Error between mouth and template | 6 × 1 |
| Xe1 | Distance between two eye brow | 1 × 1 |
| Xe2 | Distance between eye and eye brow | 1 × 1 |
| Xe3 | Distance between nose and eye(left side) | 1 × 1 |
| Xe4 | Distance between nose and eye(right side) | 1 × 1 |
| Xse | Error between eye and template | 4× 1 |
Data processing:
This section processes all the extracted features into a dataset for prediction.
Emotion Prediction:
Once the processed data is obtained, it will be compared with the data obtained from model training for comparing and matching results. The closest result will be the predicted emotion by the algorithm.
Features
Xm1
Xm2
Xse
Xe1
Xe2
Xe3
Xe4
Xse
There is no such system available which will have analyzed all three types of data. The system available can only work on single type of data and for speech we haven’t any good quality android base application which will predict the emotion from speech type data with good accuracy. We will provide good accuracy and efficient response based android Application which will process all three types of data and recognize the real time emotions. In future we will update it to cross platform to support both IOS and Android Operating System.
Benefits:
Requirement Specification:
In this chapter we will give detailed description about the software and as well as hardware. It also includes the interaction of client with the software. It also includes the backend processing requirements.
Requirement specification consists of two parts functional and non-functional requirements.
Non Functional Requirements:
PERFORMANCE:
The performance relies heavily on the internet connection used by the user, better the internet connection, the quicker he will get the results.
RELABILITY:
The accuracy of the results is dependent of the recorded audio file quality and the user’s speech characteristics. It is also independent of user text quality and image backgrounds etc.
AVAILABILITY:
The system will be available to access via app 24/7 unless down due to maintenance issues.
CAPACITY:
The system will only store a log of registered users and the data collected during model training for prediction part.
SECURITY:
The system will ensure user privacy and protection. The recorded audio files or images and text taken for analysis will be immediately removed once the results are generated by the system.
USABILITY:
The users will only need to use the app, which is easy to use and understand.
Framework of Speech Base Emotion recognition:

Figure 4: Design 1 for Speech
Framework of Text Base Emotion Recognition:

Figure 5: Design 2 for Text
Framework of Visual Image Base Emotion Recognition:

Figure 6: Design 3 for Visual Image
Figure 7: Use Case Diagram for Emotion Detection
Activity Flow Analysis:

Figure 8: Activity Analysis Diagram for Emotion Detection
Class Based Analysis of Emotion Detection:

Figure 9: Class Analysis Diagram for Emotion Detection
| Elapsed time in (days or weeks or month or quarter) since start of the project | Milestone | Deliverable |
|---|---|---|
| Month 1 | Complete Software Requirements Specification and Design | yes |
Solar energy is the most abundant source of energy in the world. Solar power is not only a...
The main purpose of this project is to achieve a successful working prototype that is capa...
Our project has a new approach toward the advertisement industry. Holographic technology h...
This website provides multiple categories (food, photographer, grocery) at one platform....
Water scarcity is growing exponentially, hindering growth in food production and harming h...