Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

ROMANIZED SINDHI TEXT FOR POS TAGGING AND SENTIMENT ANALYSIS

Artificial intelligence is advancing dramatically. It is transforming our world day by day, socially, Economically, politically. Artificial intelligence involves a variety of technologies and tools; Some of the recent technologies are: Natural Language Processing (NLP) is the intersect

Project Title

ROMANIZED SINDHI TEXT FOR POS TAGGING AND SENTIMENT ANALYSIS

Project Area of Specialization

Artificial Intelligence

Project Summary

Artificial intelligence is advancing dramatically. It is transforming our world day by day, socially, Economically, politically. Artificial intelligence involves a variety of technologies and tools;

Some of the recent technologies are:

Natural Language Processing (NLP) is the intersection of computer science, linguistics, and machine learning. The field focuses on communication between computers and humans in natural language and NLP is all about making computers understand and generate human language.

A POS tag (or part-of-speech tag) is a special label assigned to each token (word) in a text corpus to indicate the part of speech and often also other grammatical categories such as tense, number (plural/singular), case, etc.

Sentiment Analysis is one of the most popular NLP techniques that involves taking a piece of text (e.g., a comment, review, or a document) and determining whether data is positive, negative, or neutral.

Project Objectives

Aims:

The aim of our research is to develop such a platform of Sindhi Language that helps people of different regions can easily communicate with each other.

Objectives:

  1. To create a dataset for the Romanized Sindhi Language.
  2. To Preprocess the data.
  3. To apply an appropriate POS Tagger for Romanized Sindhi.
  4. To split the dataset into train and test by applying Naive Bayes    Classifier and TF-IDF.
  5. To classify the text for Sentiment Analysis.
  6. To find model accuracy using Naive Bayes Classifier and TF-IDF.

Project Implementation Method

Text Documentation:

The text Documentation phase shows how we collected the data for our research here we collect the group of Romanized Sindhi texts such as unigram, bigram, trigram, and N-gram words.

Data Preprocessing:

The data preprocessing phase we used to read and store the datasets and transform the raw data in a useful and understandable format.

Feature Selection:

 The feature Selection phase is the most important phase of our research, in this phase, we collected the data Romanized Sindhi Text from the previous phase Data Preprocessing.

Part-Of-Speech (POS) Tagger:

Part-Of-Speech (POS) Tagger is the most important phase of our research POS Tagger will assign unique grammatical tags to every word in a collection of words.

Training Dataset:

In the Training Dataset phase, we created a dataset for the Romanized Sindhi Language, as well as trained the datasets.

Sentiment Classification:

 Sentiment Classification is the main phase of our research, this phase is used to perform sentiment classification Romanized Sindhi Text as either positive sentiment or negative sentiment.

Benefits of the Project

Benefits of research Work:

To classify text either positive sentiment or negative sentiment applications are used in various applications such as

Social media monitoring

Customer feedback

Brand monitoring and reputation management

Customer Support

Product Analysis

Technical Details of Final Deliverable

The aim of our research is to develop such type of model that will help the Sindhi language in its digital existence. Our work focused on training the dataset for Romanized Sindhi language to perform Sentiment classification by using supervised Machine Learning Algorithms Naïve Bayes Classifier and Term Frequency - Inverse Document Frequency (TF-IDF) in python. We developed 500-word datasets of the Romanized Sindhi Language training set consisting of 80% of total data and the testing data consists of 20% of total data. The main problem is people of different regions can speak and understand the Sindhi language but could not read the written script in the Sindhi language because written scripts of the Sindhi language are different in various regions. There is a need for Romanized Sindhi language for an easy way to read written script of Sindhi language that will help to people of different regions can easily communicate with peoples of different regions in the written script such as e-mail, letter, Chat, etc.

When testing our datasets of Romanized Sindhi Text in python by using Term Frequency - Inverse Document Frequency (TF-IDF) Vectorization sentiment classification of text read either positive sentiment or negative sentiment after testing datasets here we find 67% accuracy of our model.

By applying Naïve Bayes Classifier sentiment classification of text read either positive sentiment or negative sentiment we find 76% accuracy of our model.

Final Deliverable of the Project

Software System

Core Industry

IT

Other Industries

Core Technology

Artificial Intelligence(AI)

Other Technologies

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Samsung A72 Equipment17000070000
Binding Thesis Books Miscellaneous 78005600
Stationary Miscellaneous 25802000
Photocopy Rough work Miscellaneous 3300900
Total in (Rs) 78500
If you need this project, please contact me on contact@adikhanofficial.com
Android Women Safety App

Women?s safety is a big concern is which has been the most important topic till date. Wome...

1675638330.png
Adil Khan
9 months ago
Dinner Drop

We have developed an E-Commerce System that manages any kind of small business. The main p...

1675638330.png
Adil Khan
9 months ago
Avion Formula Electric Car

Avion is a Electric Powered Formula Racing Car which is drive by a 3 Phase electric motor...

1675638330.png
Adil Khan
9 months ago
Automated Fortification and Smart Monitoring of Distribution Transform...

Distribution Transformers (DTs) are essential part of WAPDA's distribution system.The...

1675638330.png
Adil Khan
9 months ago
OpenCL code-generation backend for GPU enhance Neural Network

As we know GeNN (GPU enhanced Neural Networks) is a C++ library that generates code for ef...

1675638330.png
Adil Khan
9 months ago