Balochi, being our mother language and language spoken by all Baloch people living in Pakistan and around the globe, has not been digitalized yet. By digitalization, we mean, a human language understandable by Computer. Balochi has its written format, which matches to Urdu, Arabic, and other Arabic
Named Entity Recognition for Balochi
Balochi, being our mother language and language spoken by all Baloch people living in Pakistan and around the globe, has not been digitalized yet. By digitalization, we mean, a human language understandable by Computer. Balochi has its written format, which matches to Urdu, Arabic, and other Arabic scripted languages. In the process of digitalizing a language, some of the basic tasks are required after which the complex tasks can be performed, like Part of Speech tagging, Name Entity Recognition, Word Embedding, Dictionary and Translation management etc. We have chosen Name Entity Recognition, which is the process of identifying various names (Location, Address, Numeric, Date, Time) from a given text. Such kind of identifying will later help for extracting meaningful contents and values from a raw text. Named entity recognition (NER) is a sub-task of NLP. The purpose is to identify named entities mentioned in articles into predefined categories, such as a person’s name, organizations, locations, times, date, currency, and percentage. From the whole process of text analysis, NER belongs to the field of unknown word recognition. It is an important component of various NLP tasks, such as information retrieval, machine translation, and so on. The main idea behind our project is to make an AI model which identifies the named entities from the Balochi text.
The main objective of our project is to digitalize the Balochi language. Means to make it understandable for machine. In the process of digitalizing a language, some of the basic tasks are required after which the complex tasks can be performed, like Part of Speech tagging, Name Entity Recognition, Word Embedding, Dictionary and Translation management etc. We have chosen Name Entity Recognition, which is the process of identifying various names (Location, Address, Numeric, Date, Time) from a given text. Such kind of identifying will later help for extracting meaningful contents and values from a raw text.
The NER can be developed using three approaches, 'Rule-Based', 'Machine Learning, and 'Hybrid' approach. The Rule-Based system is difficult to develop as one should know the language and grammar rules. The machine learning approach provides different Statistical NLP tools to train the NER system. The hybrid approach is a combination of both Rule Base and Statistical based. We train our model by using supervised learning algorithm.
This project is important in the context of Natural Language Processing. We are specifically going to perform NER for Balochi. The project’s outcome will help digitalizing Balochi in the modern digital world. This project will work like a helping tool later for greater NLP tasks like Information Retrieval or specific content retrieval from any given Balochi text. The project will familiarize Computer with various Name Entities of Balochi Language. Later, this model can be utilized in various other projects.
Our Project Named Entity Recognition for Balochi (NERB) identifies the named entities like (person, location, address, date, numerical, financial values, etc.) from the Balochi text and categorizes them into default categories. It uses a pre-defined training set. And will be monitored using a machine learning algorithm according to the training set given. After training it will take a sample text, tokenize the text, and identify the named entities from the tokens and tag them into their categories
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| 50 MB Internet Connection | Equipment | 6 | 5230 | 31380 |
| Modem | Equipment | 2 | 3000 | 6000 |
| twisted pair cable | Equipment | 300 | 20 | 6000 |
| Ethernet cable | Equipment | 100 | 30 | 3000 |
| Grammarly premium license | Equipment | 4 | 3000 | 12000 |
| 8gb RAM | Equipment | 2 | 5400 | 10800 |
| Printing | Miscellaneous | 10 | 100 | 1000 |
| Stationery | Miscellaneous | 10 | 100 | 1000 |
| Overhead | Miscellaneous | 6 | 1300 | 7800 |
| Total in (Rs) | 78980 |
In the name of Allah, the most Gracious and the Most Merciful. Peace and blessing of Alla...
Automatic License Plate Recognition system is a real time embedded system which automatica...
In Electrical labs, various experiments are performed on different loads, when different r...
Agriculture is the backbone of our economy, In Pakistan 62% population lives rural areas,...
In three phase distribution system the unbalance phenomenon occurs due to single-phase loa...