Arabic Parts of Speech Tagging
Arabic Part-Of-Speech Tagging is a software which combines morphological analysis with Hidden Markov Model (HMM) and relies on the Arabic sentence structure. On the one hand, the morphological analysis is used to reduce the size of the tags lexicon by segmenting Ara
2025-06-28 16:30:17 - Adil Khan
Arabic Parts of Speech Tagging
Project Area of Specialization Software EngineeringProject SummaryArabic Part-Of-Speech Tagging is a software which combines morphological analysis with Hidden Markov Model (HMM) and relies on the Arabic sentence structure. On the one hand, the morphological analysis is used to reduce the size of the tags lexicon by segmenting Arabic words in their prefixes, stems, and suffixes due to the fact that Arabic is a derivational language. On the other hand, HMM is used to represent the Arabic sentence structure in order to take into account the logical linguistic sequencing. For these purposes, an appropriate tagging system has been proposed to represent the main Arabic part of speech in a hierarchical manner allowing an easy expansion whenever it is needed. Each tag in this system is used to represent a possible state of the HMM and the transitions between tags (states) are governed by the syntax of the sentence.
Project Objectives- Objective 1: As we know that Arabic language is very much important to every Muslim in the world. Quran was revealed on Prophet Muhammad (P.B.U.H) in Arabic. It is important for all Muslims to understand teachings of Quran. Whereas for a non-Arab person who don’t know Arabic it is important for him or her to understand Arabic which will help to understand the Holy Quran as well as Arabic also an international language that has been used by many of other countries as well. This project will sort out Arabic POS and nature of Arabic sentences to make understandings easier.
- Objective 2: Due to this software, it is easier for every Muslim to become aware of the Arabic language and to understand it a person should know the basic methodology of an Arabic sentence. Students are facing problem in doing ‘Tarkeeb’ in Arabic sentences. The basic feature of this project to do ‘Tarkeeb’ of Arabic sentence. It will help every student to recognize the parts of speech of Arabic sentence, the sentence will break into words then we apply morphology analysis and then HMM will be used to represent the Arabic sentence structure in order to take into account the logical linguistic sequencing.
- Objective 3: The outcome is to make every Muslim aware of the Arabic language, and it will become possible and easier for exactly every Muslim to study and understand Arabic language anywhere he wants, so that he can improve his mistakes regarding the Arabic verses, because these mistakes can be identified when a person actually knows the methodology or formulation of an Arabic sentence.
HIDDEN MARKOV MODEL:
The other approach is statistical approach it contains HMM algorithm. HMM generates model based on set of some input sequence these input sequences are also called states like s1, s2... HMM work like finite state machine. It contains two states one is hidden state and the other one is visible state represented by W and V respectively. Statistical approach is specified on the internal structure of the Arabic sentence. When we entered the Arabic text, it recognizes the morphological characteristics of the word. The use of the linguistic inner structure of the Arabic sentence will permit us to recognize logical sequence of word, and as a result of their corresponding tags. The probability of a certain word occur depends upon its previous word it in a given condition the HMM will be the possible statistical model to keep track of this history. A linguistic study is conducted to determine the Arabic sentence structure by identifying the different main form of nominal and verbal sentences. Every state of an HMM is represented by a possible tag in the lexicon and the transition between states.
RULE BASED METHOD:
Assigns POS tags based on rules. For example (Arabic) we can have a rule that says, words ending with “???” or starting with “??” must be assigned to a noun. Rule-Based Techniques can be used along with Lexical Based approaches to allow POS Tagging of words that are not present in the training corpus but are there in the testing data.[
Benefits of the Project- The dictionaries of Arabic language are available on internet but the sentence formation and recognizing parts of speech tagger (POS tagger) in Arabic language has not been developed yet. People face difficulties to understand the meaning of sentence, its formation and recognizing of Parts of Speech Tag.
-
The main purpose of working on this project is to help the students/people to understand the language in way that is more efficient and provide complete description of sentence. Students/user will acquire the ability to learn different areas of knowledge and information of Arabic language to apply these learning affectively. The reason of working on this project is to convert the learning of Arabic language from traditional way to Technical way.
-
This application is very effective for Madrassah‘s Students especially who become Aalim and also for those people who are interested to seek the Arabic language.
- Project Deliverable 1: Making a two to three word sentence library in English after analyzing it making an Arabic two to three word sentence liabrary,after making thre library we will make a tokenizer which will distribute the whole sentence into words and yhen the library will deliver the stemmed text ,by this text Ism, Fail And Herf Are identified also known as noun, verb and adjective.
- Project Deliverable 2: After a two to three sentence library now, will make a database of whole Arabic letters and word, its meaning and its tagged sentence.
- Project Deliverable 3: After making the database our next step is connect the database to the library so that it can give a stemmed sentence every time we enter a new Arabic sentence
- Project Deliverable 4: Initially we will try to target 2 or three words sentence, then we will try to simplify complex sentence.
- Project Deliverable 5: Integrate all submodules and make the final product.