Urdu is the national language of Pakistan and it is the 10th most globally spoken language in the world, with more than two hundred and thirty million total speakers all around the globe. Urdu script consists of thirty-eight basic letters. Urdu language shares alphabets with Persian and Arab
Handwritten Urdu Script Recognition Using Deep Learning
Urdu is the national language of Pakistan and it is the 10th most globally spoken language in the world, with more than two hundred and thirty million total speakers all around the globe. Urdu script consists of thirty-eight basic letters. Urdu language shares alphabets with Persian and Arabic language as well as the challenges faced in their character recognition. For e.g. the alphabet (Gaaf) can be misrecognized as alphabet (Kaaf) because of having quite same geometrical characteristics. In Urdu language, the alphabets can be divided into two parts
Joiner and
Non-Joiner
In joiner, the characters will join with other characters either at starting, middle, or at the end of the word. While the non-joiners appear in the isolated form. The task of recognizing the joiner letters of Urdu language is much more complex and challenging because of their overlapping nature as compared to the recognition of non-joiners.
Figure 1 - List of non-joiner and joiner alphabets of Urdu script (a) Non-joiner, (b) Joiner alphabets in Urdu script
Therefore, our aim is to make such a system, which will recognize Urdu Text efficiently and will provide output with acceptable accuracy. The advantages of Urdu handwritten text recognition is that it will convert from handwritten ?les into digital documents. It will be beneficial for the people who want to understand/learn Urdu as they can translate Urdu digital documents in to their native language.
Since the advent of computer vision and pattern recognition, handwritten scripts are being scanned, processed, and converted into searchable text. Handwriting recognition is an active area of research in the field of pattern recognition as it is regarded as one of the most challenging areas. The complex and inappropriate style of text, the shape/font, similarity of individual characters are the issues that make the recognition task more difficult. The popular techniques used for handwritten scripts recognition systems are Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) (built on top of OCR). The first Optical character recognition system which recognizes Latin characters was built in 1940. After that handwritten script recognition was developed for many other popular languages. Development of a character recognition system for Urdu language was started in the early 2000’s but the lack of research has resulted in a very few tools available which are able to recognize the Urdu language on printed books and therefore being used to digitize the printed books written in Urdu. But approximately no work has been done for the recognition of handwritten Urdu scripts. One of the main reasons for this slow growth is that Urdu script is cursive in nature.
Urdu language comprises of 36 alphabets and can be categorized into two groups: joiner and non-joiner. Joiner alphabets have different forms when written in front, middle or at the end of a word, whereas the non-joiner alphabets remain the same regardless of their position in a word. There are also more than 10 commonly used fonts of Urdu script. These special properties of Urdu language make the handwritten script recognition task even more challenging.
The Handwritten Urdu Recognition is basically a system that allows you to take a picture of your handwritten Urdu script through your mobile phone, recognizes the text written in the image, and will convert it into an editable digital text format. Therefore, for this we need an efficient Urdu handwritten recognition technique to achieve remarkable results and accuracy for recognizing handwritten texts. It will be built using an Artificial Neural Network technique that provides higher accuracy in image classification. The project has been undertaken because, there is not any available implementation of such level for this area of work as of yet. The model can be used almost everywhere throughout Pakistan. It can be used in Pakistan Post, digitization of handwritten notes and official documents written in Urdu.
Software technologies to be used:
1. Flutter - for mobile application development. Which will work as front-end for the user, from where the user will have to capture/select (an) image(s) and upload to the server.
2. Python - for server-side development. Which will work as a back-end for processing the image(s) received from the mobile application.
3. Tensorflow – It will be used to create the required Neural Network to process Urdu script images
4. Pandas
5. Numpy
6. Scikit-learn
Algorithms and Data Structures to be used:
We will be using a Neural Network Algorithm to process the Handwritten Urdu Script images into digital text/editable format.
Queues/Priority Queues will be used as data structures. When a user sends an image to the server, the job will be enqueued inside the Queue, and it will be waiting for its turn to be processed.
3. Arrays will also be used frequently throughout the project codes.
4. Matrices will be used as a data structure to store image pixels.
Requirements:
Data – the main requirement for the project is the data. We will need to collect handwritten Urdu scripts written by multiple people. The Urdu sentences need to be written on horizontal margins on the paper.
Virtual Server – a virtual server with specific requirements will also be required to run the Neural Network System.
The Handwritten Urdu Recognition is basically a system that allows you to take a picture of your handwritten Urdu script through your mobile phone, recognizes the text written in the image, and will convert it into an editable digital text format. Therefore, for this we need an efficient Urdu handwritten recognition technique to achieve remarkable results and accuracy for recognizing handwritten texts. It will be built using an Artificial Neural Network technique that provides higher accuracy in image classification. The project has been undertaken because, there is not any available implementation of such level for this area of work as of yet. The model can be used almost everywhere throughout Pakistan. It can be used in Pakistan Post, digitization of handwritten notes and official documents written in Urdu.
Fact-finding techniques being used:
Research - researching related projects, related research papers, related publications, and survey papers.
Sampling - sampling of existing datasets.
We will be using a specialized virtual server that will be running our Neural Network model. On the user’s side mobile phones will be required to capture the image of Handwritten Urdu Scripts, and then the image will be sent to the virtual server, and it will process the image, and will produce the digital text of the Urdu Script as output and will return it to the mobile phone from where the request was generated.
Requirements:
Data – the main requirement for the project is the data. We will need to collect handwritten Urdu scripts written by multiple people. The Urdu sentences need to be written on horizontal margins on the paper.
Virtual Server – a virtual server with specific requirements will also be required to run the Neural Network System.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Crucial RAM 16GB CT16G4DFRA266 | Equipment | 1 | 6300 | 6300 |
| ASUS GeForce RTX 2060 SUPER Overclocked 8G EVO GDDR6 Dual-Fan | Equipment | 1 | 55000 | 55000 |
| Neural Compute Stick | Equipment | 1 | 8700 | 8700 |
| Total in (Rs) | 70000 |
CNC machining is common in projects that require a high level of precision and repetition....
Finite element simulations of three laminates in open-hole and unnotched configurations su...
Fetal heart rate (FHR) was first introduced in the 17th century. It is an important p...
A lot of attention has been focused on radar cross section (RCS) reduction with the rapid...
To reduce/minimize the plastering time and attain smooth plastering. ...