Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

Handwritten Urdu Script Recognition Using Deep Learning

Urdu is the national language of Pakistan and it is the 10th most globally spoken language in the world, with more than two hundred and thirty million total speakers all around the globe. Urdu script consists of thirty-eight basic letters. Urdu language shares alphabets with Persian and Arab

Project Title

Handwritten Urdu Script Recognition Using Deep Learning

Project Area of Specialization

Computer Science

Project Summary

Urdu is the national language of Pakistan and it is the 10th most globally spoken language in the world, with more than two hundred and thirty million total speakers all around the globe. Urdu script consists of thirty-eight basic letters. Urdu language shares alphabets with Persian and Arabic language as well as the challenges faced in their character recognition. For e.g. the alphabet (Gaaf) can be misrecognized as alphabet (Kaaf) because of having quite same geometrical characteristics. In Urdu language, the alphabets can be divided into two parts

  • Joiner and

  • Non-Joiner

In joiner, the characters will join with other characters either at starting, middle, or at the end of the word. While the non-joiners appear in the isolated form. The task of recognizing the joiner letters of Urdu language is much more complex and challenging because of their overlapping nature as compared to the recognition of non-joiners.

image

Figure 1 - List of non-joiner and joiner alphabets of Urdu script (a) Non-joiner, (b) Joiner alphabets in Urdu script

           Therefore, our aim is to make such a system, which will recognize Urdu Text efficiently and will provide output with acceptable accuracy. The advantages of Urdu handwritten text recognition is that it will convert from handwritten ?les into digital documents. It will be beneficial for the people who want to understand/learn Urdu as they can translate Urdu digital documents in to their native language.


 

Project Objectives

Since the advent of computer vision and pattern recognition, handwritten scripts are being scanned, processed, and converted into searchable text. Handwriting recognition is an active area of research in the field of pattern recognition as it is regarded as one of the most challenging areas. The complex and inappropriate style of text, the shape/font, similarity of individual characters are the issues that make the recognition task more difficult. The popular techniques used for handwritten scripts recognition systems are Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) (built on top of OCR). The first Optical character recognition system which recognizes Latin characters was built in 1940. After that handwritten script recognition was developed for many other popular languages. Development of a character recognition system for Urdu language was started in the early 2000’s but the lack of research has resulted in a very few tools available which are able to recognize the Urdu language on printed books and therefore being used to digitize the printed books written in Urdu. But approximately no work has been done for the recognition of handwritten Urdu scripts. One of the main reasons for this slow growth is that Urdu script is cursive in nature.

Urdu language comprises of 36 alphabets and can be categorized into two groups: joiner and non-joiner. Joiner alphabets have different forms when written in front, middle or at the end of a word, whereas the non-joiner alphabets remain the same regardless of their position in a word. There are also more than 10 commonly used fonts of Urdu script. These special properties of Urdu language make the handwritten script recognition task even more challenging.

The Handwritten Urdu Recognition is basically a system that allows you to take a picture of your handwritten Urdu script through your mobile phone, recognizes the text written in the image, and will convert it into an editable digital text format. Therefore, for this we need an efficient Urdu handwritten recognition technique to achieve remarkable results and accuracy for recognizing handwritten texts. It will be built using an Artificial Neural Network technique that provides higher accuracy in image classification. The project has been undertaken because, there is not any available implementation of such level for this area of work as of yet. The model can be used almost everywhere throughout Pakistan. It can be used in Pakistan Post, digitization of handwritten notes and official documents written in Urdu.

Project Implementation Method

Software technologies to be used:

1. Flutter - for mobile application development. Which will work as front-end for the user, from where the user will have to capture/select (an) image(s) and upload to the server.

2. Python - for server-side development. Which will work as a back-end for processing the image(s) received from the mobile application.

3. Tensorflow – It will be used to create the required Neural Network to process Urdu script images

4. Pandas

5. Numpy

6. Scikit-learn

Algorithms and Data Structures to be used:

  1. We will be using a Neural Network Algorithm to process the Handwritten Urdu Script images into digital text/editable format.

  2. Queues/Priority Queues will be used as data structures. When a user sends an image to the server, the job will be enqueued inside the Queue, and it will be waiting for its turn to be processed.

3. Arrays will also be used frequently throughout the project codes.

4. Matrices will be used as a data structure to store image pixels.

Requirements:

  1. Data – the main requirement for the project is the data. We will need to collect handwritten Urdu scripts written by multiple people. The Urdu sentences need to be written on horizontal margins on the paper.

  1. Virtual Server – a virtual server with specific requirements will also be required to run the Neural Network System.

Benefits of the Project

The Handwritten Urdu Recognition is basically a system that allows you to take a picture of your handwritten Urdu script through your mobile phone, recognizes the text written in the image, and will convert it into an editable digital text format. Therefore, for this we need an efficient Urdu handwritten recognition technique to achieve remarkable results and accuracy for recognizing handwritten texts. It will be built using an Artificial Neural Network technique that provides higher accuracy in image classification. The project has been undertaken because, there is not any available implementation of such level for this area of work as of yet. The model can be used almost everywhere throughout Pakistan. It can be used in Pakistan Post, digitization of handwritten notes and official documents written in Urdu.

Technical Details of Final Deliverable

Fact-finding techniques being used:

  1. Research - researching related projects, related research papers, related publications, and survey papers.

Sampling - sampling of existing datasets.

We will be using a specialized virtual server that will be running our Neural Network model. On the user’s side mobile phones will be required to capture the image of Handwritten Urdu Scripts, and then the image will be sent to the virtual server, and it will process the image, and will produce the digital text of the Urdu Script as output and will return it to the mobile phone from where the request was generated.

Requirements:

  1. Data – the main requirement for the project is the data. We will need to collect handwritten Urdu scripts written by multiple people. The Urdu sentences need to be written on horizontal margins on the paper.

  1. Virtual Server – a virtual server with specific requirements will also be required to run the Neural Network System.

Final Deliverable of the Project

Software System

Core Industry

IT

Other Industries

Education

Core Technology

Artificial Intelligence(AI)

Other Technologies

Others

Sustainable Development Goals

Quality Education

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Crucial RAM 16GB CT16G4DFRA266 Equipment163006300
ASUS GeForce RTX 2060 SUPER Overclocked 8G EVO GDDR6 Dual-Fan Equipment15500055000
Neural Compute Stick Equipment187008700
Total in (Rs) 70000
If you need this project, please contact me on contact@adikhanofficial.com
Conversion of Vertical milling machine into CNC

CNC machining is common in projects that require a high level of precision and repetition....

1675638330.png
Adil Khan
9 months ago
Ultimate strength prediction of notched and unnotched composites using...

Finite element simulations of three laminates in open-hole and unnotched configurations su...

1675638330.png
Adil Khan
9 months ago
Fetal Heart Rate monitoring system

Fetal heart rate (FHR) was first introduced in the 17th century. It is an important p...

1675638330.png
Adil Khan
9 months ago
Design and Analysis of Ultrathin Multi-Functional Metasurface for Micr...

A lot of attention has been focused on radar cross section (RCS) reduction with the rapid...

1675638330.png
Adil Khan
9 months ago
High velocity impact plaster machine

To reduce/minimize  the plastering time and attain smooth plastering. ...

1675638330.png
Adil Khan
9 months ago