AI Enabled Urdu OCR for Mobile Devices

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

Urdu is a language that contains the cursive typescript. The style of writing has been a crucial element in the creation of a well-established Optical Character Reader or OCR for Urdu. The main need of creating an OCR for cursive script-writing is essential in preservation of languages that have defined the cultural outline of many nations. Urdu is one the largest spoken languages in the world and the absence of an efficient OCR has halted its progress majorly. Most of the content written in Urdu language and specially Nastaliq type script has remained on paper only. The technological advancement has not entered the area of literature as no proper tool exists that can convert the printed text into digital and editable text files. Hence, we have proposed the mobile application for OCR that can enable users and Urdu enthusiast to convert the printed Urdu text into editable format and store them digitally. This project required the presence of an efficient deep learning model that can help in recognition and prediction of scanned text into its original and accurate form digitally. Data was collected using UPTI and scanned images went under pre-processing. The image after pre-processing went under segmentation and later the ligature extraction takes place. Followed by feature extraction the images are sent for classification and recognition. The dataset will be used to train and test the created model and future use in the application. An optimized model with greater accuracy will be kept for the backend of the application. The application will be primarily an android application that shall consist of the ability to scan and convert the document into an editable document. The application will work as an interface to the engine designed to take input as images of the document and follow its technical flow to output an editable text document.

Project Objectives

i. The primary goal of the proposed project is to build an efficient and accurate Ligature segmentation technique for the creation of an efficient Urdu OCR system that can address the overlapping challenge in Ligatures caused by Nastaliq Script's diagonality.
ii. Create a user interface that can be used on every scale and anyone who wishes to make use of the OCR. An efficient interface that address the needs of the consumer can influence the popularity and awareness of the tools presence.

Project Implementation Method

A sequential approach has been taken in the development of our project where the group members began by dedicating some time for Literature review. The research papers helped us gain an insight on this growing area of research and experimentation. The literature review also enlightened us about the path we were to take to create and train a successful model for our OCR application. We began working on our data collection scenario and became familiar with Urdu Printed Text Images. The pre-processing was done in order to achieve a cleaner and reliable dataset for training and testing. The pre-processing was followed by Segmentation where the images were segmented line by line and then ligature by ligature. This enabled the next step of Feature Extraction which helped further in Classification and Recognition. The created dataset is being used for training the Model which was created alongside in order to achieve maximum optimization. Up till now the flow has been rather smooth. The second half of project timeline will be followed by Model testing and creation of the interface of the OCR. The application interface and UI will be designed in order to cater a greater flow of application and ease of use for the end user. Followed by the application development A successful integration of model and application will indicate the proper completion of our project.

Benefits of the Project

Despite Urdu's significance, its literature and content remains on paper. In the age of digitization, where most of the reading material is present on screen and in editable text files, Urdu's catalogue remains largely absent there. The lack of efficient and robust OCR's have made the content largely inaccessible and often forgotten. This has also halted the level and quality of Urdu Literature that is being taught in schools and colleges. The younger generations are mostly oblivious towards the significance and richness of their own national language.
The presence of an Urdu OCR will help us enter a new era of Urdu. An Urdu renaissance that will not only make the current material digitally accessible but also invite the discovery of new content that has been lost over the years. A platform where you can scroll through millions of Urdu novels, books, papers and compilations and download them on your device for leisure reading or educational purposes. This will also make Urdu literature more appealing to younger audiences and reap young writers and poets. Not just in urban areas, but rurally where an internet access can help young individuals get familiar with the language. This will also open doors for research and study of Urdu literature and how it helped define our history and what more can we explore about our past and possible future.

Technical Details of Final Deliverable

The project consist of a deep learning model which is trained to take in images as input. The model takes in a hybrid approach where both RNN and CNN play cruicial roles for its accurate approach and output. The model can be seen using the CNN element by encooperating VGG16 which refers to the fact that it is 16 layers deep neural network (VGGnet). This means that VGG16 is a pretty extensive network and has a total of around 138 million parameters.VGG16 is composed of 13 convolutional layers, 5 max-pooling layers, and 3 fully connected layers. VGG16 is object detection algorithm which is able to extract 1000 images of 1000 different categories with 92.7% accuracy. It is one of the popular algorithms for image classification and is easy to use with transfer learning. The RNN model that takes part in the model is Bidirectional LSTM or BiLstm. Using Bidirectional LSTMs, you feed the learning algorithm with the original data once from beginning to the end and once from end to beginning. Sequence processing model that consists of two LSTMs: one taking the input in a forward direction, and the other in a backwards direction. This is followed by CTC layer or Connectionist Temporal Classification (CTC) is a type of Neural Network output helpful in tackling sequence problems like handwriting and speech recognition where the timing varies. Using CTC ensures that one does not need an aligned dataset, which makes the training process more straightforward.. CTC works on the following three major concepts:

Encoding the text
- Eliminates errors and misprinted or repeated words/ligatures into one where required.
Loss calculation
- o train the CRNN, we need to calculate loss given the image and its label.
Decoding
- Once CRNN gets trained, we want it to give us output on unseen text images. Putting it differently, we want the most likely text given an output matrix of the CRNN. One method can be to examine every potential text output, but it won’t be very practical to do from a computation point of view. The best path algorithm is used to overcome this issue.
  
  It consists of the following two steps:
- Calculates the best path by considering the character with max probability at every time-step.
- This step involves removing blanks and duplicate characters, which results in the actual text.

The mode will eventually be integrated to an android application that shall deliver as an interface to the user and the model.

Final Deliverable of the Project

HW/SW integrated system

Core Industry

Other Industries

Education

Core Technology

Artificial Intelligence(AI)

Other Technologies

Others

Sustainable Development Goals

Quality Education, Reduced Inequality

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
Document Camera	Equipment	1	30000	30000
Computational platform(GPU))	Equipment	1	40000	40000
Miscellaneous	Miscellaneous	1	10000	10000
			Total in (Rs)	80000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Autonomous UAV coalition of multiple UAVs to destroy dynamic targets i...

Unmanned Aerial Vehicles (UAV?s) have some crucial application such as target detection, s...

Adil Khan

11 months ago

Parametric Optimization of Friction Stir Welding

Friction stir welding is a newly developed solid state welding technique developed by "The...

Adil Khan

11 months ago

Solar powered autoirrigation system

Pakistan has a remarkable solar energy potential and unlike some countries, it?s there for...

Adil Khan

11 months ago

Gesture Detection System

MediaPipe Gesture Detection is an ultrafast face detection solution that comes with 6 land...

Adil Khan

11 months ago

Design and Development of Vertical Takeoff and Landing Unmanned Aerial...

The project is to design and develop a UAV which is capable of Vertical Take-off and Landi...

Adil Khan

11 months ago