OCR for Urdu in Nastaleeq font
In the running world,there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format.These days there is a huge respect in storing the in
2025-06-28 16:34:18 - Adil Khan
OCR for Urdu in Nastaleeq font
Project Area of Specialization Software EngineeringProject SummaryIn the running world,there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format.These days there is a huge respect in storing the information available in these paper documents into a computer storage disk and then later re-using this information.OCR for Urdu is aimed at digitizing the hand-printed data and later have the capability to edit the digitized information.
Our project aims at accomplishing the below mentioned features:
- Scanning the image and Segmentation of text.
- Pre-processing the segmented data.
- Recognition of words.
- Creating an editable text file.
- Text to audio conversion for the visually impaired.
- Mobile Application of OCR for android platform.
The main objectives are:
- Digitizing paper documents.
- Automated Data Entry.
- Assisting the visually impaired.
- Saving time of manually entering printed data onto a computer.
- Saving space as reams of paper documents will be converted to digital text files.
The user, that requires OCR, installs the OCR android application or uses the desktop based software application. He then select an image with urdu text, which needs to be converted into editable text file. After selecting the image, the user can edit the image i.e. crop/rotate etc. Once the image is uploaded, the OCR processes the image and extracts the text from the image. The text is then saved as an editable text file.
Benefits of the Project- Higher Productivity
- Cost Reduction
- High Accuracy
- Increased Storage Space
- 100% Text-searchable Documents
- Massively Improves Customer Service
- Makes Documents Editable
To develop Urdu OCR, we have made use of
SOFTWARE
- Microsoft Windows
- OpenCV
- Tesseract
- Python IDE
- Android OS
- Autodesk Fusion 360
- Android Studio
HARDWARE
- Android based Mobile phone with fair resolution.
- Laptop with Graphic Card.
- Book
- Book Stand
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 60700 | |||
| 16 GB RAM | Equipment | 1 | 12000 | 12000 |
| GPU (GTX 1050ti) | Equipment | 1 | 20000 | 20000 |
| SSD (Samsung EVO 860) | Equipment | 2 | 9500 | 19000 |
| Printing of Standee | Miscellaneous | 1 | 800 | 800 |
| Book Stand | Miscellaneous | 2 | 3000 | 6000 |
| Brochures | Miscellaneous | 30 | 50 | 1500 |
| Contact Cards | Miscellaneous | 20 | 20 | 400 |
| Printing of Urdu OCR shirts | Miscellaneous | 2 | 500 | 1000 |