Pro Urdu Lingo

PRO URDU LINGO ? is an AI-based model that works on Urdu Grammar by predicting the rules trained with the help of a given trained model. The model is trained by focusing on the data taken from different Urdu newspapers stories regarding four categories: Business & Economics

2025-06-28 16:28:51 - Adil Khan

Project Title

Pro Urdu Lingo

Project Area of Specialization Artificial IntelligenceProject Summary

PRO URDU LINGO – is an AI-based model that works on Urdu Grammar by predicting the rules trained with the help of a given trained model. The model is trained by focusing on the data taken from different Urdu newspapers stories regarding four categories:

Business & Economics
Science & Technology
Entertainment
Sports

These categories would help to generate a suitable dataset for the Urdu NLP task. The PRO URDU LINGO would be a web-based application that would help news editors, publication sectors, and writers in the area of Urdu context. It would check the grammar of sentences and would provide the corrected sentence as an output.

Project Objectives

PRO URDU LINGO – is an AI Web-based application that aims to help the writers in the context of Urdu to generate correct sentences by using the trained dataset of past Urdu newspapers from different sectors.

Project Implementation Method

The whole application is trained using the jupyter Notebook - a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and much more.

The methodology includes:

Preparing Dataset: We have collected the data from different newspapers of different sectors by using Beautiful Soup – a web scraping tools using Python. Through which we have collected a huge amount of dataset of headlines, details, date, category in the form of CSV format with encoding type UTF-8 that helps us to process Urdu Language.

Preprocessing: The collected dataset is trained via deep learning models such as seq2seq (sequential to sequential) - Seq2Seq is a method of encoder-decoder-based machine translation and language processing that maps an input of sequence to an output of sequence with a tag and attention value. The idea is to use 2 RNNs that will work together with a special token and try to predict the next state sequence from the previous sequence.
Testing: The model will be tested on the content that will be provided by the different news editors and on the basis of those results the accuracy will be measured.
Integration: On achieving the satisfying accuracy, the model will be integrated with the web via API.

Benefits of the Project

It is fully capable of correcting the sentences which are the part of newspaper.
It helps the media industry to save their time when writing any news, and by using this no need of rechecking required.

Technical Details of Final Deliverable

JUPYTER Notebook
VS Code
Python3
NUMPY
PANDAS
Kaggle
Google COLAB
HTML5

Final Deliverable of the Project Software SystemCore Industry ITOther Industries Education Core Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Quality EducationRequired Resources

Pro Urdu Lingo

More Posts