Pro Urdu Lingo
PRO URDU LINGO ? is an AI-based model that works on Urdu Grammar by predicting the rules trained with the help of a given trained model. The model is trained by focusing on the data taken from different Urdu newspapers stories regarding four categories: Business & Economics
2025-06-28 16:28:51 - Adil Khan
Pro Urdu Lingo
Project Area of Specialization Artificial IntelligenceProject SummaryPRO URDU LINGO – is an AI-based model that works on Urdu Grammar by predicting the rules trained with the help of a given trained model. The model is trained by focusing on the data taken from different Urdu newspapers stories regarding four categories:
- Business & Economics
- Science & Technology
- Entertainment
- Sports
These categories would help to generate a suitable dataset for the Urdu NLP task. The PRO URDU LINGO would be a web-based application that would help news editors, publication sectors, and writers in the area of Urdu context. It would check the grammar of sentences and would provide the corrected sentence as an output.
Project ObjectivesPRO URDU LINGO – is an AI Web-based application that aims to help the writers in the context of Urdu to generate correct sentences by using the trained dataset of past Urdu newspapers from different sectors.
Project Implementation MethodThe whole application is trained using the jupyter Notebook - a web application for creating and sharing documents that contain code, visualizations, and text. It can be used for data science, statistical modeling, machine learning, and much more.
The methodology includes:
- Preparing Dataset: We have collected the data from different newspapers of different sectors by using Beautiful Soup – a web scraping tools using Python. Through which we have collected a huge amount of dataset of headlines, details, date, category in the form of CSV format with encoding type UTF-8 that helps us to process Urdu Language.
- Preprocessing: The collected dataset is trained via deep learning models such as seq2seq (sequential to sequential) - Seq2Seq is a method of encoder-decoder-based machine translation and language processing that maps an input of sequence to an output of sequence with a tag and attention value. The idea is to use 2 RNNs that will work together with a special token and try to predict the next state sequence from the previous sequence.
- Testing: The model will be tested on the content that will be provided by the different news editors and on the basis of those results the accuracy will be measured.
- Integration: On achieving the satisfying accuracy, the model will be integrated with the web via API.
- It is fully capable of correcting the sentences which are the part of newspaper.
- It helps the media industry to save their time when writing any news, and by using this no need of rechecking required.
- JUPYTER Notebook
- VS Code
- Python3
- NUMPY
- PANDAS
- Kaggle
- Google COLAB
- HTML5