Using deep neural networks and data mining Converting Roman urdu to urdu language
Nowadays, Roman Urdu is widely used as a source of communication in Pakistan. People have difficulty in reading and typing Urdu script. Our goal is to make this task easier. We will do research about the different methods that can be used to increase the accuracy of a mobile application that t
2025-06-28 16:36:32 - Adil Khan
Using deep neural networks and data mining Converting Roman urdu to urdu language
Project Area of Specialization Artificial IntelligenceProject SummaryNowadays, Roman Urdu is widely used as a source of communication in Pakistan. People have difficulty in reading and typing Urdu script. Our goal is to make this task easier. We will do research about the different methods that can be used to increase the accuracy of a mobile application that translates Roman Urdu sentences to Urdu language sentences. We will be implementing dropouts and increasing the dataset to achieve this goal. Drop out is a technique to avoid over-fitting and works by dropping some of the unit activations in a layer of a deep neural network. We will also develop an android application that would translate Roman Urdu sentences to Urdu language sentences.
• The application would help people in typing Urdu script. It would save their time and effort. Moreover, it would be easier for them to learn Urdu script by using this application. Therefore, it is necessary for this application gives translation quality nearer to humans..
Reading and typing Urdu language is not very convenient in day to day tasks. A vast majority of populace uses roman Urdu to communicate. This motivated us to develop some tool to translate roman Urdu to Urdu. This will help people who are not comfortable with reading or typing Urdu scripts.
Project ObjectivesFollowing are the goals we aim to achieve by the end of the project:
• To get our research paper published in a good journal with an impact factor.
• To achieve BLEU (Bilingual Evaluation Understudy) score of 70.
• Increase dataset from 0.1 million to 3 million using GANS, web/data scraping and/orcrowd sourcing.
• Implementing dropping outs to observe any increase in the improvement in translation quality.
• Develop an android app that facilitates in the translation of Roman Urdu sentences to Urdu language sentences.
Project Implementation Method. The most important ones would include S2S-models, LSTMs, CNNs, RNNs, GANs, Resents, LaTeX, GitHub, and android.
• Literature Review(Reading research papers and understanding the problems and research gaps)
• Self-learning:
a. CS-224n course for NLP using Deep Learning
b. CS-231n course for Deep learning
• Dataset from 0.1 to 1 million
• We will observe the effect of implementing dropouts in the translation model.
• Target to achieve BLEU score of 50-60
• Collect 3 million data and train model.
• Improver BLEU score up-to 70.
Benefits of the Project• Good translation quality.
• Rare words.
• Unseen words.
• Scalable – to all ages and all topics.
• Runs in real time with minimum delay.
Technical Details of Final DeliverableGeneric in nature
• Scalable: Some of Roman Urdu words that have different spellings must be translated to the same Urdu language word. It must not be domain or age specific. For example, both “jab” and “jub” must be translated as ??.
• Unseen rare words: The application should be able to translate the words that are coming to it as an input for the first time.
• Context Awareness: For example, the Roman Urdu sentence“kya hal hai?” would be commonly translated as ??? ?? ?? . But according to the context, it must be translated as ??? ??? ??
Tools/Technology
From learning perspective, we will learn a lot of new tools and techniques being used in the fields of natural language processing and deep learning. The most important ones would include S2S-models, LSTMs, CNNs, RNNs, GANs, Resents, LaTeX, GitHub, and android.
Final Deliverable of the Project Software SystemType of Industry IT Technologies Artificial Intelligence(AI)Sustainable Development Goals Quality EducationRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 78500 | |||
| ZOTAC GAMING GeForce RTX 2070 AMP Graphics Card, ZT-T20700D-10P | Equipment | 1 | 70000 | 70000 |
| ZOTAC GAMING GeForce RTX 2070 AMP Graphics Card, ZT-T20700D-10P | Miscellaneous | 1 | 8500 | 8500 |