Urdu DB Pedia

Following the success of the web of documents (World Wide Web), there has been a big enthusiasm in creating a Web of Data by publishing data in a manner that can be easily  understood by software programs.         In 2014, there were more than 1,000 publicly availab

2025-06-28 16:36:31 - Adil Khan

Project Title

Urdu DB Pedia

Project Area of Specialization Artificial IntelligenceProject Summary

Following the success of the web of documents (World Wide Web), there has been a big enthusiasm in creating a Web of Data by publishing data in a manner that can be easily  understood by software programs.
        In 2014, there were more than 1,000 publicly available    datasets containing more than 900,000 documents. Among these datasets, DBpedia stands out as the central hub of Linked Open Data (LOD) because it provides a vast amount of information and most other datasets in the LOD cloud link to DBpedia.
    
This is the Urdu DBpedia Project  to extract structured knowledge from WikiPedia and English DBpedia to make
it freely available on the Web using Semantic Web and Linked Data technologies. 
The project will extract knowledge from  English language edition of Wikipedia and then will compare with urdu DBpedia to find out what are the missing attributes.

The Urdu DBpedia project comprises three main areas:

1) Structured Data Extractors & Transformers:

which extract entities, entity relationship types, and entity relationships from Wikipedia documents.

2)Deployment of Linked Open Data:

that makes entity relationships available to the Web in Linked Open Data form.

Project Objectives

The main objective is to extract structure Knowledge and then find what is missing in the attributes and synchronizing of Knowledge that is extracted from English and Urdu WikiPedia.And to provide  Wikipedia-knowledge in a form compatible with tools covering business intelligence & analytics, entity extraction, natural language processing, reasoning & inference, machine learning services,and artificial intelligence in general.

Project Implementation Method

This phase involves:
Training of Web Crawling 
And then Extraction of WikiPedia Infoboxes by Crawling over
Cloud Server
Generation Of Classes 
Urdu DBPedia Ontology
Mapping with DBPedia Classes and Properties
And then experimentation and testing of the Project.

Benefits of the Project

As we know that a significant percentage of the information
stored in the English DBpedia is not available in the Urdu DBpedia. This fact places the English DBpedia as a valuable and exclusive source of  information.

 Also we have to remark that the Urdu DBpedia does not contain all the information stored in the other DBPedias like English DBpedia, but only a minimum subset.

By extracting knowledge from the English DBpedaia and WikiPedia and making it available on the web in the structured form will benefit all the Urdu DBpedia readers whether they belong to  the field of Education,
Medicine,Agriculture,Industry or Infrastructure.

Technical Details of Final Deliverable

Project  involves web crawling of English and Urdu Wikipedia, After comparing them the missing attributes will be identified. It  involves Android Java, PHP and the implementaion of apatche nutch.                                                                          The  project will  provide semantics to the data (mapping) by relating data from Wikipedia articles to elements of the Urdu DBpedia ontology by using wiki-based tools. The extraction process will read a Wikipedia page which containsan infobox and will extract its attribute-value pairs.

Final Deliverable of the Project Software SystemType of Industry IT Technologies Artificial Intelligence(AI), Cloud Infrastructure, OthersSustainable Development Goals Quality Education, Industry, Innovation and InfrastructureRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 51400
Java Training Miscellaneous 150005000
Web Crawling Training:Using Python to access Web Data Miscellaneous 150005000
Printing Equipment72001400
Data Bricks Amazon Cloud Server Equipment14000040000

More Posts