Urdu DB Pedia
Following the success of the web of documents (World Wide Web), there has been a big enthusiasm in creating a Web of Data by publishing data in a manner that can be easily understood by software programs. In 2014, there were more than 1,000 publicly availab
2025-06-28 16:36:31 - Adil Khan
Urdu DB Pedia
Project Area of Specialization Artificial IntelligenceProject SummaryFollowing the success of the web of documents (World Wide Web), there has been a big enthusiasm in creating a Web of Data by publishing data in a manner that can be easily understood by software programs.
In 2014, there were more than 1,000 publicly available datasets containing more than 900,000 documents. Among these datasets, DBpedia stands out as the central hub of Linked Open Data (LOD) because it provides a vast amount of information and most other datasets in the LOD cloud link to DBpedia.
This is the Urdu DBpedia Project to extract structured knowledge from WikiPedia and English DBpedia to make
it freely available on the Web using Semantic Web and Linked Data technologies.
The project will extract knowledge from English language edition of Wikipedia and then will compare with urdu DBpedia to find out what are the missing attributes.
The Urdu DBpedia project comprises three main areas:
1) Structured Data Extractors & Transformers:
which extract entities, entity relationship types, and entity relationships from Wikipedia documents.
2)Deployment of Linked Open Data:
that makes entity relationships available to the Web in Linked Open Data form.
Project ObjectivesThe main objective is to extract structure Knowledge and then find what is missing in the attributes and synchronizing of Knowledge that is extracted from English and Urdu WikiPedia.And to provide Wikipedia-knowledge in a form compatible with tools covering business intelligence & analytics, entity extraction, natural language processing, reasoning & inference, machine learning services,and artificial intelligence in general.
Project Implementation MethodThis phase involves:
Training of Web Crawling
And then Extraction of WikiPedia Infoboxes by Crawling over
Cloud Server
Generation Of Classes
Urdu DBPedia Ontology
Mapping with DBPedia Classes and Properties
And then experimentation and testing of the Project.
As we know that a significant percentage of the information
stored in the English DBpedia is not available in the Urdu DBpedia. This fact places the English DBpedia as a valuable and exclusive source of information.
Also we have to remark that the Urdu DBpedia does not contain all the information stored in the other DBPedias like English DBpedia, but only a minimum subset.
By extracting knowledge from the English DBpedaia and WikiPedia and making it available on the web in the structured form will benefit all the Urdu DBpedia readers whether they belong to the field of Education,
Medicine,Agriculture,Industry or Infrastructure.
Project involves web crawling of English and Urdu Wikipedia, After comparing them the missing attributes will be identified. It involves Android Java, PHP and the implementaion of apatche nutch. The project will provide semantics to the data (mapping) by relating data from Wikipedia articles to elements of the Urdu DBpedia ontology by using wiki-based tools. The extraction process will read a Wikipedia page which containsan infobox and will extract its attribute-value pairs.
Final Deliverable of the Project Software SystemType of Industry IT Technologies Artificial Intelligence(AI), Cloud Infrastructure, OthersSustainable Development Goals Quality Education, Industry, Innovation and InfrastructureRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 51400 | |||
| Java Training | Miscellaneous | 1 | 5000 | 5000 |
| Web Crawling Training:Using Python to access Web Data | Miscellaneous | 1 | 5000 | 5000 |
| Printing | Equipment | 7 | 200 | 1400 |
| Data Bricks Amazon Cloud Server | Equipment | 1 | 40000 | 40000 |