Topic Driven Concepts Extraction In Unstructured Text
Throughout the world, multiple researches and developments are being carried out in the field of Artificial Intelligence, especially natural language processing. Researchers are carrying out multiple researches on text and are trying to understand the generic meaning behind it and finding patterns w
2025-06-28 16:36:24 - Adil Khan
Topic Driven Concepts Extraction In Unstructured Text
Project Area of Specialization Artificial IntelligenceProject SummaryThroughout the world, multiple researches and developments are being carried out in the field of Artificial Intelligence, especially natural language processing. Researchers are carrying out multiple researches on text and are trying to understand the generic meaning behind it and finding patterns within text and developing different deep learning models based on them. Thus, in the same context, Topic Driven Concepts Extraction in Unstructured Text is an effort of undergoing and extending our understanding of widely used and existing approaches, techniques, and models in the field of natural language processing to extract concepts as key phrases representative of the topics of text in an unstructured text. For this, we will be using multiple approaches which includes but is not limited to statistical techniques/measures, machine-learning algorithms, deep-learning models. Upon completion, the system will be able to extract accurate and precise concepts in large unstructured text and will highly contribute and help the fields which deal with large unstructured text which includes fields of education, journalism, and judiciary. The other idea behind this effort is to partake and contribute in the research and development of natural language processing which are being carried out throughout the world.
Project Objectives- To explore multiple natural language processing techniques and methods to extract out precise topic-driven concepts.
- To allow the users to extract topics and relevant concepts on runtime.
- To create an API system which can be integrated with other systems to extract concepts.
- To contribute towards the field of natural language processing by researching and using existing methods and techniques and extracting better and precise results on them.
Analysis:
- Initial investigation on multiple textual datasets.
- Selecting a relevant dataset i.e. SQuAD2.0
- Investigating multiple key approaches and natural language processing techniques and methods.
Design:
- Designing relevant artifacts
- Designing front-end user interface of the system.
Development:
- Applying the researched methods and NLP techniques on the dataset.
- Developing back-end API’s to process the text based on the methods.
Testing:
- Testing and comparing the result metrics and accuracy.
Deployment:
- Deploying the system on live server for public use.
Benefits for Users:
- Users can use the system to extract relevant material or information from the text they provide.
- Users get to save their precious time by using the system and extracting important information instead of going through the whole textual data.
Benefits for R&D:
- Researchers can use the system to work and research further on using new methods and techniques.
Final Deliverable: A web-application based on Python-Flask tech stack.
Flask: For backend processing
Jinja: Templating engine for frond-end user interface.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 30000 | |||
| Server Hosting (10GB + .com domain) | Equipment | 1 | 7000 | 7000 |
| Documentation Printing | Miscellaneous | 1 | 3000 | 3000 |
| Internet | Equipment | 2 | 1500 | 3000 |
| Miscellaneous | Miscellaneous | 1 | 5000 | 5000 |
| RAM | Equipment | 2 | 3000 | 6000 |
| SSD | Equipment | 1 | 6000 | 6000 |