News Archive and Retrieval
The product defines a web-based application known as ?NEWS ARCHIVE AND RETRIEVAL?. The purpose of the system is to show the people who are seeking articles from past, the news articles from across different local and international newspapers on one platform to save their time. Not only showing news
2025-06-28 16:34:16 - Adil Khan
News Archive and Retrieval
Project Area of Specialization Computer ScienceProject SummaryThe product defines a web-based application known as “NEWS ARCHIVE AND RETRIEVAL”. The purpose of the system is to show the people who are seeking articles from past, the news articles from across different local and international newspapers on one platform to save their time. Not only showing news articles, there are certain more functionality added to them.
In this product, the focus is on displaying news articles since 1947 , where readers can search articles by category, year, author name and keyword based search of any incident. The related news articles from other newspapers would be shown once user clicks on one article. The system will also generate a word cloud from the articles related to search query, using which the user can modify the query.
The system stored all records in database related to news articles. This system is implemented in python language.
The main objective of this project is to provide a single platform where the concerned people can find the news articles of past and to eliminate the need of scrolling several archives or web pages in search of old news articles. This project will make it very easy for users to search for the news from past where they can type some words and all the related articles will be displayed.
The project will provide following services.
This is a web application that can be used by any user.
- There will be two type of users: registered users and unregistered users.
- Registered users will have some privileges over unregistered users.
- This software will allow registered and unregistered to search for the articles.
- The users can search articles by entering query, by categories, by year and by Author.
- The system will have the crawled articles stored in the data base and these articles will be retrieved as per user search.
- The articles will be from past: from year 1947 onwards.
- The registered users will be able to write comments, modify their query, and view their history in addition with searching articles.
- Registered users can Edit and delete their account.
Tools and Techniques
Following are the tools and techniques used for developing this project
- Spyder IDE will be used for development.
- Argo UML for creating uml diagrams.
- Web crawling will be used to crawl the articles from internet.
- Scrapy (web crawling frame work).
- Algorithms to apply Information Retrieval techniques.
- MS Word.
- MongoDB database will be used.
Languages
Python will be used as programming language.
HTML, CSS, JavaScript, Bootstrap will be used for front end developement.
Benefits of the ProjectThe project will be helpful for students, researchers, social activists, lawyers and other people like these whose work need some information(articles) from past. At present there is no such platform that provides news articles since1947 without any cost. Google Archive was the source but it does not exist anymore and all the other platforms require subscription to get access to information. This project will reduce the searching overhead. Also it will be easily available to everyone without any cost. The users will easily be able to access the articles since 1947 at a single platform.
Technical Details of Final Deliverable"News Archive and Retrieval" will be a web based application that will run on andriod smart phones, laptops, tablets that support web browsers. The devices must be connected to internet.
Final app will be a software system.
Final Deliverable of the Project Software SystemCore Industry ITOther Industries Media Core Technology Big DataOther Technologies Artificial Intelligence(AI)Sustainable Development Goals Quality EducationRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 45400 | |||
| GPU | Equipment | 1 | 20500 | 20500 |
| RAM | Equipment | 1 | 15000 | 15000 |
| Printing | Miscellaneous | 8 | 300 | 2400 |
| Buying Subscriptions to access data | Miscellaneous | 5 | 1500 | 7500 |