NEWSER
NEWSER (NEWS crawlER) is a web-based system that is for people who are interested in reading online newspapers. It is a platform for them to get a mix of news from many authentic newspapers and search them based on date/ authors name or specific city. It will also provide them luxury of searching ne
2025-06-28 16:34:16 - Adil Khan
NEWSER
Project Area of Specialization Artificial IntelligenceProject SummaryNEWSER (NEWS crawlER) is a web-based system that is for people who are interested in reading online newspapers. It is a platform for them to get a mix of news from many authentic newspapers and search them based on date/ authors name or specific city. It will also provide them luxury of searching news by keywords which rarely is given by online newspapers. It will generate crime rate reports of cities based on the news of those cities. It can also be extended to have user accounts where their actions could be monitored like which articles they read more often and based on that, only articles of his interests are shown when he gets logged in to his account.
Project ObjectivesThe main objective is to implement a web based system for news readers which can save their time in number of ways that is by providing them with different filtered searches like search based on city, date,author name or a keyword based search and to show them related news to the one they are currently reading. It also aims to provide statistical analysis of crime rates of certain cities.
Project Implementation MethodProject Implementation:
- Python3
- Flask Framework
- Natural Language Processing models like tf-idf for pull model implementation
- MongoDB database
- Newspaper3k API
First of all news articles from different news websites would be scraped using Newspaper3k API provided in python. The articles along with Title, Text, Image, Author name, Video link(if any), and published date would be stores in a csv file. Then different NLP techniques would be applied on
those articles for information retrieval and text mining as well as machine learning for implementing “related posts” part. Tf-idf is expected to be used (if couldn’t find even better and computationally less expensive model) for text classification and finding text similarity.
People who read newspapers online especially elderly ones, they would want to have a platform where they could read news and its related ones from different newspapers at one place. If they want to read news of a particular author/city/date, they would get it all at one place. If they want to search from archive based on keywords (Pull model) to avoid hectic browsing, this system would assist them. They can also get familiar with crime rates of a particular city by using this web system. These all services at one screen
will save much of their time.
This application that will store and analyze news articles from across different online newspapers. It will show news filtered by specific author name, date or city. It will also show news based on keyword searching (Pull model). It will get the news articles related to the current one, which a user is reading, from different newspapers (if available) and show them to user in a separate section. The crime rates of selected cities will also be shown to the users.
Final Deliverable of the Project Software SystemType of Industry IT Technologies Artificial Intelligence(AI), OthersSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 75000 | |||
| GPU | Equipment | 1 | 70000 | 70000 |
| Printing and others | Miscellaneous | 5 | 1000 | 5000 |