Bigdata framework for efficient link prediction

Project Title

Project Area of Specialization

Artificial Intelligence

Project Summary

Apache Hadoop and Apache Spark has gained fame in the last few years as scalable and big data processing frameworks. Although Apache Spark is more efficient compared to Apache Hadoop due to it’s in memory computation but operations like Broadcast and Shuffle restricts many machine learning and evolutionary algorithms from significant performance gain/speed up. Broadcasts and Actions are the only mechanism of communications between partitions that improves diversity and avoid partitions getting stuck in local optima. Communication between partitions results in network overhead hence performance degradation. We aim to develop a library that would work on top of Apache Spark and would deal the trade-off between network communication and performance gain in a more effective manner. In order to verify proposed library operations we would compare ML/Evolutionary algorithms performance on standard Apache framework with and without our proposed library operations. We would use standard benchmarks for experimentation.

Project Objectives

Speedup will be achieved while reducing network communication. Hence the demand to address performance issue on Big Data platforms can be better addressed. The library would make it more suitable for algorithms to be executed on Big Data Frameworks with a suitable performance while maintaining accuracy. Hence encouraging ML algorithms for scalability.

Project Implementation Method

Implementation using

Apache Spark

Scala

Apache Maven

Apache HDFS

Scala Eclipse IDE

Benefits of the Project

Technical Details of Final Deliverable

Library in Scala/Spark that would be runnable on Apache Spark clusters

Library Manual

Library Documentation

Final Deliverable of the Project

Software System

Type of Industry

Technologies

Artificial Intelligence(AI), Big Data

Sustainable Development Goals

Decent Work and Economic Growth, Partnerships to achieve the Goal

Required Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
DataBricks-Amazaon AWS Server Price Usage Price based on BTU	Equipment	1	58000	58000
			Total in (Rs)	58000

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Hostel Management Platform

in our system every one can signup and add details about his/her property and advertise it...

Adil Khan

11 months ago

Multi election system blockchain based

Web based voting allows the voter to vote from any place in state or out of state. The vot...

Adil Khan

11 months ago

Intelligent recruitment system

Traditional recruitment consume the lots of human efforts and cost of organization. The e...

Adil Khan

11 months ago

Non invasive Glucometer

The project comprises of a prototype of a non-invasive blood glucose monitoring device. Th...

Adil Khan

11 months ago

Underwater Metal Fatigue Detection For Ship Using PWM Based Eddy Curre...

An investigation was performed into the effects of operating an absolute eddy-current test...

Adil Khan

11 months ago