Adil Khan 11 months ago
AdiKhanOfficial #FYP Ideas

Bigdata framework for efficient link prediction

Apache Hadoop and Apache Spark has gained fame in the last few years as scalable and big data processing frameworks. Although Apache Spark is more efficient compared to Apache Hadoop due to it?s in memory computation but operations like Broadcast and Shuffle restricts many machine learning and evolu

Project Title

Bigdata framework for efficient link prediction

Project Area of Specialization

Artificial Intelligence

Project Summary

Apache Hadoop and Apache Spark has gained fame in the last few years as scalable and big data processing frameworks. Although Apache Spark is more efficient compared to Apache Hadoop due to it’s in memory computation but operations like Broadcast and Shuffle restricts many machine learning and evolutionary algorithms from significant performance gain/speed up. Broadcasts and Actions are the only mechanism of communications between partitions that improves diversity and avoid partitions getting stuck in local optima. Communication between partitions results in network overhead hence performance degradation. We aim to develop a library that would work on top of Apache Spark and would deal the trade-off between network communication and performance gain in a more effective manner. In order to verify proposed library operations we would compare ML/Evolutionary algorithms performance on standard Apache framework with and without our proposed library operations. We would use standard benchmarks for experimentation.

Project Objectives

Speedup will be achieved while reducing network communication. Hence the demand to address performance issue on Big Data platforms can be better addressed. The library would make it more suitable for algorithms to be executed on Big Data Frameworks with a suitable performance while maintaining accuracy. Hence encouraging ML algorithms for scalability.

Project Implementation Method

 Implementation using

Apache Spark

Scala

Apache Maven

Apache HDFS

Scala Eclipse IDE

Benefits of the Project

Speedup will be achieved while reducing network communication. Hence the demand to address performance issue on Big Data platforms can be better addressed. The library would make it more suitable for algorithms to be executed on Big Data Frameworks with a suitable performance while maintaining accuracy. Hence encouraging ML algorithms for scalability.

Technical Details of Final Deliverable

Library in Scala/Spark that would be runnable on Apache Spark clusters

Library Manual

Library Documentation

Final Deliverable of the Project

Software System

Type of Industry

IT

Technologies

Artificial Intelligence(AI), Big Data

Sustainable Development Goals

Decent Work and Economic Growth, Partnerships to achieve the Goal

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
DataBricks-Amazaon AWS Server Price Usage Price based on BTU Equipment15800058000
Total in (Rs) 58000
If you need this project, please contact me on contact@adikhanofficial.com
Hostel Management Platform

in our system every one can signup and add details about his/her property and advertise it...

1675638330.png
Adil Khan
11 months ago
Multi election system blockchain based

Web based voting allows the voter to vote from any place in state or out of state. The vot...

1675638330.png
Adil Khan
11 months ago
Intelligent recruitment system

Traditional recruitment consume the lots of human efforts and cost of organization. The e...

1675638330.png
Adil Khan
11 months ago
Non invasive Glucometer

The project comprises of a prototype of a non-invasive blood glucose monitoring device. Th...

1675638330.png
Adil Khan
11 months ago
Underwater Metal Fatigue Detection For Ship Using PWM Based Eddy Curre...

An investigation was performed into the effects of operating an absolute eddy-current test...

1675638330.png
Adil Khan
11 months ago