Apache Hadoop and Apache Spark has gained fame in the last few years as scalable and big data processing frameworks. Although Apache Spark is more efficient compared to Apache Hadoop due to it?s in memory computation but operations like Broadcast and Shuffle restricts many machine learning and evolu
Bigdata framework for efficient link prediction
Apache Hadoop and Apache Spark has gained fame in the last few years as scalable and big data processing frameworks. Although Apache Spark is more efficient compared to Apache Hadoop due to it’s in memory computation but operations like Broadcast and Shuffle restricts many machine learning and evolutionary algorithms from significant performance gain/speed up. Broadcasts and Actions are the only mechanism of communications between partitions that improves diversity and avoid partitions getting stuck in local optima. Communication between partitions results in network overhead hence performance degradation. We aim to develop a library that would work on top of Apache Spark and would deal the trade-off between network communication and performance gain in a more effective manner. In order to verify proposed library operations we would compare ML/Evolutionary algorithms performance on standard Apache framework with and without our proposed library operations. We would use standard benchmarks for experimentation.
Speedup will be achieved while reducing network communication. Hence the demand to address performance issue on Big Data platforms can be better addressed. The library would make it more suitable for algorithms to be executed on Big Data Frameworks with a suitable performance while maintaining accuracy. Hence encouraging ML algorithms for scalability.
Implementation using
Apache Spark
Scala
Apache Maven
Apache HDFS
Scala Eclipse IDE
Speedup will be achieved while reducing network communication. Hence the demand to address performance issue on Big Data platforms can be better addressed. The library would make it more suitable for algorithms to be executed on Big Data Frameworks with a suitable performance while maintaining accuracy. Hence encouraging ML algorithms for scalability.
Library in Scala/Spark that would be runnable on Apache Spark clusters
Library Manual
Library Documentation
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| DataBricks-Amazaon AWS Server Price Usage Price based on BTU | Equipment | 1 | 58000 | 58000 |
| Total in (Rs) | 58000 |
in our system every one can signup and add details about his/her property and advertise it...
Web based voting allows the voter to vote from any place in state or out of state. The vot...
Traditional recruitment consume the lots of human efforts and cost of organization. The e...
The project comprises of a prototype of a non-invasive blood glucose monitoring device. Th...
An investigation was performed into the effects of operating an absolute eddy-current test...