The ability to process, analyze, and evaluate network data and to identify their anomaly patterns is in response to realized increasing demands in various networking domains, such as corporations or academic networks. The challenge of developing a scalable, fault-tolerant and resilient monitoring sy
Classification of Network Data using Machine Learning
The ability to process, analyze, and evaluate network data and to identify their anomaly patterns is in response to realized increasing demands in various networking domains, such as corporations or academic networks. The challenge of developing a scalable, fault-tolerant and resilient monitoring system that can handle data at a massive scale is nontrivial. We present a novel framework for network traffic anomaly detection using machine learning algorithms.
This project deals with anomaly detection on network data traffic with the aid of artificial intelligence and machine learning techniques.
However, it will not be a run-time model but it will be such an AI model that will be able to detect difference between a normal packet and an anomaly packet. This can be implemented on run time but for that one has to implement some more specifications of run time, which is NOT a part of this project.
1. Investigate fraud/Anomaly as it exists:
Anomaly detection alone or coupled with the prediction functionality can be an effective means to catch the fraud and discover strange activity in large and complex networks. It is crucial for banking security, medicine, marketing, natural sciences, and manufacturing industries which are dependent on the smooth and secure operations
2. Cheaper & Faster:
Anomaly detection has the potential to add significant business value. Big data has made it effortless, integrations with existing deliver mechanisms and advancements in various delivery models has made it easier to adopt, advances in machine learning and deep learning has made it cheaper, and it is better for decision makers to manage by exceptions and it empowers ecommerce businesses to respond faster than ever before.
3. Providing Additional Security:
The main objective of this project is to make a complete simulated environment in which we can detect any threats and then identify and block them. This will be completely based on AI and we be using real network data and its features to train the models and then will make a pipeline for reinforcement learning.
4. Pipeline for Continuous Learning:
A pipeline for continuous learning is essential as world changes and data patterns changes rapidly so continuous learning will be made in effect so real time data is updated continuously.
First, we will use the best possible way to get data and all the requirements and start our software part. We will start by doing exploratory data analysis and creating a pipeline for our data which will insure that data is well cleaned and all the values are according to their data types and according to standards. Then after performing all the analysis we will perform SMOTE analysis if required and or if our data is imbalanced. Then we will work on creating our AI model we will try both techniques classification and clustering and if required we will do semi-supervised machine learning techniques to create our model. We will try different classification algorithms such as Neural Networks, Random Forest, Gradient Boost, etc. We can then use Voting technique to see which algorithm did the best.
The concept of anomaly detection is of great significance and is a growing field of cyber security. Due to dynamic change of malware in network traffic data, traditional tools and techniques are failing to protect networks from attack penetration. More and more organizations have become vulnerable to Internet attacks/intrusions. All organizations big or small are spending money on security and buying devices having pre-built procedures and anti- images to detect anomalies but...if any zero-day attack comes then they are vulnerable and here comes AI. It will provide an additional security element which will not look at the data and matches the anti-images.....rand rather it will try to look at the outlier patterns and then make the call about that packet.
DATA COLLECTION:
First, we’ll collect our data. We gathered our data for our final year project from an open source.
Whose link is attached below
https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/
EDA(Exploratory Data Analysis) FORMATION:
Our next step will be EDA formation
FEATURE SELECTION:
Random Forest for feature engineering.
SELECTED FEATURE SPLITTING:
Train, test split with cross K-fold Validation method
SUPERVISED LEARNING:
Neural networks
Decision tree
XG Boost
Ada Boost
SVM
Gaussian Naive Bayes
VALIDATION WITH TEST DAT
A:
We’ll use accuracy, precision, score F1 and confusion matrix to measure the quality of predictions
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Arduino | Equipment | 1 | 1500 | 1500 |
| ESP-8266-01 microcontroller | Equipment | 1 | 1000 | 1000 |
| Batteries | Equipment | 5 | 150 | 750 |
| D1 Mini Datalogger Shield | Equipment | 1 | 1000 | 1000 |
| microSD card (16GB+) | Equipment | 1 | 1500 | 1500 |
| USB SD card reader | Equipment | 1 | 500 | 500 |
| Resistors | Equipment | 1 | 500 | 500 |
| Wires | Equipment | 1 | 500 | 500 |
| Board ( bread board or verro) | Equipment | 1 | 500 | 500 |
| Final box | Equipment | 1 | 1000 | 1000 |
| Type A to micro USB cable | Equipment | 1 | 1000 | 1000 |
| printing | Miscellaneous | 4 | 750 | 3000 |
| book binding | Miscellaneous | 4 | 500 | 2000 |
| Total in (Rs) | 14750 |
According to ?Rescue 1122?, there have been almost 7m rescue operations reported since 200...
Stroke has been regarded as the most common cause of disability and a leading cause of mor...
The project involves designing an H-darrieus Vertical Axis Wind Turbine for use in roadsid...
People who are working in the SEO industry do use paid tools for the SEO analysis of their...