Generating malware samples using Generative Adversarial Networks (GANs)
Malware detection in emerging world of Internet of Things (IoT) continues to be a threat for technology rise. Many malware detection methods that involve deep learning models are present but they have a common weakness that is identifying new malware variants. Most of the models that exist are good
2025-06-28 16:32:43 - Adil Khan
Generating malware samples using Generative Adversarial Networks (GANs)
Project Area of Specialization Artificial IntelligenceProject SummaryMalware detection in emerging world of Internet of Things (IoT) continues to be a threat for technology rise. Many malware detection methods that involve deep learning models are present but they have a common weakness that is identifying new malware variants. Most of the models that exist are good at detecting malware samples of the families these models were trained on, but they are not trained to identify new malware families, hence they have to be retrained on new families in order to identify new malware variants. There comes the problem of lack of data availability. Since such models require large volume of relevant training dataset to provide acceptable results, the model fail to give good performance due to lack of training data availability. The purpose of this research paper is to devise a novel technique called GANS-WAR to generate malware samples using Generative Adversarial Networks (GANs) which can then be used to create large training datasets of malware samples that contain different types of malwares, this is how it becomes a heterogeneous malware environment, that is an environment that has multiple types of malwares and the system is able to detect them. These datasets can then further be used to train malware detectors and also used for many research purposes in future. The dataset used for our model is taken from various authentic resources such as Virus Share, Virus Total and Microsoft malware dataset and contains several types of malware variants for heterogeneous architectures, for example Windows-32, Windows-64, ARM, MIPs, 8086 etc.
Project ObjectivesThe main objective of this project is to devise a novel model that will be able to provide malware samples in the form of images which can be used for many industrial applications that requires samples of malware data such as anti-virus industry.
Project Implementation MethodThe main focus of this project is to develop a way of generating malware samples which can further be used to train other malware detectors or classifiers. This system can be implemented using high performance computers at anti-virus research and development centers where this system will provide huge amount of data which can further be used to create efficient machine learning based products for malware detection.
Benefits of the ProjectThis project can bring benefit to many anti-virus vendors which are working on creating anti viruses which are strong and efficient, but lack data. Many IoT devices are being implemented as technology is growing and network speed is getting faster, and since they are relatively new, malware data for these devices is not present. Thus, products which require good amount of data to be created are lacking the resources. Using this system, anti-virus vendors can easily produce their own malware sample datasets on which they can train and test their systems relatively at lower costs. This will save time and huge costs that were required to gather, clean and analyze the data from multiple sources.
Technical Details of Final DeliverableThe final deliverable will be a trained model that will be able to generate malware samples for Portable Executables 32-bit systems only for now. The scope has become limited due to lack of computational power for training the model on such huge datasets as ours. Despite all of that, the final system will be able to be extended to multiple architectures in future as required by the anti-virus vendors.
Final Deliverable of the Project Software SystemCore Industry ITOther Industries Security Core Technology Artificial Intelligence(AI)Other Technologies Big DataSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 70000 | |||
| MSI GeForce RTX 2060 | Equipment | 1 | 70000 | 70000 |