Authors

Hazem Mostafa Mahmoud Samy

Amer Mohamed Amer Mohamed

Mohamed Amin Abdelaziz Mohamed

Adham Samir Mohamed ElShehaat

Dr. Eslam Amer

Eng.Youssef Talaat

Publishing Date

2 December 2021

Abstract

The rising increase of malicious software poses a threat of an immense nature, whereas the reciprocating of data is not limited to personal daily transactions, but dwelled deeply within large enterprises and organizations. The purpose of this project is to achieve a new approach in detecting mimicry malware that disguises itself to resemble a valid software to bypass the conventional antiviruses which are mainly signature-based anti-viruses. The proposed antivirus would follow a dynamic analysis interpretation of detecting malicious software using machine learning techniques, thus evolving and adapting to the ever-changing process of formation of malware.

1.1 Purpose of this document

The main aim of the document is to outline the specifications and main functionalities of the system , illustrating the stages of development our system would go through ,providing a high level overview describing the intended final software product ,and a low level overview describing how would the model function to detect malware based on the dynamic approach.

1.2 Scope of this document

This document provides the detailed functional and non-functional requirements as well as the main functionalities of our system, which is detecting mimicry malware on windows operating system. It will also provide in-depth descriptions of the system’s architecture, processes and the different stages the system goes through.

1.3 System Overview

Our input is the API calls sequence which is considered as a dynamic feature.

In the processing part , we’ll be using TF-IDF (term frequency inverse document frequency) in order to calculate how relevant each word ( API Sequence) for a document in a collection of documents.

Then we’ll apply word embedding using the WORD2VEC algorithm in order to find the most relevant APIs to each other.

The above techniques will produce two matrices which will be inserted into a swarm intelligence algorithm specifically ant colony optimization which will produce a new weighted matrix.

We’ll use Networkx API in order to draw a directed weighted graph for malware and goodware , which will be considered the pattern that represents either goodware or malware.

We’ll use graph embedding techniques to compare between the graph of a suspicious program with the two graphs we previously drawn in order to classify it as either malware or goodware by finding the similarities between them.

The features extracted will be inserted into a three layered MLP for classification.

1.4 System Scope

The system will implement malware analysis through a dynamic approach rather the traditional static approach used in most antivirus programs. API call sequences of the selected program will be used along with NLP and accordingly using machine learning and deep learning to identify mimicry malware.