Hazem Mostafa Mahmoud Samy

Amer Mohamed Amer Mohamed

Mohamed Amin Abdelaziz Mohamed

Adham Samir Mohamed ElShehaat

Dr. Eslam Amer

Eng.Youssef Talaat

Publishing Date

November 8, 2021


The rising increase of malicious software poses a threat of an immense nature, whereas the reciprocating of data is not limited to personal daily transactions, but dwelled deeply within large enterprises and organizations. The purpose of this project is to achieve a new approach in detecting mimicry malware that disguises itself to resemble a valid software to bypass the conventional antiviruses which are mainly signature-based anti-viruses. The proposed antivirus would follow a dynamic analysis interpretation of detecting malicious software using machine learning techniques, thus evolving and adapting to the ever-changing process of formation of malware.

1.1 Background

Malicious programs are in an exponential spread tallying over 1 billion programs as of 2021. The rate hasn’t slowed down either, with 560,000 new malware variants arising every single day. Malware can wreck and ruin your computer and steal your private information, which is devastating for business and personal devices. A regular antivirus implements signature analysis where they compile a database of known threats and compare the program in question with their database. This method is incapable of keeping pace with the immense number of newly identified malware programs. The project will instead implement a dynamic approach in which the API call sequences are analyzed which are better suited for detecting viruses and mimicry malware.

1.2 Motivation

The first appearance of the term “virus” takes us back to the mid days of the eighties , the term itself was defined by Fred Cohen, he described it as “A program that can infect other programs by modifying them to include a, possibly evolved, version of itself.” in his 1986 Ph.D. but before the introduction of the term itself ,in the earliest of the 1970s, the first documented virus globally was introduced by the name of “Creeper Worm” ,It was an experiment by Bob Thomas who worked back then at BBN Technologies , the virus would copy itself to other systems and would print the message “I’m the creeper, catch me if you can”, at the same exact year , co-worker of the same company named Ray Tomlinson developed an anti version for the “Creeper Worm” and named it “The Reaper” , The Reaper didn’t only move across networks but detected and deleted the “Creeper Worm” so it’s considered as the first ever anti virus, The first malware to be released in a PC-Based environment was named “Brain” which was released at 1986 , it was created to test loop holes by two Pakistani brothers in their company’s software. Later on , as viruses evolved , the antivirus industry started to evolve as well , in 1987 the first ever antivirus company ‘McAfee’ was founded by the British American computer programmer John McAfee as he released his first antivirus program “VirusScan” .Within the last 15 years ,due to the technological advancements , malware spreading speed increased exponentially ,making it a very interesting problem to address as malware affect individuals as well as companies , it can drain resources , steal private information , it may appear as a trivial problem on an individual scale but by scaling it to companies and huge enterprises , leakage of customers information or denial of service can lead to huge financial and legal problems. The current solution for the malware issue is the well known static technique which is used by most of the popular anti viruses in which it detects the virus’ signature and block it , virus signature is defined as “a set of unique data, or bits of code, that allow it to be identified” so it acts as the fingerprint of the virus , once found the antivirus proceeds to remove or quarantine the suspected file , while it may seem like an efficient solution for the currently known viruses , its obvious challenge is the amount of new viruses released daily as it’s reported that more than 560,000 viruses are produced everyday , As a result of that , anti viruses are updated frequently to improve its accuracy by incorporating new data. Possible improvements to the current anti viruses is by leaning more on a dynamic approach which tends to focus on the behavior of the malware within the system and how does it interact with it rather than its signature.

1.3 Problem Statement

Detecting Mimicry malware based on the dynamic analysis method, utilizing the usage of the API calls made by a program on the Windows operating system as our dataset, thus classifying goodware and malware and predicting mimicry malware on the basis of the API call sequence pattern that represents the behavior of a malicious program.