DRLDO  A Novel DRL based De obfuscation System for Defence Against Metamorphic Malware

Mohit Sewak; Sanjay K. Sahay; Hemant Rathore

doi:10.14429/dsj.71.15780

Authors

Mohit Sewak Security and Compliance Research, Microsoft, Hyderabad https://orcid.org/0000-0001-8375-5713
Sanjay K. Sahay Department of Computer Science & Information, Goa Campus, BITS Pilani, Goa - 403 726 https://orcid.org/0000-0002-4640-2107
Hemant Rathore Department of Computer Science & Information, Goa Campus, BITS Pilani, Goa - 403 726 https://orcid.org/0000-0001-7298-0210

DOI:

https://doi.org/10.14429/dsj.71.15780

Keywords:

Adversarial Artificial Intelligence, Deep Reinforcement Learning, Metamorphic malware, De-obfuscation

Abstract

In this paper, we propose a novel mechanism to normalise metamorphic and obfuscated malware down at the opcode level and hence create an advanced metamorphic malware de-obfuscation and defence system. We name this system as DRLDO, for deep reinforcement learning based de-obfuscator. With the inclusion of the DRLDO as a sub-component, an existing Intrusion Detection System could be augmented with defensive capabilities against ‘zero-day’ attack from obfuscated and metamorphic variants of existing malware. This gains importance, not only because there exists no system till date that use advance DRL to intelligently and automatically normalise obfuscation down even to the opcode level, but also because the DRLDO system does not mandate any changes to the existing IDS. The DRLDO system does not even mandate the IDS’ classifier to be retrained with any new dataset containing obfuscated samples. Hence DRLDO could be easily retrofitted into any existing IDS deployment. We designed, developed, and conducted experiments on the system to evaluate the same against multiple-simultaneous attacks from obfuscations generated from malware samples from a standardised dataset that contain multiple generations of malware. Experimental results prove that DRLDO was able to successfully make the otherwise undetectable obfuscated variants of the malware detectable by an existing pre-trained malware classifier. The detection probability was raised well above the cut-off mark to 0.6 for the classifier to detect the obfuscated malware unambiguously. Further, the de-obfuscated variants generated by DRLDO achieved a very high correlation (of ≈ 0.99) with the base malware. This observation validates that the DRLDO system is actually learning to de-obfuscate and not exploiting a trivial trick.

DRLDO A Novel DRL based De obfuscation System for Defence Against Metamorphic Malware

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Make a Submission