Tools & Techniques for Malware Analysis and Classification

##plugins.themes.academic_pro.article.main##

Ekta Gandotra
Divya Bansal
Sanjeev Sofat

Abstract

Ever-evolving malware continues to flood the Internet at an alarming rate. This makes it challenging for security organizations and anti-malware vendors to devise effective solutions. It is, therefore, imperative to study automated tools and techniques for quick detection of malware, possibly limiting or preventing any impact on the target. The code or behavioural patterns obtained from malware analysis can be used to classify new malware samples into their existing families and recognize those which possess unknown behaviour and thus need a closer manual inspection. This paper provides a comprehensive review of techniques and tools currently employed for malware analysis and classification. It includes the comparison of tools and techniques for collecting malware, analyzing them statically and dynamically for extracting features and finally classifying these using machine learning methods. It also provides the examples from the literature that analyze executables for extracting useful features and apply machine learning for discriminating malicious software from benign ones.

##plugins.themes.academic_pro.article.details##

How to Cite
Ekta Gandotra, Divya Bansal, & Sanjeev Sofat. (2016). Tools & Techniques for Malware Analysis and Classification. International Journal of Next-Generation Computing, 7(3), 176–197. https://doi.org/10.47164/ijngc.v7i3.118

References

  1. Anderson, B., Quist, D., Neil, J., Storlie, C., and Lane, T. 2011. Graph-based malware detection using dynamic analysis. Journal in Computer Virology 7, 4, 247–258.
  2. Anderson, B., Storlie, C., and Lane, T. 2012. Improving malware classification: bridging the static/dynamic gap. In Proceedings of the 5th ACM workshop on Security and artificial intelligence. ACM, 3–14.
  3. Baecher, P., Koetter, M., Holz, T., Dornseif, M., and Freiling, F. 2006. The nepenthes platform: An efficient approach to collect malware. In International Workshop on Recent Advances in Intrusion Detection. Springer, 165–184.
  4. Bailey, M., Oberheide, J., Andersen, J., Mao, Z. M., Jahanian, F., and Nazario, J. 2007. Automated classification and analysis of internet malware. In International Workshop on Recent Advances in Intrusion Detection. Springer, 178–197.
  5. Bayer, U., Comparetti, P. M., Hlauschek, C., Kruegel, C., and Kirda, E. 2009. Scalable, behavior-based malware clustering. In NDSS. Vol. 9. Citeseer, 8–11.
  6. Bayer, U., Kruegel, C., and Kirda, E. 2006. TTAnalyze: A tool for analyzing malware. na.
  7. Bayer, U., Moser, A., Kruegel, C., and Kirda, E. 2006. Dynamic analysis of malicious code. Journal in Computer Virology 2, 1, 67–77.
  8. Bellard, F. 2005. Qemu, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track. 41–46.
  9. Bilar, D. 2007. Opcodes as predictor for malware. International Journal of Electronic Security and Digital Forensics 1, 2, 156–168.
  10. Buehlmann, S. and Liebchen, C. 2010. Joebox: a secure sandbox application for windows to analyse the behaviour of malware.
  11. Cho, I. K., Kim, T. G., Shim, Y. J., Ryu, M., and Im, E. G. 2016. Malware analysis and classification using sequence alignments. Intelligent Automation & Soft Computing 22, 3, 371–377.
  12. Christa, S., Madhuri, K. L., and Suma, V. 2012. A comparative analysis of data mining tools in agent based systems. arXiv preprint arXiv:1210.1040 . Cloppert, M. 2009. Security intelligence: Attacking the kill chain. Retrieved on June 1, 2012.
  13. Dinaburg, A., Royal, P., Sharif, M., and Lee, W. 2008. Ether: malware analysis via hardware virtualization extensions. In Proceedings of the 15th ACM conference on Computer and communications security. ACM, 51–62.
  14. Egele, M., Scholte, T., Kirda, E., and Kruegel, C. 2012. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR) 44, 2, 6.
  15. Firdausi, I., Erwin, A., Nugroho, A. S., et al. 2010. Analysis of machine learning techniques used in behavior-based malware detection. In Advances in Computing, Control and Telecommunication Technologies (ACT), 2010 Second International Conference on. IEEE, 201–203.
  16. Gandotra, E., Bansal, D., and Sofat, S. 2014a. Integrated framework for classification of malwares. In Proceedings of the 7th International Conference on Security of Information and Networks. ACM, 417.
  17. Gandotra, E., Bansal, D., and Sofat, S. 2014b. Malware analysis and classification: A survey. Journal of Information Security 2014.
  18. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The weka data mining software: an update. ACM SIGKDD explorations newsletter 11, 1, 10–18.
  19. Han, J., Pei, J., and Kamber, M. 2011. Data mining: concepts and techniques. Elsevier.
  20. Hu, X., Jang, J., Wang, T., Ashraf, Z., Stoecklin, M. P., and Kirat, D. 2016. Scalable malware classification with multifaceted content features and threat intelligence. IBM Journal of Research and Development 60, 4, 6–1.
  21. Imran, M., Afzal, M. T., and Qadir, M. A. 2015. Using hidden markov model for dynamic malware analysis: First impressions. In Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on. IEEE, 816–821.
  22. Islam, R., Tian, R., Batten, L. M., and Versteeg, S. 2013. Classification of malware based on integrated static and dynamic features. Journal of network and computer applications 36, 2, 646–656.
  23. Kang, M. G., Poosankam, P., and Yin, H. 2007. Renovo: A hidden code extractor for packed executables. In Proceedings of the 2007 ACM workshop on Recurring malcode. ACM, 46–53.
  24. Kolter, J. Z. and Maloof, M. A. 2004. Learning to detect malicious executables in the wild. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 470–478.
  25. Kong, D. and Yan, G. 2013. Discriminant malware distance learning on structural information for automated malware classification. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1357–1365.
  26. Lee, T. and Mody, J. J. 2006. Behavioral classification. In EICAR Conference. 1–17.
  27. Maloof, M. 2006. Machine Learning and Data Mining for Computer Security: Methods and Applications. Advanced Information and Knowledge Processing. Springer.
  28. Martignoni, L., Christodorescu, M., and Jha, S. 2007. Omniunpack: Fast, generic, and safe unpacking of malware. In Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual. IEEE, 431–441.
  29. Moser, A., Kruegel, C., and Kirda, E. 2007a. Exploring multiple execution paths for malware analysis. In 2007 IEEE Symposium on Security and Privacy (SP’07). IEEE, 231– 245.
  30. Moser, A., Kruegel, C., and Kirda, E. 2007b. Limits of static analysis for malware detection. In Computer security applications conference, 2007. ACSAC 2007. Twenty-third annual. IEEE, 421–430.
  31. Nari, S. and Ghorbani, A. A. 2013. Automated malware classification based on network behavior. In Computing, Networking and Communications (ICNC), 2013 International Conference on. IEEE, 642–647.
  32. Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B. 2011. Malware images: visualization and automatic classification. In Proceedings of the 8th international symposium on visualization for cyber security. ACM, 4.
  33. Park, Y., Reeves, D., Mulukutla, V., and Sundaravel, B. 2010. Fast malware classification by automated behavioral graph matching. In Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research. ACM, 45.
  34. Prince, M. B., Dahl, B. M., Holloway, L., Keller, A. M., and Langheinrich, E. 2005. Understanding how spammers steal your e-mail address: An analysis of the first six months of data from project honey pot. In CEAS.
  35. Rieck, K., Trinius, P., Willems, C., and Holz, T. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19, 4, 639–668.
  36. Riordan, J., Zamboni, D., and Duponchel, Y. 2006. Building and deploying billy goat, a worm detection system. In Proceedings of the 18th annual FIRST conference. Vol. 2006. 174.
  37. Royal, P., Halpin, M., Dagon, D., Edmonds, R., and Lee, W. 2006. Mirrored by: www. siliconinvestigations. com for more information, call us-920-955-3693.
  38. Saini, A., Gandotra, E., Bansal, D., and Sofat, S. 2014. Classification of pe files using static analysis. In Proceedings of the 7th International Conference on Security of Information and Networks. ACM, 429.
  39. Santos, I., Brezo, F., Nieves, J., Penya, Y. K., Sanz, B., Laorden, C., and Bringas, P. G. 2010. Idea: Opcode-sequence-based malware detection. In International Symposium on Engineering Secure Software and Systems. Springer, 35–43.
  40. Santos, I., Devesa, J., Brezo, F., Nieves, J., and Bringas, P. G. 2013. Opem: A staticdynamic approach for machine-learning-based malware detection. In International Joint Conference CISIS12-ICEUTE´ 12-SOCO´ 12 Special Sessions. Springer, 271–280.
  41. Schultz, M. G., Eskin, E., Zadok, F., and Stolfo, S. J. 2001. Data mining methods for detection of new malicious executables. In Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on. IEEE, 38–49.
  42. Siddiqui, M., Wang, M. C., and Lee, J. 2008. Data mining methods for malware detection using instruction sequences. In Artificial Intelligence and Applications. 358–363.
  43. Siddiqui, M., Wang, M. C., and Lee, J. 2009. Detecting internet worms using data mining techniques. Journal of Systemics, Cybernetics and Informatics 6, 6, 48–53.
  44. Sikorski, M. and Honig, A. 2012. Practical malware analysis: the hands-on guide to dissecting malicious software. no starch press.
  45. Singh, A. and Bu, Z. 2013. Hot knives through butter: Evading file-based sandboxes. Threat Research Blog.
  46. Tian, R., Batten, L., Islam, R., and Versteeg, S. 2009. An automated classification system based on the strings of trojan and virus families. In Malicious and Unwanted Software (MALWARE), 2009 4th International Conference on. IEEE, 23–30.
  47. Tian, R., Batten, L. M., and Versteeg, S. 2008. Function length as a tool for malware classification. In Malicious and Unwanted Software, 2008. MALWARE 2008. 3rd International Conference on. IEEE, 69–76.
  48. Tian, R., Islam, R., Batten, L., and Versteeg, S. 2010. Differentiating malware from cleanware using behavioural analysis. In Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on. IEEE, 23–30.
  49. Willems, C., Holz, T., and Freiling, F. 2007. Toward automated dynamic malware analysis using cwsandbox. IEEE Security and Privacy 5, 2, 32–39.
  50. You, I. and Yim, K. 2010. Malware obfuscation techniques: A brief survey. In BWCCA. Citeseer, 297–300.
  51. Zhuge, J., Holz, T., Han, X., Song, C., and Zou, W. 2007. Collecting autonomous spreading malware using high-interaction honeypots. In International Conference on Information and Communications Security. Springer, 438–451.
  52. Zolkipli, M. F. and Jantan, A. 2011. An approach for malware behavior identification and classification. In Computer Research and Development (ICCRD), 2011 3rd International Conference on. Vol. 1. IEEE, 191–194.