Research of machine and deep learning methods application for host-level intrusion detection and classification

Dainius Čeponis

Doctoral dissertation

Dissertations are not being sold



The growing number of intrusions on the information systems (in the context of this thesis, information system is understood as a host i.e. end system such as a personal computer, server, etc. but not the network equipment or traffic) level requires more sophisticated methods to combat cyber-attacks. A significant loss for companies is expected if those actions are not recognised and averted. The intrusion detection systems (IDS) and antivirus applications (AV) are the main approaches to combat host-level cyber-attacks. The problem is that most popular solutions are utilising signature-based methods which are incapable of detecting new and emerging attacks.

Anomaly-based IDS are designed to detect zero-day attacks with much higher accuracy than signature-based. Typically, these systems utilise statistical or machine learning methods to perform the task. Unfortunately, an enormous amount of data is required to train and validate anomaly-based systems. Currently, host-based intrusion detection systems (HIDS) lack such data in comparison with network-based systems (NIDS) and a new extensive dataset would contribute to host-based intrusion detection research.

This dissertation consists of an introduction, four main chapters and general conclusions. The first chapter introduces existing intrusion and malware detection methods as well as approaches to data collection techniques. Existing datasets, machine learning (ML) and deep learning (DL) methods, currently used in HIDS, are reviewed at the end of chapter one. The second chapter proposes a robust method of dataset generation of malicious activity for anomaly-based HIDS training as well as introduces the generated Attack-Caused Windows System Calls Traces Dataset (AWSCTD) and its characteristics. The third chapter investigates ML methods applicability in intrusion and malware detection with the newly presented host-level dataset. Chapter four discusses the application of vanilla and advanced DL methods trained with the newly generated dataset: the new DL models are proposed and compared with already recognised state-of-the-art models.

The experiments and analysis performed have demonstrated that the utilisation of virtualisation technologies allows the effective automation of dataset generation in cases where data-generating systems should be securely isolated. Simple ML methods are not sufficient for the host-level and malware detection task compared to DL methods due to comparatively low (90–92%) accuracy. Furthermore, the proposed static single-flow DL model outperformed already recognised state-of-the-art models in the intrusion detection task. Lastly, the sequence of 600 first system calls from Windows applications allows achieving more than 95% detection accuracy that is enough to perform the majority of anomaly-based intrusion and malware detection tasks adequately.

Read electronic version of the book:


Book details

Data sheet

Imprint No:
186 p.
16 other books in the same category:

Follow us on Facebook