@markus0412

Flow-based benchmark data sets for intrusion detection

, , , , and . Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS), page 361--369. ACPI, (2017)

Abstract

Anomaly based intrusion detection systems suffer from a lack of appropriate evaluation data sets. Often, existing data sets may not be published due to privacy concerns or do not reflect actual and current attack scenarios. In order to overcome these problems, we identify characteristics of good data sets and develop an appropriate concept for the generation of labelled flow-based data sets that satisfy these criteria. The concept is implemented based on OpenStack, thus demonstrating the suitability of virtual environments. Virtual environments offer advantages compared to static data sets by easily creating up-to-date data sets with recent trends in user behaviour and new attack scenarios. In particular, we emulate a small business environment which includes several clients and typical servers. Network traffic is generated by scripts which emulate typical user activities like surfing the web, writing emails, or printing documents on the clients. These scripts follow some guidelines to ensure that the user behaviour is as realistic as possible, also with respect to working hours and lunch breaks. The generated network traffic is recorded in unidirectional NetFlow format. For generating malicious traffic, attacks like Denial of Service, Brute Force, and Port Scans are executed within the network. Since origins, targets, and timestamps of executed attacks are known, labelling of recorded NetFlow data is easily possible. For inclusion of actual traffic, which has its origin outside the OpenStack environment, an external server with two services is deployed. This server has a public IP address and is exposed to real and up-to-date attacks from the internet. We captured approximately 32 million flows over a period of four weeks and categorized them into five classes. Further, the chronological sequence of the flows is analysed and the distribution of normal and malicious traffic is discussed in detail. The main contribution of this paper is the demonstration of a novel approach to use OpenStack as a basis for generating realistic data sets that can be used for the evaluation of network intrusion detection systems.

Links and resources

Tags

community