Machine-learning based intrusion detection

Downloading the Datasets

Download the 1999 DARPA IDS Dataset, and the 1999 KDD Dataset by running

cd data

This takes about 30 minutes (depending on your internet connection) and downloads the inside TCPDUMP files from the dataset (~18GB) organized into training and test sets, as well as a sample of the KDD dataset.

1999 DARPA Evaluation Labels

A description of how evaluation is performed for the DARPA dataset, as well as ground truth files can be found on the DARPA Dataset Documentation page.

Experiment Files

Our various experiments are organized as Python files in the root of the repository. Each of the experiments is explained below.

Checking Results is a simple script used for checking the results of each experiment.

usage: [-h] [--thresh THRESH] [--plot] [--table TABLE]
                        results_file attacks_file

positional arguments:
  results_file     the results.csv file
  attacks_file     the actual attacks file

optional arguments:
  -h, --help       show this help message and exit
  --thresh THRESH  range of thresholds to try. Format: start:stop:num_points,
                   default: 0.5:0.5:1
  --plot           make plots
  --table TABLE    make table using the specified threshold

Generating Plots

The plots we used in the poster and paper were generated using the scripts in plotting/.

Running Tests

To run tests locally, run

python -m unittest discover

from the root folder of the repository.