Master's Thesis · Dalhousie University · 2022

Insider Threat
GANs

Insider threats are rare by definition. That rarity is exactly what makes them so hard to detect — there's almost no data to train on. This thesis used Generative Adversarial Networks to manufacture that data.

GANsWCGAN-GPMachine LearningCybersecurityPythonKerasTensorFlowRandom ForestCERT Dataset
April 2022

The Problem

You Can't Train on Data That Doesn't Exist

Insider threats — disgruntled IT admins sabotaging infrastructure, employees exfiltrating customer data before jumping to a competitor — are a serious and growing concern. They're also extraordinarily rare in any given dataset.

The CERT Insider Threat Dataset is the standard benchmark for this problem. It's a synthetic dataset of 18 months of user activity across a 1,000–4,000 person organization. The class distribution makes the challenge immediately obvious.

CERT R4.2 — Class Distribution

Normal users307,057 samples
Insider Scenario 2861 samples
Insider Scenario 185 samples
Insider Scenario 320 samples

Insider Scenario 3 has 20 samples against 307,057 normal records — a 15,000:1 imbalance. A classifier that predicts "normal" for everything gets 99.7% accuracy and catches zero insiders.

Framework

Three-Stage Pipeline

The thesis proposes an end-to-end insider threat detection system with three components, held constant across all experiments so augmentation strategies could be compared fairly.

01

User Behaviour Representation

Raw logs (HTTP, email, file access, USB, logon) are parsed into daily feature vectors per user. A 30-day trailing window computes percentiles of each day's activity against the user's own history — automatically normalizing for individual baselines. Two feature sets were tested: 100 features (top by Random Forest importance) and 504 features (full set).

02

Data Augmentation

Four strategies were compared: Baseline (undersample normal class only), SMOTE (linear interpolation between minority samples), CGAN (conditional GAN), and WCGAN-GP (Wasserstein Conditional GAN with Gradient Penalty). The GAN is trained to generate new insider samples per scenario, growing small classes up to a target count n.

03

Classification

Four classifiers were evaluated: Random Forest, SVM, Logistic Regression, XGBoost. Random Forest consistently outperformed the others and was selected for the full suite of experiments. All results reported as mean ± variance across 10 independent runs.

Architecture

Why WCGAN-GP

GANs are notoriously hard to train — mode collapse, vanishing gradients, and no reliable metric to assess progress during training. Three architectures were attempted before landing on a stable solution.

Failed

Vanilla GAN

Training was too unstable to produce usable results. No empirical data captured.

Unstable

CGAN

Discriminator loss collapsed to zero by epoch 6000. Generated samples did not match real distributions.

Stable

WCGAN-GP

Converged reliably across every run and every parameterization. t-SNE confirmed realistic sample distributions.

Why WCGAN-GP is different

Standard GANs use KL divergence as their training signal. When the generator and real distributions don't overlap — as they don't at initialization — the gradient vanishes and training stalls. WCGAN-GP replaces this with Wasserstein distance (Earth-Mover distance), which always provides a useful gradient even when distributions are far apart. A gradient penalty enforces the Lipschitz constraint more stably than gradient clipping. The result: training converged reliably across every run and every parameterization tested — including scaling to 20,000 epochs.

Results

Performance Across Three Datasets

The model was trained exclusively on CERT R4.2, then evaluated against R5.2 (one new insider scenario) and R6.2 (two new scenarios, far fewer total insider events) to test robustness.

Metrics are macro-averaged F1 across all classes — including the smallest minority classes — using the WCGAN-GP DeepV1 strategy on the 504-feature set.

CERT R4.2Trained on this dataset
0.911
CERT R5.2+1 new insider scenario
0.617
CERT R6.2+2 new scenarios, far fewer insider events
0.614

Minority class detection

WCGAN-GP was particularly strong on Scenario 3 — the class with only 20 training samples. Recall improved to 0.94 on the 504-feature set vs. 0.86 for the baseline.

Novel scenario generalization

The classifier trained on R4.2 successfully detected Scenario 5 in R6.2 — a threat type it had never seen during training — demonstrating real generalization.

Contributions

What Was New

  • First application of WCGAN-GP to insider threat data augmentation.
  • First evaluation of R4.2-trained models against R5.2 and R6.2 for cross-organization robustness.
  • All results reported as mean ± variance across 10 runs — prior work used single runs on a dataset with high variance due to tiny insider class support.
  • Evaluated augmentation up to 100,000 generated samples per class. Prior work stopped at 5,500.
  • Comprehensive breakdown of detection performance per insider scenario — not just aggregate binary metrics.

Honest Assessment

What Didn't Work

Insider Scenarios 2 and 4 had poor detection rates on R5.2 and R6.2. The exact cause isn't clear — the CERT dataset changes its underlying behavioural models between versions, and some scenarios may behave differently enough to break the classifier's learned patterns.

The SMOTE baseline consistently underperformed all other strategies. The t-SNE visualization revealed why: SMOTE generates samples by linear interpolation between existing points, which works poorly for high-dimensional, non-linear insider behaviour distributions.

Interactive

GAN Training Visualization

An animated t-SNE visualization showing generated samples converging on real insider distributions across 5,000 training epochs — coming soon.

Interested in working together?

Get in touch