Master's Thesis · Dalhousie University · 2022
Insider Threat
GANs
Insider threats are rare by definition. That rarity is exactly what makes them so hard to detect — there's almost no data to train on. This thesis used Generative Adversarial Networks to manufacture that data.
The Problem
You Can't Train on Data That Doesn't Exist
Insider threats — disgruntled IT admins sabotaging infrastructure, employees exfiltrating customer data before jumping to a competitor — are a serious and growing concern. They're also extraordinarily rare in any given dataset.
The CERT Insider Threat Dataset is the standard benchmark for this problem. It's a synthetic dataset of 18 months of user activity across a 1,000–4,000 person organization. The class distribution makes the challenge immediately obvious.
CERT R4.2 — Class Distribution
Insider Scenario 3 has 20 samples against 307,057 normal records — a 15,000:1 imbalance. A classifier that predicts "normal" for everything gets 99.7% accuracy and catches zero insiders.
Framework
Three-Stage Pipeline
The thesis proposes an end-to-end insider threat detection system with three components, held constant across all experiments so augmentation strategies could be compared fairly.
User Behaviour Representation
Raw logs (HTTP, email, file access, USB, logon) are parsed into daily feature vectors per user. A 30-day trailing window computes percentiles of each day's activity against the user's own history — automatically normalizing for individual baselines. Two feature sets were tested: 100 features (top by Random Forest importance) and 504 features (full set).
Data Augmentation
Four strategies were compared: Baseline (undersample normal class only), SMOTE (linear interpolation between minority samples), CGAN (conditional GAN), and WCGAN-GP (Wasserstein Conditional GAN with Gradient Penalty). The GAN is trained to generate new insider samples per scenario, growing small classes up to a target count n.
Classification
Four classifiers were evaluated: Random Forest, SVM, Logistic Regression, XGBoost. Random Forest consistently outperformed the others and was selected for the full suite of experiments. All results reported as mean ± variance across 10 independent runs.
Architecture
Why WCGAN-GP
GANs are notoriously hard to train — mode collapse, vanishing gradients, and no reliable metric to assess progress during training. Three architectures were attempted before landing on a stable solution.
Vanilla GAN
Training was too unstable to produce usable results. No empirical data captured.
CGAN
Discriminator loss collapsed to zero by epoch 6000. Generated samples did not match real distributions.
WCGAN-GP
Converged reliably across every run and every parameterization. t-SNE confirmed realistic sample distributions.
Why WCGAN-GP is different
Standard GANs use KL divergence as their training signal. When the generator and real distributions don't overlap — as they don't at initialization — the gradient vanishes and training stalls. WCGAN-GP replaces this with Wasserstein distance (Earth-Mover distance), which always provides a useful gradient even when distributions are far apart. A gradient penalty enforces the Lipschitz constraint more stably than gradient clipping. The result: training converged reliably across every run and every parameterization tested — including scaling to 20,000 epochs.
Results
Performance Across Three Datasets
The model was trained exclusively on CERT R4.2, then evaluated against R5.2 (one new insider scenario) and R6.2 (two new scenarios, far fewer total insider events) to test robustness.
Metrics are macro-averaged F1 across all classes — including the smallest minority classes — using the WCGAN-GP DeepV1 strategy on the 504-feature set.
Minority class detection
WCGAN-GP was particularly strong on Scenario 3 — the class with only 20 training samples. Recall improved to 0.94 on the 504-feature set vs. 0.86 for the baseline.
Novel scenario generalization
The classifier trained on R4.2 successfully detected Scenario 5 in R6.2 — a threat type it had never seen during training — demonstrating real generalization.
Contributions
What Was New
- →First application of WCGAN-GP to insider threat data augmentation.
- →First evaluation of R4.2-trained models against R5.2 and R6.2 for cross-organization robustness.
- →All results reported as mean ± variance across 10 runs — prior work used single runs on a dataset with high variance due to tiny insider class support.
- →Evaluated augmentation up to 100,000 generated samples per class. Prior work stopped at 5,500.
- →Comprehensive breakdown of detection performance per insider scenario — not just aggregate binary metrics.
Honest Assessment
What Didn't Work
Insider Scenarios 2 and 4 had poor detection rates on R5.2 and R6.2. The exact cause isn't clear — the CERT dataset changes its underlying behavioural models between versions, and some scenarios may behave differently enough to break the classifier's learned patterns.
The SMOTE baseline consistently underperformed all other strategies. The t-SNE visualization revealed why: SMOTE generates samples by linear interpolation between existing points, which works poorly for high-dimensional, non-linear insider behaviour distributions.
Interactive
GAN Training Visualization
An animated t-SNE visualization showing generated samples converging on real insider distributions across 5,000 training epochs — coming soon.
Interested in working together?
Get in touch