Data Privacy (Spring 2026) lab syllabus

Group policy

Philosophy

These labs complement in-class quizzes and reflection check-ins. They are designed to be semi-open:

  1. Tooling allowed: You may use documentation and starter code to reduce boilerplate, but you are responsible for correctness and explanations.
  2. Analysis-driven: The “answer” is rarely just code. It is an experimental result, a plot, or a trade-off analysis that proves you understand why the code works.
  3. Iteration required: Most tasks include an “optimizer’s gap” where the first attempt is sub-optimal. Iterate to refine the results.

Lab 1: Privacy attacks (the offense)

Topic: Membership inference and data extraction Deliverable: Jupyter notebook

  1. Task 1 (CTF Extraction): Extract a hidden flag (UVA{...}) from a “Black Box” model provided by CorpXYZ.
  2. Task 2 (MIA): Implement a membership inference attack using loss/perplexity thresholds to identify which model was trained on a specific canary.
  3. Task 3 (Iteration gap - red teaming):
    • Scenario: The model now has a simple filter that blocks the exact secret string.
    • Challenge: Generate 3 “stealthy” prompts using different strategies (e.g., semantic variations) that bypass the filter but still trick the model into revealing the secret.

Lab 2: Re-identification & reconstruction (microdata vs. statistics)

Topic: Singling-out, linkage, reconstruction; DP counts as defense Deliverable: Jupyter notebook

  1. Part 1 (Singling-out): Compute equivalence class sizes for QI tuples; report the k=1 count/fraction and the rarest QI combinations.
  2. Part 2 (Linkage / attribute disclosure): Join a synthetic directory table (IDs + QIs) to the microdata; report unique match rate and basic disclosure stats.
  3. Part 3 (Reconstruction + DP): Reconstruct sensitive bits from published subgroup counts using Z3; then add Laplace/Geometric noise (vary $\epsilon$) and re-run to observe feasibility/instability and utility error.

Lab 3: Private training (the defense)

Topic: DP-SGD, privacy accountants Deliverable: Jupyter notebook

  1. Task A (The Accountant): Calculate the noise_multiplier for a target $(\epsilon, \delta)$ and epoch count.
  2. Task B (Defense): Fine-tune the mini-GPT using Opacus.
  3. Task C (The Audit): Run your Lab 1 attack against your Lab 3 model. (Reference implementation provided).
  4. Task D (Iteration gap - performance competition):
    • Challenge: DP-SGD is highly sensitive to hyperparameters.
    • Goal: Achieve the highest possible validation accuracy for $\epsilon=3.0$.
    • Iteration: Iterate to tune learning rate, batch size, and max grad norm.
  5. Task E (Debug Challenge): Identify the privacy leakage in a provided DP-SGD training loop.
  6. Analysis: Compare the “Loss Landscapes” of non-private vs. private training.

Lab 4: Secure multi-party computation (MPC)

Topic: Secure inference, arithmetic secret sharing Deliverable: Jupyter notebook

  1. Task A (Warmup): Implement a private linear layer ($Y = XW + B$) using Crypten.
  2. Task B (Iteration gap - the softmax bottleneck):
    • Challenge: Exact softmax is slow/unstable in MPC.
    • Optimization: Design and test different Polynomial Approximations (e.g., $x^2+x$, Taylor series) for the attention mechanism.
  3. Task C (Performance): Measure the latency and communication overhead of a full Transformer block in MPC.
  4. Task D (Debug Challenge): Find the security flaw in a naive secret sharing implementation.
  5. Analysis: Discuss the “Non-Linearity Tax” – why are LLMs specifically hard to run in MPC compared to CNNs?

Technical setup and infrastructure

1. The “mini transformer”

2. Compute resources