Data Privacy (Spring 2026) lab syllabus

Group policy

Team size: Each lab should be done in a group of 2.
Rotation: You are encouraged to form a new group for each lab. This helps you network and exposes you to different coding styles and perspectives.
- Alternative: You may keep the same partner for Labs 1 & 2, and then switch to a new partner for Labs 3 & 4.
Individual work: You are permitted to work individually, but you will be graded by the same standards as a group of two.
Submission: Group members must submit identical results (notebooks/reports) and will receive identical grades.

These labs complement in-class quizzes and reflection check-ins. They are designed to be semi-open:

Tooling allowed: You may use documentation and starter code to reduce boilerplate, but you are responsible for correctness and explanations.
Analysis-driven: The “answer” is rarely just code. It is an experimental result, a plot, or a trade-off analysis that proves you understand why the code works.
Iteration required: Most tasks include an “optimizer’s gap” where the first attempt is sub-optimal. Iterate to refine the results.

Topic: Membership inference and data extraction Deliverable: Jupyter notebook

Task 1 (CTF Extraction): Extract a hidden flag (UVA{...}) from a “Black Box” model provided by CorpXYZ.
Task 2 (MIA): Implement a membership inference attack using loss/perplexity thresholds to identify which model was trained on a specific canary.
Task 3 (Iteration gap - red teaming):
- Scenario: The model now has a simple filter that blocks the exact secret string.
- Challenge: Generate 3 “stealthy” prompts using different strategies (e.g., semantic variations) that bypass the filter but still trick the model into revealing the secret.

Topic: Singling-out, linkage, reconstruction; DP counts as defense Deliverable: Jupyter notebook

Part 1 (Singling-out): Compute equivalence class sizes for QI tuples; report the k=1 count/fraction and the rarest QI combinations.
Part 2 (Linkage / attribute disclosure): Join a synthetic directory table (IDs + QIs) to the microdata; report unique match rate and basic disclosure stats.
Part 3 (Reconstruction + DP): Reconstruct sensitive bits from published subgroup counts using Z3; then add Laplace/Geometric noise (vary $\epsilon$) and re-run to observe feasibility/instability and utility error.

Topic: DP-SGD, privacy accountants Deliverable: Jupyter notebook

Task A (The Accountant): Calculate the noise_multiplier for a target $(\epsilon, \delta)$ and epoch count.
Task B (Defense): Fine-tune the mini-GPT using Opacus.
Task C (The Audit): Run your Lab 1 attack against your Lab 3 model. (Reference implementation provided).
Task D (Iteration gap - performance competition):
- Challenge: DP-SGD is highly sensitive to hyperparameters.
- Goal: Achieve the highest possible validation accuracy for $\epsilon=3.0$.
- Iteration: Iterate to tune learning rate, batch size, and max grad norm.
Task E (Debug Challenge): Identify the privacy leakage in a provided DP-SGD training loop.
Analysis: Compare the “Loss Landscapes” of non-private vs. private training.

Topic: Secure inference, arithmetic secret sharing Deliverable: Jupyter notebook

Task A (Warmup): Implement a private linear layer ($Y = XW + B$) using Crypten.
Task B (Iteration gap - the softmax bottleneck):
- Challenge: Exact softmax is slow/unstable in MPC.
- Optimization: Design and test different Polynomial Approximations (e.g., $x^2+x$, Taylor series) for the attention mechanism.
Task C (Performance): Measure the latency and communication overhead of a full Transformer block in MPC.
Task D (Debug Challenge): Find the security flaw in a naive secret sharing implementation.
Analysis: Discuss the “Non-Linearity Tax” – why are LLMs specifically hard to run in MPC compared to CNNs?