# Open InfoSecCAT

**Open Information-Security Computerized Adaptive Testing dataset** — an
anonymized item-response dataset from a real **50-item, 4-option multiple-choice
computer-security exam** taken by **276 students**.

Released to support reproduction of the real-data feasibility analysis in the
accompanying paper on precision-guaranteed adaptive testing (IRT /
Fisher-information stopping rules).

## Files

| File | Rows | Description |
|------|------|-------------|
| `responses.csv` | 276 + header | one row per examinee; `student` (anonymized code) + 50 columns (items `1`–`50`), each cell = chosen option `a`–`d`, **blank = item omitted** |
| `key.csv` | 50 + header | `question_id, correct_answer, num_choices` — the scoring key |
| `LICENSE` | — | MIT License legal text |
| `CITATION.cff` | — | machine-readable citation metadata |

Score a cell as correct iff it equals `key.correct_answer` for that item;
blank/omitted counts as incorrect.

## Quick start

```python
import csv
key = {r["question_id"]: r["correct_answer"]
       for r in csv.DictReader(open("key.csv"))}
rows = list(csv.DictReader(open("responses.csv")))
# binary score for student row r, item i:
#   int(r[i].strip().lower() == key[i])  (blank -> 0)
```

## Anonymization

- The only personal identifier in the source data — the institutional
  `student_id` — has been **removed** and replaced with a sequential code
  `S001…S276`.
- Row order was **shuffled** (fixed seed) so the anonymized code cannot be
  mapped back to the original ID ordering.
- No names, timestamps, demographics, or free text were ever present.
- **Item wording is not included** — only item numbers and the answer key — so
  the live exam content is not exposed.

The transformation is label-and-order only: the scored response matrix is
bit-for-bit identical to the source (verified: per-item correct counts and the
full scored-row multiset match), so all IRT results reproduce exactly.
Reproduce from the private source with `../anonymize_data.py`.

## Notes / limits

- N = 276 is at the lower edge for stable 2PL calibration; item-parameter
  standard errors are non-trivial.
- The 50-item bank is shallow and runs ~2.4 logits easier than this cohort, so
  it is intentionally **off-target** — that mismatch is the point of the
  feasibility demonstration.
- Multiple-choice guessing is present but unmodeled under 2PL (3PL would need
  ~1000+ examinees).

## License & citation

Released under the **MIT License** — see `LICENSE`. You are free to use, modify,
and redistribute the data. If you use this dataset, please cite it via
`CITATION.cff` and cite the accompanying paper.