Open Data · Dataset
Open InfoSecCAT
Anonymized Information-Security MCQ Item-Response Dataset
- 276 examinees
- 50 items · 4 options
- binary-scorable
- MIT License
- v1.0.0
An anonymized item-response dataset from a real 50-item, four-option multiple-choice computer-security examination taken by 276 students. Released to support reproduction of the real-data feasibility analysis in the accompanying paper on precision-guaranteed adaptive testing (IRT / Fisher-information stopping rules).
Download
Or browse the individual files:
| File | Size | Description |
|---|---|---|
| responses.csv | 29 KB | One row per examinee: student (anonymized code) + 50 columns (items 1–50), each cell = chosen option a–d; blank = item omitted. |
| key.csv | 431 B | question_id, correct_answer, num_choices — the scoring key. |
| README.md | 2.6 KB | Full documentation: schema, scoring, anonymization, and limits. |
| LICENSE | 1.0 KB | MIT License legal text. |
| CITATION.cff | 1.6 KB | Machine-readable citation metadata. |
Scoring
Score a cell as correct iff it equals key.correct_answer
for that item; a blank/omitted cell counts as incorrect.
import csv
key = {r["question_id"]: r["correct_answer"]
for r in csv.DictReader(open("key.csv"))}
rows = list(csv.DictReader(open("responses.csv")))
# binary score for student row r, item i:
# int(r[i].strip().lower() == key[i]) # blank -> 0
Anonymization
- The only personal identifier in the source data — the institutional
student_id— has been removed and replaced with a sequential codeS001–S276. - Row order was shuffled (fixed seed) so the anonymized code cannot be mapped back to the original ID ordering.
- No names, timestamps, demographics, or free text were ever present.
- Item wording is not included — only item numbers and the answer key — so the live exam content is not exposed.
The transformation is label-and-order only: the scored response matrix is bit-for-bit identical to the source (verified — per-item correct counts and the full scored-row multiset match), so all IRT results reproduce exactly.
Notes & limits
- N = 276 is at the lower edge for stable 2PL calibration; item-parameter standard errors are non-trivial.
- The 50-item bank is shallow and runs ~2.4 logits easier than this cohort, so it is intentionally off-target — that mismatch is the point of the feasibility demonstration.
- Multiple-choice guessing is present but unmodeled under 2PL (3PL would need ~1000+ examinees).
License & citation
Released under the MIT License (see LICENSE) — you are free to use, modify, and redistribute the data. If you use this dataset, please cite it via CITATION.cff and cite the accompanying paper:
Piromsopa, K., & Aksharanandana, P. (2026). Open InfoSecCAT: Anonymized Information-Security MCQ Item-Response Dataset (Version 1.0.0) [Data set]. Accompanying paper: Piromsopa, K., & Aksharanandana, P. (2026). "Minimal Tests, Reliable Grades: Finite Item-Count Guarantees for Adaptive Examination under Item Response Theory."