Open InfoSecCAT · Item-Response Dataset

An anonymized item-response dataset from a real 50-item, four-option multiple-choice computer-security examination taken by 276 students. Released to support reproduction of the real-data feasibility analysis in the accompanying paper on precision-guaranteed adaptive testing (IRT / Fisher-information stopping rules).

Download

Download dataset ZIP · 8 KB

Or browse the individual files:

File	Size	Description
responses.csv	29 KB	One row per examinee: `student` (anonymized code) + 50 columns (items `1`–`50`), each cell = chosen option `a`–`d`; blank = item omitted.
key.csv	431 B	`question_id, correct_answer, num_choices` — the scoring key.
README.md	2.6 KB	Full documentation: schema, scoring, anonymization, and limits.
LICENSE	1.0 KB	MIT License legal text.
CITATION.cff	1.6 KB	Machine-readable citation metadata.

Scoring

Score a cell as correct iff it equals key.correct_answer for that item; a blank/omitted cell counts as incorrect.

import csv

key = {r["question_id"]: r["correct_answer"]
       for r in csv.DictReader(open("key.csv"))}
rows = list(csv.DictReader(open("responses.csv")))

# binary score for student row r, item i:
#   int(r[i].strip().lower() == key[i])   # blank -> 0

Anonymization

The only personal identifier in the source data — the institutional student_id — has been removed and replaced with a sequential code S001–S276.
Row order was shuffled (fixed seed) so the anonymized code cannot be mapped back to the original ID ordering.
No names, timestamps, demographics, or free text were ever present.
Item wording is not included — only item numbers and the answer key — so the live exam content is not exposed.

The transformation is label-and-order only: the scored response matrix is bit-for-bit identical to the source (verified — per-item correct counts and the full scored-row multiset match), so all IRT results reproduce exactly.

Notes & limits

N = 276 is at the lower edge for stable 2PL calibration; item-parameter standard errors are non-trivial.
The 50-item bank is shallow and runs ~2.4 logits easier than this cohort, so it is intentionally off-target — that mismatch is the point of the feasibility demonstration.
Multiple-choice guessing is present but unmodeled under 2PL (3PL would need ~1000+ examinees).

License & citation

Released under the MIT License (see LICENSE) — you are free to use, modify, and redistribute the data. If you use this dataset, please cite it via CITATION.cff and cite the accompanying paper:

Piromsopa, K., & Aksharanandana, P. (2026).
Open InfoSecCAT: Anonymized Information-Security MCQ
Item-Response Dataset (Version 1.0.0) [Data set].

Accompanying paper:
Piromsopa, K., & Aksharanandana, P. (2026).
"Minimal Tests, Reliable Grades: Finite Item-Count
Guarantees for Adaptive Examination under Item
Response Theory."