RISC-V Bare-Metal I/O Starter
Supplementary material for Computer Architecture: Design and Analysis (2026 edition) · Krerk Piromsopa, Ph.D. · Chulalongkorn University
This starter project provides the scaffolding for the I/O Activity (Appendix D.4). You will implement a UART driver for a bare-metal RISC-V system — first using a polling approach, then converting it to an interrupt-driven design — and measure the CPU utilization difference between them.
The target is the virt machine in QEMU, which includes a
NS16550A-compatible UART and a PLIC (Platform-Level
Interrupt Controller). No physical hardware is required.
Downloads
Prerequisites
1 — RISC-V toolchain
| OS | Command |
|---|---|
| Debian / Ubuntu | sudo apt install gcc-riscv64-unknown-elf (use with 32-bit flags below) |
| macOS | brew install riscv-gnu-toolchain |
| Windows | Use WSL2 (Ubuntu) and follow the Debian instructions |
riscv64-unknown-elf-gcc, edit the Makefile
and set CROSS = riscv64-unknown-elf. The flags -march=rv32ima -mabi=ilp32
already force 32-bit output, so the 64-bit toolchain works fine.
2 — QEMU
Quick Start
Expected output once Exercise 1 is complete and QEMU is running:
Platform Memory Map (QEMU virt)
| Device | Base Address | Notes |
|---|---|---|
| DRAM | 0x8000_0000 | 128 MB default; ELF loaded here |
| UART0 (NS16550A) | 0x1000_0000 | 8-bit registers; byte-wide access |
| PLIC | 0x0C00_0000 | UART0 = source 10 |
| CLINT | 0x0200_0000 | mtime, mtimecmp, msip |
UART Register Summary
| Offset | Name | Key Bits |
|---|---|---|
+0 | RBR (read) / THR (write) | bits 7:0 = received / transmit data |
+1 | IER | bit 0 = ERBFI (RX interrupt enable) |
+3 | LCR | bit 7 = DLAB; 0x03 = 8N1 |
+5 | LSR | bit 0 = DR (RX data ready); bit 5 = THRE (TX ready) |
Exercise Overview
Implement uart_init(), uart_putc(), and uart_getc()
in uart.c. The driver loops on the LSR status bits. Measure the fraction of
CPU time spent busy-waiting vs.\ transferring data.
Enable the UART RX interrupt via the PLIC, write a trap handler, and implement a ring-buffer ISR. The main loop computes Fibonacci numbers while the ISR enqueues received characters. Compare CPU cycles per character with Exercise 1.
Fill in the comparison table in the textbook: CPU cycles per character, average CPU utilization at 5 char/s and 100 kchar/s, worst-case RX latency, and code size.
If your platform includes a DMA controller, modify the TX path to use bulk DMA transfers and repeat the comparison. Determine at what input rate DMA becomes more efficient than the interrupt-driven approach.
File Structure
platform.h to understand the register
layout, then work through uart.c top-to-bottom following the TODO comments.
Each TODO includes a suggested implementation in comments so you can check your reasoning.