S3.0 Multicore processor

S3.0 is a multicore processor with S2.1 instruction set. Each core has its own "program" memory but all cores share "data" memory. So, when fetching instructions, there is no memory access conflict (as each core has its own program memory). Because data memory is shared, the conflict access can occurs. However, initially, we assume no data memory access conflict.

All cores have their own hard/soft interrupt. The interrupt vector of core N is at M[1000+N].

When program multicore processors, we write assembly code for all cores in one file. So, it is similar to write a single core program. A special instruction, "cid r1" (core id), returns the core number. This instruction is used to differentiate an individual core so that we can assign different cores to execute different part of code. Here is an example:

:main cid r3 ;; core number in r27 eq r2 r3 #0 jt r2 program1 jmp program2 :program1 ;; this is run by core 0 .... :program2 ;; this is run by core 1

Coordinate multicore

To make multicore work collaboratively we need semaphore (similar to single core) and the new all core "sync" instruction.

How to block the current process. A processor needs to stop itself, then how to "resume"? When a processor stops, it can only be wake up by the external signal, such as interrupt.

A new instruction "wait for interrupt (wfi)" behaves like it interrupt itself and goes into sleep state.

R[31] = next PC, stop execution

When interrupt occurs it behaves like it returns from an interrupt and continues execution.

PC = R[31], continue execution

With this instruction we can implement semaphore as follows:

wait(sem) M[sem]-- decrement count if( M[sem] < 0 ) enqueue current process block current process (wfi)
Block the current process is achieved by "wfi"

signal(sem) if( M[sem] < 1 ) M[sem]++ increment count if( M[sem] <= 0 ) p = dequeue() send interrupt to p (intx p)
send interrupt to p, is a new instruction "intx r1" where p = R[r1] is the core number 0.. NC-1 (NC number of core).

What will happen if "intx r1" is send to core that is not in "wfi"? For a consistent meaning, core p should behave as if it is interrupted. However, to simplify it, we can just ignore core p if it is not in "wfi". We have to be careful not to send "intx" to the core that is already stop (not "wfi") as it will continue to execute the unknown instruction.

Process synchronisation

We assume all-process synchronisation. Each process execute "sync" instruction and send itself into sleep. When all cores execute "sync", every core will be wake up to continue. If only some process need to be sync, the mechanism to do it will be more complicate. we need to know which process and how many of them want to be sync. For the core that is idle, we can execute "sync" so that it will participate properly with other active cores when they "sync". We can also sync a pair of cores using semaphore. Here is an example:

p1() p2() i = 0 i = 10 while i < 5 while i < 20 print i print i i++ i++ sync sync stop stop

p1 runs loop to print 0..4. p2 runs loop to print 10...19. p1 reaches the end faster than p2 but it waits there. p2 catches up at "sync", then both cores proceed to stop.

The implementation of sync uses "runflag[core]" to control the start/stop of cores. Here is the pseudo code (NC number of core, current core is k):

sync()

   runflag[k] = 0        
            // stop this core

   for(a=0, i=0; i<NC; i++)   // check all
        cores

if(runflag[i] == 0) a++

   if(a == NC)       
                // all cores have stop

for(i=0; i<NC; i++)

         runflag[i] = 1 
             // start all cores

Using the simulator (sim30)

The simulator has the following commands:

a - show all cores
g - go
t - single step
b ads - set breakpoint
c n - focus core n
s [rn,mn,pc] v - set
d ads n - dump
r - show register
q - quit
h - help

Most commands are similar to sim21. To avoid information overflow when we use the simulator, we need to "focus" on one core at a time. The command "c 0", set focus to core 0 (call it the current core). The display of information about internal states will be of the current core. The command "a" will display information of all cores. Each instruction of each core will be executed at the same time so we really have the simulation of multicore run concurrently.

When one core executes "stop" it will halt but other core will continue. The command "g" will start execution from the current PC (with the focus on the current core) until "stop". If there is other core running, we can refocus to that core and continue by "g". Here is the example, suppose core0 stop before core1.

C:\s30\test>sim30 count.obj

      load program, last address 23

      >c 0

      >g

      <1> [11] <2> [12] <3> [13] <4> [14] <5> [15]
      stop, execute 40 instructions

      >c 1

      >g

      [16] [17] [18] stop, execute 63 instructions

      >

The core0 runs program to count 1..5 (shows in <1> ...) concurrently with core1 runs program to count 11...18 (shows in [11]...). First, we focus on core0, after "g" it runs to "stop". Then, we refocus to core1 and "g" from the current PC. core1 continue to run until "stop".

Here is another example, first, we set "a" to see all two cores. The "t" single step both cores. You can see that initially both cores execute the same code (observe their PC are similar). After a few steps, they depart and execute different part of code (core0 PC 10, core1 PC 18).

C:\s30\test>sim30 count.obj

      load program, last address 23

      >a

      >t

      core 0: PC   0 trap 5

      r1:0 r2:0 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0

      core 1: PC   0 trap 5

      r1:0 r2:0 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:1 r28:0 r29:0

      >t

      . . .

      >t

      core 0: PC  10 eq r6 r5 #5

      r1:1 r2:1 r3:0 r4:0 r5:1 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0

      core 1: PC  18 mv r1 r5

      r1:11 r2:0 r3:0 r4:0 r5:11 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:1 r28:0 r29:0

      >

The simulation has two modes: single step, and trace. Single step executes one instruction and return to user. trace executes until stop or reach a breakpoint. ("t" single step, "g" trace)

The display can be focus to one core or show all cores. For one core, when the focus core stops, the trace stops. For all cores, trace stops when all cores stop. ("c 0" focus core 0, "a" all core)

The interrupt vector of each core is stored at M[1000+cid]. So, core0 iv is M[1000], core1 is M[1001] etc. Initially all interrupts are disable. There is one interrupt timer always runs in the background and when it reaches time-out (set by DEL in the simulator, must be recompiled), an interrupt event occurs. The processor jumps to the interrupt service routine. T is the global clock and it ticks once per instruction execution. The timer is time-out every DEL instructions.

When "wfi" is executed, the current core PC is stop but it continues to listen to the interrupt signal. When interrupt occurs, that core continue its execution. In single step mode, the simulator will continue to step and shows T but no instruction is executed.

Example programs

count.txt     use two cores to run two processes
wait.txt          use wfi and intx to synchronise two cores
sync.txt         use semaphore to synchronise two cores
sync2.txt       use sync to sync two cores

Summary

There are new instructions to support collaboration of multicore. All of them must be real instructions, they can not be written as sequence of other instructions as they control start/stop of cores.

   cid r1
   wfi
   intx r1
   sync

Update for s30-2 package

R[0] now is free, unlike s2.1 where it is always zero. Return address is now in R[31] instead of in a special register. These instructions are now built-in (not in "trap"):

ei, di, pushm, popm, cid,
      wfi, intx, sync.

Some instructions have been renamed:


      pushm is savr, popm is resr.

Some instructions now is not needed: savt, rest (as R[31] can be moved explicitly with mv). Some OS supports are still trap functions: newsem, wait, signal. Finally, interrupt works for all cores.

last update 1 May 2016