S3.0 Multicore processor

S3.0 Instruction Set

S3.0 is a multicore processor with S2.1 instruction set. Each core has its own "program" memory but all cores share "data" memory. So, when fetching instructions, there is no memory access conflict (as each core has its own program memory). Because data memory is shared, the conflict access can occurs. However, initially, we assume no data memory access conflict.

All cores have their own hard/soft interrupt. The interrupt vector of core N is at M[1000+4*N]. Four interrupt signals for each core are supported.

When we write a program for multicore processors, we write assembly code for all cores in one file. So, it is similar to write a program for single core. A special instruction, "cid r1" (core id), returns the core number. This instruction is used to differentiate an individual core so that we can assign different cores to execute different part of code. Here is an example:

:main cid r3 ; core number in r3 eq r2 r3 #0 jt r2 program1 jmp program2 :program1 ; this is run by core 0 .... :program2 ; this is run by core 1

Coordinate multicore

To make multicore processors work collaboratively we need semaphore (similar to single core) and the new all core "sync" instruction for synchronizing cores. We use semaphore to communicate between cores.

How to block the current process? A processor needs to stop itself, then "resume". We can use "wait for interrupt" instruction to achieve it. When a processor stops, it can be wake up by an interrupt signal, which can be issue by a software interrupt instruction.

An instruction "wait for interrupt (wfi)" forces processor to go into sleep state. R[31] is used to save "continuation point" (return address).

R[31] = next PC, stop execution

When interrupt occurs it behaves like a return from an interrupt.

PC = R[31], continue execution

Send an interrupt to another core (to match with wfi), is a new instruction "intx #c" where c is the core number 0.. NC-1 (NC number of core).

What will happen if "intx #c" is send to the core that is not in "wfi"? The core n should behave as if it is interrupted. To simplify it, we just ignore if that core is not in "wfi". We have to be careful not to send "intx" to the core that is already terminate (not "wfi") as it will behave as an interrupt occurs and continue to execute unintended instructions as interrupt service routines.

"intx #c" is different from "int #n" because "intx #c" generates an interrupt signal (int0) to another core but "int #n" generates (int0..3) of its own core.

Process synchronisation

All-core synchronisation

We assume all-process synchronisation. Each process execute "sync" instruction and send itself into sleep. When all cores execute "sync", every core will be wake up to continue. If only some process need to be sync, the mechanism to do it will be more complicate. We need to know which process and how many of them want to be sync. For the core that is idle, we can execute "sync" so that it will participate properly with other active cores when they "sync". We can also sync a pair of cores using semaphore. Here is an example:

p1() p2() i = 0 i = 10 while i < 5 while i < 20 print i print i i++ i++ sync sync stop stop

Assume p1 and p2 run on different core. p1 runs loop to print 1..5. p2 runs loop to print 11...19. p1 reaches the end faster than p2 but it waits there. p2 catches up at "sync", then both cores proceed to stop.

The implementation of sync uses "runflag[core]" to control the start/stop of cores. Here is the pseudo code (NC number of core, current core is k):

sync()

   runflag[k] = 0        
            // stop this core

           a = 0

   for(i=0; i<NC;
        i++)        // check all cores

if(runflag[i] == 0) a++

   if(a == NC)       
                // all cores have stop

for(i=0; i<NC; i++)

         runflag[i] = 1 
             // continue execution

Pair of cores synchronisation

Using "wfi" and inter-core interrupt "intx #core", two cores can synchronise. One core executes "wfi" to wait for interrupt. Another core issues "intx #core" to create int0 to #core. When the first core is interrupted, it behaves as if an interrupt 0 occurs and jumps to the interrupt service routine. The interrupt service routines can be an empty routine and just return. This will create a synchronisation point for both cores.

  interrupt()     // empty ISR

      

       
      p1()                       
      p2()

           i =
      0                      
      i = 10

           while i < 5
                     
      while i < 20 

              
      wfi                        
      intx #p1

               print
      i                    
      print i

              
      i++                        
      i++

          
      stop                       
      stop

Using the simulator (sim30)

The simulator has the following commands:

a - show all cores
g - go
t - single step
b ads - set breakpoint
c n - focus core n
s [rn,mn,pc] v - set
d ads n - dump
r - show register
q - quit
h - help

Most commands are similar to sim21. To avoid information overflow when we use the simulator, we need to "focus" on one core at a time. The command "c 0", set focus to core 0 (call it the current core). The display of information about internal states will be of the current core. The command "a" will display information of all cores. Each instruction of each core will be executed at the same time so we really have the simulation of multicore run concurrently.

When one core executes "stop" it will halt but other core will continue. The command "g" will start execution from the current PC (with the focus on the current core) until "stop". If there is other core running, we can refocus to that core and continue by "g". Here is the example, suppose core0 stop before core1.

C:\s30\test>sim30 count.obj

      load program, last address 23

      >c 0

      >g

      1 +11 2 +12 3 +13 4 +14 5 +15 stop, clock 36

      >c 1

      >g

      +16 +17 +18 stop, clock 71

      >

The core0 runs program to count 1..5 concurrently with core1 runs program to count 11...18 (shows in +11...). First, we focus on core0, after "g" it runs to "stop". Then, we refocus to core1 and "g" from the current PC. core1 continue to run until "stop".

Here is another example, first, we set "a" to see all two cores. The "t" single step both cores. You can see that initially both cores execute the same code (observe their PC are similar). After a few steps, they depart and execute different part of code (core0 PC 10, core1 PC 18).

C:\s30\test>sim30 count.obj

      load program, last address 23

      >a

      >t

      core 0: PC   0 cid r3

      r1:0 r2:0 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0

      core 1: PC   0 cid r3

      r1:0 r2:0 r3:1 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:1 r28:0 r29:0

      >t

      . . .

      >t

      core 0: PC  10 eq r6 r5 #5

      r1:1 r2:1 r3:0 r4:0 r5:1 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0

      core 1: PC  18 mov r1 r5

      r1:11 r2:0 r3:0 r4:0 r5:11 r6:0 r7:0 r8:0 r9:0

      r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:1 r28:0 r29:0

      >

The simulation has two modes: single step, and trace. Single step executes one instruction and return to user. trace executes until stop or reach a breakpoint. ("t" single step, "g" trace)

The display can be focus to one core or show all cores. For one core, when the focus core stops, the trace stops. For all cores, trace stops when all cores stop. ("c 0" focus core 0, "a" all core)

The interrupt vector of each core is stored at M[1000+4*cid]. So, core0 interrupt 0 vector is M[1000], core1 interrupt 0 vector is M[1004] etc. Initially all interrupts are disable. There is one interrupt timer always runs in the background and when it reaches time-out (set by DEL in the simulator, must be recompiled), an interrupt event occurs. The processor jumps to the interrupt service routine. T is the global clock and it ticks once per instruction execution. The timer is time-out every DEL instructions.

When "wfi" is executed, the current core PC is stop but it continues to listen to the interrupt signal. When interrupt occurs, that core continue its execution. In single step mode, the simulator will continue to step and shows T but no instruction is executed.

Example programs

count.txt     use two cores to run two processes
count-sync.txt   use two cores to run two processes and sync
count-intx.txt    use wfi/intx to sync two cores
suma-rz.txt       using multicore to calculate sum of an array
                   use rz36 compiler must edit a bit of assembler file after compiling

Summary

There are new instructions to support collaboration of multicore. All of them must be real instructions, they can not be written as sequence of other instructions as they control start/stop of cores.

cid r1

   intx #n   

         wfi

sync

Tools: s30-4 package

The package includes assember, simulator of S30. The change of S30 from S21 is as following. Return address of an interrupt is now in R[31] instead of in a special register RetAds. These instructions are now built-in:


      pushm, popm, cid, wfi, intx, sync.

last update 18 Nov 2022