S3.0  Multicore processor


S3.0 is a multicore processor with S2.1 instruction set.  Each core has its own "program" memory but all cores share "data" memory.  So, when fetching instructions, there is no memory access conflict (as each core has its own program memory).  Because data memory is shared, the conflict access can occurs.  However, initially, we assume no data memory access conflict. 

All cores have their own hard/soft interrupt. The interrupt vector of core N (N = 0..maxcore) of interrupt K (K = 0..3) is at M[1000 + 4*N + K].

When program multicore processors, we write assembly code for all cores in one file.  So, it is similar to write a single core program.  A special instruction, "cid r1" (core id), returns the core number. This instruction is used to differentiate an individual core so that we can assign different cores to execute different part of code.  Here is an example:

:main
    cid r3        ; core number in r3
    eq r2 r3 #0
    jt r2 program1
    jmp program2

:program1         ; this is run by core 0
    ....         

:program2         ; this is run by core 1

Coordinate multicore

To make multicore work collaboratively we need semaphore (similar to single core) and the new all core "sync" instruction.  All core sync will synchronise every core.  Two instructions help synchronise pair of core, waitx/sigx  (imitate semaphore).

How to block the current process.  A processor needs to stop itself, then "wake up" by another process.  When a processor stops, it can only be wake up by the external signal.  A new instruction "waitx" puts the core into a wait state. It can be waken up when another core sends "intercore signal" by the instruction "sigx #c" to it.

Here is an example how to synchronise two cores.  Count1 runs on core 0 and Count1 runs on core 1.  Each loop of Count1 waits for external signal.  The signal comes from Count2 "sigx #0".  So, loop of Count1 is synchronise with loop of Count2 (each prints its number alternatingly).  We put in some delay in Count2 to make sure that Count1 reaches "waitx" before Count2 executes "sigx #0".

.code 0
    cid r3
    eq r2 r3 #0
    jt r2 count1
    jmp count2

:count1            ; count 1..5
    mov r1 #0
    st r1 cnt1   
:loop1   
    waitx             ; ****
    ld r5 cnt1
    add r5 r5 #1
    st r5 cnt1
    trap r5 #print   ; increment cnt1 and print
    eq r6 r5 #5
    jf r6 loop1
    trap r0 #stop


:count2            ; count 11..20

    nop
    nop
    nop
    nop            ; delay
    mov r1 #10
    st r1 cnt2
:loop2
    sigx #0           ;  ****
    ld r5 cnt2
    add r5 r5 #1
    st r5 cnt2
    mov r2 #43       ; +
    trap r2 #printc

    trap r5 #print   ; increment cnt2 and print
    eq r6 r5 #20
    jf r6 loop2
    trap r0 #stop
.end

If you run the above program, the output looks like this:

C:\s30\test>sim30 -2 count-sigx.obj
2 cores, load program, last address 30
>g
1 +11 2 +12 3 +13 4 +14 5 +15 core 0 stop, clock 54
+16 +17 +18 +19 +20 core 1 stop, clock 101
>

What will happen if "sigx" is send to core that is not in "waitx"?  The signal will be just ignored.  We have to be careful not to send "sigx" to the core that is already stop as it will continue to execute the unknown instruction.

Process synchronisation

To perform all-core synchronisation,  each core executes "sync" instruction and send itself into wait state. When all cores execute "sync", every core will be wake up to continue. If only some core need to be sync, it is better to use "waitx/sigx".   Here is an example:

  p1()                        p2()
     i = 0                       i = 10
     while i < 5                 while i < 20
         print i                     print i
         i++                         i++
     sync                        sync
     stop                        stop

p1 runs loop to print 0..4.  p2 runs loop to print 10...19.  p1 reaches the end faster than p2 but it waits there. p2 catches up at "sync", then both cores proceed to stop. 

The implementation of sync uses "runflag[core]" to control the start/stop of cores.  Here is the pseudo code (NC number of core, the current core is k):

sync() 
   runflag[k] = 0             // stop this core
   for(a=0, i=0; i<NC; i++)   // check all cores
      if(runflag[i] == 0) a++
   if(a == NC)                // all cores have stop
      for(i=0; i<NC; i++)
         runflag[i] = 1       // start all cores

Using the simulator (sim30)

The simulator has the following commands:

a - show all cores
g - go
t - single step
b ads - set breakpoint
c n - focus core n
s [rn,mn,pc] v - set
d ads n - dump
r - show register
q - quit
h - help

Most commands are similar to sim21.  To avoid information overflow when we use the simulator, we need to "focus" on one core at a time.  The command "c 0", set focus to core 0 (call it the current core).  The display of information about internal states will be of the current core. The command "a" will display information of all cores.  Each instruction of each core will be executed at the same time so we really have the simulation of multicore run concurrently.

When one core executes "stop" it will halt but other core will continue.  The command "g" will start execution from the current PC (with the focus on the current core) until "stop". If there is other core running, we can refocus to that core and continue by "g".  Here is the example "count.txt",  core0 stop before core1.  The  "-2" specifies the number of core used in the simulation.

C:\s30\test>as30 count.txt
C:\s30\test>sim30 -2 count.obj
2 cores, load program, last address 24
>c 0
>g
1 +11 2 +12 3 4 +13 5 +core 0 stop, clock 36
14
>c 1
>g
+15 +16 +17 +18 +19 +20 core 1 stop, clock 87
>


The simulation has two modes: single step, and trace. Single step executes one instruction and return to user.  trace executes until stop or reach a breakpoint.  ("t" single step, "g" trace)

The core0 runs program to count 1..5 concurrently with core1 runs program to count 11...20.  First, we focus on core0, after "g" it runs to "stop". Then, we refocus to core1 and "g" from the current PC. core1 continue to run until "stop". 

The display can be focus to one core or show all cores. For one core, when the focus core stops, the trace stops.  For all cores, trace stops when all cores stop.  ("c 0" focus core 0, "a" all core)

Here is another example, first, we set "a" to see all two cores. The "t" single step both cores. You can see that initially both cores execute the same code (observe their PC are similar). After a few steps, they depart and execute different part of code (core0 PC 4, core1 PC 3).

C:\Users\prabhas\Dropbox\s30\test>sim30 -2 count.obj
2 cores, load program, last address 24
>a
mode show all cores
>t
T 1
core 0: PC   0 cid r3
r1:0 r2:0 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
core 1: PC   0 cid r3
r1:0 r2:0 r3:1 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
>t
...
>t
T 3
core 0: PC   2 jt r2 4
r1:0 r2:1 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
core 1: PC   2 jt r2 4
r1:0 r2:0 r3:1 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
>t
T 4
core 0: PC   4 mov r1 #0
r1:0 r2:1 r3:0 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
core 1: PC   3 jmp 13
r1:0 r2:0 r3:1 r4:0 r5:0 r6:0 r7:0 r8:0 r9:0
r20:0 r21:0 r22:0 r23:0 r24:0 r25:0 r26:0 r27:0 r28:0 r29:0
>t
...


T is the global clock and it ticks once per instruction execution.   When "waitx" is executed, the current core PC is stop but it continues to listen to the intercore signal. When signal occurs, that core continue its execution.  In single step mode, the simulator will continue to step and shows T but no instruction is executed.

Example programs

count.txt                two cores run two processes
count-sigx.txt        use waitx/sigx  to synchronise two cores
count-sync.txt       use sync to sync two cores

Summary

There are new instructions to support collaboration of multicore.  All of them must be real instructions, they can not be written as sequence of other instructions as they control start/stop of cores.

   cid r1
   waitx
   sigx #c
   sync

Update for s30-6 package

R[0] now is free, unlike s2.1 where it is always zero.  Return address (when interrupt) is now in R[31] instead of in a special register.  These instructions are now built-in:  ei, di, pushm, popm, cid, waitx, sigx, sync.  Finally, interrupt works for all cores.

Tools

s30-6.zip

30 Oct 2017

Farewell to King Bhumibol, Rama 9

last update 6 Nov 2017