som v5 test

baseline (v19v)

output of aes4

57 2 220 25
37 220 17 106
132 9 133 11
29 251 151 50

bubble 20 (no output)  noi   3597
aes4      (no output)  noi  10254
matmul2 4x4 (no out)   noi   4785
queen2  92 sol (no out)noi 133352

running time under profiler
queen2 100x hp  310 ms

(x1000)
 13,335 switch op
  2,464 ldx
  1,443 stx
    744 jmp
    205 jne
  2,464 jlt
    205 call
        inside call
          411  pass param
        1,234  save reg
    205 ret
        inside ret
        1,234  restore reg
  1,930 add
  1,703 sub
    196 mov
  1,771 efor

note:  call/ret saving/restoring reg is expensive.  one call costs 5 times as much as other instruction.


now modify interp.c to v5 (tx-code, 3 int) and profile (not yet the t2-code) hp n6400 core2 t7200 2G ram 1G

queen2-v5-noout.obj 100x (ms)

327 280 296  avg 301

conclusion so far: change from "struct" to 3 int. does not slow down the interpreter.

8 Oct 2009

y[11] = x   doesn't work 

C:\prabhas\bag\som\som-v5\som>som42a -x som5.obj
load test.txt
Warning: gv new global
+sq
(to sq ({ (= #1 (* #1 (+ 2 3 )))(= gv 3 )(= ([ #2 11 )#1 )))
7 Fun 0 3 0
10 Mul 3 1 405
13 Mov 1 3 0
16 Mov 55 403 0
19 Stx 1 76 411  <<  what is here?
22 Ret 64 3 0
25 nop 0 0 0

also the problem with opt. mov after bop

x = x * x
gv = 3

gv interferes.

9 Oct 2009

more debug

pass is incorrect when nested calls

to sq x y =
  x & y 
to main | a =
  sq a (sq 11 22)


3 Call 0 0 main
6 Sys #13 0 0
9 Fun 2 3 sq
12 And 3 1 2
15 Ret 3 3 0
18 Fun 0 1 main
21 Pass #1 1 0     <<  here
24 Pass #1 #11 0   <<  here
27 Pass #2 #22 0
30 Call 0 0 sq
33 Pass #2 64 0
36 Call 0 0 sq
39 Ret 64 1 0

matmul2.txt

to mul3 a b | s c d =
  s = 0
  d = a
  if b < 0 c = 0-b else c = b
  while c > 0
    if c & 1 s = s + d    << this line has error   
    d = d + d   
    c = c >> 1
  if b < 0 s = 0-s
  s   
 
if c & 1

30 Jmp 45 0 0   << while c > 0
33 nop 6 4 401  << if c & 1  becomes intelligible
36 Add 3 3 5    << s = s + d
39 Add 5 5 5    << d = d + d
42 Shr 4 4 #1   << c = c >> 1
45 Jgt 33 4 #0

look like (optJmp) combines logic and with jf !

improve cascade jump

from queen2.txt

111 Mov 2 #-1 0
114 Sub 5 Q #1
117 Jmp 228 0 0   for i 0 Q-1
120 Ldx 7 col 2   
123 Jlt 141 7 1   col[i] >= level <1>
126 Add 9 1 2
129 Ldx 10 d45 9  
132 Ge 8 10 1     d45[level+i]>=level <2>    
135 Mov 64 8 0
138 Jmp 144 0 0
141 Mov 64 #0 0   <1>
144 Jf 168 64 0   <2> jf to <3>
147 Add 14 1 Q
150 Sub 13 14 #1
153 Sub 12 13 2
156 Ldx 15 d135 12
159 Ge 11 15 1   d135[level+Q-1-i]>=level <4>
162 Mov 64 11 0
165 Jmp 171 0 0
168 Mov 64 #0 0   <3>
171 Jf 228 64 0   <4>

@123 jxx 141
...
@141 mov #0
@144 jf 168

should be short circuit to

@123 jxx 168    as mov #0, jf  always jump.

code optimization

before opt. (compiler v5)

bubble (noi)   3765   (no output)
matmul2 (noi)  5112   (no output)
queen2 (noi) 201849   (print 92)
aes4            ---   (still buggy!)
  
compare to v19v

bubble  v19 has call1, v5 has one surplus pass (call swap)
matmul2  
  v5 has surplus: mov retval x, ret retval (mul3)
  opt to ret x but if it is a merge, must diffuse ret
  v19 has call1 (call inita, initb, mul3)
queen2
  v19 has better short jump (jxx to mov #0 jf) and call1

aes4  must have something to do with initialising some array in DS. (as som42b has no runimm)


opt in v19v

optRet:  mov retval x, ret retval
         => ret x

after doing call2 (max 2 param)

bubble 20 (no out)  noi   3385
matmul2   (no out)  noi   4760
queen2    (92)      noi 199792

14 Oct 2009
(for democracy!)

after doing short cut jmp (improv macro and)

queen2    (92)      noi 154676

still inferior to v19v code generator

the difference is:

v19v

     26 Ldx 4 2 1004
     27 Jlt 59 4 1       
     28 Add 4 1 2
     29 Ldx 4 4 1005
     30 Jlt 59 4 1        <<<

v5

120 Ldx 7 col 2
123 Jlt 225 7 1
126 Add 9 1 2
129 Ldx 10 d45 9        <<
132 Ge 8 10 1           <<
135 Mov 64 8 0          <<<<  redundant mov
138 Jmp 144 0 0         <<
141 Mov 64 #0 0
144 Jf 162 64 0

with eliminating redundant mov

queen2    (92)      noi 145748

v19v has the following "improv" to combine logic and jmp to jf.

   logic jmp to jf.y $z => jinvlog.y jmp.z  (8)

with this optimisation

queen2    (92)     noi  133368
aes5      (no out) noi   10105

<I have modified aes4.txt so that all static array are now alloc during run>

another rule that should be done is 
  mov v x, ret v -> ret x

(but I will leave it to the next release)

15 Oct 2009

with the last rule (8), queen2 is almost perfect at 133368 (equal to v19v).  There is no code left for improvement.  Howver, there is "waste" jump as an artifact of applying optimisation rules:

129 Ldx 10 d45 9
132 Jlt 219 10 1
135 Jmp 144 0 0         <<  wasted jump    
138 Mov 64 #0 0         <<  become nop
141 Jf 219 64 0         <<  become nop
144 Add 14 1 Q

line 135, 138, 141  should be erased and save one jump.

I count the number of "jmp" in queen2 to see how much it will improve noi.  ("jmp" occurs only in this occassion).

queen2 (92) noi 133368, jmp 7443 (5.6%)
if the above suggestion is applied then noi 125925
this is better than hand-coded v19 129933!  (from 
som-v19/doc/compare-performance.txt)

I also rewrite genex to separate it into two functions: genex2 and genex1 (in gen2-s.txt)

e := exp, asg, cntl

genex1 handles exp only
genex2 handles asg and cntl

this breaks one big function into two.  At first, I did it to simplify my thought as asg and cntl return retval/nil while exp can return singleton (lv/gv/num/temp).  I thought it might make the structure of genex different. As it turns out, they are very same as one big function.

using new genex2  compile queen2.txt  noi 127181
                          aes4.txt    noi 318928
      old genex           queen2.txt  noi 127605
                          aes4.txt    noi 319665

very similar.  merge (gen2-s.txt) back to the old gencode-s.txt.

17 Oct 2009

choose to use gencode-s.txt.  gen2-s.txt does not offer any advantage.   update gencode-s.txt to accommodate improvement in gen2-s.txt (but it is still one function genex, not two).

to merge  v5-vm to som5.obj I choose to use lex2.c and relocate tokstring (from 1 which disrupt registers 0..63, to 250).  Here are the list of system area occupied by the compiler.

0..63   registers (vm)

100     ?
101     mode (interactive, compile, execute)
102     tokvalue
103     tokcol
104     line

110..149  som_src  (input file name)
150..199  som_lst  (listing file)
200..249  som_obj  (object file)
250..299  tokstring
390..700  small constants

1000..MAXDS  data segment

to test this new allocation

reloc  tokstring to 250, use som42b.  change
1.  in lex2.c  

#define tokstring_ads	250

2.  in som42.txt (token-s.txt)

to initlex =
  ...
  tokstring = 250	

then recompile  som42.txt -> som43.obj

use som43.obj with new som42b

20 Oct 2009

1) bugs: are in codegen when parameters are function calls
two calls will use RETVAL which will collide.
to solve this, the first RETVAL is moved to another reg.
However, it is not easy to recognise the situation as the second call can be macro.  for the time being, if the second call is macro, we assume it requires RETVAL (which is pessimistic).  should be improved in the next release.

compile som6.txt (full compiler) by som5.obj (som4.2 object of som5 compiler) noi 3280065

2) bugs: without freeing and reusing temporaly registers genex uses 80 locals!  This causes "overflow" registers to RETVAL as the restoring registers will overwrite the RETVAL. 
fix: do freeing and reusing.  now genex uses 18 locals.

note: comparing queen2-v18.lst and queen2.lst (from v5 compiler) v5 still uses a bit more registers.

v5 with som6.obj (compiled by som42b + som5.obj som6.txt) now can compile som6.txt and reaches a fix-point (the output som6'.obj is exact same as som6.obj using the same source).

-------------------------
what are the steps lead to this point?
(how to cross compile across the vm, som4.2 to som-v5)
(read v5-crosscompile.txt)

----------------------------
Bugs during the process:

1)  parameter passing to a function being a function call. code generator will use RETVAL to keep results of all function calls. This causes collision on registers.  fix by mov the result to a new register.

2)  bug in vm, tRet returns M[c] (not a as I intended). fixed.

3)  overflow registers (limit 63).  need freeing and reusing temporaly registers. fixed.

22 Oct 2009
 
Now, try to enable "runim". Bug in "syscall eval" because
1) cannot pass the filename from som to compiler to "loadfile". This is because a reg must be used to pass filename. 
2) To use a reg the current context (all reg used) must be save and restore for the execution of "eval" in user space.

23 Oct 2009  (Piya-Maharaj-day)

(read v5-runimm.txt)

Now, the loadfile works properly.  Compiling the whole som5 compiler source takes 1781169 inst. (noi).  (compiling with som 4.2a uses 3102330).

now start to do "immediate line" compiling

bug is in the "updatesym" that the update to the value of global variable is done (to avoid having to runim).  The line
  M[ref] = cdr e
This line (M[getRef ref] = cdr e) was introduced since som 4.0.

The behaviour is that this line stamped M[200] and corrupted the object file name (obj_file string).  So when try to "outobj" the system crashed.

The bug is hard to trace its source.  After "updatesym" is confirmed to be the source, it is analysed as follows.

(read v5-relocate-static.txt)

Now immediate lines are compiled correctly. This is tested on "aes4.txt".  Compiling som5.txt takes noi 1776760.

Final step is to compile som5.txt that has immediate lines. The log5.txt records this, noi 1777927.

Interactive mode

bug:  it does not work.  It does not load lib2.som.

24 Oct 2009

analysis of bug:  lex for interactive mode needs to return to caller when each line is ended.  

fix: set lexmode properly.

(read v5-vm-and-compiler-mode.txt)

25 Oct 2009

Improvement

allocation of registers can be improved.  Many functions use far too many registers.

atoi 11
strpack 11
findsym 8
evalBop 19 ***
genex 14
genbop 10
gencase 14
genfun 9
reloc 10
(from log5.txt  25 Oct 2009 evening)

1.  inspecting evalBop, every cases is expression (not assgn) and consumes one new register.  Because gencase does:
  v = final genex ...
we should "freev" in "final".

the result is good, with many func. now are at optimum.

atoi      11  to 9
strpack   11     9
findsym    8     7
evalBop   19     4 ***
genex     14    13
genbop    10    10
gencase   14    13
genfun     9     8
reloc     10     9

2.  inspecting genfor-loop  it already releases "end" reg.

26 Oct 2009
