executable size analysis
compare  v5 to v42

benchmark          code size (int)

		som42b		v5

bubble		147		140
matmul2		351		278
queen2		309		278
aes4		881		758
lexer		2001		1650
parser		5313		4819
pgen		2983		2601

(v5 is 12% smaller)

comparing listing (assembly code)
number of instruction (static)

            som42a	v5
bubble
  init		12   	7
  show		15	9
  swap		8	6
  sort		27	14
  main		9	8

Some detail analysis of the output code

to swap a b | t =
  t = data[a]
  data[a] = data[b]
  data[b] = t

som42a

60 Fun swap 2
62 Ldy data 1
64 Put 3
66 Ldy data 2
68 Sty data 1
70 Get 3
72 Sty data 2
74 Ret 3

v5

57 Fun 2 4 swap
60 Ldx 3 data 1
63 Ldx 4 data 2
66 Stx 4 data 1
69 Stx 3 data 2
72 Ret 0 4 0

v5 output is perfect.  It cannot be reduced any further.

enum
  20 maxdata

    for j 0 maxdata-2
      if data[j+1] < data[j]
        swap j j+1

som42a

88 Lit 18
90 Put 4
92 Lit 0
94 Put 2
96 Jmp 122
98 Ldy data 2
100 Put 5
102 Lit 1
104 Add 2
106 Ldya data
108 Lt 5
110 Jf 120
112 Push 2
114 Lit 1
116 Add 2
118 Call swap
120 Inc 2
122 Jle 98 4

v5

84 Mov 2 #-1 0
87 Jmp 108 0 0
90 Add 3 2 #1
93 Ldx 4 data 3
96 Ldx 3 data 2
99 Jge 108 4 3
102 Add 3 2 #1
105 Call 2 3 swap
108 Efor 90 2 #18


matmul2
  mul3		28	18
  inita		23	11
  initb		23	11
  matmul	48	23
  main		15	12

Some detail analysis of the output code

  if b < 0 c = 0-b else c = b

som42a

16 Get 2
18 Lti 0
20 Jf 30
22 Lit 0
24 Sub 2
26 Put 4
28 Jmp 60
30 Get 2
32 Put 4

v5 

18 Jge 27 2 #0
21 Sub 4 #0 2
24 Jmp 48 0 0
27 Mov 4 2 0

: index i j = (mul3 i N) + j

ax[index i j] = i

som42a 

104 Push 1
106 Lit 4
108 Call mul3
110 Add 2
112 Put 5
114 Get 1
116 Sty ax 5

v5

78 Call 1 #4 mul3
81 Add 3 64 2
84 Stx 1 ax 3

s = s + (mul3 ax[iN+k]  bx[kN+j])

som42a

226 Get 5
228 Add 4
230 Ldya ax
232 Push 0
234 Get 6
236 Add 2
238 Ldya bx
240 Call mul3
242 Add 3
244 Put 3

v5 

165 Add 8 5 4
168 Ldx 9 ax 8
171 Add 8 6 2
174 Ldx 10 bx 8
177 Call 9 10 mul3
180 Add 3 3 64

Summary

The number of instruction of v5 (static) is much less than som42a because of two reasons:

1)  the three-address, the destination and two sources are in one instruction whereas for som42a two instructions are required.
2)  some instructions in v5 contain more argument than som42a, for example, "call", "efor".
3)  no distinction between global, local and immediate (especially immediate) makes v5 shorter (because no need to load immediate to a register). It can be seen cleary in this code:

som42a

88 Lit 18
90 Put 4
...
120 Inc 2
122 Jle 98 4

vs v5

108 Efor 90 2 #18

v5 is 12% smaller than som42a.  If v5.1 (64-bit) is 2/3 of v5 then v5.1 will be 41.8% smaller than som42a.

12 Dec 2010


  