I've noticed that the function prologue and epilogue is needed to set up the stack, and to save off the non-volatile registers. I've come up with these forms for the prologue, depending on the number of registers to save off:
ai sp, -regsize cycles:14+0 bytes:4
mov reg, *sp+ 14+8 2
ai sp, -regsize-framesize 14+0 4
in general, bytes =8+2N: 10,12,14,16,18,20
ai sp, -regsize-framesize cycles:14+0 bytes:4
mov reg, *X(sp) 14+8 4
in general, bytes =4+4N: 8,12,16,20,24,28
So use form 1 only when we have three or more registers to save.
The epilogue is the same for both forms:
ai sp, framesize
mov reg, *sp+
The plan is to not use a frame pointer, or save off the stack pointer as part of a call. This saves us a ton of space over the course of a program since we save four bytes on the stack, and at least four instructions per function. The drawback of this design is that we lose the ability to use "calloc" or derive a call tree during debugging. I don't like this, but it's a good tradeoff.
I tried adding parallel CC0 checks in some instructions, but it just made a mess of the resulting code (about 2-3 times bigger, lots of redundant moves). I'll come back to this later.
I also checked for arguments passed on the stack. Looks like work needs to be done on the caller and callee side. Poop.