The AtariAge people were concered about stack performance, and I was second-gussing myself, so I did cycle timings for 7 or eight different ways of stack setup and teardown. I'm glad I did his, since I found a form which saves 2 bytes and 12 cycles per call. I also found that there are problems with the current code.
For small nvreg usage, the old values are stored at the bottom of the stack. For large usage, the old values are stored a the top of the stack. The epiloge assumes old values are at the top of the stack. The stack offsets assume old values are at the top of the stack.
The faster form has the old values at the bottom of the stack. If I keep the old smaller form that means that the epilogue and the stack offset calcs will need to be changed. Ick. And the port documentation. Poo.
Also, stack-vs-nvreg calculations show that using the stack is smaller for less than three uses per function. It's faster too for less than five uses per function. So the allocator needs fixing too. Bah.