I've noticed a few places where we have a sequence where a register is loaded with a constant, used for some operation, then discarded. Like this:
li r1, 27
mpy r1, r4
I wonder if it would be better to put these constants in the data section, and do something like this:
mpy @const_27, r4
Lets check it out!
li r1, 27 - 4 bytes, 12+4+4=20 clocks
mpy r1, r4 - 2 byte, 52+4=56 clocks
total: 6 bytes, 76 clocks
.data const_27: data 27 - 2 bytes, 0 clocks
mpy @const_27, r4 - 2 bytes, 52+4+8=64 clocks
total: 4 bytes, 64 clocks
Hmm, pretty good... 33% smaller, 15% faster. I wasn't expecting that. This could be better if the constant is used in other places too. That would further reduce the effective bytes used for each instruction. (Average size is between 4 and 2 bytes per operation. Asymptotic to 2.) This is also more drastic for quicker instructions (like movb: 20+14+4+8=46 clocks vs. 26 clocks, 43% faster).
I should look at ways to optimize memset and memcpy sequences, there's got to be a way to take advantage of that.