Saturday, December 4, 2010

I found another optimization that's worth doing. During initialization of local byte arrays, this sequence is emitted:

Bytes Clocks
li r2, >30 * 256 4 12+4
movb r2, @2(r10) 4 14+4
li r5, >78 * 256 4 12+4
movb r5, @3(r10) 4 14+4
--- ----
16 68

I can squeeze out a few bytes by doing this instead:

Bytes Clocks
li r2, >30 * 256 + >78 4 12+4
movb r2, @2(r10) 4 14+4
swpb r2 2 10
movb r2, @3(r10) 4 14+4
--- ----
14 62

This is about a ten percent gain in time and space, which isn't too bad. Unfortunately, I can't figure out a way to implement this now. I may have to come back to this later.

No comments:

Post a Comment