I found another optimization that's worth doing. During initialization of local byte arrays, this sequence is emitted:
Bytes Clocks
li r2, >30 * 256 4 12+4
movb r2, @2(r10) 4 14+4
li r5, >78 * 256 4 12+4
movb r5, @3(r10) 4 14+4
--- ----
16 68
I can squeeze out a few bytes by doing this instead:
Bytes Clocks
li r2, >30 * 256 + >78 4 12+4
movb r2, @2(r10) 4 14+4
swpb r2 2 10
movb r2, @3(r10) 4 14+4
--- ----
14 62
This is about a ten percent gain in time and space, which isn't too bad. Unfortunately, I can't figure out a way to implement this now. I may have to come back to this later.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment