Monday, February 28, 2011

I've been completely occupied by JITC testing at work, and haven't looked at TI stuff in almost a month. But I've got a little time right now.

It looks like GCC favors MEM-REG-MEM copies rather then MEM-MEM copies. I've noticed in testing that M-M copies are often split up, and M-R-M copies are never combined. This happens regardless of how instrcution costs are set up, or how the C code is constructed.

That makes sense, since most modern machines prefer this kind of operation. I just need to put a peephole in to fix this. I'm getting quite a few peepholes, which is normally the sign of bandaids over a deeper problem. However, since I can't see anything actually wrong at this point, and my target is old enough to be wierd by modern standards, I'll try not to worry about that too much.

$ wc -l 2k_chess.s
Before peephole: 1462
After peephole: 1440

So I optimized out 22 instructions or 1.5% of the line count. Not very impressive, but it was easy to get.

Wednesday, February 9, 2011

I've been away from TI stuff for a while, work and home life were pretty hectic and they take priority. I've got another month of madness coming up, but I should have some TI time available for a little bit.

That 2K chess program has a lot of goodies hiding in it. It's exposed quite a few weaknesses in the compiler, and it's small enough to be manageable if I need to isolate some bits.

Things I've found so far:

1) The fake registers show up in output code. That's bad.
movb r1, @73(r10)
mov @94(r10), r2
szc FAKE_R0_LOW, r2

2) Also, there's a form that looks like "copy memory byte to sign extended register" which is a convoluted mess.
movb @105(r10), r2
swpb r2
mov FAKE_R2_LOW, r2
swpb r2
sra r2, 8

3) Additionally, all memory-to-memory copies get needlessly split up into two instructions.
movb @48(r10), r7
movb r7, @5(r5)

4) There's this mess
movb r3, r3
swpb r3
sla r3, >8
mov r3, r5

5) What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L

6) Here's an opportunity to remove an extra "swpb". If either R1 or R2 is dead, we can reorder operations and remove an instruction.
mov r1, r2
swpb r2
movb r2, @L
swpb r2

I'll be starting out on problem three, the memory-to-memory copies. That should not be happening, and should be easy to find. (I'm sure I'll regret saying that...)

RTL of a typical example:

(insn 1755 795 796 146 2k_chess.c:117 (set (reg:QI 14 r7)
(mem/c:QI (plus:HI (reg/f:HI 20 r10)
(const_int 48 [0x30])) [4 %sfp+48 S1 A8])) 68 {movqi} (nil))

(insn 796 1755 797 146 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg:HI 10 r5)
(const_int 5 [0x5])) [0 .Y+0 S1 A8])
(reg:QI 14 r7)) 68 {movqi} (nil))

This gets split from a single instruction in 172.ira

From 168.asmcons:

(insn 796 795 797 139 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0 .Y+0 S1 A8])
(reg/v:QI 66 [ Y ])) 68 {movqi} (nil))

From 172.ira:

Reloads for insn # 796
Reload 0: reload_in (HI) = (reg/v/f:HI 64 [ a ])
reload_in_reg: (reg/v/f:HI 64 [ a ])
reload_reg_rtx: (reg:HI 10 r5)
Reload 1: reload_out (QI) = (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0 .Y+0 S1 A8])
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0 .Y+0 S1 A8])
Reload 2: reload_in (QI) = (reg/v:QI 66 [ Y ])
reload_in_reg: (reg/v:QI 66 [ Y ])
reload_reg_rtx: (reg:QI 14 r7)

handy info for reading RTL dumps:
(insn ....

What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L

From 172.ira:

(insn 1776 857 859 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(mem/c/i:QI (const:HI (plus:HI (symbol_ref:HI ("M") [flags 0x2] l 0xb752a058 M>)
(const_int 1 [0x1]))) [2 M+1 S1 A8])) 68 {movqi} (nil))

(insn 859 1776 1304 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))

(insn 1304 859 1305 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))

(insn 1305 1304 1171 156 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L") r_decl 0xb752a4d0 L>) [0 L+0 S1 A8])
(and:QI (mem/c/i:QI (symbol_ref:HI ("L") ) [0 L+0
S1 A8])
(not:QI (reg:QI 4 r2 [343])))) 41 {*} (nil))

Derived from insruction in 128.expand

(insn 859 858 860 2k_chess.c:54 (set (reg:QI 343)
(not:QI (mem/c/i:QI (reg/f:HI 342) [2 M+1 S1 A8]))) -1 (nil))

(insn 860 859 0 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L") ) [0 L+0 S1 A8])
(and:QI (reg/v:QI 66 [ Y ])
(reg:QI 343))) -1 (nil))