I've been completely occupied by JITC testing at work, and haven't looked at TI stuff in almost a month. But I've got a little time right now.
It looks like GCC favors MEM-REG-MEM copies rather then MEM-MEM copies. I've noticed in testing that M-M copies are often split up, and M-R-M copies are never combined. This happens regardless of how instrcution costs are set up, or how the C code is constructed.
That makes sense, since most modern machines prefer this kind of operation. I just need to put a peephole in to fix this. I'm getting quite a few peepholes, which is normally the sign of bandaids over a deeper problem. However, since I can't see anything actually wrong at this point, and my target is old enough to be wierd by modern standards, I'll try not to worry about that too much.
$ wc -l 2k_chess.s
Before peephole: 1462
After peephole: 1440
So I optimized out 22 instructions or 1.5% of the line count. Not very impressive, but it was easy to get.
Monday, February 28, 2011
Wednesday, February 9, 2011
I've been away from TI stuff for a while, work and home life were pretty hectic and they take priority. I've got another month of madness coming up, but I should have some TI time available for a little bit.
That 2K chess program has a lot of goodies hiding in it. It's exposed quite a few weaknesses in the compiler, and it's small enough to be manageable if I need to isolate some bits.
Things I've found so far:
1) The fake registers show up in output code. That's bad.
inv FAKE_R0_LOW
movb r1, @73(r10)
mov @94(r10), r2
szc FAKE_R0_LOW, r2
2) Also, there's a form that looks like "copy memory byte to sign extended register" which is a convoluted mess.
movb @105(r10), r2
swpb r2
mov FAKE_R2_LOW, r2
swpb r2
sra r2, 8
3) Additionally, all memory-to-memory copies get needlessly split up into two instructions.
movb @48(r10), r7
movb r7, @5(r5)
4) There's this mess
movb r3, r3
swpb r3
sla r3, >8
mov r3, r5
5) What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L
6) Here's an opportunity to remove an extra "swpb". If either R1 or R2 is dead, we can reorder operations and remove an instruction.
mov r1, r2
swpb r2
movb r2, @L
swpb r2
I'll be starting out on problem three, the memory-to-memory copies. That should not be happening, and should be easy to find. (I'm sure I'll regret saying that...)
RTL of a typical example:
(insn 1755 795 796 146 2k_chess.c:117 (set (reg:QI 14 r7)
(mem/c:QI (plus:HI (reg/f:HI 20 r10)
(const_int 48 [0x30])) [4 %sfp+48 S1 A8])) 68 {movqi} (nil))
(insn 796 1755 797 146 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg:HI 10 r5)
(const_int 5 [0x5])) [0.Y+0 S1 A8])
(reg:QI 14 r7)) 68 {movqi} (nil))
This gets split from a single instruction in 172.ira
From 168.asmcons:
(insn 796 795 797 139 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0.Y+0 S1 A8])
(reg/v:QI 66 [ Y ])) 68 {movqi} (nil))
From 172.ira:
Reloads for insn # 796
Reload 0: reload_in (HI) = (reg/v/f:HI 64 [ a ])
ALL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
reload_in_reg: (reg/v/f:HI 64 [ a ])
reload_reg_rtx: (reg:HI 10 r5)
Reload 1: reload_out (QI) = (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0.Y+0 S1 A8])
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0.Y+0 S1 A8])
Reload 2: reload_in (QI) = (reg/v:QI 66 [ Y ])
REAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v:QI 66 [ Y ])
reload_reg_rtx: (reg:QI 14 r7)
-------------------
handy info for reading RTL dumps:
(insn ....
-------------------
What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L
From 172.ira:
(insn 1776 857 859 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(mem/c/i:QI (const:HI (plus:HI (symbol_ref:HI ("M") [flags 0x2] l 0xb752a058 M>)
(const_int 1 [0x1]))) [2 M+1 S1 A8])) 68 {movqi} (nil))
(insn 859 1776 1304 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))
(insn 1304 859 1305 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))
(insn 1305 1304 1171 156 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L") r_decl 0xb752a4d0 L>) [0 L+0 S1 A8])
(and:QI (mem/c/i:QI (symbol_ref:HI ("L")) [0 L+0
S1 A8])
(not:QI (reg:QI 4 r2 [343])))) 41 {*tms9900.md:870} (nil))
Derived from insruction in 128.expand
(insn 859 858 860 2k_chess.c:54 (set (reg:QI 343)
(not:QI (mem/c/i:QI (reg/f:HI 342) [2 M+1 S1 A8]))) -1 (nil))
(insn 860 859 0 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L")) [0 L+0 S1 A8])
(and:QI (reg/v:QI 66 [ Y ])
(reg:QI 343))) -1 (nil))
That 2K chess program has a lot of goodies hiding in it. It's exposed quite a few weaknesses in the compiler, and it's small enough to be manageable if I need to isolate some bits.
Things I've found so far:
1) The fake registers show up in output code. That's bad.
inv FAKE_R0_LOW
movb r1, @73(r10)
mov @94(r10), r2
szc FAKE_R0_LOW, r2
2) Also, there's a form that looks like "copy memory byte to sign extended register" which is a convoluted mess.
movb @105(r10), r2
swpb r2
mov FAKE_R2_LOW, r2
swpb r2
sra r2, 8
3) Additionally, all memory-to-memory copies get needlessly split up into two instructions.
movb @48(r10), r7
movb r7, @5(r5)
4) There's this mess
movb r3, r3
swpb r3
sla r3, >8
mov r3, r5
5) What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L
6) Here's an opportunity to remove an extra "swpb". If either R1 or R2 is dead, we can reorder operations and remove an instruction.
mov r1, r2
swpb r2
movb r2, @L
swpb r2
I'll be starting out on problem three, the memory-to-memory copies. That should not be happening, and should be easy to find. (I'm sure I'll regret saying that...)
RTL of a typical example:
(insn 1755 795 796 146 2k_chess.c:117 (set (reg:QI 14 r7)
(mem/c:QI (plus:HI (reg/f:HI 20 r10)
(const_int 48 [0x30])) [4 %sfp+48 S1 A8])) 68 {movqi} (nil))
(insn 796 1755 797 146 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg:HI 10 r5)
(const_int 5 [0x5])) [0
(reg:QI 14 r7)) 68 {movqi} (nil))
This gets split from a single instruction in 172.ira
From 168.asmcons:
(insn 796 795 797 139 2k_chess.c:117 (set (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0
(reg/v:QI 66 [ Y ])) 68 {movqi} (nil))
From 172.ira:
Reloads for insn # 796
Reload 0: reload_in (HI) = (reg/v/f:HI 64 [ a ])
ALL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0)
reload_in_reg: (reg/v/f:HI 64 [ a ])
reload_reg_rtx: (reg:HI 10 r5)
Reload 1: reload_out (QI) = (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (mem/s:QI (plus:HI (reg/v/f:HI 64 [ a ])
(const_int 5 [0x5])) [0
Reload 2: reload_in (QI) = (reg/v:QI 66 [ Y ])
REAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg/v:QI 66 [ Y ])
reload_reg_rtx: (reg:QI 14 r7)
-------------------
handy info for reading RTL dumps:
(insn
-------------------
What the heck is this supposed to be?
movb @M+1, r2
inv r2
inv r2
szcb r2, @L
From 172.ira:
(insn 1776 857 859 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(mem/c/i:QI (const:HI (plus:HI (symbol_ref:HI ("M") [flags 0x2]
(const_int 1 [0x1]))) [2 M+1 S1 A8])) 68 {movqi} (nil))
(insn 859 1776 1304 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))
(insn 1304 859 1305 156 2k_chess.c:54 (set (reg:QI 4 r2 [343])
(not:QI (reg:QI 4 r2 [343]))) 47 {one_cmplqi2} (nil))
(insn 1305 1304 1171 156 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L")
(and:QI (mem/c/i:QI (symbol_ref:HI ("L")
S1 A8])
(not:QI (reg:QI 4 r2 [343])))) 41 {*tms9900.md:870} (nil))
Derived from insruction in 128.expand
(insn 859 858 860 2k_chess.c:54 (set (reg:QI 343)
(not:QI (mem/c/i:QI (reg/f:HI 342) [2 M+1 S1 A8]))) -1 (nil))
(insn 860 859 0 2k_chess.c:54 (set (mem/c/i:QI (symbol_ref:HI ("L")
(and:QI (reg/v:QI 66 [ Y ])
(reg:QI 343))) -1 (nil))
Subscribe to:
Posts (Atom)