Wednesday, September 25, 2013

I've been busy getting up to speed with a new job lately, but I finally got some time to do some TI stuff.

While I was away, Tursi found a few bug in GCC that need to be addressed. These range from simple fixes to potentially long and painful sessions tracking down where a bytewise operation gets broken. Most of these were submitted by Tursi on the AtariAge forum, so thank you for finding these.

Let's get cracking!

First up is an easy fix for jeq/jlt and jeq/jgt pairs. There are no instructions to express "jump if arithmetically greater/lesser or equal to" like there are for logical comparisons. Those would be JLE and JHE. Instead, we must chain two conditional jumps together to get the same result. The conditional flags are not affected by any of the jump instructions, so we won't get weird behavior by doing this. I made an effort to encourage GCC to not use these combinations when possible, but sometimes there's just no avoiding the situation.

For simplicity's sake, I'll only refer to the jeq/jlt pair here. The jeq/jgt pair has the same issue, just with a different test.

I originally had arbitrarily chosen to test for equality first then for a lesser value. Everything worked, so I moved on to other issues and didn't think about it. However Tursi recommended doing the lesser test first. The idea here is that the less-then test will match a larger set of possible conditions than the equality test, so by doing that one first, the second test may not be needed and performance is increased.

Next up, we're missing an ANDI instruction in the compilation of this snippet:

extern int byte2hex[];
void fast_hex_out(int x) {
        unsigned char z=(x&0xff);
        unsigned int dat = byte2hex[z];
        VDPWD = dat>>8;
        VDPWD = dat&0xff;
}

This compiles to:

fast_hex_out
* Should be "andi r1, >00FF" here
        a    r1, r1
        mov  @byte2hex(r1), r2
        mov  r2, r3
        li   r1, >8C00
        movb r3, *r1
        swpb r2
        movb r2, *r1
        b    *r11

So, after digging into the RTL, we see that at one point, GCC intended to put that instruction there, but it got stripped out.

From step 182r.peephole2:
(insn 38 37 10 2 tursi4.c:15 (and (reg:HI 1 r1 [30])
        (const_int 255 [0xff])) -1 (nil))

Wait a second... THIS INSTRUCTION DOES NOT SET ANYTHING! IT'S TOTALLY USELESS! Of course it's being deleted.
This is the result of a bad peephole. GCC would never do anything this dumb on its own. Backing up a bit, we find the pre-peephole instructions:

From 172r.ira:
(insn 7 4 9 2 tursi4.c:15 (set (reg:QI 1 r1 [orig:27 x+1 ] [27])
        (subreg:QI (reg:HI 1 r1) 1)) 73 {movqi} (nil))

(insn 9 7 10 2 tursi4.c:15 (set (reg:HI 1 r1 [30])
        (zero_extend:HI (reg:QI 1 r1 [orig:27 x+1 ] [27]))) 75 {zero_extendqihi2} (nil))

This translates to:
   (char)r1 = (char)((int)r1)
   (int)r1 = (int)((char)r1)

With all this in mind, I found the broken peephole optimization. When I was looking at it, the problem was obvious. I feel really silly right now.

No comments:

Post a Comment