Monday, March 28, 2011

Pretty much all attempts to get byte operations working in a general way have failed. I'm now forced to try rewriting the compiler to handle this case. I don't like this at all, since it will make forward portability a problem, but I don't see what else I can do.

All byte subregs are optmized away in 159r.combine

Check out combine.c::combine_instructions and try_combine

Thursday, March 24, 2011

Apparently, GCC is trying to put the entire use of HI:r322 into class REAL_REGS and QI:r69[x]. Again, GCC assumes that bytes and words use the same order, and can be used interchangably. I have yet to find a good way to deal with the messed up TI register format. GRR.

Sunday, March 20, 2011

OK, back to 8-bit registers. If you remember, the last problem I hit here was that I found an instruction which was failing to allocate a register. I mentioned that the problem was in the not_usable allocation mask.

To make things easier to explain, I've copied the instruction below.

2k_chess.c:123: error: unable to find a register to spill in class ‘REAL_REGS’
this is the insn:
(insn 740 739 742 126 2k_chess.c:113 (set (reg/v:QI 69 [ x ])
(subreg:QI (reg:HI 6 r3 [322]) 1)) 70 {movqi} (expr_list:REG_DEAD(reg:HI 6 r3 [322])
(nil)))

This instruction was MOVQI, which should take any register as an operand. For some reason, GCC is convinced it needs to use the REAL_REGS class for pseudoreg r322. The problem is that we're trying to allocate a two-byte value, but we are limited by the class to only using one register.

reload failure for reload 1

Reloads for insn # 740
Reload 0: REAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
reload_out_reg: (reg/v:QI 69 [ x ])
Reload 1: reload_in (HI) = (reg:HI 6 r3 [322])
reload_out (QI) = (reg/v:QI 69 [ x ])
REAL_REGS, RELOAD_OTHER (opnum = 1)
reload_in_reg: (reg:HI 6 r3 [322])
reload_out_reg: (reg/v:QI 69 [ x ])

I suspect the problem is in allocating a register for pseudoregister 322. The subreg is causing problems. I wish GCC would let me make a generic method to handle subregs. it would make my life so much easier.

Friday, March 18, 2011

I may have to admit defeat on the transfer register scheme. While it works great for simple programs, it makes a lot of problems for more complex ones. Also, I've seen where the set and use of the transfer register are far apart, which would prevent optimizing it away. I'd need a stateful process to clean up the output assembly. If I could do that I wouldn't need these elaborate schemes in the first place.

So I guess I'm back to using 8-bit registers again. Oh well.

Thursday, March 17, 2011

OK, the initial modifications to allow the transfer register scheme are done, and I've verified that things are working pretty well so far. Observe.

Input:
char test1(int a)
{
return(a);
}

Output:
test1
mov r1, FAKE_TR
movb FAKE_TR, r1
b *r11

I've moved the transfer register to the head of the allocation list, so that should result in peephole-friendly output.

Unfortunately, 2k_chess.c crashes the compiler. Poop. This may be more involved than I thought.

Sunday, March 13, 2011

I've been kicking around the idea of using a single fake register to transfer byte and word data around. The idea is that real registers cannot copy bytes to word and so all conversions must use the fake register. This fake register cannot do any actual work, and we can optimize it away during the peepholes.

I don't know if this will work any better than the other attempts, but I've got a good feeling about it.

Saturday, March 12, 2011

After about a month and a half of 100-hour weeks, I can finally get back to a normal schedule. So that means getting back to the crash.

I'm looking at the not_usable mask during register allocation, I think that's messing me up.

It is. GCC is assuming a word-sized quantity where a byte-sized one is called for. I've got all these fake registers, and prohibit QI usage in non-fake regisers. Since GCC is confused about type and size, a register cannot be allocated.

I'm really getting frustrated about these byte-related issues.

Tuesday, March 1, 2011

I found the cause of the repeated "inv" sequences seen below:
movb @M+1, r2
inv r2
inv r2
szcb r2, @L

There was no commutative form for an expression like "A=A&(^B)", so GCC added extra "inv" instructions to fit the one form it had. By adding the commutative form "A=(^B)&A", GCC was able to recognize the redundant work and eliminate it. We now have much nicer output here:
szcb @M+1, r2
movb r2, @L

Unfortunately, this could be made smaller yet:
szcb @M+1, @L

This is the same reluctance to use memory-to-memory instructions I found earlier. I suppose I could make more peepholes for this, but that would explode the pattern count. I would need an optimization matching a pattern like "Rn=FUNC(); mem=Rn" for every instruction in the system. That's a lot of peepholes. At that point, it might be better to try to find a different way to encourace memory-to-memory operations.

Adding this same logic to word-sized not-and operations resulted in an error:

2k_chess.c:122: error: unable to find a register to spill in class ‘REAL_REGS’
this is the insn:
(insn 740 739 742 126 2k_chess.c:112 (set (reg/v:QI 69 [ x ])
(subreg:QI (reg:HI 6 r3 [322]) 1)) 70 {movqi} (expr_list:REG_DEAD (reg:HI 6 r3 [322])
(nil)))
2k_chess.c:122: confused by earlier errors, bailing out

OK... I didn't see that coming. Also, R3 should be available since it would otherwise be free at the end of the instruction. What's in the area of this instruction?

(insn 739 738 740 126 2k_chess.c:112 (set (reg:HI 322)
(and:HI (not:HI (reg:HI 362 [ prephitmp.228 ]))
(reg:HI 322))) 40 {*not_andhi} (nil))

(insn 740 739 742 126 2k_chess.c:112 (set (reg/v:QI 69 [ x ])
(subreg:QI (reg:HI 322) 1)) 70 {movqi} (expr_list:REG_DEAD (reg:HI 322)
(nil)))

Hmm, not good. That's the logic I just fixed.

Note to self: I modified reload1.c to use debug output, this will cause a crash later if not removed.