Saturday, September 10, 2011

I've had a thought about these comparison patterns. If I make a pattern which only allows comparisons using registers, I can force GCC to use the efficient forms.

I'm removing the 32-bit comparisons to see what happens. Ideally, GCC will break this operation down into two comparisons. If that works, The .md file will be just a little bit smaller, which makes maintenance easier.

Here's an example with 32-bit values:
int compare_long_mem_eq_0(long *a) {return(*a == 0 ? 7: 9);}

Original output:
mov r1, r3
mov r2, r4
ci r3, >0
jne $+>4
ci r4, >0
jeq L2
li r1, >9
b *r11
jmp L6
li r1, >7
b *r11

Results after removing "tstsi" and "cmpsi" patterns
mov *r1, r2
soc @>2(r1), r2
mov r2, r1 <-- Not necessary, work on that later
jeq L51
li r1, >9
jmp L52
li r1, >7
b *r11

This is quite a bit better, and is the worst-looking output of all comparison forms. Most of the time, we can do the compare in one instruction. Typically, ORing the two words together.

Eliminating the 32-bit comparisons works really well, but causes the compiler to crash in some cases. It turns out that in the peephole optimization for comparing values to +-1 and +-2, I used a "general_operand" type, which allows memory references as well as registers. Unfortunately, I used "peephole2_reg_dead" to determine if that optimization should be used. That function explodes if called upon to check anything other than registers. If GCC tries to use one of those patterns when comparing a memory value, we will get a crash.

This was fixed by only allowing registers for the optimization pattern, and all is well. Honestly, that's what it should have been all along.

As a side effect of this work, the code that determines which instructions affect the condition flags has been rewritten from scratch. That code was originally taken from the PDP-11, and modified slightly. Now that I'm more familiar with the GCC internals, I can see where the flaws are. The new code is smaller, easier to follow as well as more correct.

Since the 32-bit comparison work went so well, I thought I'd monkey with the 16-bit comparisons. Instead of worrying about scratch registers, I'll just force all comparisons to be done in registers. Basically what will happen is that values will be copied into registers for the comparison, and later steps will realize that the move itself will set the comparison flags, resulting in clean looking output.

Sadly, that doesn't work out. Much more effort is done to set up the comparison than what I had before. Usually, there are several extra MOVs and no effort is made to use the additional optimizations. Oh well. What I have is still pretty good.

No comments:

Post a Comment