Wednesday, September 7, 2011

So I've been looking into this code, which is used to set a VDP address:

movb r9, @>8C02 <-- should be copying low byte
mov r9, r2
ori r2, >4000
movb r2, @>8C02

I turned on all debug output to dig into this sequence, and follow its development.

At some point during register alocation, this instruction is being deleted:
(insn 36 35 39 4 lucien2_5.c:18 (set (reg:QI 2 r2 [orig:40 i+1 ] [40])
(subreg:QI (reg/v:HI 9 r9 [orig:26 i ] [26]) 1)) 72 {movqi} (nil))

I'm not sure why, that looks fine to me. Instead, GCC assumes the wrong byte usage, eventually resulting in "movb r9, @>8C02".

The subreg expression seems to be removed entirely, and is replaced with an instruction like "mov r2, r2", as seen below:

Reloads for insn # 36
Reload 0: reload_in (QI) = (reg:QI 9 r9)
REAL_REGS, RELOAD_FOR_INPUT (opnum = 1), can't combine
reload_in_reg: (subreg:QI (reg/v:HI 9 r9 [orig:26 i ] [26]) 1)
reload_reg_rtx: (reg:QI 2 r2 [orig:40 i+1 ] [40])

Later this instruction is deleted, and since the subreg expression is lost, we use the wrong byte. So I need to find where this is being done.

Call tree for problem location:
ira
reload
alter_reg
df_ref_change_reg_with_loc
df_ref_change_reg_with_loc_1
reload_as_needed
subst_reloads

OK, here's the problem. The correct subreg expression needed for proper byte handling is removed in subst_reloads, but that code does not make the determination to do the removal. There is a data structure containing operations to apply for each instruction which subst_reloads duitfully applies. I need to find the code which makes the decisions. Which means I need to start over again. Ugh.

Once agin:
find_reloads
push_reload

The decision to do register substitution is done in find_reloads, but that determination is made due do a default handler. As I understand it, if a set instruction is used, but the input argument is to be reloaded, and that reload has not yet been made, the input is replaced with the output argument in an attempt to later remove this instruction. I now need to find why GCC needs to replace the subreg argument, since that looks perfectly fine to me. That decision looks to be somewhere in push_reload.

Nope. That is done in find_reloads. This is the most obtuse code I've looked at in a long time. The find_reloads function by itself is over 2000 lines of twisty logic. It's taken almost two weeks just to answer what I thought was a simple question.

It turns out that if there is a subreg expression as an instruction operand, GCC forces that expression to be reloaded, with the hope that it can be removed later. I have added a check to not do that in the case where a byte-to-word or word-to-byte expression is used. This preserves the subreg expression until instruction output, where we can emit the correct code.

So, at long last, we get good-looking code for this sequence:

mov r9, r2
swpb r2
movb r2, @>8C02
mov r9, r2
ori r2, >4000
movb r2, @>8C02

I'm cleaning out the TONS of debug output and path tracing code, and moving on to the next problem.

1 comment:

  1. I just checked the source of "find_reloads", it is really frightening. I also read some of the RTL doc. All I can say is I'm really in admiration of the work you have done so far.

    ReplyDelete