Posted at AtariAge:
That chess program was so insane, I needed to see what the compiler would do with it. Aside from a few unexpected fake register accesses (grr! fixed by hand), and failing to handle the giant lookup table (not suprising) it was pretty much drama-free.
The resulting code with -O2 optimizations: 1463 lines of assembly. Are there errors? Could be... But I have no motivation to compare against that C code. At first glance, it looks right.
Here's the readelf dump of the resulting object file:
eric@compaq:~/dev/tios/src/temp$ tms9900-readelf -S 2k_chess.o
There are 8 section headers, starting at offset 0xefc:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 00000000 000034 000e54 00 AX 0 0 2
[ 2] .rela.text RELA 00000000 001e70 000da4 0c 6 1 4
[ 3] .data PROGBITS 00000000 000e88 000041 00 WA 0 0 2
[ 4] .bss NOBITS 00000000 000ec9 000cb2 00 WA 0 0 1
[ 5] .shstrtab STRTAB 00000000 000ec9 000031 00 0 0 1
[ 6] .symtab SYMTAB 00000000 00103c 000b00 10 7 145 4
[ 7] .strtab STRTAB 00000000 001b3c 000333 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
So that's 3668 bytes of code (.text section), and 3315 bytes of data (.data + .bss sections). Of course that data size is shy about 128 MB or so.
Still, I'm really impressed with how things are shaping up so far.
Friday, January 21, 2011
Thursday, January 20, 2011
On Atariage, Astharot posted some crazy C code for a chess program, wondering if it could run on the TI. The code looks like it was from a coding contest attempting to squeeze something neat into 2K of source code. Since this looked insane, I figured it would be worth trying it aginst the compiler to see what popped out.
This initially caused compiler crashes, since I did all my testing aginst simple programs. It looks like I missed checking for more complex cases like constant memory expressions (ex: symbol+offset). There's also some lingering problems with fake registers in the 16-bit operations.
I think that since I allow any register to be used with those operations (with the caveat that all 16-bit values must start on an even register), GCC feels free to access the low byte by using the fake registers. Fixing this could be tricky. I don't think I can make a register class like I did for the 8-bit operations, since multiple registers are used for these values. This is also a problem for 32-bit values, so I don't want to use special case code to handle this. If I did that, the machine description file would at least double in size.
I'll have to think about this for a bit.
This initially caused compiler crashes, since I did all my testing aginst simple programs. It looks like I missed checking for more complex cases like constant memory expressions (ex: symbol+offset). There's also some lingering problems with fake registers in the 16-bit operations.
I think that since I allow any register to be used with those operations (with the caveat that all 16-bit values must start on an even register), GCC feels free to access the low byte by using the fake registers. Fixing this could be tricky. I don't think I can make a register class like I did for the 8-bit operations, since multiple registers are used for these values. This is also a problem for 32-bit values, so I don't want to use special case code to handle this. If I did that, the machine description file would at least double in size.
I'll have to think about this for a bit.
Sunday, January 16, 2011
After tons of frustration and research, I've finally made some good progress. I've given up on using 16-bit registers, and I'm back to fake 8-bit registers. That's the only way I can force GCC to track the size of the data value stored in a register. All attempts using 16-bit registers resulted in sadness and despair.
I've added a new register class for the real registers, and made sure that all byte operations can only use those registers. This guarantees that the fake registers will never appear in the compiled output. I've made an exception for the move byte operations. That is required for data to be copied out of the fake registers. I'm pretty happy about the whole thing.
I've also made a few optimizations. The array initilization optimizations I mentioned before have been implemented. We can now save two whole bytes of code per two bytes of array initilization. Horray!
I've also added code to potentially squeeze out an unnecessary move operation in some obscure cases. This won't really be noticable in most cases, but it makes me feel better.
Now that I'm past this nasty hurdle, I can get back to optimization and code coverage.
I've added a new register class for the real registers, and made sure that all byte operations can only use those registers. This guarantees that the fake registers will never appear in the compiled output. I've made an exception for the move byte operations. That is required for data to be copied out of the fake registers. I'm pretty happy about the whole thing.
I've also made a few optimizations. The array initilization optimizations I mentioned before have been implemented. We can now save two whole bytes of code per two bytes of array initilization. Horray!
I've also added code to potentially squeeze out an unnecessary move operation in some obscure cases. This won't really be noticable in most cases, but it makes me feel better.
Now that I'm past this nasty hurdle, I can get back to optimization and code coverage.
Sunday, January 9, 2011
So, I'm trying to get better tuncations. Let's see what GCC is doing right now.
I'm looking at this line:
unsigned char a = (val & 0x0F) + '0';
Which eventually turns into this: (R1=val, R13=a)
mov r1, r3
andi r3, >F
ai r3, >30
mov r3, @14(r10)
movb @15(r10), r13
This is the RTL for the final "mov-movb" bit of this sequence.
128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))
133r.vregs
(insn 38 37 39 4 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (nil))
150r.bypass
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))
This is unchanged till 171r.subregs_of_mode_init, which makes sense I suppose. At that point this gets split into the two move instructions. Unfortunately, there is nothing obvious in here which would prevent using memory for transfers.
Just for kicks, I removed the CANNOT_CHANGE_MODE_CLASS macro, let's see what happens. The fact that the TMSW9900 stores an 8-bit quantity in the high byte will be lost, but maybe I can trace what happens.
128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))
168r.asmcons
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))
172r.ira:
(insn 38 37 81 3 printf.c:13 (set (reg/v:QI 4 r4 [orig:45 a ] [45])
(reg:QI 4 r4 [orig:51+1 ] [51])) 68 {movqi} (nil))
Once we get to initial register allocation, this instruction has been transformed into a NOP, since GCC thinks the byte quantity will be left in the low byte of R4.
So after a lot of testing, and trial-and-error, I think I need to admit defeat on trying to get int-to-char working properly for 16-bit hard registers. GCC cannot be convinced to handle QI mode differently for the subregs. Depending on how I set up the test code, the point at which the hard subreg gets lost changes. This implies that that assumption is build into a lot of places. I might be able to optimize away conversions through the mov-movb sequence, but I need to do testing to make sure the stack will be suitably reduced, and that GCC doesn't try to reuse the value in memory later. Remember, there are no REG_DEAD flags for memory.
It's been a long day and I'm frustrated. No more for today.
I'm looking at this line:
unsigned char a = (val & 0x0F) + '0';
Which eventually turns into this: (R1=val, R13=a)
mov r1, r3
andi r3, >F
ai r3, >30
mov r3, @14(r10)
movb @15(r10), r13
This is the RTL for the final "mov-movb" bit of this sequence.
128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))
133r.vregs
(insn 38 37 39 4 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (nil))
150r.bypass
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))
This is unchanged till 171r.subregs_of_mode_init, which makes sense I suppose. At that point this gets split into the two move instructions. Unfortunately, there is nothing obvious in here which would prevent using memory for transfers.
Just for kicks, I removed the CANNOT_CHANGE_MODE_CLASS macro, let's see what happens. The fact that the TMSW9900 stores an 8-bit quantity in the high byte will be lost, but maybe I can trace what happens.
128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))
168r.asmcons
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))
172r.ira:
(insn 38 37 81 3 printf.c:13 (set (reg/v:QI 4 r4 [orig:45 a ] [45])
(reg:QI 4 r4 [orig:51+1 ] [51])) 68 {movqi} (nil))
Once we get to initial register allocation, this instruction has been transformed into a NOP, since GCC thinks the byte quantity will be left in the low byte of R4.
So after a lot of testing, and trial-and-error, I think I need to admit defeat on trying to get int-to-char working properly for 16-bit hard registers. GCC cannot be convinced to handle QI mode differently for the subregs. Depending on how I set up the test code, the point at which the hard subreg gets lost changes. This implies that that assumption is build into a lot of places. I might be able to optimize away conversions through the mov-movb sequence, but I need to do testing to make sure the stack will be suitably reduced, and that GCC doesn't try to reuse the value in memory later. Remember, there are no REG_DEAD flags for memory.
It's been a long day and I'm frustrated. No more for today.
Friday, January 7, 2011
Thursday, January 6, 2011
Monday, January 3, 2011
Wow, I've apparently hit the mother lode of horrible broken behavior here. Not only has this printf.c code triggered all the earlier problems, it's also caused an event I was afraid of for a while, but foolishly convinced myself would never happen.
If you remember, way back when I was working on function epilogues, I made a fake PC register and added all the support needed to use it. The fear at the time was what would happen if GCC decideded to use that fake register for something. In all my earlier testing, that never happened, and I convinced myself that things were good and moved on. Sadly, this round of testing has uncovered a case where that fake PC was being used.
I'm not allowed to mark the fake PC as a reserved due to its use in the epilogue. Doing that reservation causes an instant compiler crash. What I can do, however, is prevent GCC from putting that register in the allocation pool. That allows the fake register to be used for the epilogue, but nothing else. This is technically bad form, but given all the contortions needed so far to get the TMS9900 backend working, I'm not losing sleep over anything.
Long story short: the fake PC register can no longer be used for anything except epilogue work.
So now that that's out of the way, let's see if I can fix the fake register usage...
Here's the original C code, which converts larger values to A-F, "a_base" is either "x" or "X":
if(a > '9') a += a_base - '9' - 1;
This is compiled with -O1, which results in horrible code.
The resulting assembly ("a" is stored in the high byte of R13):
mov r13, r4 # Copy "a" to temp
mov r4, r13 # Copy temp back to a (for some reason)
swpb r13 # Copy to "a" to low byte of R13 (wrong setup for cb)
li r4, >39 * 256 # R4 = '9' in high byte
cb FAKE_R4_LOW, r4 # Insn 39. This is wrong for so many reasons...
RTL dumps of this instructions during compilation:
172r.ira: This looks pretty good
(insn 39 81 40 3 printf.c:14 (set (cc0)
(compare (reg/v:QI 26 r13 [orig:62 a ] [62])
(reg:QI 8 r4))) 7 {cmpqi} (nil))
185r.cprop_hardreg
insn 39: replaced reg 26 with 9
--- CLIP ---
(insn 39 81 40 3 printf.c:14 (set (cc0)
(compare (reg:QI 9 FAKE_R4_LOW [orig:62 a ] [62])
(reg:QI 8 r4))) 7 {cmpqi} (expr_list:REG_DEAD (reg:QI 8 r4)
(nil)))
In 185r.cprop_hardreg, the correct decision to use R13 was changed to use FAKE_R4. Why the heck did that happen?
As part of the 185 step, GCC attempts to reuse parts of old registers to try to eliminate some work. Unfortunately, I was missing a CANNOT_CHANGE_MODE_CLASS macro which would prevent invalid register use. I think it's OK now, but I'm done for today. Test tomorrow.
If you remember, way back when I was working on function epilogues, I made a fake PC register and added all the support needed to use it. The fear at the time was what would happen if GCC decideded to use that fake register for something. In all my earlier testing, that never happened, and I convinced myself that things were good and moved on. Sadly, this round of testing has uncovered a case where that fake PC was being used.
I'm not allowed to mark the fake PC as a reserved due to its use in the epilogue. Doing that reservation causes an instant compiler crash. What I can do, however, is prevent GCC from putting that register in the allocation pool. That allows the fake register to be used for the epilogue, but nothing else. This is technically bad form, but given all the contortions needed so far to get the TMS9900 backend working, I'm not losing sleep over anything.
Long story short: the fake PC register can no longer be used for anything except epilogue work.
So now that that's out of the way, let's see if I can fix the fake register usage...
Here's the original C code, which converts larger values to A-F, "a_base" is either "x" or "X":
if(a > '9') a += a_base - '9' - 1;
This is compiled with -O1, which results in horrible code.
The resulting assembly ("a" is stored in the high byte of R13):
mov r13, r4 # Copy "a" to temp
mov r4, r13 # Copy temp back to a (for some reason)
swpb r13 # Copy to "a" to low byte of R13 (wrong setup for cb)
li r4, >39 * 256 # R4 = '9' in high byte
cb FAKE_R4_LOW, r4 # Insn 39. This is wrong for so many reasons...
RTL dumps of this instructions during compilation:
172r.ira: This looks pretty good
(insn 39 81 40 3 printf.c:14 (set (cc0)
(compare (reg/v:QI 26 r13 [orig:62 a ] [62])
(reg:QI 8 r4))) 7 {cmpqi} (nil))
185r.cprop_hardreg
insn 39: replaced reg 26 with 9
--- CLIP ---
(insn 39 81 40 3 printf.c:14 (set (cc0)
(compare (reg:QI 9 FAKE_R4_LOW [orig:62 a ] [62])
(reg:QI 8 r4))) 7 {cmpqi} (expr_list:REG_DEAD (reg:QI 8 r4)
(nil)))
In 185r.cprop_hardreg, the correct decision to use R13 was changed to use FAKE_R4. Why the heck did that happen?
As part of the 185 step, GCC attempts to reuse parts of old registers to try to eliminate some work. Unfortunately, I was missing a CANNOT_CHANGE_MODE_CLASS macro which would prevent invalid register use. I think it's OK now, but I'm done for today. Test tomorrow.
Sunday, January 2, 2011
Disaster averted. This problem was caused by a bug in the GO_IF_LEGITIMATE_ADDRESS macro. The macro is supposed to check for valid addressing modes, but the problem was that there should be two variants of this macro, one for when it is not known if a pseudoreg is a real register, and one where it is known. I only had one macro, which effectively meant GCC did not care if real registers were ever used. that allowed the memory location to be used in a post-increment setting. All fixed now, back to testing.
Important note for later:
When adding code which depends on using debug options, don't be suprised when things segfault and crash if those options aren't used. I forgot to remove all the test code I put into df-scan.c, which caused mayhem and hours of digging when I started testing with the normal makefiles.
Don't do that again.
I found something disturbingly familiar. Remember the "sra r2, 8", then empty lines and gibberish? It's back. Unfortunately, I can't for the life of me remember what I did to fix that.
Problem text:
movb *r9+, r2
sra r2, 8
--- CLIP ---
mov r2, r2
Problem solved! There was a missing return in the "*movqihisx1" form. This seems to have had the same result as returning an uninitialized pointer. When GCC attempted to dereference it, all the junk was printed. I'll make sure no more of those pop up.
After more testing, I've found a C function which allows fake registers to be emitted into the output while using -O1. I'll need to look at this more closely tomorrow. I don't like the fact that old bugs seem to be popping up lately.
Important note for later:
When adding code which depends on using debug options, don't be suprised when things segfault and crash if those options aren't used. I forgot to remove all the test code I put into df-scan.c, which caused mayhem and hours of digging when I started testing with the normal makefiles.
Don't do that again.
I found something disturbingly familiar. Remember the "sra r2, 8", then empty lines and gibberish? It's back. Unfortunately, I can't for the life of me remember what I did to fix that.
Problem text:
movb *r9+, r2
sra r2, 8
--- CLIP ---
mov r2, r2
Problem solved! There was a missing return in the "*movqihisx1" form. This seems to have had the same result as returning an uninitialized pointer. When GCC attempted to dereference it, all the junk was printed. I'll make sure no more of those pop up.
After more testing, I've found a C function which allows fake registers to be emitted into the output while using -O1. I'll need to look at this more closely tomorrow. I don't like the fact that old bugs seem to be popping up lately.
Saturday, January 1, 2011
After much digging around, I found the instruction which is causing the crash:
(post_inc:HI (mem/c:HI (plus:HI (reg/f:HI 20 r10)
(const_int 12 [0xc])) [0 %sfp+12 S2 A16]))
GCC is expecting this to be a register:
(mem/c:HI (plus:HI (reg/f:HI 20 r10)
(const_int 12 [0xc])) [0 %sfp+12 S2 A16])
This is derived from the C line "p = format". Due to spilled register usage,
"p" is stored on the stack. So here's the evolution of tthat instruction before it breaks.
128r.expand: This is the initial definition of this intruction. This is just an RTL expression at this point.
(insn 30 29 0 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(reg/v/f:HI 69 [ format ])) -1 (nil))
152r.dse1: At this point, an instruction form has been chosen. In this case "movhi". GCC has also determined that the value of "format" will no longer be used.
(insn 30 28 61 2 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(reg/v/f:HI 69 [ format ])) 69 {movhi} (expr_list:REG_DEAD (reg/v/f:HI 69 [ format ])
(nil)))
159r.combine: Here, an invalid mode is selected. This would be something like "mov (format)+, (p)", but "format" is a memory location.
(insn 30 25 61 2 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(mem/f/c/i:HI (post_inc:HI (reg/v/f:HI 60 [ ap.34 ])) [0 format+0 S2 A16])) 69 {movhi} (expr_list:REG_INC (reg/v/f:HI 60 [ ap.34 ])
(nil)))
So, somewhere in the combine step, GCC went bonkers.
(post_inc:HI (mem/c:HI (plus:HI (reg/f:HI 20 r10)
(const_int 12 [0xc])) [0 %sfp+12 S2 A16]))
GCC is expecting this to be a register:
(mem/c:HI (plus:HI (reg/f:HI 20 r10)
(const_int 12 [0xc])) [0 %sfp+12 S2 A16])
This is derived from the C line "p = format". Due to spilled register usage,
"p" is stored on the stack. So here's the evolution of tthat instruction before it breaks.
128r.expand: This is the initial definition of this intruction. This is just an RTL expression at this point.
(insn 30 29 0 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(reg/v/f:HI 69 [ format ])) -1 (nil))
152r.dse1: At this point, an instruction form has been chosen. In this case "movhi". GCC has also determined that the value of "format" will no longer be used.
(insn 30 28 61 2 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(reg/v/f:HI 69 [ format ])) 69 {movhi} (expr_list:REG_DEAD (reg/v/f:HI 69 [ format ])
(nil)))
159r.combine: Here, an invalid mode is selected. This would be something like "mov (format)+, (p)", but "format" is a memory location.
(insn 30 25 61 2 printf.c:22 (set (reg/v/f:HI 66 [ p ])
(mem/f/c/i:HI (post_inc:HI (reg/v/f:HI 60 [ ap.34 ])) [0 format+0 S2 A16])) 69 {movhi} (expr_list:REG_INC (reg/v/f:HI 60 [ ap.34 ])
(nil)))
So, somewhere in the combine step, GCC went bonkers.
Subscribe to:
Posts (Atom)