OK, I was looking into the dwarf section problem when I noticed there is a "-dA" command line option for gcc to comment all the emitted dwarf values. This is great, since bouncing between compiler source, emitted assembly and the Dwarf2 specification was getting old. Sadly, I found an assembler problem. GAS is misinterpreting the "*" comment character as an arithmetic operator. Any other text is treated as the start of a new instruction.
eric@compaq:~/dev/tios/src/temp$ cat emw3.s
data 0x1234 * stupid comment
eric@compaq:~/dev/tios/src/temp$ tms9900-as emw3.s
emw3.s: Assembler messages:
emw3.s:1: Error: unknown instruction 'comment'
Not super helpful. I can filter out everything after the first space in the comment, but what about the first word? What if that first word is a valid symbol or number? We could have multiple ways to parse the line. For example:
data COUNT *2 bytes allocated
Is the two supposed to be an operand or a comment? TI allows unadorned comments at the end of the line so the comment could be either "2 bytes allocated" or "bytes allocated". TI got around this by forbidding spaces in expressions. So the two in the line above would be considered part of the comment.
This should be super fun to fix, since the assembler eats a lot of spaces before we get to see the contents of the input file.
Thursday, September 18, 2014
Tuesday, September 16, 2014
OK, I tracked down the crash to the point where the compiler tries to emit unaligned data longer than one byte. The normal course of operations is to decompose the unaligned value into a series of smaller, hopefully aligned values. Ultimately, this breaks down to emitting a series of individual bytes. That's where the trouble begins.
I had previously forbade the compiler from allowing subreg expressions which resulted in byte values. This was done in order to support the bytewise instructions. In this case, the code emitter (in varasm.c) has no other strategy to fall back on, and fails an assertion, crashing the compiler.
I was able to get around this by defining fake data types to handle these cases. This beat the compiler into submission enough to output an object file with Dwarf2 info. Unfortunately, it's terrible. The compiler is adding 4-byte relocations for a machine with 2-byte addresses. The assembler choked on that one. Also, the emitted byte order is little-endian which is not helpful either.
By default, GCC is using ".2byte" directives to embed a two byte value. This is the opposite endianness of TI's "data" directive. This results in Dwarf sections which are utter jibberish. I manually swapped the bytes for some test sections, and that seems to result in a properly-formatted section. Now I have to find a way to decompose all dwarf values into "data" and "byte" values. This may be tricky, since only some of the dwarf generating code can be overridden by the target definition in tms9900.h
It looks like the TARGET_ASM_INTEGER macro can be defined to let us take over the job of emitting integer data. In fact here are a couple other machines which need special attention in this area. So I shamelessly stole some ideas from them. So now the data is the right endianless and nothing is crashing, but tms9900-objdump is complaining about malformed debug sections. Ugh.
I had previously forbade the compiler from allowing subreg expressions which resulted in byte values. This was done in order to support the bytewise instructions. In this case, the code emitter (in varasm.c) has no other strategy to fall back on, and fails an assertion, crashing the compiler.
I was able to get around this by defining fake data types to handle these cases. This beat the compiler into submission enough to output an object file with Dwarf2 info. Unfortunately, it's terrible. The compiler is adding 4-byte relocations for a machine with 2-byte addresses. The assembler choked on that one. Also, the emitted byte order is little-endian which is not helpful either.
By default, GCC is using ".2byte" directives to embed a two byte value. This is the opposite endianness of TI's "data" directive. This results in Dwarf sections which are utter jibberish. I manually swapped the bytes for some test sections, and that seems to result in a properly-formatted section. Now I have to find a way to decompose all dwarf values into "data" and "byte" values. This may be tricky, since only some of the dwarf generating code can be overridden by the target definition in tms9900.h
It looks like the TARGET_ASM_INTEGER macro can be defined to let us take over the job of emitting integer data. In fact here are a couple other machines which need special attention in this area. So I shamelessly stole some ideas from them. So now the data is the right endianless and nothing is crashing, but tms9900-objdump is complaining about malformed debug sections. Ugh.
Sunday, September 7, 2014
Surprisingly, the only non-conforming thing I could find with the assembler is support for numbered registers. For example "mov 1,2". Now that's been fixed.
There are of course a lot of E/A specific keywords having to do with code location and linking options the assembler does not support, but those don't make sense in this context anyway.
One thing I've wanted to do is get the -g option working correctly to allow mixed source and assembly output files. This would allow faster debugging and to quickly find code that turns into ugly assembly.
I added config options which should have allowed debug output, but any build done with -g crashes the compiler. Hmm.
One thing I've wanted to do is get the -g option working correctly to allow mixed source and assembly output files. This would allow faster debugging and to quickly find code that turns into ugly assembly.
I added config options which should have allowed debug output, but any build done with -g crashes the compiler. Hmm.
Saturday, September 6, 2014
It wasn't any fun, but I finally got division working. Sample output at the end of this. The main problem is that proper signed division uses a lot of code. My intent was to provide the optimizer with enough information to pare this down to the minimum number of required operations.
There were a few problems though. I was trying to implement signed division using register constraints and lots of scratch registers for the temporaries, but that whole approach does not work.
At the time when the compiler is deciding which patterns to use for the initial RTL representation, it does not look at operand constraints. This means that having special behavior for constants or restricting division to valid data locations will not work. The compiler just takes whatever RTL is defined for an operation and adds it to the instruction stream. If that happens to be incorrect, the user will get compiler crashes or badly broken code.
One test I ran was just a short loop with a dividends between -10 and 10. The broken division code turned this into an infinite loop with an oscillating dividend. Not even close to working code.
Eventually I got this to work by using instruction expanders. This lets the compiler figure out which branches are needed for a given input and uses registers optimally. This needed five more temporary registers and expanding the single division instruction to eleven different ones, including a wrapper around unsigned division which itself is not much fun. A lot of these steps get optimized away if not needed.
Fortunately, no one needs to care about this but me.
Have some sample code:
eric@compaq:~/dev/tios/src/temp$ cat div_signed.c
int div(int a, int b) {return a/b;}
eric@compaq:~/dev/tios/src/temp$ cat div_signed.s
pseg
even
def div
div
mov r1, r5
xor r2, r5
abs r2
clr r3
mov r1, r4
abs r4
div r2, r3
mov r3, r1
inv r5
jlt $+4
neg r1
b *r11
even
Next up, improved adherence to Editor/Assembler syntax and conventions.
There were a few problems though. I was trying to implement signed division using register constraints and lots of scratch registers for the temporaries, but that whole approach does not work.
At the time when the compiler is deciding which patterns to use for the initial RTL representation, it does not look at operand constraints. This means that having special behavior for constants or restricting division to valid data locations will not work. The compiler just takes whatever RTL is defined for an operation and adds it to the instruction stream. If that happens to be incorrect, the user will get compiler crashes or badly broken code.
One test I ran was just a short loop with a dividends between -10 and 10. The broken division code turned this into an infinite loop with an oscillating dividend. Not even close to working code.
Eventually I got this to work by using instruction expanders. This lets the compiler figure out which branches are needed for a given input and uses registers optimally. This needed five more temporary registers and expanding the single division instruction to eleven different ones, including a wrapper around unsigned division which itself is not much fun. A lot of these steps get optimized away if not needed.
Fortunately, no one needs to care about this but me.
Have some sample code:
eric@compaq:~/dev/tios/src/temp$ cat div_signed.c
int div(int a, int b) {return a/b;}
eric@compaq:~/dev/tios/src/temp$ cat div_signed.s
pseg
even
def div
div
mov r1, r5
xor r2, r5
abs r2
clr r3
mov r1, r4
abs r4
div r2, r3
mov r3, r1
inv r5
jlt $+4
neg r1
b *r11
even
Next up, improved adherence to Editor/Assembler syntax and conventions.
Subscribe to:
Posts (Atom)