Wednesday, December 17, 2014

At last, I can see some progress. I put in some code in August to handle a problem where word-to-byte conversions were getting dropped. I handled this by inserting a swpb instruction to do the conversion first before doing the operation.

The problem here seems to be that this sequence was being performed on an instruction where the source and destination are the same register. This caused the instruction to be repeated. This might help explain things:

Original:
r27 = subreg_low_part(r27)

After reload:
r27 = subreg_low_part(r27)
r27 = subreg_low_part(r27)

The code in reload pulls the subreg expression out and does that first before doing any other operations. The problem is that in this case, r27 is virtual

There was an earlier modification to prevent instructions like this from being reloaded. So maybe that's the cause of this problem.

Thursday, December 11, 2014

There's another problem that was found on AtariAge. In this one, global variables which are not initialized are not being declared as global. Additionally, for initialized data, the label is emitted before the section directive. It's a mess.


eric@compaq:~/dev/tios/src/temp$ cat emw4.c
int no_init[10];
int init_zero[10] = {0};
int init_val[] = {1,2,3,4};
eric@compaq:~/dev/tios/src/temp$ cat emw4.s
    cseg
# No "def no_init", symbol will not be visible to other modules

no_init
    bss 20

    def    init_zero

    def    init_zero
init_zero
    bss 20

    def    init_val
    dseg
    even


# The "def init_val" line should be here
init_val
    data    1
    data    2
    data    3
    data    4

Saturday, December 6, 2014

It took a bit, but I got the compiler to emit some information


From unwind-dw2-fde.c.164r.subreg2:
(insn 41 40 213 7 ../../../libgcc/../gcc/unwind-dw2-fde.c:612 (set (reg:QI 22 [ prephitmp.500 ])
        (subreg:QI (reg/v:HI 27 [ encoding ]) 1)) 73 {movqi} (nil))
From unwind-dw2-fde.c.172r.ira:
Reloads for insn # 41
Reload 0: reload_in (QI) = (subreg:QI (reg/v:HI 27 [ encoding ]) 1)
        ALL_REGS, RELOAD_FOR_INPUT (opnum = 1)
        reload_in_reg: (subreg:QI (reg/v:HI 27 [ encoding ]) 1)
        reload_reg_rtx: (reg:QI 14 r14 [orig:22 prephitmp.500 ] [22])



(insn 229 40 238 7 ../../../libgcc/../gcc/unwind-dw2-fde.c:612 (set:QI (reg:QI 27)
        (subreg:QI (reg:HI 27) 1)) 73 {movqi} (nil))
../../../libgcc/../gcc/unwind-dw2-fde.c: In function ‘classify_object_over_fdes’:
../../../libgcc/../gcc/unwind-dw2-fde.c:645: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396
Please submit a full bug report,


It looks like the problem is that the instruction is trying to validate R27 for use in strict mode. Since R27 is not a real register, the instruction fails. The instruction needs to be transformed to use a memory address. I'm not quite sure how to do this and why it's not being done now. My guess is that the changes I did to handle subregs is causing something else to break.

I checked the usage of R27, and there's no wierd dependancy forcing it to stay in a register. Hmm. More research is required here.

Friday, December 5, 2014

OK, I've done something horribly wrong. The latest patch fails to build libgcc. The initial problem was reported by TheMole on AtariAge. I just got too confident and didn't test with a full build.

I was able to replicate this depressingly easily. The problem is that the offending line declares a function prototype which uses the FILE macro from stdio.h, but does not include that header file. This apparently is not a problem when compiling GCC, but libgcc is not so forgiving. It turns out that the problem prototype is not really needed there, so we can comment that out and get a bit farther.

Until we hit the next problem:

insn does not satisfy its constraints:
(insn 229 40 238 7 ../../../libgcc/../gcc/unwind-dw2-fde.c:612 (set:QI (reg:QI 27)
        (subreg:QI (reg:HI 27) 1)) 73 {movqi} (nil))
../../../libgcc/../gcc/unwind-dw2-fde.c: In function ‘classify_object_over_fdes’:
../../../libgcc/../gcc/unwind-dw2-fde.c:645: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396
Please submit a full bug report,
with preprocessed source if appropriate.
See for instructions.
make[1]: *** [unwind-dw2-fde.o] Error 1
make[1]: Leaving directory `/home/eric/dev/tios/src/gcc_installer/temp/build/gcc-4.4.0/build/tms9900/libgcc'

Somehow the stupid byte conversion stuff got back in there. Grr.

Thursday, December 4, 2014

GCC 4.4.0 patch 1.10

Changes this version:

Prevented use of R0 as an address base
Moved jump tables into text segment to free up space for variables
Fixed bug which put initialized data in bss section
Fixed negation of byte quantities
Minor code cleanup

Download: gcc-4.4.0-tms9900-1.10-patch.tar.gz

Binutils 2.19.1 patch 1.7

Changes this version:

Restored ability to have label and code on same line
Minor code cleanup

Download: binutils-2.19.1-tms9900-1.7-patch.tar.gz
I was trying to find more compiler stuff to work on, but I'm out of ideas. There are no more bugs I know about, and no obvious features that are lacking. I guess it's time for another release.

By the way, I tried out position-independent code (-fpic), and it seems to work OK. I didn't attempt anything serious, but the resulting code looks like it should be good.

The floating point stuff I was working on earlier would probably work out best as a separate library, and it might be good to write it in C rather than in assembly. That means scrapping all that code and starting over, even though I'm about 80% done with an assembly version. The problem is that handling denormals is a pain, and I'm spending too much time doing early optimization in assembly rather than getting proper functionality first.

Sunday, November 30, 2014

There was another problem reported that under some circumstances, jump tables were not being initialized correctly. What was actually happening was a lot more annoying to track down.

The problem was that the ASM_OUTPUT_COMMON macro that I stole from some other archetecture subverted the code used to track the currect segment. The code used to emit text into the assembly file is in varasm.c, in that file is a function called switch_to_section. It is intended to be responsib;e for all section switches. Each time a variable or code block is emitted, switch_to_section checks to see if the intended section is already in use. If not, the directives needed to switch sections are emitted. If the intended section is already in use, nothing happens.

The ASM_OUTPUT_COMMON macro, which outputs a variable in the .bss section, gratuitously emits a "cseg" directive to change the current segment. Later, the switch_to_section code, no knowing a section change has been made, might not do anything. The result is that objects which should be located in the code or data segments are mistakenly being placed in the bss section.

Like so many GCC problems, once the painful task of tacking down the problem is done, fixes are pretty straightforward. Basically, the fix is to use switch_to_section to change sections, like I should have done in the beginning.

Monday, November 10, 2014

OK, I've been trying to get the floating point library done before fixing the latest round of compiler bugs. Unfortunately, I'm stuck.

I got normalized multiplication working, but to be able to use denormal values in multiplication I will need to overhaul the design and basically start over. Ugh.

I was looking at the softfloat library at www.jhauser.us/arithmetic/SoftFloat.html and it looks like it should work for the TMS9900 if the proper configuration is used. If that's the case, and the resulting code is not too large, my little effort might just be pointless.

Alright, enough fooling around. Lucien over at AtariAge was good enough to send a bunch of bugs he found, so let's get cracking.

First up, negating a byte quantity is done by calling "neg".

This is a problem in a couple of ways. First, if this were invoked on a value stored in memory, one of the neighboring bytes would be corrupted. Even if the value were stored in a register, if the low byte of the negated word is not zero, the resulting byte value will not be correct.

To fix this, I've forced all byte negation to be done in a register. The low byte is cleared, ensuring that after neg is called, the value will be correct.

I thought I fixed that earlier, but apparently not. At least it's done now.

The other problem was incorrectly using R0 as a base register for addressing memory. This is another easy one. There is a BASE_REG_CLASS macro used to determine if a register may be used as an address base, I changed the value to exclude R0 and all is well. While I was in that area, I noticed some junk code left over from the period when I was using fake registers. All that junk was been removed, and the code is more pretty. Yay.

Saturday, October 18, 2014

It turns out that I only needed to implement floating point multiplication, division and exponentiation and some comparison wrappers to be complete.

I've got a working float multiply function, but it's ugly and needs to be streamlined. I also need to check for boundary conditions (infinities and infinitesimals).

Mostly I need to reacquaint myself with this code. It's all written in twisty assembly since I wanted to maximize performance and minimize space. Unfortunately, it takes a while to understand how all the pieces fit together.

Doing all this in assembly may have been a mistake.


Saturday, October 11, 2014

Now that I've sent a release out, I need to find something else to work on for a while. I think I should probably finish the floating point stuff. Right now it's more than halfway done. If I can get that done I can move on to other stuff. Maybe libc? Oh well, I can figure that out later.

Friday, October 10, 2014

Binutils 2.19.1 patch 1.6

Changes this version:

Added support for numeric registers
Correct handling of comments
Added support for dwarf debugging information (-g option)

Download: binutils-2.19.1-tms9900-1.6-patch.tar.gz

GCC 4.4.0 patch 1.9

Changes this version:

Changed order of jumps for less-than-or-equal tests to improve performance
Fixed several integer type conversion bugs
Corrected handling of variable shift by zero bits
Fixed signed division
Added support for dwarf debugging information (-g option)

Download: gcc-4.4.0-tms9900-1.9-patch.tar.gz
Alright, I finally got all the dwarf stuff figured out. The debug_line section was actually built by the assembler, but the contents were malformed. The problem is that the assembler was trying to create a four-byte relocation record. This is a problem since for the tms9900 anything beyond a two byte address in a relocation record is meaningless.

I had implemented a workaround, but that was broken, and placed the relocation in the upper word. This made the actual section size and the size declared in the header disagree, as a result, no source line information ever gets parsed. Realizing this problem was long and painful.

I spent some time studying the dwarf2 specification, but the debug_line section is implemented as bytecode for a virtual machine, so I wanted to avoid manual translation if I could. I tried to find something which could dump the strings section, but no such luck.

In the end I made a one-line program and stepped my way through the section using the dwarf specification and did some trial-and-error work to fix the broken header values.

Long story short, we're good now and I can finally prepare another release.

Thursday, October 9, 2014

Well, I've made a mess of things by forcing line number information to be built by gcc. There are lots of interdependancies between the dwarf sections, and what I did left some loose ends resulting in some unknown symbols being found during the assembly process.

Apparently, the preferred method is for the assembler to compose the line number section based on the .file and .loc directives. Now I need to find out why those aren't being built.

Wednesday, October 8, 2014

It took a while, but I found that the code which handles the debug option for gcc is in the set_debug_level function. Specifically, the "write_symbols" variable is used to store the debug level. We can test that to see if the file line directives should be used or not.

One drawback of this is that by enabling source line information with .file and .loc directives, the dwarf .debug_line section is not created. I'm not sure why that is, but this is hard-coded behavior and not configurable. This just sounds dumb, so I'm overriding this. If you want debug sections, you now get all of them. I just like the idea of being able to use objdump to view the source code.

So I'm not sure what else to do right now. It might be time to do a release.

Tuesday, October 7, 2014

Well, that was a lot harder than I expected. The start_line_hook function in the assembler got a major overhaul just to deal with TI comments. But at least we can now properly assemble code again.

I had to implement a grammatical parser to recognize each part of an assembly line in order to see if a given chunk of text was a comment or not. This was needed because a comment starting with an asterisk may be incorrectly merged into an expression ("ai r1, 2 * comment"). The same kind problem may be seen if a comment follows an instruction using a postincrement argument ("a r1, *r2+ comment") since the intervening space was getting removed and we saw invalid expression errors.

I also had to remove all trailing whitespace caused by the comments because the gas file parser assumes that by the time start_line_hook is called, all comments and extra whitespace has already been removed. The file parser apparently can only handle whitespaces one character long.

Ugh. At least the assembler is working again. Now I can get back to the DWARF debugging stuff, using div_signed.c for testing.

OK, Looking better. I can compile using the "-g" option to include debugging information, and then use "tms9900-objdump -S" to disassemble and view the source code. Nice.

The problem is that the standard way of viewing mixed source and assembly output is to use "tms9900-gcc -Wa,-adhln -g". If I try this right now, I just see the assembly with no interleaved C code. Hmm.

After a lot of poking around, I got mixed assembly output working, but there are a lot of ".file" and ".loc" directives all over the place to deal with, which a stricter TI assembler will choke on. I want to only emit those when the "-g" option is invoked. Otherwise only straight TI code should be emitted.

Thursday, October 2, 2014

This seems like a small update, but I solved a lot of the problems I was having with the asterisk character. The problem was that gas would eat the spaces around them, making it impossible to see if the asterisk was intended as an operator or a comment delimiter.

In looking through the parsing code, I found that if defined, tc_symbol_chars could be used to indicate which characters may start a symbol. If a symbol character is found, the surrounding whitespace is preserved. This list of characters is used to handle cases like mine for other machines. If not for the comments in the parser code, I would not have even considered using tc_symbol_chars for this.

By the time I found this I was fairly far down the road of performing a parallel parsing of the input files, but preserving all spaces. That would have been a long, error prone mess. This new approach is so much cleaner.

One down side of this approach is that I need to scan through the line to see if it's OK to collapse the surrounding spaces. Another problem is that I had to remove the asterisk from the list of comment characters, since that would override the "used as part of a symbol" option. I'll have to scan each line and handle all comments myself. Oh well. No matter what, this approach will be more correct and much easier to implement than what I was doing.

Thursday, September 18, 2014

OK, I was looking into the dwarf section problem when I noticed there is a "-dA" command line option for gcc to comment all the emitted dwarf values. This is great, since bouncing between compiler source, emitted assembly and the Dwarf2 specification was getting old. Sadly, I found an assembler problem. GAS is misinterpreting the "*" comment character as an arithmetic operator. Any other text is treated as the start of a new instruction.

eric@compaq:~/dev/tios/src/temp$ cat emw3.s
      data 0x1234 * stupid comment
eric@compaq:~/dev/tios/src/temp$ tms9900-as emw3.s
emw3.s: Assembler messages:
emw3.s:1: Error: unknown instruction 'comment'

Not super helpful. I can filter out everything after the first space in the comment, but what about the first word? What if that first word is a valid symbol or number? We could have multiple ways to parse the line. For example:

      data COUNT *2 bytes allocated

Is the two supposed to be an operand or a comment? TI allows unadorned comments at the end of the line so the comment could be either "2 bytes allocated" or "bytes allocated". TI got around this by forbidding spaces in expressions. So the two in the line above would be considered part of the comment.

This should be super fun to fix, since the assembler eats a lot of spaces before we get to see the contents of the input file.

Tuesday, September 16, 2014

OK, I tracked down the crash to the point where the compiler tries to emit unaligned data longer than one byte. The normal course of operations is to decompose the unaligned value into a series of smaller, hopefully aligned values. Ultimately, this breaks down to emitting a series of individual bytes. That's where the trouble begins.

I had previously forbade the compiler from allowing subreg expressions which resulted in byte values. This was done in order to support the bytewise instructions. In this case, the code emitter (in varasm.c) has no other strategy to fall back on, and fails an assertion, crashing the compiler.

I was able to get around this by defining fake data types to handle these cases. This beat the compiler into submission enough to output an object file with Dwarf2 info. Unfortunately, it's terrible. The compiler is adding 4-byte relocations for a machine with 2-byte addresses. The assembler choked on that one. Also, the emitted byte order is little-endian which is not helpful either.

By default, GCC is using ".2byte" directives to embed a two byte value. This is the opposite endianness of TI's "data" directive. This results in Dwarf sections which are utter jibberish. I manually swapped the bytes for some test sections, and that seems to result in a properly-formatted section. Now I have to find a way to decompose all dwarf values into "data" and "byte" values. This may be tricky, since only some of the dwarf generating code can be overridden by the target definition in tms9900.h

It looks like the TARGET_ASM_INTEGER macro can be defined to let us take over the job of emitting integer data. In fact here are a couple other machines which need special attention in this area. So I shamelessly stole some ideas from them. So now the data is the right endianless and nothing is crashing, but tms9900-objdump is complaining about malformed debug sections. Ugh.

Sunday, September 7, 2014

Surprisingly, the only non-conforming thing I could find with the assembler is support for numbered registers. For example "mov 1,2". Now that's been fixed. There are of course a lot of E/A specific keywords having to do with code location and linking options the assembler does not support, but those don't make sense in this context anyway.

One thing I've wanted to do is get the -g option working correctly to allow mixed source and assembly output files. This would allow faster debugging and to quickly find code that turns into ugly assembly.

I added config options which should have allowed debug output, but any build done with -g crashes the compiler. Hmm.

Saturday, September 6, 2014

It wasn't any fun, but I finally got division working. Sample output at the end of this. The main problem is that proper signed division uses a lot of code. My intent was to provide the optimizer with enough information to pare this down to the minimum number of required operations.

There were a few problems though. I was trying to implement signed division using register constraints and lots of scratch registers for the temporaries, but that whole approach does not work.

At the time when the compiler is deciding which patterns to use for the initial RTL representation, it does not look at operand constraints. This means that having special behavior for constants or restricting division to valid data locations will not work. The compiler just takes whatever RTL is defined for an operation and adds it to the instruction stream. If that happens to be incorrect, the user will get compiler crashes or badly broken code.

One test I ran was just a short loop with a dividends between -10 and 10. The broken division code turned this into an infinite loop with an oscillating dividend. Not even close to working code.

Eventually I got this to work by using instruction expanders. This lets the compiler figure out which branches are needed for a given input and uses registers optimally. This needed five more temporary registers and expanding the single division instruction to eleven different ones, including a wrapper around unsigned division which itself is not much fun. A lot of these steps get optimized away if not needed.

Fortunately, no one needs to care about this but me.

Have some sample code:

eric@compaq:~/dev/tios/src/temp$ cat div_signed.c
      int div(int a, int b) {return a/b;}
eric@compaq:~/dev/tios/src/temp$ cat div_signed.s
      pseg
      even
      def div
div
      mov r1, r5
      xor r2, r5
      abs r2
      clr r3
      mov r1, r4
      abs r4
      div r2, r3
      mov r3, r1
      inv r5
      jlt $+4
      neg r1
      b *r11
      even

Next up, improved adherence to Editor/Assembler syntax and conventions.

Wednesday, August 27, 2014

I tried to compare the function and inline methods to see which is the most efficient. Unfortunately, the variables which must be considered are not easy to predict (number of and ratio between division and modulus calls, number of registers to save around function calls).

With much hand waving and shady math, I determined that if there are five or less signed division calls, the inline method wins out.

Ultimately, it doesn't matter. I am going to implement this in an inline form, and if it turns out that calling out to a function to perform division is more efficient, the user can write their code to do that. The other reason is that there is no good way for the user to somehow inline division if the function method is worse.

Tuesday, August 26, 2014

Apparently, division has problems too. Here are the failed scenarios:

X/10    -> Results in -1 for all negative X
X/(-10) -> Results in -1 for all negative X, 0 for all positive X
N/X     -> Internal compiler error for all N

The compiler error looks like it will be a pain to track down, so I'm looking at the incorrect math first.

Here's a simple function I'm using for testing, assume a=-100, b=10:

# int divide(int a, int b) {return a/b;}
divide:               # R1=0xFF9B R2=0xA
        mov  r1, r4   # R4=0xFF9B
        seto r3       # R3=0xFFFF
        jlt  $+>4
        clr  r3      
        div  r2, r3   # [R3,R4]/R2 = 0xFFFFFF9B/0xA
        mov  r3, r1   # R3 holds quotient
        b    *r11

If DIV does signed division, we should see a result of 0xFFF5 (-10)
If DIV only does unsigned division, we should have a result of 0x998F. For some reason we see 0xFFFF, weird.

Nope, not weird, I just missed a really important note in the data sheet: "... if the divisor is less that the data contained in Rd ... Rd and Rd+1 are unchanged." I think this is the second time I've really messed up division. Anyway, DIV apparently only does unsigned division, so I need to think about this a bit.

div_unsigned:         # return A/B, R1=A, R2=B
        clr  r0       # Clear high word of [R0,R1]
        div  r2, r0   # R0 = [R0,R1]/R2
        mov  r0, r1   # Move result to return position
        b    *r11     # Return result

Probably the best solution here is to find the sign of the quotient, convert both operands to their absolute value, divide, then set the sign of the result. No other approach I've looked at would be smaller or faster than this.

div_signed:           # return A/B, R1=A, R2=B
        mov r1, r3
        xor r2, r3    # Calculate sign of result
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R0 = [R0,R1]/R2
        inv r3
        jlt posval
        neg r0        # Negate result if necessary
posval: mov r0, r1    # Move result to return position
        b   *r11      # Return result
  22 bytes
       
OK, what about modulo? As of the C 1999 and C++ 2011 standards, the sign of the modulo must match that of the dividend.

Examples:
 103% 10 =  3
 103%-10 =  3
-103% 10 = -3
-103%-10 = -3

For The unsigned case, we can easily modify the division function by returning the modulo stored in R1. For the signed case, it gets trickier. We need to save the sign of the dividend somewhere and use it after the calculation is complete.

mod_unsigned:         # return A%B, R1=A, R2=B
        clr  r0       # Clear high word of [R0,R1]
        div  r2, r0   # R1 = [R0,R1]%R2, return value already in position
        b    *r11     # Return result

mod_signed:           # return A/B, R1=A, R2=B
        mov r1, r3    # Save unmodified dividend
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R1 = [R0,R1]%R2
        inv r3
        jlt posval
        neg r1        # Negate result if needed, already in position
posval: b   *r11      # Return result
  18 bytes

Any way to do a signed divmod, combining these two? Without lots of extra work?

divmod_signed:        # return A/B and A%B, R1=A, R2=B
        mov r1, r3
        xor r2, r3    # Calculate sign of quotient
        mov r1, r4    # Save unmodified dividend
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R0 = [R0,R1]/R2
        inv r3
        jlt posdiv
        neg r0        # Negate quotient if needed
posdiv: inv r4
        jlt posmod
        neg r1        # Negate modulus if needed
posmod: mov r0, r1    # Move result to return position
        b   *r11      # Return result
  30 bytes, 5 registers

This is really bulky and ugly, but ultimately better than two separate calls. About 25% smaller and maybe 75% faster. It might make sense to move this into a library call because if there is a lot of math, the code could get really big. The overhead for the function call seems fairly small too. need to do the math and get real numbers though.

Monday, August 25, 2014

I think I found a problem in the shift instructions.

I made a simple function to emit a variable shift instruction (EX: sla r12, r0). I was looking at the edge cases, and realized that if the variable shift value is zero, the instruction will actually shift by 16 bits. The C standard requires that in this case, no shift is to be performed. This is a problem.

I took a look at the shift instructions for Arm and x86, and they shift from 0 to N-1 bits for a N bit value. The TI instructions instead shift from 1 to N bits. Hmm..

So what I need to do is insert a check every time that a shift of this type is called for. Something like this:

  # Left shift. Shift count in R0, value in R1
  andi r0, >000F  # Mask shift value, check for zero count
  jeq $+4         # If zero, jump over shift
  sla r1, r0      # Shift by non-zero bits

I'm not really happy about this, it adds 6 bytes for every use and lots of clocks. Unfortunately, I don't have a choice if I want to emit correct code.

After testing, this is now working properly.

Sunday, August 24, 2014

Hey, check it out. I got correct code finally:

Input C code:

unsigned char Round;
#define VDPWD *((volatile char*)0x8C00)
void modulo()
{
  VDPWD = '0' + Round % 10;
}

Output assembly:

modulo:
        movb @Round, r1
        srl  r1, 8
        mov  r1, r2
        clr  r1
        li   r3, >A        div  r3, r1
        mov  r2, r1
        swpb r1  # instruction is right where it's supposed to be
        ai   r1, >3000
        movb r1, @>8C00
        b    *r11

OK, I'll admit it. At first that doesn't seem too exciting. The thing is that I've been looking for a way to cleanly handle this case for a long, long time. I walked away in frustration because I got stuck on this problem. I just need to test this a bit more and we can call this done.

I was looking through older development notes to see if there was anything else left semi-implemented. I found a note to see if there can be any way to remove two jumps in sequence like jlt/jmp

Here are all the conditional jumps and the logic they use:

Conditional flags:
  L - Logical greater than
  A - Arithmetic greater than
  Q - equal

Arithmetic:
<   ~A & ~Q  jlt
>    A & ~Q  jgt
==   Q       jeq
>=   A | Q   jgt,jeq  \__ Ick, go away.
<=  ~A | Q   jlt,jeq  /
!=  ~Q       jne

Logical:
<   ~L & ~Q  jl
>    L & ~Q  jh
==   Q       jeq
>=   L | Q   jhe
<=  ~L | Q   jle
!=  ~Q       jne

Unfortunately, there is no way to completely implement >= and <= in one instruction for all valid imputs, but if the range of the test values were known, we could occasionally replace the jgt,jeq sequence with single jhe instruction.

EX: For byte compares, if A=[0..127] and B=[0..127], then these sequences are equivalent:

jlt ADDR  --> jle ADDR
jeq ADDR

jgt ADDR  --> jhe ADDR
jeq ADDR

Unfortunately, here is no way for the compiler to recognize this situation. If the user knows that their range matches this, they can take advantage of the slightly faster code by doing an unsigned compare. Something like this works:

int fast_lt_eq(int a, int b)
{
  return ((unsigned int)a <= (unsigned int)b);
}

What else...

Way back in June, TheMole on Atariage reported that they were unable to compile a function whose return value was assigned in an asm block and implicitly typecast as the retun value. No problems now.

So at this point, I think I'm up to date on error reports. These are the last items before I move on to new stuff:

1) Test 32-bit shift by variable, I'm not sure I checked that
2) Test assembler for Editor/Assembler adherance for caps and digits for registers, comments, register defines, etc.
3) Check signed vs. unsigned division and multiplication

Friday, August 22, 2014

No more procrastinating, this won't get any easier.

During the UNSPEC_SUBREG experiment I  found the places in reload where these problem expressions are identified. If a word-byte conversion is detected, a swpb instruction is performed on the offending register. The embedded subreg will be converted to a word-sized register by the existing code, and we should be OK.

Here is an example instruction before modification:

(insn 9 8 10 2 tursi5.c:6 (set (reg:QI 21 [ D.1197 ])
        (plus:QI (subreg:QI (reg:HI 25) 1)
            (const_int 48 [0x30]))) 59 {addqi3} (expr_list:REG_DEAD (reg:HI 25)
        (nil)))

And after modification:

(insn 22 8 9 2 tursi5.c:6 (set (reg:QI 21 [ D.1197 ])
        (subreg:QI (reg:HI 25) 1)) {movqi3}
(insn 9 8 10 2 tursi5.c:6 (set (reg:QI 21 [ D.1197 ])
        (plus:HI (reg:HI 25)
            (const_int 48 [0x30]))) 59 {addqi3} (expr_list:REG_DEAD (reg:HI 25)
        (nil)))

Unfortunately, I need to tweak this a bit more:

        div  r3, r1
        mov  r2, r1
        swpb r1  # <-- br="" finally="" here="" instruction="" is="" this="">        swpb r1  # \_ These are extras added due to multiple passes, ick
        swpb r1  # /
        ai   r1, >3000
        movb r1, @>8C00

I also need to make sure that I add code that peoperly handles all cases with embedded subregs.

So we have a few conditions to handle:

  Subreg as operation destination:
  (operation) (subreg R1) (reg) ->
    (operation) (R1) (reg)
    swpb R1 # Move value into byte position

  Subreg as operation source:
  (operation) (reg) (subreg) ->
    swpb R1 # Prepare R1 for next operation
    (operation) (R1) (reg)
    swpb R1 # If needed, restore R1 for later use

The first one will probably never happen, due to how the instructions are processed, but the second one is definitely causing problems. It seems to only appear when a word-to-byte conversion is needed, and the value is stored in a register and the register will not be needed after that instruction. Even though that sounds like a fairly rare circumstance, it seems to happen more often than I would like.

Wednesday, August 20, 2014

There's a few problems here. The obvious one is that the subreg expression is being thrown away. The not-so obvious one is that once that is fixed, we end up with an invalid instruction:

insn does not satisfy its constraints:
(insn 9 8 11 2 tursi5.c:6 (set (reg:QI 1 r1 [orig:21 D.1197 ] [21])
        (plus:QI (subreg:QI (reg:HI 1 r1) 1)
            (const_int 48 [0x30]))) 59 {addqi3} (nil))

What we need to do is split this into two expressions:
(set (reg:QI 1 r1)
        (subreg:QI (reg:HI 1 r1) 1))
(set (reg:QI 1 r1)
        (plus:QI 1 r1)
            (const_int 48 [0x30]))) 59 {addqi3} (nil))

No problem, right? Well, maybe.

We also need to factor in the live-ness of the register. If in this case we had R2 as the output, we might have to find a temp register to not modify R1. Alternately, we might have to swap R1, do some work and swap it back. Terrible.

On top of that, we might have to implement instructions with every possible variation of arguments. This will quadruple the machine description and make a testing nightmare.

One way out of this particular problem is to make sure that there is a defined mode for every operation. Hopefully, that will minimize the number of these problems. It's just a hunch, but it's the best idea I've got.

Monday, August 18, 2014

Even though I've been slacking off on the TI front, I've still been following the goings on at AtariAge. A common issue people are dealing with is that there a lot of error-prone steps to build GCC, causing some people to give up in frustration. This makes me sad.

In order to make things easier, I've made a script which bundles the patch files and does the download, patch and build sequence. This should make it much easier for people to get involved and to use any additional patches that come out.

Sunday, August 10, 2014

Apparently it's been a really long time since I've done TI stuff, but I haven't forgotten about it.

I was looking into the internal function "subst_reloads". This code modifies instructions to do register replacement and expression simplifications determined elsewhere. I thinking that I could use it to figure out how to restructure expressions involving word-byte conversion.

No dice. That function only replaces individual operands, typically replacing memory locations with hard registers. To do this properly, I'll need to muck around with the rtx records directly and manually restructure instructions. Needless to say, I'm not really excited to do this. Unfortunately, I need to find a solution here and all other attempts have failed.

Tuesday, January 14, 2014

I just realized that the i386 processor has aliased registers similar to what I need for the TMS9900. On that machine, [AH,AL] is aliased onto AX. There might be some insights worth stealing here.

Nope, no generated instructions take advantage of the high byte registers (like AH). This register is acceptable for inline assembly, but will not be used in any code produced by GCC. All actions seem to be done with full-width registers. Poop.

Wednesday, January 8, 2014

Well, I've neglected TI stuff for long enough, it's time to get back into it. What took so long was knowing that the first thing I would have to tackle would be the dreaded word-byte conversion problem. This is a ridiculously stubborn problem, and one I've been fighting with since I started this. Ugh.

I'm trying something new here. The idea is to use an UNSPEC instruction to encode a subreg expression. This would be used to tell the compiler "The value is magically converted to byte representation". I assume the UNSPECs will not be disappeared by reload, and the existing code should split expressions like the one I've been fighting with.

For reference, here's the unmodified output from reload:

Reloads for insn # 9
Reload 0: reload_in (QI) = (reg:QI 1 r1)
        reload_out (QI) = (reg:QI 1 r1 [orig:21 D.1197 ] [21])
        PC_REGS, RELOAD_OTHER (opnum = 0), can't combine
        reload_in_reg: (subreg:QI (reg:HI 1 r1 [25]) 1)  <-- br="" is="" this="" want="" we="" what="">        reload_out_reg: (reg:QI 1 r1 [orig:21 D.1197 ] [21])  <-- br="" expression="" good="" lost="" no="" subreg="">        reload_reg_rtx: (reg:QI 1 r1 [orig:21 D.1197 ] [21])

Apparently, a naive replacement of the subreg expression with (unspec:QI [(reg:HI 1 r1 [25])] 1) is a dismal failure. There is a sanity check elsewhere in reload that fails, since I'm trying to replace a register location with an undefined action. I may have to split up the original expression somehow. What would that look like?

Oh yeah, I did this before:
(set (reg:QI 1 r1)
        (subreg:QI (reg:HI 1 r1) 1))
(set (reg:QI 1 r1)
        (plus:QI 1 r1)
            (const_int 48 [0x30]))) 59 {addqi3} (nil))

Finding a way to do this in general will be tricky. Even assuming simple expressions, we could have things like:
(set ( ))
(set ( ))
(set ( ))
(set ( ))
(set ( ))
(set ( ))
(set ( ))
(set ( ))

Each of these need seperate splitting. As bad as this is, it could get so much more complicated. So apparently, splitting is no good. I guess I can try to put more effort into the unspec route.

After beating on GCC for a while, I finally got a compile to complete with unspec instructions, but it's not pretty:
        ...
        div  r3, r1            # Modulo is in R2
        mov  r2, r1            # Move result to R1
        movb r1, r2            # Wait a sec, this should be like "swpb r2"
        ai   r2, >3000         # Add '0' to result in byte mode
        fake_set_subreg r1 r2  # This fake insn is like "swpb r2; movb r2, r1", makes no sense here
        movb r1, @>8C00        # Copy to VDPWD
        b    *r11


I'm backing all this out, it won't work.