Thursday, April 29, 2010

Well, this is anticlimactic. It turns out that the register allocator prohibits the use of volatile registers for user-defined variables if the optimization level is one or less (see ira-conflicts.c, ira_build_conflicts). Since I've been doing my tests using "-O1" optimization... mystery solved. And it only took a few days to figure that out.

Monday, April 26, 2010

The missing epilogue problem was caused by the existance of the "return" pattern. By defining this, the epilogue was not always used. So I removed that pattern, but had a new problem, how to return to the caller at the end of the epilogue. I looked at other archetectures, but found no useful pattern which would work for the TI.

Manually emitting instructions into the output stream did not work, since the outputted instructions did not appear in the right place, and would have resulted in non-functional code. I could not find a good way to make a RTX expression to use the existing branch instructions for the "b *lr" instruction.

What I ended up with was to create a fake hard PC register, and used a special form of "movhi" to emit the return instruction. This seems to work, but I'm concerned what will happen if GCC tries to use the fake PC register for actual work.

Monday, April 19, 2010

I've got the stack working properly, but my "hello world" program is being wierd. It's allocating R9 unexpectedly and does not include a function epilogue. I've started looking at the debug output, but I'm out of time for now.

So for later edification, start looking at emw.c.172r.ira for register allocation. The epilogue is in emw.c.178r.pro_and_epilogue

Thursday, April 15, 2010

So, I'm looking at arguments passed on the stack, so here's some random notes.

My test function:
void a()
{
zprintf(91,92,93,94,95,96,97,98);
}

resulting assembly:
ai r10, -6
mov r11, @4(r10)

li r1, 97 --.
mov r1, *r10 | Push arguments to the stack
li r1, 98 |
mov r1, @2(r10) --'

li r1, 91
li r2, 92
li r3, 93
li r4, 94
li r5, 95
li r6, 96
bl @zprintf
ai r10, 4
mov *r10+, r11
rt

So the stack looks like this in zprintf:
[ volatiles saved by A
[ A's frame
[ zprintf stack arguments
[ volatiles saved by zprintf
[ zprintf's frame
stack pointer

on the callee side, stack arguments are indexed as if from address zero.
need to fix sizes so ELIMINABLE_REGS works for arg-to-stack calculations

Wednesday, April 14, 2010

I've noticed that the function prologue and epilogue is needed to set up the stack, and to save off the non-volatile registers. I've come up with these forms for the prologue, depending on the number of registers to save off:

Form 1:
ai sp, -regsize cycles:14+0 bytes:4
mov reg, *sp+ 14+8 2
...
ai sp, -regsize-framesize 14+0 4

in general, bytes =8+2N: 10,12,14,16,18,20
cycles=28+22N

Form 2:
ai sp, -regsize-framesize cycles:14+0 bytes:4
mov reg, *X(sp) 14+8 4
...

in general, bytes =4+4N: 8,12,16,20,24,28
cycles=14+22N

So use form 1 only when we have three or more registers to save.

The epilogue is the same for both forms:

ai sp, framesize
mov reg, *sp+
...

The plan is to not use a frame pointer, or save off the stack pointer as part of a call. This saves us a ton of space over the course of a program since we save four bytes on the stack, and at least four instructions per function. The drawback of this design is that we lose the ability to use "calloc" or derive a call tree during debugging. I don't like this, but it's a good tradeoff.

I tried adding parallel CC0 checks in some instructions, but it just made a mess of the resulting code (about 2-3 times bigger, lots of redundant moves). I'll come back to this later.

I also checked for arguments passed on the stack. Looks like work needs to be done on the caller and callee side. Poop.

Tuesday, April 6, 2010

At this point, I think I've got instruction generation pretty much complete. There is still some ugliness when converting between words and bytes, but in order to get better code, I would need a lot of peepholes. I'm not prepareed to do that level of effort right now. Pretty good results can be had by writing optimization-friendly C code. For example, demote data values as early as possible, promote as late as possible. In a lot of cases, I cannot make better assembly by hand than what GCC outputs. I've been suprised by how good the output looks.

I still want to confirm that the stack is used correctly, I haven't checked that for a while. I also want to ensure that the assembler can use standard TI conventions. Also, the GCC code needs to be cleaned up, since it's currently full of debug code and commented-out experiments.

I also need to get a real blog together, since I have a full GNU toolchain working that other people may be interensted in.

By the way, I've successfullly compiled and tested a "hello world" C program. I'm happy with how easy it was to put together, but there is still something going on when "main" is invoked. GCC wants to add a call to "__main" at the start of "main". I'm not sure why this is.

Other things I've been looking into is the ea5 disk format. I'd like to have a tool to convert an ELF file to either a cart or disk format as desired. Also, right now the GROM cart header must be added by special assembly code. This is OK, but I would rather have a tool to add this.

Oh, I also need to get the condition register updates added to the machine description file.