Wednesday, December 29, 2010

Just for the heck of it I decided to try compilation at -O1, mostly to see what would happpen. What happened was disaster. I used the in-development printf.c file as a test, and I got this:

printf.c: In function ‘printf’:
printf.c:60: internal compiler error: in df_ref_record, at df-scan.c:2799
Please submit a full bug report,
with preprocessed source if appropriate.
See for instructions.
EMW>> df_ref_record : GET_CODE = 41

If -Os is used, there is no problem. Odd.

The code 41 line was added to help debug this mess. What I've fallen into is that a CONCAT RTX is being used where a register RTX is expected. the assertion fails, and I get the error above. This is during step 172r.ira. I think a CONCAT is a construct where a value is spread over several registers, but these are only supposed to show up during initial RTX generation. This seems a little late for concats, based on what I've read. Need more research.

Tuesday, December 14, 2010

I've revisited GAS to improve the error handling, we now get more helpful messages. The misleading "missing comma" error only shows up when a comma is actually missing.

The 16-bit compare recipie had the same "compare with general location" problem that I earlier found with 8-bit compares. Now fixed.

Saturday, December 11, 2010

I've found another FAKE_* register which pops up in movb instructions while using -O0. I think I've fixed it, but I need more testing. I also found this problem :

Unrecognized expression: "@4(r8),$25"
/tmp/cc5D59MT.s: Assembler messages:
/tmp/cc5D59MT.s:91: Error: bad expression
/tmp/cc5D59MT.s:91: Error: missing comma seperator

Context:
inc @6(r8)
movb r1, r1
jeq L8
ci @4(r8), >25 <-- problem line
jne L9

The problem is that CI only takes a register as argument 1.

I also need to take a look at GAS, the error message is misleading

Thursday, December 9, 2010

Today I got printf using a real format string. Right now it oly handles "%", "x" and "X", but that's not bad. Now that the format parser is in place, I can expand it using the other formats.

For now, I think I'll ignore printf's return value. It's supposed to return the number of bytes written, which isn't too bad, I just want to keep my job simple for a while first. Right now printf only takes up about 200 bytes, which isn't great, but that's from the compiler. Hand-optimizaton can improve that a bit. But that's for later.

More GCC stuff:
I found a bug in subhi3, it needs to handle arbitrary constants, like "ai r0, -5". That's been fixed.

Another bug was found in char-to-int casting. I get "sra r2, 8", then several dozen empty lines, then gibberish control codes. Not good. I'll fix that tomorrow.

It also looks like bit shifting is not in the list of GCC-recognized operations that modify the conition flags. Check and fix tomorrow

Wednesday, December 8, 2010

At this point, I can't see anything else which would make for good optimizations. So it's back to LIBC for me. I'm sure as the size of that codebase grows, more opprotunities will appear.

I've got a hex printing routine written in C, and it works great. Slightly bulkier than the assembly version I wrote earlier. Ultimately, I think I'll stick with the assembly routines. The C code is more like a real-world test than production code.

I've also confirmed that the variadic va_* functions work. Handy!

I've added a byte count to the cooked string output code. This will be used in printf.

Saturday, December 4, 2010

I found another optimization that's worth doing. During initialization of local byte arrays, this sequence is emitted:

Bytes Clocks
li r2, >30 * 256 4 12+4
movb r2, @2(r10) 4 14+4
li r5, >78 * 256 4 12+4
movb r5, @3(r10) 4 14+4
--- ----
16 68

I can squeeze out a few bytes by doing this instead:

Bytes Clocks
li r2, >30 * 256 + >78 4 12+4
movb r2, @2(r10) 4 14+4
swpb r2 2 10
movb r2, @3(r10) 4 14+4
--- ----
14 62

This is about a ten percent gain in time and space, which isn't too bad. Unfortunately, I can't figure out a way to implement this now. I may have to come back to this later.
In the initial implemenation of the right-shift-and-cast operations only dealt with shift offsets greater than eight, but I realized I can be more general.

Here's a truth table for all possible right shifts and the equvalant general shift I need, including casting.

N Original pattern Shifted pattern Result Optimization
- ----------------- ----------------- -------- ----
0 01234567.89ABCDEF -> 01234567.89ABCDEF -> 89ABCDEF X<<8 -.
1 01234567.89ABCDEF -> x0123456.789ABCDE -> 789ABCDE X<<7 |
2 01234567.89ABCDEF -> xx012345.6789ABCD -> 6789ABCD X<<6 |
3 01234567.89ABCDEF -> xxx01234.56789ABC -> 56789ABC X<<5 | X<<(8-N)
4 01234567.89ABCDEF -> xxxx0123.456789AB -> 456789AB X<<4 |
5 01234567.89ABCDEF -> xxxxx012.3456789A -> 3456789A X<<3 |
6 01234567.89ABCDEF -> xxxxxx01.23456789 -> 23456789 X<<2 |
7 01234567.89ABCDEF -> xxxxxxx0.12345678 -> 12345678 X<<1 -'
8 01234567.89ABCDEF -> xxxxxxxx.01234567 -> 01234567 nop
9 01234567.89ABCDEF -> xxxxxxxx.x0123456 -> x0123456 X>>1 -.
A 01234567.89ABCDEF -> xxxxxxxx.xx012345 -> xx012345 X>>2 |
B 01234567.89ABCDEF -> xxxxxxxx.xxx01234 -> xxx01234 X>>3 |
C 01234567.89ABCDEF -> xxxxxxxx.xxxx0123 -> xxxx0123 X>>4 | X>>(N-8)
D 01234567.89ABCDEF -> xxxxxxxx.xxxxx012 -> xxxxx012 X>>5 |
E 01234567.89ABCDEF -> xxxxxxxx.xxxxxx01 -> xxxxxx01 X>>6 |
F 01234567.89ABCDEF -> xxxxxxxx.xxxxxxx0 -> xxxxxxx0 X>>7 -'