I spent the last couple of days working on libgcc, and got all missing functions done except the division and modulus stuff.
Once this is done, I need to review everything and look for some bugs to fix. I know there's a problem with register counts for the function prologue and epilogoue, so that should be interesting.
Sunday, November 25, 2012
Wednesday, November 21, 2012
So for the past few days I've been working on libgcc, making sure the compiler covers all instructions up to 32-bit operations.
Missing operations:
Count leading zero bits
__ctzsi2,__ctzhi2, __ctzqi2
Count trailing zero bits
__clzsi2,__clzhi2, __clzqi2
Find index of least significant bit
__ffssi2,__ffshi2, __ffsqi2
Return one if an even number of bits set
__paritysi2, __parityhi2, __parityqi2
Return number of set bits
__popcountsi2, __popcounthi2, __popcountqi
Signed division of 32-bit values
__divsi3
Unsigned division of 32-bit values
__udivsi3
Calculate modulus of 32-bit values
__modsi3
Calculate unsigned modulus of 32-bit values
__umodsi3
Do both division and modulus calculations
__divmodsi3
Do both unsigned division and modulus calculations
__udivmodsi3
Multiply 32-bit values
__mulsi3
For now, the trapped arithmetic instructions will be implemented using the default code. These functions call "abort" when there is an overflow condition, and are only needed in rare cases. The TMS9900 updates an overflow flag which we can use for this, but we can do that work later.
The other routines in libgcc are for floating-point and fixed-point math, odds are they are too big to really use. But again, I can fix that later.
Missing operations:
Count leading zero bits
__ctzsi2,__ctzhi2, __ctzqi2
Count trailing zero bits
__clzsi2,__clzhi2, __clzqi2
Find index of least significant bit
__ffssi2,__ffshi2, __ffsqi2
Return one if an even number of bits set
__paritysi2, __parityhi2, __parityqi2
Return number of set bits
__popcountsi2, __popcounthi2, __popcountqi
Signed division of 32-bit values
__divsi3
Unsigned division of 32-bit values
__udivsi3
Calculate modulus of 32-bit values
__modsi3
Calculate unsigned modulus of 32-bit values
__umodsi3
Do both division and modulus calculations
__divmodsi3
Do both unsigned division and modulus calculations
__udivmodsi3
Multiply 32-bit values
__mulsi3
For now, the trapped arithmetic instructions will be implemented using the default code. These functions call "abort" when there is an overflow condition, and are only needed in rare cases. The TMS9900 updates an overflow flag which we can use for this, but we can do that work later.
The other routines in libgcc are for floating-point and fixed-point math, odds are they are too big to really use. But again, I can fix that later.
Wednesday, November 7, 2012
I was trying to build libgcc to make the missing functions like __udivsi3 mentioned earlier. Unfortunately, there's a bug in GCC which causes the libgcc build to fail:
eric@compaq:~/dev/tios/toolchain/gcc-4.4.0/libgcc$ make
Makefile:143: ../.././gcc/libgcc.mvars: No such file or directory
make: *** No rule to make target `../.././gcc/libgcc.mvars'. Stop.
After looking into this a bit, it tuns out my build directions were lacking. The GNU people always do their builds from a seperate directory, and say that any problems arising from building in the source directory will not be fixed.
So, rather than fight the world on this, I'm changing the build instructions. The following are to be done from the top level of the GCC source directory.
$ mkdir build
$ cd build
$ ../configure --prefix /home/eric/dev/tios/toolchain --target=tms9900 --enable-languages=c
$ make all-gcc
$ make install
$ mkdir libgcc/build
$ cd libgcc/build
$ ../../../libgcc/configure --prefix /home/eric/dev/tios/toolchain/ --host=tms9900
$ make
$ make install
eric@compaq:~/dev/tios/toolchain/gcc-4.4.0/libgcc$ make
Makefile:143: ../.././gcc/libgcc.mvars: No such file or directory
make: *** No rule to make target `../.././gcc/libgcc.mvars'. Stop.
After looking into this a bit, it tuns out my build directions were lacking. The GNU people always do their builds from a seperate directory, and say that any problems arising from building in the source directory will not be fixed.
So, rather than fight the world on this, I'm changing the build instructions. The following are to be done from the top level of the GCC source directory.
$ mkdir build
$ cd build
$ ../configure --prefix /home/eric/dev/tios/toolchain --target=tms9900 --enable-languages=c
$ make all-gcc
$ make install
$ mkdir libgcc/build
$ cd libgcc/build
$ ../../../libgcc/configure --prefix /home/eric/dev/tios/toolchain/ --host=tms9900
$ make
$ make install
Sunday, November 4, 2012
After being super busy at work for the last few months, I finally got some time to work on TI stuff again. I was in the middle of something earlier, but heck if I can remember what that was.
One of the people on AtariAge wanted to know if a FAT library would compile OK for the TI. Sounds like a good question, let's find out.
The first pass failed due to missing the libc headers for TI. For now, I'm using the i386 headers. As long as I don't try to link anything, it should be fine.
The second pass failed too, there is an invalid instruction being generated. This was super tedious to track down.
inv r2
neg >4 * invalid, only registers my be used
inc r2
(insn 418 417 784 fat_access.c:167 (set (reg:SI 2 r2 [230])
(neg:SI (reg:SI 2 r2 [230]))) 68 {negsi2} (nil))
Once the instruction was isolated, the problem was obvious. I was overrunning an RTX array in the NEGSI2 instruction and the junk found there showed up in the final output. By using a properly-sized array for this instruction, all is well.
total:
[Nr] Name Type Size(Hex) Size(Dec)
[ 1] .text PROGBITS 00005462 = 21602
[ 3] .data PROGBITS 00000030 = 48
[ 4] .bss NOBITS 00000CCA = 3274
Of course, this is missing a lot of other stuff:
puts
printf
__udivsi3
strncpy
strncmp
memcpy
memset
So, yes, this library is compilable for the TI, but needs about 24KB for everything. Probably impractical.
I've seen this a few places in the output. It would be easy to optimize this, but this may be a rarely-used pattern, and not worth special attention.
mov r2, *r1
mov r2, @>2(r1)
This could better be written as:
mov r2, *r1+
mov r2, *r1
As an unrelated note, I thought for a minute that using the X instruction might be useful for the function prologue and epilogue, since we're iterating though registers. Unfortunately, it's not. At most, there are four instructions for either 'logue, and a lot more would be required if X were used. Never mind.
One of the people on AtariAge wanted to know if a FAT library would compile OK for the TI. Sounds like a good question, let's find out.
The first pass failed due to missing the libc headers for TI. For now, I'm using the i386 headers. As long as I don't try to link anything, it should be fine.
The second pass failed too, there is an invalid instruction being generated. This was super tedious to track down.
inv r2
neg >4 * invalid, only registers my be used
inc r2
(insn 418 417 784 fat_access.c:167 (set (reg:SI 2 r2 [230])
(neg:SI (reg:SI 2 r2 [230]))) 68 {negsi2} (nil))
Once the instruction was isolated, the problem was obvious. I was overrunning an RTX array in the NEGSI2 instruction and the junk found there showed up in the final output. By using a properly-sized array for this instruction, all is well.
total:
[Nr] Name Type Size(Hex) Size(Dec)
[ 1] .text PROGBITS 00005462 = 21602
[ 3] .data PROGBITS 00000030 = 48
[ 4] .bss NOBITS 00000CCA = 3274
Of course, this is missing a lot of other stuff:
puts
printf
__udivsi3
strncpy
strncmp
memcpy
memset
So, yes, this library is compilable for the TI, but needs about 24KB for everything. Probably impractical.
I've seen this a few places in the output. It would be easy to optimize this, but this may be a rarely-used pattern, and not worth special attention.
mov r2, *r1
mov r2, @>2(r1)
This could better be written as:
mov r2, *r1+
mov r2, *r1
As an unrelated note, I thought for a minute that using the X instruction might be useful for the function prologue and epilogue, since we're iterating though registers. Unfortunately, it's not. At most, there are four instructions for either 'logue, and a lot more would be required if X were used. Never mind.
Subscribe to:
Comments (Atom)
