I've been working for an hour or less most days since the last update, and it didn't make sense to write an update for each day. So here's a summary of what's been happening.
I had originally intended 32-bit multiplies to be implemented in libgcc, but instead GCC is trying to cobble something together using 16-bit multiplies. That results in unexpectedly wrong results when signed multiplication is used. The problem here is that MPY does only unsigned multiplication, but the way the MULHI3 pattern is written, GCC assumes that signed multiplication is used. Unfortunately, I could find no way to rewrite this instruction to result in nice-looking code.
The only way I could resolve the situation was to implement a MULSI3 instruction for inline 32-bit multiplies. There was an implementation commented out in the machine description file, but I was able to reduce it to nine instructions and two temporary registers. I'm not happy about this, but I can't think of a better solution.
The implementation of the TMS9900 libgcc source code has been totally refactored. Initially, I had an individual file for each libgcc function. There were quite a lot of them, and I would see default code unexpectedly linked into the test code. The problem there was that libgcc expects all assembly optimizations to be stored in a single file, pointed to the LIB1ASMSRC symbol in the makefile. Additionally, the LIB1ASMFUNCS symbol needs to contain a list of the implemented functions so the default ones will not be linked into the library. This was annoying to track down, since I couldn't find documentation for any of this, and had to rely on code used by other processors. All the libgcc functions have been tested, and work as expected.
I looked at the floating point emulation code, but it is mostly a mess of precompiler macros, and fails to build for 16-bit architectures. I think I got an error like "Here's a nickel kid, buy yourself a real processor..." This did not make me happy. My options now are to either refactor the existing code to support 16-bit machines, or write a new library from scratch. If I'm honest the second option seems more interesting.
After looking though the GCC documentation, I may have found a way to get rid of the fake PC register using UNSPEC instructions. These are unspecified machine-specific operations that might be used to handle returns or maybe even byte-to-word conversions. I would need to do some major testing for that one though.