At this point, I can't see anything else which would make for good optimizations. So it's back to LIBC for me. I'm sure as the size of that codebase grows, more opprotunities will appear.
I've got a hex printing routine written in C, and it works great. Slightly bulkier than the assembly version I wrote earlier. Ultimately, I think I'll stick with the assembly routines. The C code is more like a real-world test than production code.
I've also confirmed that the variadic va_* functions work. Handy!
I've added a byte count to the cooked string output code. This will be used in printf.