After tons of frustration and research, I've finally made some good progress. I've given up on using 16-bit registers, and I'm back to fake 8-bit registers. That's the only way I can force GCC to track the size of the data value stored in a register. All attempts using 16-bit registers resulted in sadness and despair.
I've added a new register class for the real registers, and made sure that all byte operations can only use those registers. This guarantees that the fake registers will never appear in the compiled output. I've made an exception for the move byte operations. That is required for data to be copied out of the fake registers. I'm pretty happy about the whole thing.
I've also made a few optimizations. The array initilization optimizations I mentioned before have been implemented. We can now save two whole bytes of code per two bytes of array initilization. Horray!
I've also added code to potentially squeeze out an unnecessary move operation in some obscure cases. This won't really be noticable in most cases, but it makes me feel better.
Now that I'm past this nasty hurdle, I can get back to optimization and code coverage.