Well, I finally got multiply working using define_expand, but it's worse than what I had before.
This approach allocates two registers for the multiply result, and doesn't seem to allow for reuse of one of the input registers. The registers seem to be allocated at the beginning of the instruction, and any dead registers are marked as available for reuse after the instruction. This results in more used registers and an extra MOV instruction in most cases. The MOV is sued to copy one operand into the newly allocated 32-bit registers. This is no good, so I'll just back this out and pretend it never happened.
The good news is that this experience has given me more tools to use for the compiler. That's been handy for MULHI3, which is defined using define_expand, and results in nice clean code.