I've been completely occupied by JITC testing at work, and haven't looked at TI stuff in almost a month. But I've got a little time right now.
It looks like GCC favors MEM-REG-MEM copies rather then MEM-MEM copies. I've noticed in testing that M-M copies are often split up, and M-R-M copies are never combined. This happens regardless of how instrcution costs are set up, or how the C code is constructed.
That makes sense, since most modern machines prefer this kind of operation. I just need to put a peephole in to fix this. I'm getting quite a few peepholes, which is normally the sign of bandaids over a deeper problem. However, since I can't see anything actually wrong at this point, and my target is old enough to be wierd by modern standards, I'll try not to worry about that too much.
$ wc -l 2k_chess.s
Before peephole: 1462
After peephole: 1440
So I optimized out 22 instructions or 1.5% of the line count. Not very impressive, but it was easy to get.