I've noticed that there is a bug in the memcpy routine, here's an example of the test I ran. The first test varies the copy length, the second the destination address
I expect to see output like this:
I actually see this:
I did another test, and using byte-copies with the screen buffer slows the system back down, losing us our 20% bump. So I need to get memcpy (and memset, which uses a similar method) working.
word memcpy test 2, don't append "\n"
So it looks like the lowest-order bit is being ignored for word copies.
After fixing memcpy, test runs for 0-A00 time
83.64 first pass, working memcpy