(int y) = ((int)(charx))<<8

sra r2, 8 12+2*8+4=32

sla r2, >8 12+2*8+4=32

total: 64 clocks, 4 bytes

This could be replaced with:

andi r2, >FF00 14+8+4=26

total: 26 clocks, 4 bytes

more generally:

(int y) = ((int)(charx))<< N

sra r2, 8 12+2*8+4 = 32

sla r2, N 12+2*N+4 = 16+2*N

total: 48+2N clocks, 4 bytes

sla r2, N-8 12+2*N-2*8+4 = 2*N

andi r2, 0xFFFF<

This looks pretty darn good, about 33% faster on average.

Truth table below:

N Original pattern Result Optimization

- ----------------- ----------------- ----

0 01234567.xxxxxxxx -> xxxxxxxx.01234567 swpb

1 01234567.xxxxxxxx -> xxxxxxx0.1234567x >>7

2 01234567.xxxxxxxx -> xxxxxx01.234567xx >>6

3 01234567.xxxxxxxx -> xxxxx012.34567xxx >>5

4 01234567.xxxxxxxx -> xxxx0123.4567xxxx >>4

5 01234567.xxxxxxxx -> xxx01234.567xxxxx >>3

6 01234567.xxxxxxxx -> xx012345.67xxxxxx >>2

7 01234567.xxxxxxxx -> x0123456.7xxxxxxx >>1

8 01234567.xxxxxxxx -> 01234567.xxxxxxxx nop

9 01234567.xxxxxxxx -> 1234567x.xxxxxxxx <<1

A 01234567.xxxxxxxx -> 234567xx.xxxxxxxx <<2

B 01234567.xxxxxxxx -> 34567xxx.xxxxxxxx <<3

C 01234567.xxxxxxxx -> 4567xxxx.xxxxxxxx <<4

D 01234567.xxxxxxxx -> 567xxxxx.xxxxxxxx <<5

E 01234567.xxxxxxxx -> 67xxxxxx.xxxxxxxx <<6

F 01234567.xxxxxxxx -> 7xxxxxxx.xxxxxxxx <<7

## No comments:

## Post a Comment