I've noticed a few places where we have a sequence where a register is loaded with a constant, used for some operation, then discarded. Like this:
  li r1, 27
  mpy r1, r4
I wonder if it would be better to put these constants in the data section, and do something like this:
  mpy @const_27, r4
Lets check it out!
li r1, 27  - 4 bytes, 12+4+4=20 clocks
mpy r1, r4 - 2 byte, 52+4=56 clocks
total: 6 bytes, 76 clocks
.data const_27: data 27 - 2 bytes, 0 clocks
mpy @const_27, r4 - 2 bytes, 52+4+8=64 clocks
total: 4 bytes, 64 clocks
Hmm, pretty good... 33% smaller, 15% faster. I wasn't expecting that. This could be better if the constant is used in other places too. That would further reduce the effective bytes used for each instruction. (Average size is between 4 and 2 bytes per operation. Asymptotic to 2.) This is also more drastic for quicker instructions (like movb: 20+14+4+8=46 clocks vs. 26 clocks, 43% faster).
I should look at ways to optimize memset and memcpy sequences, there's got to be a way to take advantage of that.
Monday, May 21, 2012
Saturday, May 19, 2012
I fixed the problem with the use_regs error. The TARGET_PASS_BY_REFERENCE macro was not changed from the default value. This macro is evaluated to determine if function parameters which are to be passed by value should instead be silently copied to the stack and then that copy be passed by reference. This is the problem I saw earlier, the structure was being passed by value, but there were not enough registers to hold the bytes of that value.
The default action of TARGET_PASS_BY_REFERENCE is to never silently pass by reference (which causes the problem seen here). I've overridden this to instead use pass-by-reference for data types larger than four bytes or for aggregate data types (structures).
The default action of TARGET_PASS_BY_REFERENCE is to never silently pass by reference (which causes the problem seen here). I've overridden this to instead use pass-by-reference for data types larger than four bytes or for aggregate data types (structures).
Saturday, May 12, 2012
I got some free time again to look at GCC stuff, so I'm going through the source code for Nyog Sothep looking at the problems there.
Right now I'm looking at a problem resulting in "internal compiler error: in use_regs, at expr.c:2245"
This is the result of an assert requiring that all registers be valid. This assert is within expr.c::use_regs.
The call_regs function is being called to validate the usage of a record requiring 19 registers, starting at R1. This sounds like an entire structure is being loaded into registers, and not just the element of that structure we are interested in.
This is the operation which is causing problems: has_item(investigators[i],RETURN_SPELL)
All of "investigators[i]" is being passed by value in registers instead of passed by reference using one register. Since we don't have enough registers to hold the entire structure, we get a fatal error.
Right now I'm looking at a problem resulting in "internal compiler error: in use_regs, at expr.c:2245"
This is the result of an assert requiring that all registers be valid. This assert is within expr.c::use_regs.
The call_regs function is being called to validate the usage of a record requiring 19 registers, starting at R1. This sounds like an entire structure is being loaded into registers, and not just the element of that structure we are interested in.
This is the operation which is causing problems: has_item(investigators[i],RETURN_SPELL)
All of "investigators[i]" is being passed by value in registers instead of passed by reference using one register. Since we don't have enough registers to hold the entire structure, we get a fatal error.
Thursday, May 3, 2012
Well, I finally got multiply working using define_expand, but it's worse than what I had before.
This approach allocates two registers for the multiply result, and doesn't seem to allow for reuse of one of the input registers. The registers seem to be allocated at the beginning of the instruction, and any dead registers are marked as available for reuse after the instruction. This results in more used registers and an extra MOV instruction in most cases. The MOV is sued to copy one operand into the newly allocated 32-bit registers. This is no good, so I'll just back this out and pretend it never happened.
The good news is that this experience has given me more tools to use for the compiler. That's been handy for MULHI3, which is defined using define_expand, and results in nice clean code.
This approach allocates two registers for the multiply result, and doesn't seem to allow for reuse of one of the input registers. The registers seem to be allocated at the beginning of the instruction, and any dead registers are marked as available for reuse after the instruction. This results in more used registers and an extra MOV instruction in most cases. The MOV is sued to copy one operand into the newly allocated 32-bit registers. This is no good, so I'll just back this out and pretend it never happened.
The good news is that this experience has given me more tools to use for the compiler. That's been handy for MULHI3, which is defined using define_expand, and results in nice clean code.
Subscribe to:
Comments (Atom)
