I tried to compare the function and inline methods to see which is the most efficient. Unfortunately, the variables which must be considered are not easy to predict (number of and ratio between division and modulus calls, number of registers to save around function calls).
With much hand waving and shady math, I determined that if there are five or less signed division calls, the inline method wins out.
Ultimately, it doesn't matter. I am going to implement this in an inline form, and if it turns out that calling out to a function to perform division is more efficient, the user can write their code to do that. The other reason is that there is no good way for the user to somehow inline division if the function method is worse.