ID:2125944
 
It's pretty well-known that division is a slower operation than multiplication, and so a lot of times it's good to do your best to avoid numerous divisions. Thus, it seems natural to assume if you have to do a number of divisions by the same number, you should just compute the inverse of that number, store it, and do multiplications by that stored value instead, right?

In DM, at least, the answer is no.

To test and verify this, after Kaiochao remarked on his surprise when he was doing testing, I used code similar to the following:
#define DEBUG

mob
var
list/data

verb
Populate()
set background = 1

data = new
for(var/i = 1 to 300000)
data += rand(1, 1000)

world << "Done."

Test_StoreAndMultiply()
var
s
b1;b2

len = data.len

for(var/i = 1 to len step 3)
s = data[i]
b1 = data[i+1]
b2 = data[i+2]

s = 1 / s
b1 *= s
b2 *= s

Test_Divide()
var
s
b1;b2

len = data.len

for(var/i = 1 to len step 3)
s = data[i]
b1 = data[i+1]
b2 = data[i+2]

b1 /= s
b2 /= s

with (horribly ugly) metacode to generate it found here.

Using this, I generate the following data for n = 2, 3, 5, 7, 9, 11, 25, and 40.
/*

n = 2

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.115 0.115 0.115 5
/mob/verb/Test_Divide 0.098 0.098 0.098 5

n = 3

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.140 0.140 0.140 5
/mob/verb/Test_Divide 0.124 0.124 0.124 5

n = 5

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.199 0.199 0.199 5
/mob/verb/Test_Divide 0.181 0.181 0.181 5

n = 7

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.249 0.249 0.249 5
/mob/verb/Test_Divide 0.231 0.231 0.231 5

n = 9

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.316 0.316 0.316 5
/mob/verb/Test_Divide 0.298 0.298 0.298 5

n = 11

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.386 0.386 0.386 5
/mob/verb/Test_Divide 0.370 0.370 0.371 5

n = 25

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 0.774 0.774 0.774 5
/mob/verb/Test_Divide 0.748 0.748 0.748 5

n = 40

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 1.184 1.184 1.184 5
/mob/verb/Test_Divide 1.143 1.143 1.143 5

*/


A quick summary of the data is that in each of the cases for n = 2, 3, 5, 7, 9, 11, 25, and 40, dividing each time rather than multiplying by the inverse gets a speed improvement of 0.017, 0.016, 0.018, 0.018, 0.018, 0.016, 0.026, and 0.041 seconds respectively. While the results appear non-linear (it could be linear with the non-linearity being from random chance), it appears that repeated division gives you better gains as n increases. Obviously without further understanding of why this occurs I can't do much but extrapolate, as I have here, but it certainly appears that it's always better to do repeated division rather than pre-computing an inverse.

Anyone have insights into why this would be the case?
What does the call order look like? I've had cases where when two procs/verbs use the same data, the second one to be called is faster because of REASONS.
In response to MrStonedOne
I manually called them, to give the VM time to "unwind", so to speak. I consistently did them in the same order, though (whatever default order they popped up in on the stat panel), so I can't rule out lingering weirdness. I'll test tomorrow when I get a chance to sit down at my computer for a bit.
I forgot I'd meant to come back to this. Mixing up the order does seem to have some effect:

/*

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 1.181 1.181 1.181 5
/mob/verb/Test_Divide 1.137 1.137 1.137 5

(S&M -> Div)x5

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 1.166 1.166 1.167 5
/mob/verb/Test_Divide 1.132 1.132 1.132 5

Div x5 -> S&M x5

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 1.165 1.165 1.165 5
/mob/verb/Test_Divide 1.137 1.137 1.137 5

S&M x5 -> Div x5

Profile results (average time)
Proc Name Self CPU Total CPU Real Time Calls
------------------------------- --------- --------- --------- ---------
/mob/verb/Test_StoreAndMultiply 1.176 1.176 1.176 5
/mob/verb/Test_Divide 1.138 1.138 1.138 5

(Div -> S&M) x5

*/


Alternating between the two seems to have the biggest difference, while sticking with the same one while doing it seems to result in it being quicker, which is interesting. The general result seems about the same, though.
The reason I think the pre-inverse is worse is that in that routine, you're doing three operations that are either multiplication or division. In the straight division one, you're only doing two.

In other words the hope of a speedup is premised on the idea that two multiplications and a division should be faster than two divisions. If that isn't true, there goes the speed improvement.

If multiplication is intrinsically faster than division, you should see an improvement if you reuse the same inverse a more and more times. If you were changing four numbers instead of just two, for instance, you'd be comparing four multiplications and a division to four divisions, and that's more likely to show you a difference if there is one.
In response to Lummox JR
Lummox JR wrote:
If multiplication is intrinsically faster than division, you should see an improvement if you reuse the same inverse a more and more times. If you were changing four numbers instead of just two, for instance, you'd be comparing four multiplications and a division to four divisions, and that's more likely to show you a difference if there is one.

I've actually tested it up to and including forty numbers, meaning comparing 40 multiplications + 1 division to 40 divisions, and the latter is still noticeably faster.
That would suggest that either floating point multiplication and division take the same number of cycles, or there's a mitigating factor like the var being set (although that should be fast).

One thing I did notice in your code is that the inverse test sets s twice; I'd set s = 1/data[i] in the first place, and that should shorten the number of instructions. It's conceivable that this is having an impact on your tests.