I'm not actually sure if DM does this internally, and if it doesn't then I might quite like to add it.

profile would suggest it already does this:
                         Profile results (total time)
Proc Name Self CPU Total CPU Real Time Calls
--------------------------- --------- --------- --------- ---------
/mob/verb/flippedwhile 28.587 28.587 28.587 500
/mob/verb/normalwhile 28.338 28.338 28.337 500
/mob/verb/checkloopflipping 0.005 56.930 56.929 1


mob
verb/checkloopflipping()
. = 0
for (var/i in 1 to 500)
. += normalwhile()
world << .
for (var/i in 1 to 500)
. += flippedwhile()
world << .

verb/normalwhile()
. = 0
while (. < 500000)
.++

verb/flippedwhile()
. = 0
if (. < 500000)
do
.++
while (. < 500000)


In response to MrStonedOne
Although your profile data is extremely compelling, it turns out while() and for() loops are not flipped. I can't explain why the flipped form in your results performs moderately worse--way more so in fact than the if() alone could account for.
way more so in fact than the if() alone could account for.

It's the scope traversal, is my bet. Traversals in DM's VM are pretty damn costly.
In response to Ter13
I'm not sure I follow; there's no traversal going on that I can see in that code. The . var is very fast to access.
Wait, DM's if statements don't traverse in? Are they jump points?
Ifs are jumps, yes.
That... Does not explain a few things that had led to believe me that ifs were not actually jumps.
Hence my confusion over MSO's results. The impact of the if() should be negligible, and the saved jump instruction over 500K iterations should be significant enough to show up in a profile.
Thanx guys! thats so useful
Anybody mentioned bit flags yet? Because bit flags. Love 'em.
Here's some general purpose things I live by at my job that aren't strictly code related.

-Giving variables good/descriptive names are as good as code commenting if not better in some cases.

- If your choice is between slightly inefficient but simple code and efficient but complex code, go with the simple solution unless the performance actively becomes a problem. Clever, complex solutions are frustrating to debug if you're not the original author or have been away from the code for awhile.

- Good use of #Define can speed code comprehension up a lot.
In response to Techgamer
Techgamer wrote:
- Good use of #Define can speed code comprehension up a lot.

Seconding this. I use preprocessor macros in pif_LongInt (and eventually in pif_BigInt) to differentiate two pieces of code that, while equivalent in function, have semantic differences. It also just makes it easier to grasp what's going on. Compare the Multiply() method from one of my classes before I used preprocessor macros, and just used raw code:
var
// These are the bytes of both src and Int.

src0 = src.block_1 & 0x00FF
src1 = (src.block_1 & 0xFF00) >> 8
src2 = src.block_2 & 0x00FF
src3 = (src.block_2 & 0xFF00) >> 8

Int0 = Int_block_1 & 0x00FF
Int1 = (Int_block_1 & 0xFF00) >> 8
Int2 = Int_block_2 & 0x00FF
Int3 = (Int_block_2 & 0xFF00) >> 8

// Bytes of the final result. We will only use the first byte of each, while
// the second byte will be used as a buffer to temporarily store bits until we
// shift them to the next byte. E.g., byte_1 = XXXX FFFF, where XXXX is data that
// will be moved to the first byte of byte_2.

byte_1 = 0
byte_2 = 0
byte_3 = 0
byte_4 = 0

// A buffer used to temporarily store results of multiplication.
buffer

/// Compute the terms with a coefficient of r**0 == 1. Assuming src0 = Int0 = 0xFF--the largest integer
/// that can be stored in one byte--then byte_1 = src0*Int0 = 0xFE01. Thus, we can just throw it in here
/// and pull the buffer into byte_2.
///
/// Maximum value after this step: 00 00 FE 01

byte_1 = src0*Int0
byte_2 = (byte_1 & 0xFF00) >> 8 // Shift buffer of byte_1 var into byte_2
byte_1 &= 0x00FF // Clear out byte_1 buffer.

/// Compute the terms with a coefficient of r**1. Assuming all terms are 0xFF, the largest integer that
/// can result from src1*Int0 + src0*Int1 is 0x2*(0xFF*0xFF) = 0x2*0xFE01 = 0x1FC02. Thus, there may be
/// overflow.

// Maximum value after this step: 00 00 FE 01 + 00 FE 01 00 = 00 FE FF 01

buffer = src1*Int0
byte_2 += (buffer & 0x00FF)
byte_3 = ((buffer & 0xFF00) >> 8) + ((byte_2 & 0xFF00) >> 8)
byte_2 &= 0x00FF // Clear out byte_2 buffer.

// Maximum value after this step: 00 FE FF 01 + 00 FE 01 00 = 01 FD 00 01

buffer = src0*Int1
byte_2 += (buffer & 0x00FF)
byte_3 += ((buffer & 0xFF00) >> 8) + ((byte_2 & 0xFF00) >> 8)
byte_4 = ((byte_3 & 0xFF00) >> 8)

byte_2 &= 0x00FF // Clear out byte_2 buffer.
byte_3 &= 0x00FF // Clear out byte_3 buffer.

/// Compute the terms with a coefficient of r**2. Assuming all terms are 0xFF, the largest integer
/// that can result from src2*Int0 + src1*Int1 + src0*Int2 = 0x3*(0xFF*0xFF) = 0x3*0xFE01 = 0x2FA03.

// Maximum value after this step: 01 FD 00 01 + FE 01 00 00 = FF FE 00 01

buffer = src2*Int0
byte_3 += (buffer & 0x00FF)
byte_4 += ((buffer & 0xFF00) >> 8) + ((byte_3 & 0xFF00) >> 8)

byte_3 &= 0x00FF
byte_4 &= 0x00FF

// Maximum value after this step: FF FE 00 01 + FE 01 00 00 = (01) FD FF 00 01

buffer = src1*Int1
byte_3 += (buffer & 0x00FF)
byte_4 += ((buffer & 0xFF00) >> 8) + ((byte_3 & 0xFF00) >> 8)

byte_3 &= 0x00FF
byte_4 &= 0x00FF

// Maximum value after this step: FD FF 00 01 + FE 01 00 00 = (01) FC 00 00 01

buffer = src0*Int2
byte_3 += (buffer & 0x00FF)
byte_4 += ((buffer & 0xFF00) >> 8) + ((byte_3 & 0xFF00) >> 8)

byte_3 &= 0x00FF
byte_4 &= 0x00FF

/// And lastly, the terms with a coefficinet of r**3. Assuming all terms are 0xFF, the largest
/// integer than can result from src3*Int0 + src2*Int1 + src1*Int2 + src0*Int3 = 0x4*0xFE01 =
/// 0x3F804.

// Maximum value after this step: FC 00 00 01 + (FE) 01 00 00 00 = (FE) FD 00 00 01.
buffer = src3*Int0
byte_4 += buffer
byte_4 &= 0x00FF

// Maximum value after this step: FD 00 00 01 + (FE) 01 00 00 00 = (FE) FE 00 00 01.
buffer = src2*Int1
byte_4 += buffer
byte_4 &= 0x00FF

// Maximum value after this step: FE 00 00 01 + (FE) 01 00 00 00 = (FE) FF 00 00 01.
buffer = src1*Int2
byte_4 += buffer
byte_4 &= 0x00FF

// Maximum value after this step: FF 00 00 01 + (FE) 01 00 00 00 = (FF) 00 00 00 01.
buffer = src0*Int3
byte_4 += buffer
byte_4 &= 0x00FF

to after I used preprocessor macros, indicating what each operation is doing rather than just writing out the operation.
var
// These are the bytes of both src and Int.

src0 = BYTE_ONE(block_1)
src1 = BYTE_TWO(block_1)
src2 = BYTE_ONE(block_2)
src3 = BYTE_TWO(block_2)

Int0 = BYTE_ONE(Int_block_1)
Int1 = BYTE_TWO(Int_block_1)
Int2 = BYTE_ONE(Int_block_2)
Int3 = BYTE_TWO(Int_block_2)

// Bytes of the final result. We will only use the first byte of each, while
// the second byte will be used as a buffer to temporarily store bits until we
// shift them to the next byte. E.g., byte_1 = XXXX FFFF, where XXXX is data that
// will be moved to the first byte of byte_2.

byte_1 = 0
byte_2 = 0
byte_3 = 0
byte_4 = 0

// A buffer used to temporarily store results of multiplication.
buffer

/// Compute the terms with a coefficient of r**0 == 1. Assuming src0 = Int0 = 0xFF--the largest integer
/// that can be stored in one byte--then byte_1 = src0*Int0 = 0xFE01. Thus, we can just throw it in here
/// and pull the buffer into byte_2.

// Maximum value after this step: 00 00 FE 01

byte_1 = src0*Int0

ADDBUFFER(byte_1, byte_2) // Add byte one's buffer into byte 2.
FLUSH(byte_1) // Clear out byte_1 buffer.

/// Compute the terms with a coefficient of r**1. Assuming all terms are 0xFF, the largest integer that
/// can result from src1*Int0 + src0*Int1 is 0x2*(0xFF*0xFF) = 0x2*0xFE01 = 0x1FC02. Thus, there may be
/// overflow.

// Maximum value after this step: 00 00 FE 01 + 00 FE 01 00 = 00 FE FF 01

buffer = src1*Int0

ADDDATA(buffer, byte_2)
byte_3 = BUFFER(buffer) + BUFFER(byte_2)

FLUSH(byte_2) // Clear out byte_2 buffer.

// Maximum value after this step: 00 FE FF 01 + 00 FE 01 00 = 01 FD 00 01

buffer = src0*Int1

ADDDATA(buffer, byte_2)
byte_3 += BUFFER(buffer) + BUFFER(byte_2)
ADDBUFFER(byte_3, byte_4)

FLUSH(byte_2)
FLUSH(byte_3)

/// Compute the terms with a coefficient of r**2. Assuming all terms are 0xFF, the largest integer
/// that can result from src2*Int0 + src1*Int1 + src0*Int2 = 0x3*(0xFF*0xFF) = 0x3*0xFE01 = 0x2FA03.

// Maximum value after this step: 01 FD 00 01 + FE 01 00 00 = FF FE 00 01

buffer = src2*Int0

ADDDATA(buffer, byte_3)
byte_4 += BUFFER(buffer) + BUFFER(byte_3)

FLUSH(byte_3)
FLUSH(byte_4)

// Maximum value after this step: FF FE 00 01 + FE 01 00 00 = (01) FD FF 00 01

buffer = src1*Int1

ADDDATA(buffer, byte_3)
byte_4 += BUFFER(buffer) + BUFFER(byte_3)

FLUSH(byte_3)
FLUSH(byte_4)

// Maximum value after this step: FD FF 00 01 + FE 01 00 00 = (01) FC 00 00 01

buffer = src0*Int2

ADDDATA(buffer, byte_3)
byte_4 += BUFFER(buffer) + BUFFER(byte_3)

FLUSH(byte_3)
FLUSH(byte_4)

/// And lastly, the terms with a coefficinet of r**3. Assuming all terms are 0xFF, the largest
/// integer than can result from src3*Int0 + src2*Int1 + src1*Int2 + src0*Int3 = 0x4*0xFE01 =
/// 0x3F804.

// Because we add directly to byte_4, from this point on we can drop using the buffer. We do
// continue to flush byte_4 after each step to prevent roll-over or loss of precision causing
// problems.

// Maximum value after this step: FC 00 00 01 + (FE) 01 00 00 00 = (FE) FD 00 00 01.
byte_4 += src3*Int0
FLUSH(byte_4)

// Maximum value after this step: FD 00 00 01 + (FE) 01 00 00 00 = (FE) FE 00 00 01.
byte_4 += src2*Int1
FLUSH(byte_4)

// Maximum value after this step: FE 00 00 01 + (FE) 01 00 00 00 = (FE) FF 00 00 01.
byte_4 += src1*Int2
FLUSH(byte_4)

// Maximum value after this step: FF 00 00 01 + (FE) 01 00 00 00 = (FF) 00 00 00 01.
byte_4 += src0*Int3
FLUSH(byte_4)

It's much clearer what's going on in the latter case than in the former (assuming you can make heads or tails of either of them, of course).
Bitflags? Back in college I came up with this one, that you've probably seen in a few patterns on BYOND:

n & (n-1)

I use it on BYOND as a check to see if a dir is diagonal; or more specifically, if it has more than 1 bit set.

It's part of a more general set of bit twiddles:

// remove the least significant bit from n
n & (n-1)

// fill everything to the right of the LSB with 1
n | (n-1)

// isolate the LSB
n ^ (n & (n-1))


These are bit twiddles I use for autojoining dirs. Going clockwise from north=1, northeast=2, etc. to northwest=128:

// 16-state join (no diagonals)
d & 85

// 47-state join: corners are outer, horizontal, vertical, inner, or filled
d & (((d << 1) & ((d << 7) | (d >> 1))) | 85)

// 161-state join: corners are outer, horizontal, vertical, h-curve, v-curve, inner, filled
d & ((d << 1) | ((d << 7) | (d >> 1)) | 85)

// 82-state join: corners are outer, horizontal, vertical, h-curve, v-curve, and inner=filled
d & ((d << 1) ^ ((d << 7) | (d >> 1)) | 85)

Each of those is about determining how the cardinal directions influence their neighbors. In 47-state joins, two adjacent cardinal dirs have to be set for the diagonal to matter. In 161-state, you only need either cardinal dir to join for the diagonal to matter. 82-state is like 161 where the inner and filled corners are identical. And in 16-state joining, diagonals are irrelevant.

Also I'm fascinated by Gray codes, which are ways of reorganizing a number so that n and n+1 differ by only a single bit. Converting any integer to Gray code is easy:

n ^= n >> 1

Converting back from Gray code to normal (in 16 bits):

n ^= n >> 8
n ^= n >> 4
n ^= n >> 2
n ^= n >> 1

Gray codes come up in electrical engineering, where certain devices might use brushes that make contact with conductors on a ring that indicate different bits. Mechanically, it's possible for some of the brushes to advance to the next position before others, so in going from 7 to 8 you could read any number from 0 to 15. With a Gray code, 7 is expressed as 4 and 8 as 12, and 4 and 12 differ by only a single bit--so the only contact that matters is that one, and there's no chance for a (decoded) reading at that position other than 7 or 8.
Page: 1 2 3 4 5 6