ID:1434058
 
I've made a pretty startling discovery about the way BYOND handles proc call computations within the underlying VM. For one, it seems to be bogged down when logic is embedded into extensive computations in the form of individual function calls as opposed to just having all the logic in a single stream. That's not all, though; the profiler does not seem to display computations in the form of proc calls, but rather the code within the proc calls themselves. There needs to be some way to track the actual time needed for individual proc calls themselves, not just the code they execute.

Here's some code to show you what I mean:

#define DEBUG 1


proc/Logic()
var/i = 0
i++
if(i == 1) i--

var/end = 0
var/subloops = 50000
mob/verb
/* Case 1: Takes up 100+% CPU when subloops == 50000 */
Case1()
while(!end)
for(var/x = 1, x <= subloops, x++)
Logic()
sleep(1)
end = 0

/* Case 2: Takes up 40-60% CPU when subloops == 50000 */
Case2()
while(!end)
for(var/x = 1, x <= subloops, x++)
var/i = 0
i++
if(i == 1) i--
sleep(1)
end = 0

CaseEnd()
end = 1

mob/Stat()
stat("World CPU", world.cpu)


Here's what I take from this situation:


Case 1 is performing 50,000 logic iterations per decisecond (@ 10 FPS)
Case 2 is performing 50,000 logic iterations per decisecond (@ 10 FPS)

Case 1, after 10 seconds of processing, takes up 90-130% CPU.
Case 2, after 10 seconds of processing, takes up 40-60% CPU.


What can be assumed is embedding logic into extensively-iterated routines is less resource-intensive than
extending the logic into its own function (ie /proc/Logic()). Is this a bug? What exactly is happening here?
Are Case 1 and Case 2 not essentially equivalent in their function?
I've always had a feeling that the act of calling a proc uses up a significant amount of CPU. This is most evident in the practical situation of a smooth-movement game running at 30 FPS with many moving objects.

I think I first saw significant evidence of it being an issue when I used a vector system that used preprocessors instead of procs. The preprocessor version is significantly faster because the functions are inserted straight into the code, rather than called as procs.
//  preprocs
#define vec2(x, y) list(x, y)
#define vec2_add(a, b) vec2(a[1] + b[1], a[2] + b[2])

// procs
// these can't be used at compile-time, either
proc/vec2(x, y) return list(x, y)
proc/vec2_add(a[], b[]) return list(a[1] + b[1], a[2] + b[2])

From the above example, the preprocessor vec2_add() was 2.4x faster than the proc vec2_add() for 1,000,000 tests.
good to know!