In my doubly linked list, when i loop thru it, the proc var for storing the current item being worked on randomly changes to something else, causing runtimes, but clearly only one thing ever accesses that var, and its on the up and up.
Code:
// Run thru the queue of subsystems to run, running them while balancing out their allocated tick percentage
/datum/controller/master/proc/RunQueue()
. = 0 //so the caller can detect runtimes by checking the return
var/datum/subsystem/queue_node //**RELATED VAR DEFINED HERE**
var/queue_node_flags
var/queue_node_priority
var/queue_node_paused
var/current_tick_budget
var/tick_precentage
var/tick_remaining
var/ran = TRUE //this is right
var/ran_non_ticker = FALSE
var/bg_calc //have we switched current_tick_budget to background mode yet?
var/tick_usage
//keep running while we have stuff to run and we haven't gone over a tick
// this is so subsystems paused eariler can use tick time that later subsystems never used
while (ran && queue_head && world.tick_usage < TICK_LIMIT_MC)
ran = FALSE
bg_calc = FALSE
current_tick_budget = queue_priority_count
**RUNTIME HERE**for (queue_node = queue_head; istype(queue_node); queue_node = queue_node.queue_next)**RUNTIME HERE**
if (ran && world.tick_usage > TICK_LIMIT_RUNNING)
break
queue_node_flags = queue_node.flags
queue_node_priority = queue_node.queued_priority
//super special case, subsystems where we can't make them pause mid way through
//if we can't run them this tick (without going over a tick)
//we bump up their priority and attempt to run them next tick
//(unless we haven't even ran anything this tick, since its unlikely they will ever be able run
// in those cases, so we just let them run)
if (queue_node_flags & SS_NO_TICK_CHECK)
if (queue_node.tick_usage > TICK_LIMIT_RUNNING - world.tick_usage && ran_non_ticker)
queue_node.queued_priority += queue_priority_count * 0.10
queue_priority_count -= queue_node_priority
queue_priority_count += queue_node.queued_priority
current_tick_budget -= queue_node_priority
continue
if ((queue_node_flags & SS_BACKGROUND) && !bg_calc)
current_tick_budget = queue_priority_count_bg
bg_calc = TRUE
tick_remaining = TICK_LIMIT_RUNNING - world.tick_usage
if (current_tick_budget > 0 && queue_node_priority > 0)
tick_precentage = tick_remaining / (current_tick_budget / queue_node_priority)
else
tick_precentage = tick_remaining
CURRENT_TICKLIMIT = world.tick_usage + tick_precentage
if (!(queue_node_flags & SS_TICKER))
ran_non_ticker = TRUE
ran = TRUE
tick_usage = world.tick_usage
queue_node_paused = queue_node.paused
queue_node.paused = FALSE
last_type_processed = queue_node
queue_node.fire(queue_node_paused)
current_tick_budget -= queue_node_priority
tick_usage = world.tick_usage - tick_usage
if (tick_usage < 0)
tick_usage = 0
if (queue_node.paused)
queue_node.paused_ticks++
queue_node.paused_tick_usage += tick_usage
continue
queue_node.ticks = MC_AVERAGE(queue_node.ticks, queue_node.paused_ticks)
tick_usage += queue_node.paused_tick_usage
queue_node.tick_usage = MC_AVERAGE_FAST(queue_node.tick_usage, tick_usage)
queue_node.cost = MC_AVERAGE_FAST(queue_node.cost, TICK_DELTA_TO_MS(tick_usage))
queue_node.paused_ticks = 0
queue_node.paused_tick_usage = 0
if (queue_node_flags & SS_BACKGROUND) //update our running total
queue_priority_count_bg -= queue_node_priority
else
queue_priority_count -= queue_node_priority
queue_node.last_fire = world.time
queue_node.times_fired++
if (queue_node_flags & SS_TICKER)
queue_node.next_fire = world.time + (world.tick_lag * queue_node.wait)
else if (queue_node_flags & SS_POST_FIRE_TIMING)
queue_node.next_fire = world.time + queue_node.wait
else if (queue_node_flags & SS_KEEP_TIMING)
queue_node.next_fire += queue_node.wait
else
queue_node.next_fire = queue_node.queued_time + queue_node.wait
queue_node.queued_time = 0
//remove from queue
queue_node.dequeue()
CURRENT_TICKLIMIT = TICK_LIMIT_RUNNING
. = 1 //normal finish, let the caller know we didn't runtime
[00:51:45]undefined variable /datum/controller/master/var/queue_next
[00:51:45]proc name: RunQueue (/datum/controller/master/proc/RunQueue)
[00:51:45] source file: master.dm,325
[00:51:45] usr: null
[00:51:45] src: Master (/datum/controller/master)
[00:51:45] call stack:
[00:51:45]Master (/datum/controller/master): RunQueue()
[00:51:45]Master (/datum/controller/master): Loop()
[00:51:45]Master (/datum/controller/master): StartProcessing()
Basically, according to the runtime, queue_node is randomly getting set to something of type /datum/controller/master. I know better than to trust this, and am assuming it could also be null or some other bad type. But as you'll see, this can't be the case, the istype() is right there!
Now, Wondering what the hell was going on, I added runtime detection to the thing that calls RunQueue, and i have the Master Controller doing a soft reset while sending to log any bad data it finds if RunQueue runtimes:
//resets the queue, and all subsystems, while filtering out the subsystem lists
// called if any mc's queue procs runtime or exit improperly.
/datum/controller/master/proc/SoftReset(list/ticker_SS, list/normal_SS, list/lobby_SS)
. = 0
world.log << "MC: SoftReset called, resetting MC queue state."
if (!istype(subsystems) || !istype(ticker_SS) || !istype(normal_SS) || !istype(lobby_SS))
world.log << "MC: SoftReset: Bad list contents: '[subsystems]' '[ticker_SS]' '[normal_SS]' '[lobby_SS]' Crashing!"
return
var/list/subsystemstocheck = subsystems + ticker_SS + normal_SS + lobby_SS
for (var/thing in subsystemstocheck)
var/datum/subsystem/SS = thing
if (!SS || !istype(SS))
//list(SS) is so if a list makes it in the subsystem list, we remove the list, not the contents
subsystems -= list(SS)
ticker_SS -= list(SS)
normal_SS -= list(SS)
lobby_SS -= list(SS)
world.log << "MC: SoftReset: Found bad entry in subsystem list, '[SS]'"
continue
if (SS.queue_next && !istype(SS.queue_next))
world.log << "MC: SoftReset: Found bad data in subsystem queue, queue_next = '[SS.queue_next]'"
SS.queue_next = null
if (SS.queue_prev && !istype(SS.queue_prev))
world.log << "MC: SoftReset: Found bad data in subsystem queue, queue_prev = '[SS.queue_prev]'"
SS.queue_prev = null
SS.queued_priority = 0
SS.queued_time = 0
SS.paused = 0
if (queue_head && !istype(queue_head))
world.log << "MC: SoftReset: Found bad data in subsystem queue, queue_head = '[queue_head]'"
queue_head = null
if (queue_tail && !istype(queue_tail))
world.log << "MC: SoftReset: Found bad data in subsystem queue, queue_tail = '[queue_tail]'"
queue_tail = null
queue_priority_count = 0
queue_priority_count_bg = 0
world.log << "MC: SoftReset: Finished."
. = 1
Result:
runtime error:
[04:37:57]undefined variable /datum/controller/master/var/queue_next
[04:37:57]proc name: RunQueue (/datum/controller/master/proc/RunQueue)
[04:37:57] source file: master.dm,325
[04:37:57] usr: null
[04:37:57] src: Master (/datum/controller/master)
[04:37:57] call stack:
[04:37:57]Master (/datum/controller/master): RunQueue()
[04:37:57]Master (/datum/controller/master): Loop()
[04:37:57]Master (/datum/controller/master): StartProcessing()
MC: SoftReset called, resetting MC queue state.
MC: SoftReset: Finished.
(ie, no bad linked list state detected, as it printed no lines showing that)
So either one of two things is true, there is some magical line that edits queue_node that my eyes aren't seeing (as if it changed between runs, the middle part of the for would not have ran yet to check it) Or there is some byond bug at play.
I have looked and looked and looked, no line outside of the for loop's define edits that var.
The reason I ask this is, you're calling queue_node.dequeue() at the end of the loop, just before checking queue_node.queue_next to get the next node. While I don't see this as being able to directly cause your issue, this kind of thing always spooks me because if anything much happens to the node during the deqeuue, the queue_next reference could be wrong.
Whenever I loop through lists like this internally, when I know that the list may be altered, I immediately grab the next pointer right at the top of the loop interior, and I use that in the loop.
Again I don't see how your specific bug could manifest from not doing that, but for safety I think I'd make that change anyway since you are altering the queue. I would think if a problem cropped up it'd be a null reference appearing, rather than a complete change.
The next thing I would probably try in RunQueue() is to figure out at which point queue_node is changing, by sticking either istype() checks in there, or by using a local var that stores queue_node's type at the beginning of the loop and periodically does a comparison (which would obviously be faster).
You also mentioned a #define; what does it look like?