ID:1823627
 
(See the best response by Ter13.)
Code:


Problem description:

Randomly on tgstation13's ss13 server, the game will enter some state where the "Out of resources" runtime will trigger (the server has 64gb of ram, so running out of ram isn't gonna be an issue, but running out of 32bit accessible ram might) followed by a bunch of bad list runtimes on anything that accesses our global lists followed by dd fully hanging.

I am trying to figure out how one would go about hunting this down to determine if this is a byond bug or a id10t error in our coding department.

Best response
It sounds like you've got a leak going on somewhere. Unfortunately, with a project as large as SS13, hunting this down would be exceptionally difficult. Memory debugging in BYOND is basically nonexistant.

In order to catch this kind of thing, you have to have a really deep understanding of what code does what and what objects reference what other objects.

Memory leaks generally originate from circular references, though. Look for anything that uses a large number of lists and simultaneously has circular references with those lists or owning objects.
One thing I might suggest is that before you get to that point, run a debug verb a few times (every so often) that dumps all of your types and a count of each.

proc/CountItems()
var/datum/D
var/list/L
var/atom/A
var/list/counts = new
for(A in world) counts[A.type] = (counts[A.type]||0) + 1
for(L in world) counts[/list] = (counts[/list]||0) + 1
for(D in world) counts[D.type] = (counts[D.type]||0) + 1
// you could add some sorting here
var/file/F = file("usage.txt")
text2file("[time2text(world.realtime)]\n", F)
for(var/i in counts)
text2file("[i]\t[counts[i]]\n", F)
text2file("\n", F) // blank line

Note that the "in world" list will pick up items that have only circular references, so if as Ter suggests there's some kind of pernicious chain going on, you should see increases in the types of objects involved.