We didn't plan to disable logging on the main server because it's core to the game. What we're doing is remaking the system to use SQLite.
And just to put it here, these are the numbers when the CPU is choking;

server mem usage:
Prototypes:
obj: 617076 (3963)
mob: 619764 (168)
proc: 1994152 (4076)
str: 3683927 (54936)
appearance: 21733763 (92091)
id array: 2195168 (11884)
map: 12611920 (700,650,5)
objects:
mobs: 874072 (842)
objs: 21388776 (187362)
datums: 1152416 (5558)
lists: 3232864 (46458)
Your appearance and objs are quite damn high. As in, higher than SS13's that are worth noting. But I'm not sure why that'd make any difference.


Here's a question for you.. When profiling DURING the cpu increase, are there any procs that seem to be taking up more CPU than normal? In particular Stat() or any others?
Appearances ain't no thang, but that obj count is p high.

I think odds are the CPU choke is due to the update loop.

Every tick, every atom in every client's screen, and every atom in every client's view range is checked for appearance updates.

I think odds are, the fact that you guys are using one object per letter, for your hud text may well be an issue.
Most of the hud is using maptext now, there's very little using the letter-by-letter stuff anymore, just a few small under-used things here and there.

We don't have anything showing absurd CPU usage during the spikes, and don't use Stat() at all.
Just for comparison's sake, how many objects are there when the world starts up and has one or two players?
server mem usage:
Prototypes:
obj: 617092 (3963)
mob: 619780 (168)
proc: 1994124 (4076)
str: 904549 (18442)
appearance: 4525149 (18103)
id array: 2110344 (9322)
map: 21667756 (700,650,5)
objects:
mobs: 442480 (548)
objs: 6448144 (49362)
datums: 79680 (515)
lists: 612212 (12234)


This is locally, there's no way to test low numbers live, since the server always has people on it.

I'll page you the memory dump we have from running this process on a loop as the game ran for a while, don't wanna post it here because it has user details listed.
Yeah, you guys have a GC problem. Object count is only going up the longer the world is up, which means you guys have circular references not being cleared.
Yeah, I noticed the object count increasing too, except we're not getting memory usage increases and there's no object procs showing anything out of the normal in the profiler so I'm not sure if that's the problem or just "a problem".
I can assure you that is indeed the problem. It's the only outlier. You have a memory leak very apparently going on.

The objects' procs aren't the issue, otherwise the CPU usage would show up in the profiler. They don't, which means it has to be an internal issue.

The only thing that makes sense is that runaway object leak resulting in excess internal (non-profiled) function overhead.
Wouldn't a leak result in increasing memory usage and not just increasing CPU usage?
The obj count displayed in that output is slightly incorrect, in that the obj count is actually the size of the object pointer array. But it can only get that high if it needs to, which means at some point you had in excess of 180K actual objs in the world. A circular reference would be a huge possible cause. Notice though that the memory is only showing a difference of 15M for objs; for a big server that's not necessarily a perceptible difference, so it doesn't look like a memory leak even though it is. The fact that obj memory is roughly proportional indicates that the obj creation wasn't just a mass one-off event, either, but that they still exist.

With the reliance on del(), that means each and every one of these objs' var lists are being searched until all the outstanding refs are found. There's the CPU gremlin.

What I recommend is a diagnostic loop, something to count the used refs:

var/list/types = new
for(var/obj/O)
types[O.type] = (types[O.type]||0) + 1
world.log << "Obj counts by type:"
for(var/t in types)
world.log << "[t]: [types[t]]"

If a specific type has a circular ref, it should (either by itself or counting children) show up like a neon sign in the output. If all objs are doing so, or /obj itself does so, that's a sign that something very unseemly is being done with default vars or in obj/New().

Also of note is the ginormous (though smaller) number of lists.
I added the loop to the next update the code and will check the output of it when I have output to check.

As for the amount of lists, that's really surprising, but there is a lot of stuff using them, didn't expect it to be that big though.
In response to Ter13
Ter13 wrote:
The reason that del is so high is because forcibly deleting objects has to look through every variable of every object in the world and null out references to ref. The more datums and variables in your world, the longer that search will take.

There are almost no cases where using the del keyword is correct.

If that's what del does, and it's so terribly inefficient then why don't they keep track of all the references made to the ref and store the data with it, so it can use that lookup to delete the refs quickly, given it's bad practice to reference objects directly when adding things to variables anyways(if avoidable) and instead adding other info which will allow you to grab a reference to the object(or needed data) later if needed.
In response to Superbike32
Superbike32 wrote:
Ter13 wrote:
The reason that del is so high is because forcibly deleting objects has to look through every variable of every object in the world and null out references to ref. The more datums and variables in your world, the longer that search will take.

There are almost no cases where using the del keyword is correct.

If that's what del does, and it's so terribly inefficient then why don't they keep track of all the references made to the ref and store the data with it, so it can use that lookup to delete the refs quickly, given it's bad practice to reference objects directly when adding things to variables anyways(if avoidable) and instead adding other info which will allow you to grab a reference to the object(or needed data) later if needed.

Lummox believes that user-counters are a memory-hungry approach. I agree that they are a memory-hungry approach, but the CPU benefits are hard to argue with.

Basically, BYOND is "never" going to go x64, according to Tom/Lummox, and as such, we are always working under a low ceiling when it comes to memory usage.
In response to Nadrew
Nadrew wrote:
I added the loop to the next update the code and will check the output of it when I have output to check.

As for the amount of lists, that's really surprising, but there is a lot of stuff using them, didn't expect it to be that big though.

Just to keep the details up to date, Nadrew applied fixes to recycling (which did give us a nice performance boost) but didn't resolve the CPU rising over time. What he's doing now is working on the logging issues that Lummox addressed.
So what would cause this? The creation and deletion of objects over time?

I have a server experiencing this heavily to the point that it cannot stay up consistently for 2 hours without a reboot. I'm guessing this isn't a new problem, just has become exaggerated (like every other BYOND issue) as more updates roll in.

I got a hefty donation waiting for you, Lummox, if you can figure this out -- if nothing else.
FKI, it would help if you could get some snapshots of the memory stats from DD over time in your game. That will show if a particular kind of object is growing in number more than perhaps you think it is.

Typically, the main cause of CPU usage going up over time is going to be things like infinite spawn loops that don't let their src objects die. Examples:

obj/proc/WorkLoop()
spawn()
while(src)
...
sleep(delay)

obj/proc/WorkLoop2()
...
spawn(dewlay) WorkLoop2()

Where in your code do you have loops like that? That's the first thing you should check.

Basically CPU usage going up steadily is almost always going to come down to something in the game code.
In response to Lummox JR
I've extensively checked the code for possible mistakes, but there are none. I had all objects being immediately deleted at one point and the problem persisted. Now del() is using relatively sparingly and no success.

The Profiler also shows no anomalies. This issue seems to become exaggerated the more players that are active in the world. I'll reboot the game and have 60 instantly reconnect, and world.cpu lingers around 50. As player count rises above 90 into 100, so does cpu, eventually never stopping even if player count does.

Some folks I've talked to who've experienced the same thing said the problem only became tambed by dialing back the BYOND version. I don't doubt this at all since I've ran the same game in the past, multiple times, with 50-70% more player activity, without this issue. I'm in the process of going far back to see how that works.

Some memory stats I grabbed yesterday:

At 40 players:
Prototypes:
obj: 411404 (2786)
mob: 413356 (122)
proc: 2187980 (4344)
str: 925522 (26771)
appearance: 5016130 (18938)
id array: 2477092 (12406)
map: 78154504 (100,100,57)
objects:
mobs: 528768 (455)
objs: 6503360 (30388)
datums: 817524 (14676)
images: 377384 (4360)
lists: 639480 (10198)


At 64 players:
Prototypes:
obj: 411404 (2786)
mob: 413356 (122)
proc: 2187980 (4344)
str: 937020 (26771)
appearance: 8262228 (33913)
id array: 2488452 (13182)
map: 78163612 (100,100,57)
objects:
mobs: 757376 (701)
objs: 11834368 (43725)
datums: 1278980 (14676)
images: 500156 (6276)
lists: 893936 (14676)
I really don't see how rolling back versions would help--especially without an indication as to which version.
Page: 1 2 3 4