ID:2474725
 
BYOND Version:512
Operating System:Windows Sever, Linux, Ubuntu
Web Browser:Chrome 74.0.3729.169
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:
I have two problem signatures for Dream Daemon. This is a problem that's been plaguing the newly released Project Crimson Roleplay for two weeks now. We've gone through extensive efforts to fix the problem on our side, from changing the way our game operates, tons of bug fixes, runtime fixes, running DreamDaemon verbose to find suppressed runtimes to handle. We've changed servers, changed operating systems, done virtually everything and we can to solve this problem of crippling 'lag.'

So to be clear with what we're dealing with, after a couple hours, or even a few hours of running, DreamDaemon begins to take up a massive amount of resources. It maxes out an entire core and runs itself into the dirt-- locking up and eventually crashing. We've checked to ensure nothing is spawning and going out of control, we've checked for infinite loops, we've optimized constantly, and nothing seems to solve it. The game makes extensive use of animations, and we suspect that might have something to do with it, but we're almost entirely certain it's a red herring.

Our debugging efforts have tracked an in-game CPU that rises to 500+, 1500+, eventually 8000+ before DreamDaemon locks up and comes to a complete freeze, then crashes. This happens at random seemingly, without any kind of discernable trigger, at different populations of players, and regardless of the host machine's capacity (We suspected we didn't have enough RAM at first and upgraded servers and RAM and it doesn't matter how much is available, DD kills itself).


Problem signature:
Problem Event Name: APPCRASH
Application Name: dreamdaemon.exe
Application Version: 5.0.512.1470
Application Timestamp: 5ceae321
Fault Module Name: byondcore.dll
Fault Module Version: 5.0.512.1470
Fault Module Timestamp: 5ceae2aa
Exception Code: c0000005
Exception Offset: 000f69d2
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 54e2
Additional Information 2: 54e23fb558bc27b5426ec592345ce16d
Additional Information 3: aa59
Additional Information 4: aa592dcc388ea38f2a413041580d6077

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=280262

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt



===========================================================

Problem signature:
Problem Event Name: APPCRASH
Application Name: dreamdaemon.exe
Application Version: 5.0.512.1470
Application Timestamp: 5ceae321
Fault Module Name: byondcore.dll
Fault Module Version: 5.0.512.1470
Fault Module Timestamp: 5ceae2aa
Exception Code: c0000005
Exception Offset: 000f69d2
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 54e2
Additional Information 2: 54e23fb558bc27b5426ec592345ce16d
Additional Information 3: aa59
Additional Information 4: aa592dcc388ea38f2a413041580d6077

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=280262

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt


=======================================================

Also have this:

runtime error: Cannot execute null.().
runtime error: Cannot execute null.().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.3991033912().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.3991033912().
runtime error: Cannot execute null.3991033912().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.3991033912().#
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().
runtime error: Cannot execute null.71.223.38.137().

Numbered Steps to Reproduce Problem:

1. Host server.
2. 50 + Players online (2 hour runtime)
3. Let DreamDaemon explode on its own.
Code Snippet (if applicable) to Reproduce Problem:
N/A


Expected Results:

Actual Results:

Does the problem occur:
Every time? Or how often?
Every last time.
In other games?
Not that I'm aware. I think Baystation12 encountered something similar and posted about it, but their post wasn't helpful for us.
In other user accounts?
N/A
On other computers?
On other host machines, yes.

When does the problem NOT occur?
Always occurs

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
We started with 5.10 I believe, and updated BYOND to no avail.

Workarounds:
Nothing.
It sounds like you're running out of memory and that's causing a snowball effect after that that leads to the crash. Keep in mind, Dream Daemon can only use a little under 2 GB of RAM; it is not a 64-bit application. Therefore you probably never needed to upgrade the machine unless your RAM before that was well below what you needed.

The big question here of course is, what's using all that memory? Some memory reports from DD would be helpful. In Windows there's a menu command for this; in Linux it's a signal you send to the server. With that info, it would be possible to start discovering what kinds of objects are using the most memory and, ultimately, why. The best thing to do is to collect these reports over time, so you can track if the info is remaining fairly stable or if something is tracking upwards at an increasing rate--the latter being a sign that maybe your game has some references it should be, but isn't, throwing away. (And, there's no reason a BYOND bug couldn't be ruled out, like if some proc weren't decrementing reference counts that it's supposed to.)
How would I extract memory reports from DreamDaemon exactly? I've been keeping track of memory usage on my servers and that's never been a problem-- not on its own.

We've had a linux server with 3.6gigs of ram and even that maxed out after a while. On our Windows server, we have less, but our system's own memory and processing power is never maxed out-- and only up to about 35% of it is ever used.

But if you tell me the way to get the memory reports on DreamDaemon itself, I can probably work at the problem or at the very least get you the info you need to determine if its a DD limitation.
On Linux you'd run

kill -SIGUSR2 [pid]


Where 'pid' is the process ID.

On Windows you'd press ctrl+M in DreamDaemon, or select "Memory Stats" from the "World" menu.
Thank you Nadrew.
Should that be giving a pop up or something? I'm not getting any response from the command
(Using windows, using CTRL+M, and clicking it from dropdown).

Maybe I just don't know where to see the results?
If you have world.log set the results will go into that file.
So I've collected some memory reports.

30 minutes in

Server mem usage:
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 758 KB (18,299)
appearance: 538 KB (16,731)
filter: 256 B (0)
id array: 7.29 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 2.7 MB (2,862)
objs: 22 MB (138,724)
datums: 1.2 MB (9,973)
images: 1.36 MB (19,707)
lists: 5.17 MB (86,034)

1 hour in:
Server mem usage:
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 803 KB (19,443)
appearance: 770 KB (24,022)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 3.13 MB (3,138)
objs: 23 MB (139,855)
datums: 3.07 MB (11,869)
images: 1.9 MB (27,334)
lists: 5.58 MB (92,840)


2 hours in

Server mem usage:
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 806 KB (19,443)
appearance: 719 KB (22,746)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 2.86 MB (2,921)
objs: 22.8 MB (139,634)
datums: 2.21 MB (10,942)
images: 1.95 MB (28,038)
lists: 5.41 MB (88,558)


3 hours in

Server mem usage:
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 854 KB (20,659)
appearance: 853 KB (27,081)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 3.76 MB (3,716)
objs: 23.9 MB (142,154)
datums: 3.73 MB (14,126)
images: 2.29 MB (33,056)
lists: 6.4 MB (107,534)


4 hours in

prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 871 KB (20,659)
appearance: 929 KB (29,260)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 4.12 MB (4,032)
objs: 24.1 MB (143,075)
datums: 4.36 MB (15,695)
images: 2.2 MB (30,837)
lists: 6.94 MB (115,461)


4 hours, 25 minutes in
It's worth noting that this was also the time when we felt the game freeze up. DD stopped responding and our CPU usage skyrocketed. Then, DD killed itself and the game dropped with this as the last report-- and the game not being able to move forward between this report and the final kill.
Server mem usage:
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 864 KB (20,659)
appearance: 909 KB (28,522)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 4.01 MB (3,947)
objs: 23.9 MB (142,807)
datums: 4.27 MB (15,387)
images: 2.06 MB (28,476)
lists: 6.77 MB (112,923)
Memory usage looks surprisingly low here, but what is the system saying the memory for the process is?

A number of things don't get counted in the memory report, and the proc queue is one of those things. The crash you reported in the first post is happening in the proc duplication routine, after proc data is duplicated. However, in that crash world.log should have included an "Out of memory" line to signal that.

The situation happening here suggests something has corrupted the heap, but that's going to be very difficult to suss out. Memory exhaustion would do that, which I think could potentially happen if you overloaded something like the proc queue that doesn't show in these reports.

The fact that this happens only in your game tells me that either memory exhaustion really is the culprit, or there's some functionality that isn't used by many other games but is used by yours, that is causing the issue.
So I've got a couple screenshots.

Our server just grinded to a halt and I stopped the server in an effort to ensure all the save files can be written-- but even stopping DD from the stop button is taking ages and DD is frozen. It's probably about to explode. This is our in-game measurement



This is our memory usage on the Windows server (our linux server is has more memory, over double, but encountered the same issues. We swapped back for ease of use and because we suspected a problem with Linux)



And this is our memory usage.





And this is our memory stats
prototypes:
obj: 678 KB (3,178)
mob: 12 KB (771)
proc: 3.82 MB (3,535)
str: 799 KB (19,443)
appearance: 659 KB (20,562)
filter: 256 B (0)
id array: 7.3 MB (12,499)
map: 30.1 MB (1000,1000,4)
objects:
mobs: 3 MB (3,037)
objs: 22.5 MB (139,595)
datums: 2.23 MB (11,471)
images: 1.82 MB (25,646)
lists: 5.48 MB (90,146)


Pretty consistent. So my only question at this point-- I guess, is how can we stop from overloading it? The only thing that comes to mind that other games might not be using is extensive use of animation procs, and other image transformation features from BYOND's most recent installments.
You're definitely running at around the maximum for the server. (Side question: Are you seriously running a newer flavor of Windows on a machine with only 3 GB of RAM? If so, you need to upgrade that machine pronto. 3 GB was sort of adequate for me when I ran XP, but I don't see 7+ or 10 playing nicely on that little memory.)

This being the case, and because your crash happened in a spawn() or sleep() call, I think the first place to look is at your sleep/spawn calls: especially spawn(). I noticed you have a rather large number of objs; are they spawning anything?

I also really want to see what the memory usage looks like over time. It looks as if there's a significant lack of info in that graph, like perhaps you started Task Manager after the problem occurred instead of before. I think you might well see memory tracking upward slowly, and that's a sign of something leaking or your proc queue getting stuffed.
Yeah, it's not our best server-- but the better servers haven't proven to be much better functionally. This isn't our permanent solution, just the easiest when it comes to managing and accessing what we need to see in order to try and fix our issue. We've encountered the same problems on our 32 gigabit linux server.


This is after a fresh reboot:



1 hour in:


1 hour 48 minutes in:


This last one was a crash. Seems our memory is definitely scaling up, but somewhere between an hour and almost 2 hours, it either ramped up really quick, or it took on a steady increase over the second hour. I've gotta take a look at the causes, can't tell if it's from player count, activity, or some unruly proc.

It's looking like DD's limit is 500+ MB.

I did a few more observations:

Run 1
61 minutes

120 minutes


Run 2
105 minutes

120 minutes




To answer your questions
We make use of spawn frequently. Our objects are mostly dense tiles on our map, not objects with spawns attached. But I'll be looking into what objects may have spawn() and sleep() attached to it this week. There's 639 spawn calls in our project, so I'll have to look into it after work.
DD's limit is not 500+ MB; it's a little shy of 2 GB. If you're seeing a sudden increase leading to a crash but it doesn't come anywhere near the 2 GB limit, that's a sign of heap corruption.

By any chance did you get an event log with crash data from that newer crash? With heap corruption there's often nothing useful to work with but it can't hurt to see anyway.

Based on the new information my current hypothesis is that something fairly unique to your game is causing heap corruption. (By that I mean your game may be using a feature not often used by other games, or using it in a different way from most, triggering a bug in the engine that normally stays dormant.) Can you think of any features your game uses that others don't use as often, especially newer ones? Regular expressions, visual contents, etc. A full rundown would be very useful.
We use a lot of animations simultaneously, and those are timed with a bunch of spawn() too.
We use a lot of .mp3s as .midis. Figured that would be worth mentioning.
I don't think we do too much, but we definitely built the game on it.
Sorry to double post:
http://www.byond.com/forum/post/2476501
Could this be at all related?
That issue is unlikely to be related, as 1) it's client-side, and 2) if a similar problem were happening in your code on the server end, I would expect to see evidence in the crash log pointing to the server animation routines instead of generic heap corruption. However, #2 is not necessarily a given.

You didn't answer me yet, though, as to whether you got crash details from your most recent crash.

I would also like to see some of your code timing animations with spawn(), to see what it's doing and figure out if that has any bearing on the problem. (It's generally not good form regardless, as it's better if you can let animations handle their own timing.) You could always send me the code and a list of where to find the various animations that use spawns/sleeps.
Oh. No, didn't get any crash details. I'll remove my hourly reboot and let the server run its course so I can try to secure those details.

Here's an example from the code:

/datum/move/cold_stream
name = "cold stream"
// desc = "The target is attacked with a sharp chop. Critical hits land more easily."
attribute = "ice"
category = "special"
modpower = 110
hit = 100
uses = 5

// index_number = 2

use(var/mob/monster/m)
m.cold_stream(src)

/mob/monster/proc/cold_stream(var/datum/move/move)

if(current_hp < 1)
return

if(world.time < last_animation)
return

if(world.time < last_attack + 30)
return

spawn()

if(!move.check_pp()) return
last_attack = world.time

last_animation = world.time + 25

for(var/mob/m1 in view())
m1 << output("<font color=[COMBAT_COLOR]>[get_name(src)] used cold stream!","chat2.oocoutput")


for(var/mob/monster/r in get_step(src, dir))

animate(r, color = list(null, null, null, null, rgb(50, 50, 150)), 3)
animate(alpha = 255, 25)
animate(color = null, 3)

view() << sound('59. cold stream.wav')

for(var/n = 0, n < 46, n++)
var/obj/f = new(r.loc)
f.layer = 6
f.pixel_x = rand(-96, 96) - 150
f.pixel_y = rand(-96, 96) + 16
f.alpha = 255

if(!saved_icons["ice_beam"])
var/icon/i = icon('58. Ice Beam.dmi')
i.Scale(16, 16)
i.Scale(32, 32)
saved_icons["ice_beam"] = i
f.icon = saved_icons["ice_beam"]
f.icon_state = "3"
f.alpha = 0

var/xs = f.pixel_x
var/ys = rand(-96, 96)
animate(f, pixel_x = xs + 75, pixel_y = ys / 4, alpha = 255, 2.5)
animate(pixel_x = xs + 225, pixel_y = (ys / 4) * 3, alpha = 255, 5)
animate(pixel_x = xs + 300, pixel_y = ys, alpha = 0, 2.5)

spawn(10) del(f)

sleep(0.25)

sleep(3)

for(var/n = 0, n < 5, n++)
var/obj/f = new(r.loc)
f.layer = 6
f.pixel_x = rand(-32, 32)
f.pixel_y = rand(-32, 32) + 16
f.alpha = 255

if(!saved_icons["ice_beam"])
var/icon/i = icon('58. Ice Beam.dmi')
i.Scale(16, 16)
i.Scale(32, 32)
saved_icons["ice_beam"] = i
f.icon = saved_icons["ice_beam"]
f.icon_state = "3"

var/matrix/m = matrix()
m.Scale(1.5, 1.5)

animate(f, pixel_x = rand(-16, 16), pixel_y = rand(-16, 16) + 16, transform = m, 10)
animate(alpha = 0, 0.5)
animate(alpha = 255, 0.5)
animate(alpha = 0, 0.5)
animate(alpha = 255, 0.5)

spawn(15) del(f)

view() << sound('58. Ice Freeze.wav')

sleep(1)

sleep(16)

r.take_damage(move, src)

view() << sound('Damage.wav')
One thing I can tell you for sure is that del() isn't the best way to go about this. You really have no reason not to move these objects to null for a soft delete by the garbage collector. Even better, set their alpha to 0 with a time=0 animation step at the end, and move them to null at a slightly later time with a longer sleep/spawn.

Problem signature:
Problem Event Name: APPCRASH
Application Name: dreamdaemon.exe
Application Version: 5.0.512.1470
Application Timestamp: 5ceae321
Fault Module Name: byondcore.dll
Fault Module Version: 5.0.512.1470
Fault Module Timestamp: 5ceae2aa
Exception Code: c0000005
Exception Offset: 0018f8d0
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: 3c39
Additional Information 4: 3c390ead1f6678e1205760b69cf3f2fc

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=280262

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt


Crash report.
This crash is in a different spot but also indicates that an allocation failed. Do you have any data on how much memory the process was using (according to Task Manager) at the time?
Not for that one, but I'll gather more with all of that relevant information.
Page: 1 2