ID:2585748
 
BYOND Version:513
Operating System:Linux
Web Browser:Firefox 68.0
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:
Dream Daemon 513.1526 running on Linux debian 4.19.0-6-amd64 suffers a complete server lock when under player load. When it occurs, dream daemon CPU usage goes to 100% and the server stops responding effectively. Any attempts to use any verbs on it are ignored (or respond extremely slowly), sending the "world.restart" command causes the server to undergo some kind of soft restart, but CPU remains at 100% and it doesn't init the world. Ambiance continues playing. The byond profiler stops logging information at its previous rate, updating sporadically with much less than it should be. The server appears to be running extremely slowly.


Numbered Steps to Reproduce Problem:
A little difficult to reproduce, because we can only have it occur when under player load. I'm not sure if it's something one of them is doing (as it doesn't always crash at the same point) or if it's a race condition, or something else.

Because it's a result of Dream Daemon locking up, we've taken a core dump of the process and uploaded it here: https://gofile.io/d/eKz38L

If you need another dump, we reproduce it fairly consistently.

Expected Results:
Dream Daemon doesn't lock or go to 100% CPU, and the .debug status and world restart commands respond as expected, at a minimum.

Actual Results:
Dream Daemon hardlocks, server commands stop responding.

Does the problem occur:
Every time? Or how often?
Consistently under high player load
In other games?
Using the same / similar codebase
In other user accounts?
Server issue
On other computers?
Server issue, yes

When does the problem NOT occur?
Low player counts

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
Don't know

Workarounds:
None known yet
I need something to work with to investigate this. Any chance you can break in with gdb and get a stack trace?
I think I need the symbols to get a stack trace, don't I? The stack traces should be within the gcore capture. However, I can try attaching to the server with gdb, although most of my experience is with windbg for this kind of work.

If the stacks aren't in the core dump, I can attach to the server when it hangs and use the backtrace command on all of the threads, though.
There appears to only be one thread - is this correct?
The stack trace is
Thread 1 (LWP 3393):
#0 0xf7be9357 in ?? ()
#1 0xf7bdfe72 in ?? ()
#2 0xf7bdfe72 in ?? ()
#3 0xf7bdfe72 in ?? ()
#4 0xf7bdfe72 in ?? ()
#5 0xf7c06ed5 in ?? ()
#6 0xf7c07fd0 in ?? ()
#7 0xf7bb5e96 in ?? ()
#8 0xf7cfa69a in ?? ()
#9 0xf7cb5c52 in ?? ()
#10 0x0804afcc in ?? ()
#11 0xf72b3b41 in ?? ()

if you want to manually translate the pointers into addresses
I'll need to know what the offset addresses are into libbyond.so. With the offsets I can try tracing things back.
I'm sorry, I'm having trouble getting the offset addresses from the core (using info sharedlibrary - it reports 'No shared libraries loaded at this time.').
I'll need to wait until it crashes again, I think.
Well that didn't take long

From To Syms Read Shared Object Library
0xf79b5200 0xf7df3848 Yes (*) target:/usr/local/lib/libbyond.so
0xf76bf460 0xf7734c48 Yes (*) target:/usr/local/lib/libext.so
0xf769c3d0 0xf769fcb4 Yes (*) target:/lib32/librt.so.1
0xf75894e0 0xf763f945 Yes (*) target:/usr/lib32/libstdc++.so.6
0xf7456170 0xf74de29f Yes (*) target:/lib32/libm.so.6
0xf74302d0 0xf74459c5 Yes (*) target:/usr/lib32/libgcc_s.so.1
0xf74125e0 0xf7421e9f Yes target:/lib32/libpthread.so.0
0xf72480e0 0xf7394776 Yes (*) target:/lib32/libc.so.6
0xf722a130 0xf722b1c4 Yes (*) target:/lib32/libdl.so.2
0xf7ee6090 0xf7f0150b Yes (*) target:/lib/ld-linux.so.2
0xf6a1d300 0xf6a23cc4 Yes (*) target:/lib32/libnss_files.so.2
0xf6a141c0 0xf6a171f4 Yes (*) target:/lib32/libnss_dns.so.2
0xf69fc3a0 0xf6a08014 Yes (*) target:/lib32/libresolv.so.2
No ./libbyond-extools.so


#0 0xf7c6178a in ?? () from target:/usr/local/lib/libbyond.so
#1 0xf7b8453e in ?? () from target:/usr/local/lib/libbyond.so
#2 0xf7b75e72 in ?? () from target:/usr/local/lib/libbyond.so
#3 0xf7b75e72 in ?? () from target:/usr/local/lib/libbyond.so
#4 0xf7b75e72 in ?? () from target:/usr/local/lib/libbyond.so
#5 0xf7b75e72 in ?? () from target:/usr/local/lib/libbyond.so
#6 0xf7b9ced5 in ?? () from target:/usr/local/lib/libbyond.so
#7 0xf7b9dfd0 in ?? () from target:/usr/local/lib/libbyond.so
#8 0xf7b4be96 in ?? () from target:/usr/local/lib/libbyond.so
#9 0xf7c9069a in TimeLib::SystemAlarm() () from target:/usr/local/lib/libbyond.so
#10 0xf7c4bc52 in SocketLib::WaitForSocketIO(long, unsigned char) () from target:/usr/local/lib/libbyond.so
#11 0x0804afcc in ?? ()
#12 0xf7249b41 in __libc_start_main () from target:/lib32/libc.so.6
#13 0x0804a801 in ?? ()
I'm not able to resolve those addresses to the actual offsets in the file. Any way you can find that out?
I think at that point I need copies built with debugging info, don't I? I don't think I built the byond running on linux myself.
No, you don't need debugging info. I just need to know the offset into libbyond.so where each frame in the stack trace is happening. Unfortunately the info above doesn't give me that; if I subtract 0xf79b5200 from the addresses in the stack trace, the addresses don't line up at all.

Alternatively, if you could get gdb to spit out what instruction bytes are around each frame, that might help me figure out what's wrong in the calculation.

According to some info I found online, this command in gdb might help:

set print symbol-filename on
Hello,

I'm sorry, the issue hasn't occurred again yet. According to one of our other devs, it may have been an infinite loop in our code.

If the issue happens again, I'll try running that command.

Thank you for looking at it so far!