We'll try to get a fix out this week.
499.1202 has been released. The primary change here is a fix to the issues found in the valgrind log. Users who have been having issues should retest with the new version.

The issues all stem from a data structure being written/read after deletion, if the connection was closed unexpectedly. The main place this showed up was in map routines, where there was some (but very little) checking for this, which I've now addressed. The other place was in hub communication.

I suspect there could be other lingering cases like this that I may have missed (most likely in completely different areas though), to say nothing of completely unrelated causes. My main concern was addressing the known cases of heap corruption.
Well test it out shortly and keep you updated. Our longest uptime without freeze / crash on 1193 is 34hours (average is 7hours) so were aiming for 48hours without corruption related crash/freeze to confirm if its working or not.

(were using 1193 because its the version that gave us the longest run, 1197,1201 tend to crash within minutes.)
Froze on 499.1202.

(gdb) attach 19361
Attaching to process 19361
Reading symbols from /usr/local/byond/bin/DreamDaemon...(no debugging symbols found)...done.
Reading symbols from /usr/local/lib/libbyond.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libbyond.so
Reading symbols from /usr/local/lib/libext.so...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libext.so
Reading symbols from /usr/lib/i386-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/i386-linux-gnu/libstdc++.so.6
Reading symbols from /lib/i386-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libm.so.6
Reading symbols from /lib/i386-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libgcc_s.so.1
Reading symbols from /lib/i386-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/i386-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libdl.so.2
Reading symbols from /lib/i386-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/i386-linux-gnu/libpthread.so.0
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libmysqlclient.so...done.
Loaded symbols for /usr/lib/libmysqlclient.so
Reading symbols from /lib/i386-linux-gnu/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/librt.so.1
Reading symbols from /lib/i386-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libnss_files.so.2
Reading symbols from /lib/i386-linux-gnu/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libnss_dns.so.2
Reading symbols from /lib/i386-linux-gnu/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/libresolv.so.2
0xf6e34c50 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0 0xf6e34c50 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xf754e7ac in ProtoStrCompSigned(ProtoStr*, ProtoStr*) () from /usr/local/lib/libbyond.so
#2 0xf7461a2a in DelProtoStr(unsigned long) () from /usr/local/lib/libbyond.so
#3 0xf74ec25e in ServerResizeClientMap(unsigned short, long, long, Value) () from /usr/local/lib/libbyond.so
#4 0xf74b6c08 in ObjectWriteVar(Value, unsigned long, Value) () from /usr/local/lib/libbyond.so
#5 0xf74d6666 in ?? () from /usr/local/lib/libbyond.so
#6 0xf74c55d1 in ?? () from /usr/local/lib/libbyond.so
#7 0xf74d31a9 in ?? () from /usr/local/lib/libbyond.so
#8 0xf74d327b in ?? () from /usr/local/lib/libbyond.so
#9 0xf7569b87 in TimeLib::SystemAlarm() () from /usr/local/lib/libbyond.so
#10 0xf753aeaa in SocketLib::WaitForSocketIO(long, unsigned char) () from /usr/local/lib/libbyond.so
#11 0x0804a41e in ?? ()
#12 0xf6d1a935 in __libc_start_main () from /lib/i386-linux-gnu/libc.so.6
#13 0x08049f01 in ?? ()
well that was quick

GDB



0xb75ae7b0 in ProtoStrCompSigned(ProtoStr*, ProtoStr*) () from /usr/local/lib/libbyond.so
(gdb) bt
#0 0xb75ae7b0 in ProtoStrCompSigned(ProtoStr*, ProtoStr*) () from /usr/local/lib/libbyond.so
#1 0xb74c1e8f in ?? () from /usr/local/lib/libbyond.so
#2 0xb74e3d50 in MsgReadArg(unsigned short, Value, Value*, unsigned short, NetMsg&, unsigned long&) () from /usr/local/lib/libbyond.so
#3 0xb74e411c in ?? () from /usr/local/lib/libbyond.so
#4 0xb74f74fa in ?? () from /usr/local/lib/libbyond.so
#5 0xb74f85ff in ServerHandleMsg(NetMsg&, unsigned short, unsigned char) () from /usr/local/lib/libbyond.so
#6 0xb754506c in ServerSideLink::HandleMsg(NetMsg*) () from /usr/local/lib/libbyond.so
#7 0xb75983c6 in ClientSocket::ReadMsg() () from /usr/local/lib/libbyond.so
#8 0xb7597fa8 in ?? () from /usr/local/lib/libbyond.so
#9 0xb759b31f in SocketLib::WaitForSocketIO(long, unsigned char) () from /usr/local/lib/libbyond.so
#10 0x0804a41e in ?? ()
#11 0xb6ddabd6 in __libc_start_main (main=0x8049fb0, argc=8, ubp_av=0xbfd46874, init=0x804bc20, fini=0x804bc10, rtld_fini=0xb78f1080 <_dl_fini>, stack_end=0xbfd4686c) at libc-start.c:226
#12 0x08049f01 in ?? ()






sigusr2



*************************************
Caught SIGUSR2, printing diagnostics:

Server port: 2506
Server visibility: invisible
Server reachable by players: yes

...

server mem usage:
Prototypes:
obj: 869092 (6392)
mob: 871380 (143)
proc: 8326028 (14771)
str: 3861757 (78563)
appearance: 5517901 (9858)
id array: 8348840 (28317)
map: 1590606 (255,255,7)
objects:
mobs: 98944 (87)
objs: 13689964 (52448)
datums: 5541232 (52448)
lists: 10393288 (286412)

Backtrace for BYOND 499.1202 on Linux [old glibc]:
Generated at Wed Aug 7 20:04:33 2013

DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a8fe]
libbyond.so 0x340790, 0x34079e
[0xb78e2000, 0xb78e2410], [0xb78e2000, 0xb78e2410]
libbyond.so 0x340790, 0x34079e
libbyond.so [0xb726e000, 0x0], 0x253e8f
libbyond.so 0x275c30, 0x275d50
libbyond.so [0xb726e000, 0x0], 0x27611c
libbyond.so [0xb726e000, 0x0], 0x2894fa
libbyond.so 0x28a580, 0x28a5ff
libbyond.so 0x2d6fc0, 0x2d706c
libbyond.so 0x32a0f0, 0x32a3c6
libbyond.so [0xb726e000, 0x0], 0x329fa8
libbyond.so 0x32cc40, 0x32d31f
DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a41e]
libc.so.6 0x16af0, 0x16bd6 (__libc_start_main)
DreamDaemon [0x8048000, 0x8049b40], [0x8048000, 0x8049f01]

End of diagnostics.
*************************************

Interestingly enough it didnt freeze on any line of the dmb this time. but the freeze is definitly still here and was just about as quick to happen as 1201

We reverted back to 1993 since well, this one is just unusable right now (same as 1201), Which would mean that either the error we caught in valgrind are still here, or there is something else going on.

It would be great if that infinite loop could somehow be caught and handled, i don't have the code so i cant really guess what the structure is, but if its something relatively contained, a few checks sprinkled around to see if its corrupted (a loop counter with a limit in the millions strings or gb if its on a per byte loop would probably do the trick), could help.

Since its pretty obvious the memory is still allocated (application end up in an infinite loop and not a segfault) knowing at which point it end up corrupted would probably help you a lot.

I'm worried that the corruption come from something the player do on the server, so valgrind has very little chance of catching it since its unplayable when valgrind is running.</_dl_fini>
Hell, id rather it asserted or segfaulted when that infinite loop happen, than just stay here frozen (at least with a proper crash auto restart can happen right away).

Right now, im stuck using a "stillalive" file that the game touch every now and then, and to kill the process if it hasnt been touched for x minutes. But its not exactly as reliable
In response to Jey123456
I wonder why yours manages to catch SIGUSR2 - mine doesn't catch anything while frozen.
i have -unsafe_diag enabled
Well rats, that was disappointing. I think I'll need more valgrind info to go on, then, because the issues your log brought up should all be taken care of in this build.
valgrind is not an option really. Its unplayable, and its most definitly player induced. I cannot tell exactly what action since it doesnt freeze right away, but i had the server empty for days in a row without freeze / crash.
So is it clear that the issue occurred between 1193 and 1194, or can that be isolated?
1197 works a treat here..
In response to Tom
The freeze mentioned in my original post occurred in a 498 daemon compiled by 498 dreammaker, after the release of 499. I had never experienced a freeze until roughly the 499 release.

I can't be sure whether it was coincidentally some update I made to the game at the time, though.
no 1193 also have the problem so does 1197 just not at the same extreme.

1193 will often work fine for quite a few hours in a row, where as 1197 generally only last 1-2 hours max, and 1201/1202 never seen past 30 minutes yet in my tests.

but all 3 share the same exact freeze location in debugger (the string infinite loop)
I am unsure why 1197 only lasts 1-2 hours for you most of BYONDPanels servers are on 1197 especially all new ones with the exception of eternia on 498.1158 (I think) haven't had any complaints thus far about freezing or any other issues, some clients including a new one had over 40 players.

Each server is sporting Ubuntu 12.10 32Bit.
believe me i am as unsure as you are heh. 1193 is the one that proved to give us some form of stability (but even then, it freeze now and then). 1197 was definitely worse and 1201-1202 are plain simply unusable.
I still don't get how such a dramatic difference could be seen between versions of 499. The server code was touched very little throughout that process. Hub connection code was changed a bit, but the one case of possible heap corruption we identified there has been taken care of. I'll look over the hub code some more though to see if there's anything there that could remotely be an issue. That's pretty much the only place I can expect to find significant changes after 1193, since most of the beta changes affected the pager only.

Obviously the core problem predates 499 since the original report was for 498; the sources of possible corruption I've already addressed would fit that description. It could be there are new sources of corruption in newer 499 builds, but I think a more likely explanation is that something is merely exacerbating the existing issue. I don't know what else could be screwing up the heap, though, which is why logs from a tool like valgrind are so critical. That can catch heap corruption as it happens, rather than after when it's too late.
If this helps at all, my friend was attempting to connect but was returned with "Connection failed". In the logs it was clearly shown that he had disconnected and connected within a second, which was approximately immediately to half a second before it crashed.

(This was before the new build, but we are still crashing, and this is what happens most of the time.)

PS: Server's Sigusr2 returns that the server got stuck at mob/Stat, but not on the mob that connected and disconnected.
Maybe there's another issue related to the map-sending problem, then. I'm looking into ways that I can mitigate any issues like this across the board.
Sun Aug 18 00:24:38 2013
World opened on network port 1213.
Welcome BYOND! (4.0 Public Version 499.1197)
The BYOND hub reports that port 1213 is reachable.
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:160)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:161)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:168)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:169)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:170)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:171)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:174)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:175)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:266)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:269)
BUG: Bad ref (6:40461) in DecRefCount(DM spell_tree.dm:270)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/16/kai7max.html (current directory is /home/dmb/wano0132)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/16/kai7max.html (current directory is /home/dmb/wano0132)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/kai7max.html (current directory is /home/dmb/wano0132)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/kai7max.html (current directory is /home/dmb/wano0132)
BUG: Bad ref (6:40461) in DecRefCount(DM summon.dm:33)
BUG: Bad ref (6:40461) in DecRefCount(DM summon.dm:33)
BUG: Bad ref (6:40461) in DecRefCount(DM savefiles.dm:311)
BUG: Unexpected hub certificate (65535)
BUG: Unexpected certificate (6)
BUG: Failed to decode message 54,5
BUG: Network connection for Odensity shutting down due to read error. (2,1)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/08/18/theshadowone.html (current directory is /home/dmb/wano0132)
BUG: File not found: /home/dmb/wano0132/logs/chat/2013/07/24/lokus.html (current directory is /home/dmb/wano0132)
Mon Aug 19 20:38:04 2013
World opened on network port 1213.
Welcome BYOND! (4.0 Public Version 499.1197)
The BYOND hub reports that port 1213 is reachable.
BUG: Crashing due to an illegal operation!

Backtrace for BYOND 499.1197 on Linux:
Generated at Tue Aug 20 00:24:03 2013

DreamDaemon [0x8048000, 0x0], [0x8048000, 0x804a8ce]
libbyond.so 0x2fa610, 0x2fa631
[0x60b000, 0x60b600], [0x60b000, 0x60b600]
libbyond.so 0x2fa610, 0x2fa631
libbyond.so 0x25f520, 0x25f958
libbyond.so 0x262d80, 0x262de1
libbyond.so 0x262f90, 0x26306f
libbyond.so [0x60c000, 0x0], 0x29235b
libbyond.so [0x60c000, 0x0], 0x2c5f6a
libbyond.so [0x60c000, 0x0], 0x2b4c16
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so [0x60c000, 0x0], 0x2c6a52
libbyond.so [0x60c000, 0x0], 0x2b122c
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so [0x60c000, 0x0], 0x2c68ec
libbyond.so [0x60c000, 0x0], 0x2b122c
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so 0x2c7730, 0x2c77fc
libbyond.so 0x261140, 0x261596
libbyond.so 0x2a7400, 0x2a7981
libbyond.so [0x60c000, 0x0], 0x2c7506
libbyond.so [0x60c000, 0x0], 0x2b6481
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so [0x60c000, 0x0], 0x2c68ec
libbyond.so [0x60c000, 0x0], 0x2b122c
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so [0x60c000, 0x0], 0x2c6a52
libbyond.so [0x60c000, 0x0], 0x2b122c
libbyond.so 0x2c5360, 0x2c546b
libbyond.so [0x60c000, 0x0], 0x2ce972
libbyond.so [0x60c000, 0x0], 0x2b1132
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so 0x2c7730, 0x2c77fc
libbyond.so [0x60c000, 0x0], 0x2c90e9
libbyond.so [0x60c000, 0x0], 0x2cea31
libbyond.so [0x60c000, 0x0], 0x2b1132
libbyond.so 0x2c5360, 0x2c546b
libbyond.so 0x2c54e0, 0x2c5593
libbyond.so 0x2c7730, 0x2c77fc
libbyond.so 0x273b60, 0x273c73
libbyond.so [0x60c000, 0x0], 0x2751da
libbyond.so [0x60c000, 0x0], 0x2883ca
libbyond.so 0x289450, 0x2894cf
libbyond.so 0x2d5e60, 0x2d5ef9
Page: 1 2 3 4 5 6 7 8