ID:1529017
 
BYOND Version:504
Operating System:Linux
Web Browser:Firefox 31.0
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
Its not uncommon to see the server going down with this message now
BUG: Finished erasure with refcount=1 (ref=5:1) DM (:0)
BUG: Crashing due to an illegal operation!


The refcounting error isnt always coupled with, but seems to have a chance to cause, an illegal operation. Our FreeBSD servers on the same build dont seem to suffer from the crash at all (I've seen it happen and crash involving things other than clients however, but that seems to be a separate issue).


Is it also possible to have calls check the reference they're giving before they go haywire?
Refcount errors have never been uncommon in large worlds.

Could be a leak or something that never got caught.

If you compile with the debug flag, it should you tell the file and line where that error stems from.
It is compiled with debugging, and it does not.

Theres another much rarer issue where something still inside a turfs contents gets GC'd leading to a catastrophic failure, but thats a separate issue, same thing there though.
ref 5 is a client object.

At some point, Lummox said the cause behind those could be the usage of skin procs in weird places like client/New(), or relating to world/Export().

Maybe add some debug outputs & timestamps right before anything like that, if possible?
There are winsets in client/New behind a spawn(50) for the lack of a better place to put that, world/Export is also used but I dont think thats the issue

Of course I know 5 is a client objects, that why I said client in the OP

winset() should never be in client/New() because the user is not fully initialized at that point; they should be in Login(). Nevertheless I wouldn't expect a refcount error as a result.

If you can narrow this down in any way it would be extremely helpful. I've been chasing the client refcount issue for ages and while I've always suspected winset() and related routines, I've never had a solid lead.
Lummox, since these errors are related to clients...

Would a user's Options and Messages output anything relating to this if they have client debug enabled? Or not, since attempted erasure could only happen after they've actually disconnected, assuming it's intended?
In response to Lummox JR
Lummox JR wrote:
winset() should never be in client/New() because the user is not fully initialized at that point; they should be in Login(). Nevertheless I wouldn't expect a refcount error as a result.

If you can narrow this down in any way it would be extremely helpful. I've been chasing the client refcount issue for ages and while I've always suspected winset() and related routines, I've never had a solid lead.

Moved it to Login, lets see if this helps
In response to Super Saiyan X
Super Saiyan X wrote:
Would a user's Options and Messages output anything relating to this if they have client debug enabled? Or not, since attempted erasure could only happen after they've actually disconnected, assuming it's intended?

If you're the host you should get these messages, but not if you're connected to a remote server. If the host disconnects (if hosting in DS) then the game is over, so this would never be an issue.

In the case of this bug, "Finished erasure with refcount 1" means the client was deleted, but a reference to it was hanging around somewhere but never found.
I killed off the winset in client/New and the refcount errors are still occuring
We're getting a very similar crash issue on Linux:
Tue Mar 25 09:29:43 2014
World opened on network port 8000.
Welcome BYOND! (5.0 Version 504.1232)
The BYOND hub reports that port 8000 is reachable.
BUG: Sequence number 55AA expected but 42 received
BUG: Unexpected hub certificate (65535)
BUG: Finished erasure with refcount=1 (ref=5:8) DM (nanomanager.dm:223)
BUG: Bad ref (5:8) in DecRefCount(DM nanomanager.dm:223)
BUG: Unexpected hub certificate (5)


The referenced line is this:
client << browse_rsc(file(path + file)) // send the file to the client

https://github.com/Baystation12/Baystation12/blob/master/ code/modules/nano/nanomanager.dm#L223

E: And that proc is called here, inside client/New(). Hmm.
I'm suspecting this is related to file references randomly breaking during client logout, its pretty hard to replicate however (logout messages get sent to the world instead of a special log file completely randomly)
Huh, I think I've finally made some headway on understanding this. I believe what's happening is that the reference in question, normally sitting happily on the proc stack, has been popped at this point but is in a temporary var that does not get scanned. The client is being deleted when the output is attempted (because of a disconnect), but the ref that has yet to be decremented is still around temporarily. However, this situation shouldn't lead to a crash. The "Bad ref in DecRefCount" error, however, is expected. In this case, the bug messages are more or less garbage; they can't cause a real problem.

On the plus side, I think this might still help me understand the crash in light of what I've learned here and the crash message that started this thread. As a followup, I'd like to know, Tobba, if your project has debug info compiled in--which would be required for the file and line number info. If it doesn't, I would highly recommend turning debug info on, so that the spot of the crash can be found. The only other time I see :0 as file/line info is when it's a proc defined in stddef.dm, but there aren't many of those and none of them perform output to a client.
The project is already built with debug info
In response to Tobba
Okay, good to know. That would suggest the culprit is somewhere outside of normal proc code, where it's trying to send a message but failing. Map procs have traditionally been one place this could happen, but I've gone over those quite a lot and would be surprised if any straggler issues remain there.

Is this in threaded mode or no? What I'm seeing in the code actually suggests this would be less likely to happen with threads in play.
Servers runs 504, so no
This is a rare variation
BUG: Finished erasure with refcount=2 (ref=5:15) DM (:0)


It seems the refcount errors are sometimes accompanied by several network errors, usually the server reading the sequence 0042 (sometimes 0043, occasionally 00DB) and then failing due to a bad sequence number, but sometimes compression issues (sometimes even resource batch creation failures), these network errors are ALWAYS accompanied by the refcount error (although not necessarily the other way around)

I'm guessing this is somehow tied into the server barfing logs into the world too
Our servers just started to silently crash without any message or dump, almost always (only one instance where I havent observed it), this has been at the end of the log
The most recent beta (506.1241) included some changes to the connection sequence that could possibly have fixed some problems of this sort, but I'm not convinced this is happening only at login.

This issue seems to only impact certain games. I am curious what's going on in client/New(), as doing operations like winsets there has typically been problematic in general, and there might be some clues to be had.
I'm fairly sure the servers updated to the latest beta last night and they both crashed this morning

I made some changes so the code to strip the HUD off a player in Logout wont run on a disconnection, I'm suspecting the crashing comes from doing too complicated things in Logout, since when a player disconnects the server momentarily goes haywire
Page: 1 2