ID:1713463
 
Keywords: 506, list, runtime
Resolved
List data could become corrupted in Cut(), causing later additions to fail
BYOND Version:506
Operating System:Windows 7 Pro 64-bit
Web Browser:Chrome 39.0.2171.42
Applies to:Dream Daemon
Status: Resolved (507.1265)

This issue has been resolved.
A regression in list handling was introduced in v506.1245

Please see the dme and project linked below.

Link:
https://www.dropbox.com/s/afcmlvp3arehigy/ 506_1245_regression_crash.zip?dl=0

Numbered Steps to Reproduce Problem:
1. Place byond /bin and /cfg dirs under the byond directory in the project
2. Run bugdemo_runme.bat
3. Click "packQueue" verb
4. Click "drainQueue" verb
5. Click "packQueue" verb

Expected Results:
No output is expected.

Actual Results:
spam of:
runtime error: bad list
proc name: enqueue (/datum/dynamicQueue/proc/enqueue)
source file: dynamicQueue.dm,52
usr: Volundr (/mob)
src: /datum/dynamicQueue (/datum/dynamicQueue)
call stack:
/datum/dynamicQueue (/datum/dynamicQueue): enqueue(foo (/datum/foo))

Environments in which problem occurs:
The problem occurs consistently on windows machines. I have not tested it on linux.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked?
The prior working version was 506.1244

Workarounds:
None
Did you check 506.1247? That's the final version of 506 and contains a fix to a list issue that was reported by Tobba, in which junk data was showing up in recycled lists. There have been no reports of list problems since then.

I would recommend using the beta 507 anyway. If anything it should be more stable in the internals. The only real "beta" part about it is the webclient behavior.
I have verified the issue persists in 506.1247, as well as all versions of 507.

I tried running our construction mode server on 507, but it hung hard for several minutes and then crashed after a few hours with some bad ref messages in the error log. We downgraded after that and it has been stable since, aside from some really weird list issues.
Tobba has been saying not to use 507 because of instability on linux, but I figured he was just being Tobba.

I also noticed that 507 sits at 100% cpu utilization where 506 does not.
I'll look into this list issue when I have a chance. As far as I was aware lists have been stable since 506.1247, so this is definitely news to me.

If you can narrow down which 507 builds show differences in CPU utilization, it might help show up which parts of the code contributed to that. The underlying server code really hasn't changed very much in 507.
It took a while to pin this down in our codebase -- It has been causing some minor issues in our garbage collector implementation for some time, and might be the source of some bad ref runtime errors as well...

I have a patch applied (from Tobba) to 506 that supposedly makes outgoing packets less aggressively compressed. Tobba indicated that the compression call is being made with aggressive settings, and it was causing CPU utilization to be unneccessarily high. I'm not sure if that's the case, or if he's logged a bug report on that or not.

I'll see if I can pin down which version of 507 started running hot.
Lummox JR resolved issue with message:
List data could become corrupted in Cut(), causing later additions to fail
Wow, this was a really good find. I had a hard time at first following what your benchmark was doing but once I saw the issue in action and was able to exhaustively trace what was going on, the problem finally jumped out.
Yay :) Thanks for fixing.

Is there any chance this is what was causing bad ref messages on 507?

I'll run 507 with trace on and see if i can catch the crash.
Yes, it's entirely likely this was the cause of some bad ref messages.
I believe this just randomly happened to us, I can't really provide a lot of information, sorry.
All lists exploded and popped the bad list runtimes quickly filling the log.
The byond version was 5.0 Beta Version 507.1267 on windows
In response to Aranclanos
You're likely looking at a different issue, so it would merit a new thread. However without any information to investigate there really isn't anything I can do.
Is there any kind of logging we should do if it happens again?
IS THERE ANY KIND OF LOGGING WE SHOULD DO IF IT HAPPENS AGAIN?
BECAUSE IT HAPPENS ONE YEAR LATER STILL MY DEAR
In response to Aranclanos
As I said before, you're dealing with a different issue and it belongs in a new thread; the specific problem from this thread was resolved. For logging, just listing the output from those runtime errors would be a good start.

However, if this only happens in huge projects like SS13, then I expect this is the result of var index mangling, which was partially fixed in several stages of 507--including a final fix in 507.1284. So I would recommend when 507 comes out that you compile with that and try again.