ID:2320212
 
BYOND Version:511
Operating System:Win Server 2012 R2
Web Browser:None (disabled for all users)
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
DreamDaemon Logs:
BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\memo.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \a\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \a\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \a\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \a\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \s\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \s\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \s\player_ckey\info.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \m\player_ckey\preferences.sav BYOND Error:(Sfile.cpp,1138) failed to open file: BYOND Error:(Sfile.cpp,1138) C:\Users\Administrator\Desktop\project_dir\data\player_saves \o\player_ckey\preferences.sav

Descriptive Problem Summary:

At some point, seemingly for no reason, BYOND becomes unable to read save files. The error you see above is repeated anytime you try to open a savefile. The problem is not fixed until a server reboot but then there are no issues. The error does not always start with that particular savefile. There doesn't seem to be any trend of runtimes coming before these errors. Sometimes the server will run without hiccup except for this and sometimes there'll be a large number of runtimes before. It's hard to gauge but this problem will happen on average once every 20 restarts at least, but sometimes occurring consecutively or spaced by much more.
I have neither seen nor heard of it happening elsewhere, but like I said I don't think that's indicative of anything.


Numbered Steps to Reproduce Problem:
1. Run server for a long time without hiccup
2. ???
3. Error happens
4. Restart

Expected Results:
A properly loaded savefile.

Actual Results:
A runtime.

Does the problem occur:
Every time? Or how often?
-I would say on average 1 in 20 restarts, but it can be consecutive or far longer in between
In other games?
-No, only in this instance of SS13. I don't think any of our devs have reported this problem, just on the remote. That being said, there are a lot more rounds on the remote than in total local tests plus its not likely that we'd notice.
In other user accounts?
-Unknown, but probably
On other computers?
-Unknown

When does the problem NOT occur?
During a majority of rounds, but still inconsistently.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
We have logs on file dating back to October 27th, with the first since then being Nov 8th, and happening seven times total from that point.
The very first time was around 6 months ago.
Here's a screencap of all the instances of "failed to open file" in the runtime logs to give you an idea of the frequency.
http://prntscr.com/hhzwy6
The entire scale is from 10/27/17 to 12/1/17.

Workarounds:
None, other than restart.


I made this at the recommendation of MrStonedOne, there's something similar in the Dev help section if that's a more appropriate place.
Failure to open files is an odd bug for sure. Are you sure there isn't come kind of possible permissions issue in play? I know it seems unlikely; just covering the bases. If for instance it reads off a network share, that going flaky could be an issue.

I'm not sure how I'd even begin to diagnose this.
If it were a permission issue or an antivirus problem wouldn't the issue be more consistent?

Is there something internal to Sfile.cpp that is persistent across all file reads? Our theory was that if there was one particular corrupted savefile it might ruin the reader until a reboot, but we couldn't find any trends in the failed reads. I'm going to write something to log dive to discern online players at the time and look for a correlation, since it's possible the troublemaker file is not reported, and only subsequent files are.

Also, I should mention that this can happen in the middle of a round, not just the beginning.
I'd hypothesize that somewhere in BYOND's back-end, the file is opened in read-write or write mode, and not closed. Whether that's because you persist a reference to a savefile and then try to open it again (in which case BYOND should probably use the existing resource), or whether it's due to a failure on BYOND's end to free up the resource when it's no longer referenced, I can't say---especially not without access to BYOND source-code.

Perhaps there's an uncommon code path that causes the resource to not be freed; i.e. if an exception occurs in the C++ back-end before the reference count is decremented. I'm reaching, but again as a bystander I can do naught but make suggestions based on the nature of the error.
Hmm, an exception might make some sense here. Are there any runtime errors in the log that line up with these situations?
Unfortunately no. We have on file at least one round where the only runtime were these BYOND errors. I’ll ask if I can give you logs access so you can take a look yourself.

Out of curiosity what does Sfile.cpp do on line 1138?
One thing to try, is to just add a WARNING() line next to everything that opens up savefiles so that next time this triggers you can actually see what files were opened up before hand.

you might also have a stack_trace() proc in _HELPERS/unsorted.dm that you can use instead WARNING() so you also get a stack trace
Occasionally the error will show up with a call stack and the likes, but it's been in different places, such that the only common factors is savefiles.

It's come from save_preferences(), save_character(), load_preferences(), open_load_dialog(), load_character(), and possibly others.

The log diving code I wrote didn't dig anything up so that's no help.

Would a try-catch actually catch this or not since it's not a normal runtime? If so, I can throw those everywhere to try and get more info.
2 updates:

Lummox contact me for some logs if you want them, runtime logs and our stuff which is a bit of a hassle to wade through.

I'm gonna write a quick something to load every single savefile to see if it breaks, but I have the impending feeling of doom that it won't be that simple.
what codebase are you running on anyways?
Colonial Marines
We have a very similar problem on the Baystation code; mentioned here: http://www.byond.com/forum/?post=2272686
In response to Xales
Xales wrote:
We have a very similar problem on the Baystation code; mentioned here: http://www.byond.com/forum/?post=2272686

Interesting. Curious - do you have an approximation of the number of save files you guys have on your server? I was hoping it was something as simple as congestion with too many files. We're having this error fairly commonly at ~50,000 save files, but when I deleted user data over a year old and got it down to ~30,000, we got the same error next round.

Next step is to just wipe all character data and start over. Best guess is that a handful of players have corrupted save files and they unintentionally break the server when they try and log in.
tg has never had this issue and we at one point had over 100k save files (most old files from back when we would store 5 save files per client before some noticed the CD command exists within savefiles.)

In response to MrStonedOne
MrStonedOne wrote:
tg has never had this issue and we at one point had over 100k save files (most old files from back when we would store 5 save files per client before some noticed the CD command exists within savefiles.)


Ok, thanks for that perspective. I guess we're on to plan B.

One other thing I wanted to add in case it's relevant:

If I join the game after this issue hits the server and no one's preferences load anymore (including mine when I join), and then I use the Debug > Reload Admins verb, all admins on the server get dumped into regular users and the server console gives an error for the admins.txt with "File is empty", even though it's obviously not.

It's literally ALL files that the server wants to load after this error hits. So, admins.txt is fine UNTIL this error happens, and then not until an attempt to reload it is made. It's like the server just gives up at some point and doesn't want to load anything anymore.

Anyway, thanks for the help troubleshooting, gents.
Just to add some more troubleshooting to this issue, it's looking more and more likely that it's due to a memory leak.

I hard rebooted the server 4 days ago and we haven't had a single one of these file errors since, when before it was multiple times a day and eventually every game round.

My best guess is that the list of players who have logged on and off fills up in the "Players" tab until it becomes too much for the server to handle, which results in these file access errors. We're able to have the server running for weeks at a time, so that list of players sucks up more and more memory and doesn't let go.

Is it possible to have the server dump the oldest player entries in order to make room for new ones? I imagine that would solve the problem.