ID:2204917
 
Resolved
An error in the way a structure was aligned caused the server to believe objs and mobs known to the client had changed even though they hadn't, resulting in increased network traffic in some games.
BYOND Version:511.1369
Operating System:Windows Server 2012 R2
Web Browser:Chrome 56.0.2924.76
Applies to:Dream Daemon
Status: Resolved (511.1370)

This issue has been resolved.
Descriptive Problem Summary:
We updated byond on the server from 1364 to 1369 on Jan 22 at 5am GMT

Keep that in mind you look at this graph:



(note, for outgoing traffic, lower means more traffic).

That big change is between 5am and 6am on the 22nd. 10Mbit/s to 56Mbit/s the exact hour I updated byond.

I'll have somebody from /vg/station come in here, I know they had an issue trying to update, i think to 1367 or 1366.

Incoming also increased. 700kbit/s to 1000kbit/s
Well that sucks. Any chance you can narrow down the exact build where this occurred? Otherwise I'm not sure what to go on.

Networking shouldn't have really changed in any significant way between those versions, so my thinking is maybe something changed on the server end (since it appears to be the server) as far as determining how and when to send certain data.
This could be the cause of the excessive lag our players saw on Goonstation when we upgraded from 1364 to 1366. I never really had a chance to delve into it but a downgrade fixed it for all users.
Hmm. Looking just between 1364 and 1366, I did find one issue that looks like it could have some kind of relationship with this problem. That would be id:2177680, affecting how the server perceives HUD changes. However, that's a very limited change, and all it did was add refcounting to the appearance IDs for HUD objects, making appearances possibly stick around longer and reducing churn. (Even then, it would only have an impact in rare cases.) It doesn't look like a likely candidate for causing increased network activity.

It is vitally important that we narrow down what version this started in. Only that will give me enough information to hunt this issue down.
I'll run a test round on the server with 1365 after work today to see

To confirm this, i checked stats and traffic did go back down after revertting to 1364 so it wasn't the rare/unlikely scenario of bad timing with a dos, or some game code change.
It would also help, if you can get this data, to know if there's a certain point where this happens or if it basically happens as soon as the server starts up.

I really, really doubt id:2177680 is the cause, but knowing the exact build where this occurred will at least let me go through a diff and find out the full list of code changes in play.
This could be the cause of the excessive lag our players saw on Goonstation when we upgraded from 1364 to 1366. I never really had a chance to delve into it but a downgrade fixed it for all users.

Yes, ours saw the same lag.

511.1365 is clean. it's a change between 65 and 66 that caused it.
I can't take the server down too often as the players hate it, and this isn't really easy to see otherwise.

Tomorrow after work I'll try 1366 to confirm, but its likely if you looked at 1366's diff it would stand out long before i get that chance.
I confirmed it locally by connecting with my lan ip to dd, then just looking at the processes network usage (I used process explorer) It's almost 15x.

1365 does not have the issue.

1366 does have the issue.

This is disappointing because i'm not actually seeing anything in the published changelog for that release.

Dream Daemon
Assignments to undefined vars always printed the error to world.log instead of sending the error message as in a normal exception. (Kamuna)

Dream Seeker
Images did not properly obey the KEEP_TOGETHER behavior of their parent atom in some cases. (Kamuna)
The only server things jumping out at me in the diff are that undefined var message, and a change to the way mapped objects (movables known to a client) are handled. The latter was changed to pull out some legacy code and avoid a crash that was happening in some weird, rare cases. Removing that code should not have caused this issue, but I'll double-check and see if maybe I'm missing something.

[edit]
Hrm! I actually right away found something interesting: There's a comparison done that checks pixel offsets, and it should be using an 8-byte alignment--but 1366 took out a var that could potentially have impacted alignment. I wonder if objs are being believed to have different pixel or step offsets when they don't, as a result of this.
the rate did seem to very sharply go up if i was near lots of objects compared to just space turfs
Lummox JR resolved issue with message:
An error in the way a structure was aligned caused the server to believe objs and mobs known to the client had changed even though they hadn't, resulting in increased network traffic in some games.
oh hayzuz that made a big diff
Indeed a good find.