ID:101443
 
BYOND Version:475
Operating System:Linux
Web Browser:Firefox 3.6.8
Applies to:Dream Daemon
Status: Deferred

This issue may be low priority or very difficult to fix, and has been put on the back burner for the time being.
Descriptive Problem Summary:

GOA uses a lot of Export communication between multiple servers, for loading savefiles and communication. It can send things rapidly enough that the server becomes marked as a Denial of Service attempt and eventually disconnected.

In addition (and possibly related) the following message is spammed in logs of both servers as soon as they connect:
BUG: Failed to decode message 0,1

Numbered Steps to Reproduce Problem:
1) Start GOA save server
2) Start GOA game server
3) Watch as people logging in cause masses of Export calls between the two.

Code Snippet (if applicable) to Reproduce Problem:


Expected Results:

Actual Results:

Does the problem occur:
Every time? Or how often? Every time
In other games? Probably any game that uses Export to such a large extent, and is fairly popular.
In other user accounts?
On other computers? Really only from the main servers, but it's probably related to the sheer volume of Exports that get sent as people login so a simple test isn't very easy.

When does the problem NOT occur?
In earlier BYOND versions

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
The last version on the servers before update to 475 was 472, which worked fine. If needed I could try the in between versions.

Workarounds:
Use an older BYOND version, or delete ~/.byond/cfg/hostban.txt and restart the servers on that machine to cause it to connect. None known for BUG: spam.
Gah. It's hard to find an adequate point at which DoS attempts can be properly detected without making the detection utterly worthless.
For GOA's purposes the ability to disable it either compile-time or when starting Daemon would probably be sufficient. Relatively few other games have the combination of such reliance on Export/Topic communication and sheer amount of players that causes it to be an issue.
That makes sense. The more ideal solution of course would be for GOA to avoid sending so many contacts to its savefile server at once, but I appreciate that that's probably not a trivial change.
It may be possible to group together a few of them, but most of the calls are pretty much necessary. There's currently about 4 Exports involved in logging in to loading a character. The game can avoid 3 of those if it is directly connected to the database, but that's not really a simple solution as there's no guarantee that the servers will always be able to directly access the database. (And it's disabled on current game servers as for some reason they have a very slow connection to the database even though it's on the same machine as the one they're contacting...)

After the initial rush of people logging in to a server as soon as it starts up, the amount of calls being thrown around ought to go down a bit, but 60+ people logging in in less than a minute causes a nice amount of Export calls.
What I mean about the grouping is, do those calls have to be sequential or can they be bunched in some way that multiple queries could be grouped into the same topic, and multiple responses sent back grouped together? If instead of making the calls directly, there was some kind of world.Export() manager in place, you could let requests collect within some small time period (like a few ticks) and then send them out as a batch. In theory the hardest part about that is just handling the muxing and demuxing, since once those routines are ready you could easily slap them in in place of your world.Export() calls and as an extra block within the regular world/Topic(). I'm not suggesting the whole process is simple of course, but it can be done.
It's definitely possible, since I've already gotten most of it rather abstracted behind a few calls, since it does have to have those separate modes for talking directly to the database or to another server. Most of the time the responses are direct returns from the other side's world/Topic() though -- I don't respond by sending an Export back in most cases. (This is just a slight difficulty though, it just means I'd need to take care of separating the data back in a slightly different way.) I'll see what I can manage with that, it should at least help keep down the number of requests during the initial login spam.
Just to make sure I have the right idea:

GetCharacterNames(key)
if(!_char_name_requests_cache)
_char_name_requests_cache = list(key)
else
_char_name_requests_cache += key

if(!_requesting_char_names)
_requesting_char_names = 1
sleep(20)
var/char_name_requests = _char_name_requests_cache
_char_name_requests_cache = null
_char_name_response_cache = params2list(SendInterserverMessageTopic(server_address, "get_chars", list("keys" = char_name_requests)))

else
while(!_char_name_response_cache || !_char_name_response_cache[key])
sleep(10)

return _char_name_response_cache[key]


SendInterserverMessageTopic is just a simple wrapper around Export that assembles the topic string and adds some security measures.

(Edit: My first version had some issues. This one might end up returning old data if you call it for the same key twice in a row, but it's not likely to change that often so.)
If I follow your code correctly, I believe that should do the trick, yes.

One option you have to avoid the old data problem is to clear each key out of the cache whenever the cached value is used, assuming this doesn't get called very often for the same key. Since you only need the response cache while waiting for a response for each key, that should solve the issue.
Ok, so after putting that one in it doesn't appear to be pushing enough to trigger it on the save server, although people are being a bit slower than normal logging in. I still have a few other things I need to apply the same idea to, and it appears that some of the initial data from the save server coming in is registering as a DoS from the game server. I should be able to group all that initial data into a quick single request, but as far as I can tell it should only be 4 Export calls, even though they all happen concurrently.

Also, I'm not seeing the BUG: Failed to decode message anymore.
...and of course, the next time I reboot to merge together that initial information the BUG: Failed to decode message spam comes back. At least I didn't see the suspected DoS message.

Edit: Hmm, that's strange. The hostban.txt came back on both machines even though I checked beforehand for it. There's definitely no logs about suspected DoS from the save server.

Edit 2: ...but now there are? Gotta love how strange this is. There should be a maximum of two Export calls from the saves to the game server during this initial start-up when it complains about detecting a possible DoS... Hmm, maybe weird timing circumstances could cause more. The game takes about a minute to load up, so maybe all the ban check topics are piling up if people try to login while it's busy loading the map and such...
Ok, so I think I've got an idea of WHY exactly this is happen.

The game takes a few moments to load, as I mentioned in the previous comment.

During this time, people could be trying to login -- it seems to get stuck on connecting from the client's side, at least for a bit.

Each of these login attempts probably causes a call to IsBanned, which (unless the server has direct access to the database) calls an Export out at the save server. Since these are potentially happening before the initialization of everything is complete, they happen before the game has given it's information to the savefile server. Then the savefile server, since it hasn't received that information yet, sends an Export back asking for it (currently every single time until it does get some -- I'll have to fix that so it waits at least a few seconds between). Those requests in turn each cause another Export back to the saves from the game, and each of those (again, pretty much by an oversight) results in yet another Export back to the game.

So this initial set of ban info requests happening due to people trying to connect while the server is still loading the maps and calling New procs could potentially balloon into a giant storm of Export calls back and forth for each one, very rapidly.

I'm definitely going to fix this, most likely by just denying the login attempts or delaying until the initial registration the the server is done.
It sounds like even if the DoS check was disabled you'd still have a storm of connections back and forth between the main server and the savefile server. Having some kind of intermediary on that would definitely be a big help in any case.
Yeah, I just fixed the cause of those. It now sets a var 1 second after the server is fully setup, and pauses IsBanned checks until it's set, plus the save server now only asks for server information at most once a second, and only resend the initial information if the server wasn't already listed. There's still a small issue related to the OpenPort fix that's causing it to register twice, but that's not a big issue compared to what it was doing.

I didn't see DoS detection after fixing that.

As for the Failed to decode messages: Given when they appeared it seems like they only appear when the connection was blocked by hostban.txt
Moving this to deferred, since the original poster found an acceptable workaround and no other users have mentioned an issue with the DoS detection.