ID:120247
 
Resolved
The receiver of world.Export() was sometimes sending back messages that caused the sender to shutdown prematurely. The receiver now avoids sending such messages, and the sender is more tolerant of them.
BYOND Version:493
Operating System:Windows 7 Home Premium 64-bit
Web Browser:Firefox 7.0.1
Applies to:Dream Daemon
Status: Resolved (494)

This issue has been resolved.
Descriptive Problem Summary:
Ping topics (world.Export("byond://[ipPort]?ping")) have become unreliable, possibly due to my network being stressed. Even after establishing a supposedly persistent connection, ping topics will often return null, it does this almost instantly, which would seem to imply that the connection is not timing out, but that something is just failing somewhere in the communication process. Attempting to use this method to contact remote servers almost never returns an actual result. Using this method to determine if servers are "alive" is somewhat detrimental with such unreliable success rates, and if topics aren't getting through at all then this is even worse in a situation where vital information needs to be distributed (haven't been able to test fully test this yet, but topics in general do seem to always register, even ones sent from external sources).

On a semi-related note, it appears that IP and Ports have been removed from the &format=text information available on hubs (if it was ever there to begin with?), and that the provided server "url" is completely worthless for sending data to.

Numbered Steps to Reproduce Problem:
Attempt to acquire a "ping" from a BYOND server using world.Export("byond://[ipPort]?ping")

Expected Results:
A ping is always returned if the server is alive and the connection can be made

Actual Results:
The ping never has a 100% success rate, even on locally hosted servers (contacted using the external IP), and is almost 0% on remote servers.

Does the problem occur:
Every time? Or how often? Every when hosting on my network
In other games? Yes
In other user accounts? Most Likely
On other computers? Network stress may be relevant, it used to work on my dedicated server

Workarounds:
Unknown
Works fine for me. I never used the byond:// in world.Export, wonder if that would cause any effect.
After some more testing, it appears that this is tied to the stress (CPU and/or network usage) of the BYOND server being pinged. Pinging a lower population server not only has a much higher success rate, but returns almost instantly. Pinging one with more players will not only take longer to return a "ping", but majority of the time it will just flat out fail. Two of the servers that I tested were even hosted on the same connection, but yielded completely different results.

Based on the long logs kept on Server Manager; Server X (most populated) and Server Y (empty, but stressed by AI) fail on a regular basis, while the 18 other servers rarely fail a ping. In my tests just now, Server A (locally hosted, 15 online) has a 100% success rate, Server B (locally hosted, 30 online) has about a 50% success rate, and Server C (remotely hosted, 100 online) has a 10% success rate at best.

I still haven't tested to see if a topic is received by the server 100% of the time, even when a "ping" isn't returned.

Ocean King wrote:
Works fine for me. I never used the byond:// in world.Export, wonder if that would cause any effect.
I tried removing the byond:// as suggested. It didn't seem to make much of a difference.
I just shut down all of my servers and ran another test, it definitely doesn't seem to be related to sender-stress, but does seem to be effected by receiver-stress. Even with everything offline, and full network speed available, it still got awful results:
21 Attempts: 109 Players
8 Attempts: 108 Players
14 Attempts: 108 Players
21 Attempts: 108 Players
3 Attempts: 108 Players
38 Attempts: 110 Players
1 Attempts: 111 Players
26 Attempts: 111 Players
50 Attempts: 111 Players

I'd also like to point out, this isn't much of a concern for the actual purpose of retrieving a ping value, its what happens if that ping value is supposedly claiming the server as offline. And even more importantly, the issue that the topic data may not be reaching the servers at all. If servers aren't receiving the necessary information on the first time every time then it will be a downfall of my global distribution systems.
I pinged several of the most populated games/servers on BYOND, and got varying results. Some always returned in 1 attempt, some would take a few tries (even a single failed attempt would be problematic), and some servers responded as poorly as my own.
I have done some more testing. Apparently the topic is received by the server every time, even though a ping value is not properly returned in majority of the cases.

This at least means that I can go ahead with my systems for the most part, since the data should hopefully always get delivered. I'll just need to come up with a more consistent method of determining when the servers go offline. Possibly just a ping system that I build myself...

EDIT: After more usage, it appears that the topics aren't received every time after all.
A proper way to ping players would resolve this.
SuperAntx wrote:
A proper way to ping players would resolve this.

Assuming that that method works any more reliably than the currently built in one does... It wouldn't necessarily tell me if a specific BYOND server was online, simply if I could contact their IP in general.
What I linked was a way to measure the round-trip time for messages sent over a network. If you use it to ping a server and it doesn't time out wouldn't that meet your requirements for testing if a server is there or not?
SuperAntx wrote:
What I linked was a way to measure the round-trip time for messages sent over a network. If you use it to ping a server and it doesn't time out wouldn't that meet your requirements for testing if a server is there or not?

Just because I can ping Joe's computer, that doesn't mean that he is hosting a server of my game. Unless I could ping an IP:Port, instead of just an IP address. And even then, I'm not sure if that would necessarily indicate the presence of a BYOND server on that port, or just the general ability to establish a connection through it.
Even on my new server, which is surely not being stressed, this is still failing on a regular basis. So maybe stress isn't as relevant as I had thought.
After more usage, it appears that world topics in general just aren't received consistently. Which makes them completely worthless. Even messages sent from a PHP script on a web-server aren't received on a regular basis, let alone BYOND server-to-server communication.
This would explain why my save server would randomly result in corrupt or unsaved files for players of a game I was testing a while back. It used topic to address connection and then send the savefile. In a lot of cases no file would be sent, even though the server stress was low. I'd say 20% of all the savefiles transferred to the server either didn't make it or became corrupt, even though both instances were running locally.
Are world topics tied to the website/pager systems?
Lummox JR changed status to 'Unverified'
mob/verb/Ping()
var/list/IPs=list("70.34.194.66:2734 "="Blank","70.34.194.66:4224"="HU1","67.228.100.130:26400"="SS13","67.60.236.70:2734"="RPU")
for(var/thisIP in IPs)
world<<"Ping testing [thisIP]..."
var/gotPing
var/tryCount=0
while(gotPing==null)
tryCount+=1
gotPing=world.Export("byond://[thisIP]?ping",null,1)
world<<"<b>[IPs[thisIP]] pinged in [tryCount] tries ([gotPing])"


Within that test:
HU1 averaged about 9 tries before a result, it sometimes takes over 100 retries
Blank (which is just an empty project hosted on the same machine as HU1) always returns in 1 try
SS13 averages about 15 retires
RPU always returns in 1
All of these occasionally manage 1 try at some times, but the noted usually fail.
I tried some other connects as well. NEStalgia never seems to return at all, and Dead World seems to give some really weird results where it causes the test server to output blank messages even though it has no reason to...
NEStalgia might be blocking pings or the default world topics for all I know, so I can't say one way or the other on that. The thing is I think I need a server that actually experiences the problem to see it in action.
In response to Lummox JR (#16)
Lummox JR wrote:
NEStalgia might be blocking pings or the default world topics for all I know
Probably

The thing is I think I need a server that actually experiences the problem to see it in action.
If you still have the host/source files for Box Zombies, you could probably test it with those. It doesn't fail quite as often, and has a low retry rate when it does, but it does seems to, and even a single fail completely kills the system.
Good news! I was actually able to figure out the cause of this one. It's happening at both ends, and it's partly due to all the flicking. The receiver was sending out some messages the sender didn't care about, and the sender was freaking out. I nixed the message responsible from being sent in inappropriate cases, and the sender will now be less sensitive to unwanted messages.
Lummox JR resolved issue with message:
The receiver of world.Export() was sometimes sending back messages that caused the sender to shutdown prematurely. The receiver now avoids sending such messages, and the sender is more tolerant of them.
Page: 1 2