ID:114665
 
BYOND Version:486
Operating System:Windows 7 Home Basic
Web Browser:Chrome 12.0.742.100
Applies to:Dream Seeker
Status: Unverified

Thus far we've been unable to verify or reproduce this bug. Or, it has been observed but it cannot be triggered with a reliable test case. You can help us out by editing your report or adding a comment with more information.
Descriptive Problem Summary:
The new re-connection thinks you're disconnected waaaay to fast, if the server is experiencing some lag it will give everyone the message (I'm DEFINITELY sure it isn't the server disconnecting people, it's BYOND itself).

Numbered Steps to Reproduce Problem:
1. Login to a BYOND game that may have lag spikes every now and then.

2. Randomly get some lag spikes.

3. Get the reconnecting in 10 seconds message.

Expected Results:
drifting through the lag instead of disconnecting everyone all the time randomly.

Actual Results:
reconnects some people on the server

Does the problem occur:
Every time? Or how often? almost every time, it depends how laggy the game is of course.
In other games? Yes
In other user accounts? Yes
On other computers? Yes

When does the problem NOT occur?
Before the "reconnecting in 10 seconds" BYOND update.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? It didn't happen in V484 (I think) and any BYOND before that.

Workarounds:
Get annoyed by the stupid reconnect message and let it reconnect you.
We'll need examples of some games that have the issue in order to run some tests, but I'm 99% certain there is no bug here. The games that you think aren't disconnecting people probably are; the code for reconnection is only activated in the event of an unexpected disconnect.
You can try byond://108.11.215.86:12201 which is Naruto GOA, it happens there and its only been doing the reconnect thing since a few updates ago. Also when normally reconnecting to the game the time out for it to do the 10 second thing is really short and it'll keep doing it too.
It's possible you may have been seeing the behavior documented in issue 3191. Please let us know if this still occurs in 487 and if you have a test case to clearly demonstrate the problem.
I can verify this behavior. Earliest working byond version which does not seem to suffer is 485.

486 - 488 all have the issue. During initial connection, after roughly 10 - 90 secs (varies) connection is dropped. Nothing seems to be wrong, I checked logs and there is simply nothing.

During reconnection, seeker reports a read error followed by a connection failed message.

Rarely it may happen a second time, but if connection survives for at least 3 minutes, it seems the disconnection problem does not happen anymore.

I thought maybe 488 fixed it with mention of the DmiFontsPlus library but the issue is still there. :(
A read error in the connection is unlikely to be related to any of the new features. The only thing I can think of as relevant in 486's notes is that there was a change to browse_rsc(), but that should have zero impact on the actual network data, which is where a read error would occur.

It would help to know what specific games or servers you saw this in, so we could narrow down if it only happens in specific games and perhaps find out if certain ones were responsible for the issue.
I exported a wireshark log (TXT format) which I'll send to you in email. But to summarize consistent behavior, I am only slammed in 1 of two conditions:

- At least 4 hosting hours pass after issue, and seeker reconnects.
- On initial dream daemon startup and trying to connect.

After about 90 seconds, the observed packets show:
DD: (FIN, ACK)
DS: (ACK)
DS: (FIN, ACK)
DD: (ACK)
DS: (SYN, ECN, CWR)
DD: (SYN, ACK, ECN)
DS: (ACK)
DD: (FIN, ACK)
DS: (ACK)
DS: (RST, ACK)

The exchange is very quick and disconnection happens internally without warning even when the connection is idle. I hope that maybe others might do some testing as well and you collect enough data to identify where this nasty critter is hiding.
I'm not sure the packet exchange can tell me much of anything. Assuming there is a bug here, the only way for me to get to the bottom of it is to isolate the behavior that causes the network issue, presumably caused by something in 486. So for instance I'd need to know more about what kinds of prcs the server was using, whether browse_rsc() was in play, etc.
Ok just compiled the simplest of all programs:
mob/Login()
src << "Hello world"


Hosted on the linux box, opened port 7777, and waited.
After 16 - 17 seconds I was disconnected. Output is as follows:

Hello world
Connection failed.
Reconnecting in 10 seconds...
Network connection shutting down due to read error.
Connection failed.
Interesting. Did the server output any error messages from that run?
No Lummox the log is dead empty:

ghtry@aondcrey:~/host/hello$ cat hello.log
ghtry@aondcrey:~/host/hello$ stat hello.log
File: `hello.log'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 801h/2049d Inode: 9330888 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ ghtry) Gid: ( 1000/ ghtry)
Access: 2011-07-12 12:01:21.000000000 -0500
Modify: 2011-07-12 11:43:03.000000000 -0500
Change: 2011-07-12 11:43:03.000000000 -0500

The version of byond this was tested on is the latest 488.

Server uname -a reads:
Linux aondcrey 2.6.32-5-686 #1 SMP Wed May 18 07:08:50 UTC 2011 i686 GNU/Linux
A little update,

I tested versions 486 487 and 488 on the wifes win 7 32 bit, and I was never disconnected. The issue seems to be linux related. Maybe the network profiling code or such but definitely a linux hosting issue from testing here.

The disconnection is somewhat random but if needed I could try my best to pinpoint which byond version is causing the issue. I just figured I'd mention that when hosting on the windows dream daemon, the issue does not exist.
Curious. I don't think any of our UNIX networking code has changed in quite some time, but I'll refocus on what networking stuff might be different in 486 (I doubt any is though).

[edit]
486 changed absolutely nothing in networking, except in our Flash code.
I got your packet log in my email finally and have been going through it, but it's hard to tell what it's saying. Unfortunately because the packet logger is including the TCP/IP headers, it's more difficult to diagnose the actual content.

If I'm reading the log correctly, the server's last transmission is a 515-byte message (actually it's six BYOND messages) in frame 55. The client is then sending a 0-byte response in frame 56, which looks like just an ACK.

The only thing I'm really seeing as a possible clue here is that in frames 57-60 there's some DHCP activity going on, which would tend to suggest one of the machines--I assume the server--is being reassigned to a new IP address. That would make perfect sense if you were on a wireless connection that got interrupted. Otherwise I'm not sure what would trigger the DHCP request but I feel it's pretty likely that's what's killing your connection.
This puzzles me and I must respect your reasoning. What I can not understand is why earlier versions seemed to work.

The network here has a very interesting setup which previously never caused any issues. The computers behind the router have static IPs but indeed running the packet logger for well over an hour with nothing else running, there is an endless stream of DHCPv6 protocol packets.

I'll investigate where this is coming from as I can't read the v6 addresses so I am unsure where they are coming or going to. The only place DHCP is used is, the router uses some DHCP with the modem for internet connection reasons. However, the DHCP broadcast I don't feel should be even coming across the packet logger.

I am still leaning on that it would be related to the linux side of the code but I am not requesting anything extra from you until I can absolutely be sure that it is nothing pertaining to me. If I can kill the DHCP broadcasts (they should not even be there), and the issue still happens, then it is not a changed IP (which would indeed kill a connection). Hopefully if anyone else is experiencing issues they bring them forward.

I'd be curious to know maybe if those Naruto servers and such mentioned have linux hosts. I am definitely having issues here and need to eliminate them.
The DHCP broadcast could also be a symptom of a broken connection instead of a cause. In that case killing those broadcasts wouldn't prevent the problem.
I am not ruling myself out, but killing the broadcasts was easily accomplished by turning off the windows 7 DHCP Client service.

With an idle packet logger, I retried the hello world app, waiting patiently, and after someplace just over a minute I was disconnected again. But a connection to windows (dream daemon on windows) does not have this at all.

Re running the test, I was chopped from the linux server after 63 seconds. I wish I could fix it but I have absolutely nothing showing that it is an issue on my end. I went to the extent of hand checking the entire array of linux logs and came up empty handed.

I ran some ping and connection testing tools from the debian repository which show nothing there either. This is getting really frustrating. :(
I downloaded gdb via:
apt-get gdb

This is the GNU Debugger. When I run it with the appropriate
gdb --args /usr/local/bin/DreamDaemon (params) for the test program it spits out that the executable (DreamDaemon) has no debugging symbols.

Does anyone know a linux program I can run DreamDaemon through to get better info than what has already been presented? I really don't want to host on older unaffected versions as my last resort. Thanks in advance.
I stand corrected, the earliest working version of byond which has no issues is 480.1088 which is the version I may stick with for a very long time. The issue has disappeared completely on 480 but 481 and later suffers.

I am not sure why but the issue only affects linux hosts. It may not affect all linux hosts as the only linux I use is debian because I like the package manager and ease of use.
The only changes to the network code that I see in 481 revolve around how the network responds to read errors, and what happens to clients when the server goes through a normal shutdown. None of the changes I see are capable of causing a disconnect that wasn't already in progress.
Page: 1 2