ID:1904906
 
Applies to:Dream Seeker
Status: Open

Issue hasn't been assigned a status value.
So, for an experiment, I fired up Clumsy to simulate network lag, and ran /tg/station13 on a local DD server

I tested 10ms, 100ms 250ms, 1000ms and 3000ms. (all set to lag only in one direction, because of oddities of how localhost connections work)

Once you get to 250ms, the effect of the lag become noticable, but its not until 1000ms that you can really see the inefficiencies of the byond network protocol.

There are still quite a few things that are ask for something -> wait for reply -> ask for more things -> wait for reply, repeat until it has the info it wants.

For instance, while on 3000ms latency, if I run the verb jump-to-area "engineering" where that area has a lot of objects, the following happens:

3 seconds passes.
DS is told its being moved to a new loc, it blanks out the shown turfs, then asks for turf info for its new loc
3 seconds passes.
DS updates with turf info, but not object info, it asks for that information
3 seconds passes.
DS updates with some of the info on objects in view, it asks for the rest
3 seconds passes.
DS updates with some more of the info on objects in view it asks for the rest.
3 seconds passes.
DS finally has all of the info on objects in view.

basically, it took 5 round trip operations to process getting moved to a new loc. at 3 seconds of lag, this is 15 seconds total, at a more reasonable number of 300ms, this is 1.5 seconds total.

Now, If that were pipelined, that could be as low as 3000ms(3 seconds) for the extreme example, and 300ms for the more reasonable example.

Such examples exist all over the place, So i wanted to bring this to attention as an optimization avenue.

First step would be to reduce the assumed network state to just auth and handshaking. DS shouldn't care about rather or not it asked for info, this way it can handle the data regardless of what commands were queued, all info about the command it needs should be implied by the command.

Take IRC for example, it is shockingly stateless, RFC states that to a client, sending a command, and getting a reply, should be fully separate. Clients should handle errors like "cannot join #channel, banned" regardless of if they remember trying to enter that channel.

Second step would be to than pipeline requests.

Ask for all of the info at once, both turf and object info. or send one request, then another, without wait (and ds might need to be setup to send it all at once, rather than wait for the client to request more if it doesn't fit it all in the packet)

And that brings me to the final step: Preempt the client.

There is good bet DD can be trained to know what info the client will want, so it could start sending it when it tells the client about the new loc.

The filter I used in clumsy was:
tcp and outbound and ip.DstAddr >= 127.0.0.1 and ip.DstAddr <= 127.255.255.255  and (tcp.DstPort == 31337)

(31337 is the port i use for local DD instances when testing.)

Another note, is that if map threading were finished, most of the possible shortfalls of such a system (mainally in the final step) basically disappear.
Under most circumstances, DD should actually be sending all the turf and object info ahead of time without DS having to ask for anything. (Verb parsing, however, is often another matter.) For the most part, DS's ability to ask about objs and such ought to be deprecated. I'd probably have to study a case in action to see exactly where things are breaking down, because the behavior you described is, I think, abnormal.

DS does ask for icons and sounds after the fact, if it doesn't have them already, but I expect that shouldn't come up much as those are typically included in the original resource download.
in this case, I'm using 508.1293 installed on windows 7 64bit if that helps.

I have like 10 versions of byond zip formats in a folder. So I could look to see if this behavior changes in one of them.

(how ever, it gets harder to pinpoint degeneration in client server different version combos, but I'll leave that up to you.)
The typical approach to maps is basically this:

- DD sends info on a block of turfs, and tells the client to shift its internal map. (The DSified webclient doesn't keep an internal map the same way DS does, which works a little nicer.)
- DD sends info on any new areas encountered.
- DD sends obj and mob updates, in separate messages.
- Any new appearances are sent prior to their IDs in all of the above messages.

With all that being the case, I'm stumped as to how DS could respond with requests. Even if the messages got delayed, the objects are received in such an order that DS shouldn't be freaking out about anything being missing. Very weird.
I haven't looked at the internal network transfer during this, i made some assumptions about the flow. but at some point, there is waiting happening between those steps.

It may be that the the server is waiting for the client to reply or ack the data for all I know.

Oddly enough, i only lagged data going from the client to the server, not the server to the client,(at least, I think anyway, i'll do some more testing) so this is even more confusing.

Edit: confirmed, Lag is as i intended, only applied to messages going from the client to the server, not the reverse.
It's interesting data for sure. I'll have to look into it more when I have a chance.
So I've also confirmed that this is affecting the download of resources as well...

Even on localhost, only fake lagging from client to server, a 500ms lag causes the resource download to take AGES. I had to pause the lag for the connection when i reverted to an older code base to test out 504 and 506.

As for the move change detection, I did some more tests on with my client.view set to 35. to really see how it updated.

How it seems works is that all turfs and mobs are shown all at once. (except turfs in world.area that aren't of type world.turf)

Then world.area updates, in 2 to 4 line increments, bottom to top updating both obj and turfs that aren't of type world.turf.

Then everything else updates, in 2 line increments, bottom to top.

Oddly enough, during the world.area update, randomly some tiles that aren't in world.area will update with objects.

506.1251 was the same, 504.1225 was mostly the same but it had no separate world.area process and the object updates seemed to happen in a random order but was otherwise 2 rows at a time.

My main tests have been on 508.1293 running on my local computer. I also tested with somebody else connecting to my local instance. and me connecting to the sybil server on tgstation13.org.

And i did some tests of the feb-2015 version of our code base (since current master uses 508 language features) using 506 and 504 (for both client and server) that i mentioned above.

Generally i set my client's view to 128 (we have a verb for this, we know it rounds down to 35, but its nice to pretend) wait for it to update, then teleport to an area on the map i hadn't seen before with lots of objects, like brig, medbay, engineering.

I also tried other client.view numbers, like 7(our default) 14, 20, 25, and 32.
Isn't this what TCP does internally? You might be triggering the packet-loss message. (Forgive my lack of technical information)
I did some research:

TCP has whats called a receive window size, This is how much data the sender can send the receiver that hasn't been acknowledged by the receiver before it has to start queuing up data to send. (this would also cause blocking/hangs on DD unless it prevents connection blocking) Both sides of the connection have their own size that the other side has to respect.

This dynamically changes with every SYN (synchronize) packet based on current network throughput, round trip lag, and available memory.

So heres the odd part....

According to technet.microsoft, Laggier connections are given a much LARGER window size. https://technet.microsoft.com/en-us/magazine/ 2007.01.cableguy.aspx

It makes sense, they want to avoid the lag slowing down the transfer of the connection. (IE: they want to prevent such things from causing exactly this)

And I popped open wireshark to test, while connected from my computer to /tg/station server 2: sybil, the client's receive window quickly goes from 65kb to 214kb the moment i enable lag.

More then enough to send required data at a view range of 14. I can also see the network packets happen, it doesn't start sending more when it gets the TCP acknowledge packet, but when it gets a 11 byte data packet from the client to the server, (byond network protocol command number is 00 3c) or a 50 to 60 byte packet from the client to the server, (command number 00 3d)

as for resource download:

With the client and lag simulator on my laptop, and the server on my computer, connected over a 100mbit lan (its an old laptop), the tcp window size goes to 260kb. so, at 3000ms lag, that SHOULD be a transfer rate of 87kb/s, but if i count how long it takes to raise 0.1mb in splash screen, i get 4 to 6 seconds, or 25 to 16kb/s. I can also see the client sending a 00 9f packet then getting the next chunk of data,

All in all, everything seems to indicate that this is a pipelining issue. Too much waiting for responses before moving on to the next step. Doing this compounds the effect of lag on the transfer rate and the responsiveness of the game.

Its just odd that this isn't suppose to be happening, according to lum.
Its just odd that this isn't suppose to be happening, according to lum.

I really wonder if this is part of the problem with world.Export() between servers being hung up by 100ms minimum across a local connection. Something in the networking model really seems to be trying to be overcautious about lost and malformed data.

Thanks for the interesting read, StonedOne. This really feels like the garbage collection issue that you guys discovered and we got fixed not too long ago. This feels like something that'll be very fruitful for performance if addressed.
I am always welcome to seeing BYOND's networking optimized even further. Especially if it makes a huge difference in performance.

I'd imagine it'll vastly improve pixel movement?
The Export issue has to do with task scheduling, I'm convinced, so I don't think it's related. However, resource downloads are the one place where one-way lag makes sense to me as a problem, because file transfers are chunked and wait for a response before sending the next chunk. This, I feel, is something I can improve with much larger chunk sizes, or even dynamically increasing chunk size.

For the map atoms, I'll have to do more investigation.
for resource downloads, send it all as some standard chunksize, one after another, until the send buffer fills and let tcp and byond already built in sequence number system and checksum system handle any packet loss. TCP will detect and re-transmit in 99% of packet lost/corruption cases.

Better yet, split the resource into files, send them all at once. one after another, and anything lost can be handled under the normal resource system of downloading it when it needs it.

Why work?
BYOND handles file downloads as pass/fail IiRC, though maybe that can be fixed on a case-by-case basis. This is definitely an area for improvement so I'll look into it.
How it should go is it should send the entire contents of the file to the client, split up into 1024byte packets, but pipelined, and then it should ask the client if they got it, and wait for a response.

This is actually a case where blocking connections might actually be useful, because you can check if it would block, and know the buffer is full. basically offloading the window sizing to tcp (because we both know that it's gonna do it better and more effectively).

Every time you wait for a reply, you have a transfer speed slow down that is greater than you realize. even when there is only a 150ms round trip ping, and lets say the connection could transfer at 3MB/s (avg 25Mbit/s residential cable connection), and say You set the send size to 32kb.

Thats still only 213KB/s max transfer speed. Raise it to 64KB chunk size, and you get 426KB/s.

To max that 3MB/s connection out at 150 latency, you'd need a chunksize of 480KB.

At 200ms, it need to be 605KB, and all I can think about as I calculate this out, is that there is literally a system built in to the network stack to figure this out, and it's much smarter about it then my calculations are.

Wouldn't larger packets make more sense, so the network layer could split packets if need be?
Ya, I kinda pulled 1024 out of my ass, but tcp tends to get split into frames of 1160 bytes, so it makes sense, kinda. actually i'll have to calculate google the overhead on a packet, i think that might leave a touch less then 1024 left for the payload.

All in all you get the idea, pipeline pipeline pipeline
1392 bytes afaik is a safe value for a max packet send. PPoE tends to have a window of 1400 and normal windows tend to be 1500. Past that you automatically get fragmentation on most residential routers unless they're specifically configured AND the isp supports larger windows.