ID:2117899
 
Resolved
When a very very large number of objects per turf was in play, updates caused by moving around the map could slow the client down at those times, causing stuttering. Optimizations made to the client and server, and additional info in the message format, will improve this behavior.
BYOND Version:511.1349
Operating System:Windows 7 Ultimate 64-bit
Web Browser:Chrome 51.0.2704.103
Applies to:Dream Daemon & Dream Seeker
Status: Resolved (511.1353)

This issue has been resolved.
Descriptive Problem Summary:
Let me set the stage:

world.fps = 20
world.tick_lag = 0.5 (50ms)
client.fps = 40
client.tick_lag = 0.25 (25ms)
client.glide_size = 0
mob.glide_size = 4
world.icon_size = 32

So, logically, the mob should glide 2px every client tick right? And it should take it 4ds regardless to get to the next tile, right?

nope! Its 1px and 8ds to get to the next tile!?!?

I can confirm that on the client its 1px and not the 2px it's suppose to be, because if i have the code display a message exactly 4ds (icon_size/glide_size = 8, 8 * tick_lag = 4ds) later, the mob will be exactly halfway to the tile, not all the way like it should be.
I'm not able to reproduce this in 511.1350. I was concerned that world.fps being higher also might have some kind of impact on the calculations, but I can see the calculations being done correctly when I look at this in the debugger.
Lummox JR changed status to 'Unverified'
Are you testing this with tile glide movement and not pixel movement.

edit, holy fuck, ya, not reproducing, let me load up the 511 branch of ss13 and see if its fixed there

edit2, Still doing it on the ss13 branch. I'll dig around and see if i can't figure out why, but my guess would be something resetting glide_size back to 0 before it's suppose to.
Yes, for clarification I did definitely test this with regular gliding, not pixel gliding.
I'm stumped, It does it on the ss13 code, but not when reproduced.

I tried to replicate as much of our movement code as i could that relates to delays and assignment of mob.glide_size, and confirmed nothing is fucking with glide_size outside of the pre-move assignment.

after work i'll give this another go, either something is messed up in our code, or some odd edge case we hit is triggering this.
Looks like i might have got this to reproduce, kinda.

I didn't think to just test world.fps = 40 and client.fps = 0, doing so it happens in ss13 as well, so this isn't pixel gliding, and this could most likely be not even a 511 issue.

Knowing that, I put up a test project that is a bit more "busy" in terms of items on screen, and it reproduced. (its the same test project for the client shuttering bug, with ss13's movement code thrown in, and the new dynamic glide size code)

Oddly enough, the client's cpu isn't getting anywhere near 100% of a core, at 0.5 second sampling rate

So here is where it gets even more odd: It only reproduces while holding an arrow key down, even if that arrow key doesn't result in a move. if in ss13, I make my mob drift thru space, the glide is normal while not pressing an arrow key, then the shuttering starts again while pressing an arrow key, (as long as there are enough other objects in view). (while in space, client/Move() ignores the move command if there isn't anything to grab on to/push off of near you) The glide itself changes as well it seems, but I can't quite quantify that.

edit: when holding an arrow key down, it seems to act like the mob's glide_size is 0, when not holding it down, it acts like it's the assigned one.

I'll dig around to see if i can figure a cause from our game code, and if making that not use north()/south()/etc fixes it.
ok, so i never commented on this after doing more testing, thought i did.

Basically, it only happens if there are a lot of objects on the screen AND: Only if you use the default .north/.south verbs.

Test case: https://tgstation13.org/msoshit/shuttershow.zip

Hit tab to switch between north/south verbs and my custom ones.
default is custom. white input bar means default, grey means custom.
And this is the case in 511.1352 as well?
yes.

I had to re-test it to make sure i found the right test case folder, and i'm on 1352 now.

edit: I should point out, that i only tested via hosting via dd, so no idea if ds hosted does it.
Good to know. I'll run some tests later and see what I can find.
So far I'm seeing identical behavior whether client.fps is set or not, and regardless of which verbs are in use. This is just in DS, not DD+DS which I haven't tried yet.

What I'm seeing is that the glide appears to be happening faster than it should. I'll have to check in the debugger to confirm. glide_size is set to 4, and with world.fps=20 that should indeed finish in 0.4s. But it glides, stops, glides, stops, even though Move() is reporting that it's starting on the right tick pretty consistently. With the times Move() is reporting, I would expect the glide to be continuous.
Ya, I noticed that after more extensive testing, I mentioned it in passing in one of my longer comments.

still a bug, just not a client fps bug like i thought. (I haven't tested it with 510)
I believe you're going to find that 510 behaves the same, because the changes to gliding don't appear to be relevant to what's happening here.

If I change your code so that a move north or south takes two steps, timed by a sleep, I'm seeing that the first glide never actually completes on time; there's a delay during which no map updates happen, and then it starts working again. When I drilled down into the debugger, I found that it's encountering a delay.

1003571281: Draw map 118.279999
1003571359: Draw map 119.059998
1003571421: Draw map 119.690002
1003571484: Draw map 120.309998
1003571578: Draw map 121.250000
1003571781: Coords 0122AF40 changed to 1408,1888
1003571781: Draw map 123.279999
1003571875: Draw map 124.220001
1003572125: Coords 0122AF40 changed to 1408,1920
1003572140: Draw map 126.870003
1003572218: Draw map 127.650002
1003572296: Draw map 128.440002
1003572375: Draw map 129.220001
1003572437: Draw map 129.839996
1003572531: Draw map 130.779999
1003572609: Draw map 131.559998
1003572703: Draw map 132.500000

That was recorded in DS running in remote mode, with DD running the server part. I'm not terribly happy with the timing values I'm seeing here as far as consistency, but the main thing to notice is that there's a significant skip happening when the mob moves. Here's the really weird part: Notice the timestamp on the left for the debug output. The "coords changed" message from the mob moving is coming in after whatever is causing the skip, even though the skip is obviously correlated to the mob moving.

I did a little more digging and tracing the message types and times received, and I found that the OBJMAP_INF message is being sent 11 times every time the mob moves. The sizes I was getting for those messages were 611, then 4x 609, 5247, 4x 5109, and finally 2406. In other words, a crapload of data is being sent because of the very large number of objects. DD limits how many obj/mob updates it will send in a single message; it's not so much to save memory as it is for some structural reasons at this point, though I'd eventually like to remove that restriction. (Some of the structural issue is built into the message format itself.) These messages were spaced out along the delay.

Thinking that DD's limit-of-N behavior on those messages might be partially responsible for the long delay, because it has to re-traverse parts of the mapped object list in order to fill the messages, I did some experimenting by upping the value of N--even though I still suspected the problem was also that it simply took more time to handle the sheer number of objects. When I tested further and increased N to near its limit, I found there was still a stutter although the numbers were a little bit better. (The distribution of data in those OBJMAP_INF messages stayed proportionally the same, which I found super odd. I would expect the first messages to be the biggest, but that isn't the case. I could study that question further, but I doubt it's worth the effort.)

Interestingly I did see one minor optimization issue on the server, in that x and y coordinates are sent in the message rather than sending a simple turf ID and letting the client do the job of splitting that up into x,y,z. But on the whole that's probably not hurting anything.

Finally I got to wondering if the delay was strictly on the client end, and it might relate to how the messages are processed afterward. Specifically, the client keeps a really convoluted set of data structures that holds a list of objs and mobs on each turf, within the normal turf range that is. Managing those data structures takes some amount of time, and I found that they were being handled every time an OBJMAP_INF message was received.

As a test, I added a value to the message that says whether more messages are on the way, and then on the client end told it not to do that post-processing if more messages were expected. As a result, the stutter dropped dramatically--though not completely.

In the final analysis, this isn't a glide problem and isn't a beta issue; it's a problem related to the very very very large number of objects in play in that demo. My optimizations however should be nice for all games where the number of objs/mobs sent in a tick exceeds the old per-message limit, so those optimizations will end up in the next version (511.1353).
Lummox JR resolved issue with message:
When a very very large number of objects per turf was in play, updates caused by moving around the map could slow the client down at those times, causing stuttering. Optimizations made to the client and server, and additional info in the message format, will improve this behavior.
Any idea why changing the arrow keys to not use the built in movement verbs was making it much better? (incase you didn't see the line about using tab to switch between the two)
I tried switching between the two and I saw absolutely no difference. However it's possible that the fact that .north and .south end up stacking differently in the queue (because your verbs are instant, and those are not) might make some kind of difference in terms of timing.
thats.... interesting. I saw a massive difference. the glide was almost perfect on the custom verbs, and shuttered massively on the default verbs.

Lower the 200 number near the bottom of the file (it has a call out for you) to reduce the number of objects spawned, say 100 or 50.

Maybe it's getting so high for you that it shutters in both.

edit: ah, ya, commenting that out does make them identical. It also still happens on my laptop if i lower the var/i amount massively, to like 50 (50% chance to spawn done 5 times) (i had it set to spawn so many objects initially because i wanted to make sure it happened to you =P)
this fix seems like a pretty big deal, nice
+MrStonedOne> its about client render preformance
incoming> well yes that is good on all fronts
incoming> but now I reseated my heatsink for nothing

Incoming's laptop thanks you lummox.