ID:1288063
 
BYOND Version:499
Operating System:Windows 7 Pro
Web Browser:Firefox 21.0
Applies to:DM Language
Status: Verified

A member of our crack team of bug testers has verified that this issue is reproducible, and has handed it off to the development team for investigation.
Descriptive Problem Summary:
When using step_to, get_step_to and walk_to if AI mobs can not actually get to you because you're surrounded by other mobs, CPU usage will increase drastically, this does not occur in 489 or earlier versions.

I've also noticed that they'll continue walking towards you, while in version 489 once the path is blocked they'll stop moving altogether.


Numbered Steps to Reproduce Problem:
1. Run the example below with 498/499/490+
2. Notice CPU usage and mob movements after they've surrounded you.
3. Run example in 489 - notice difference.

Code Snippet (if applicable) to Reproduce Problem:
Example can be downloaded here: https://dl.dropboxusercontent.com/u/51176364/ scratch-step_to.zip

Expected Results:

Actual Results:

Does the problem occur:
Every time? Or how often? Every time.
In other games? Yes.
In other user accounts? Yes.
On other computers? Yes, tried 2 computers.

When does the problem NOT occur? BYOND version 489 or earlier versions.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.) The problem did not occur in 489, I've also tried 490, 498 and 499 and the problem did occur with those versions.

Workarounds: Use BYOND version 489 or don't use those functions.

It increases because it is unable to find a path and so it is trying again, as fast as possible.

You have to detect for this yourself (just don't even use walk_to, use step_to so you can make checks in between calls inside a loop).

This is more like a feature request, where we request to have better built-in pathfinding.

Edit: Didn't notice there was different behavior in 489, so that is interesting. Can we have a developer comment on that change?

Fancy Feature Request Suggestion: Have internal pathfinding call something we can overwrite when movement is blocked...

mob/enemy
proc/chase(mob/m)
chasing = m
begin_chase()
proc/begin_chase()
while(chasing)
step_to(src, chasing)
sleep(1)
StepBlocked(atom/a)//called with pathfinding is blocked
//a is the atom that blocked
blocked++
if(blocked > 10)
walk(src, 0)//cancel movement
chasing = null // stop chasing
StepImpossible(atom/a)//called when pathfinding is impossible
//a was the original target
world << "[src] can't get to [a]"
walk(src, 0)
chasing = null
//ranged units could start shooting at this point

You can detect if a step is blocked by doing the following:
var/turf/t = get_step_to(src, chasing)
if(t)
Move(t)
else
// path is blocked.

get_step_to() will return null whenever a path can not be made or blocked.

Although, in versions after 490 it only returns null when the path is blocked, if a path can not be mapped it'll continue until it can't anymore and when it can't find the next turf to hop next to, it the CPU increase occurs.

I have tried with step_to(), walk_to() and get_step_to(). The example uses walk_to() because it's simple, I wanted to show the behavior compared to 489.

EDIT: I figured it's probably important to note that if you do not pass over the Min parameter, it'll be 0 thus the same problem occurs, the mob is right next to you but will continue to attempt getting closer to you and there you notice the CPU increase.
It's obviously something due to pixel movement - 490 was the pixel movement release. They had to do some changes to those procs, might have caused this.
Long story short, we deserve some more procs to handle the pathfinding better.
The reason for the change in behavior would be because 490 switched to A* for its pathfinding from the old, terrible O(N^3) routine which was strictly tile-based. A* has many advantages, but it also unfortunately has the property that it won't give up until the search space has been exhausted or a limit has been hit.

One possible approach to dealing with this would be to allow A* to operate from both ends, so instead of calculating from the source to the destination, it could calculate the other way as well (using the source's size/speed), and bail out if one side failed. Reconciling the two paths might prove iffy as the steps wouldn't usually line up, but perhaps that's not such a big deal.
I look forward for it, my game pretty much relies on the AIs, this will be a great improvement.
In response to Lummox JR
Lummox JR wrote:
One possible approach to dealing with this would be to allow A* to operate from both ends, so instead of calculating from the source to the destination, it could calculate the other way as well (using the source's size/speed), and bail out if one side failed. Reconciling the two paths might prove iffy as the steps wouldn't usually line up, but perhaps that's not such a big deal.

Bump. This sounds good.
Stephen001 changed status to 'Verified'
Bumpy bump. I figure instead of reconciling the two paths, just use the first one ala original pathfinding.
Any chance your problem relates to this bug report?

Basically if a path is blocked it appears the built in pathfinding doesn't give up easily and continue to search for a path resulting in a spike.
I'm certain it's related, so I'll reattach it.

I think this could easily show up in a world hosted in DS, not just DD, because when you host in DS basically it's the exact same thing. (The only real performance difference would be that the client side bogging down would impact the server.)

A memory leak is unlikely. The issue described in the original thread is almost certainly the culprit.

I do want to get this fixed as soon as I can, but in all honesty it's a tough fix and I don't have a plan just yet. The problem is that pathfinding with pixel movement is inherently difficult, and A* is really bad at realizing when it's stuck. What I'll have to do is come up with some kind of plan to bail out or degrade the algorithm under extreme conditions, which should help.

Nav meshes, the ultimate solution for this, are something I don't really plan to build in because of their complexity.
Until a better solution can be found, I'm going to put a Band-Aid on it for now in 508.1291 which should hopefully solve a lot of problems. The following behavior will be changed:

1) In the event of an A* failure, a pixel mover will no longer refuse to move; it will move toward the last candidate spot it considered. So a direction should be returned as long as movement isn't utterly restricted.

2) Failure was originally defined as 1000 steps, period. Now there are global counters to track success and failure. On any failure, the step limit decreases by 20% to a minimum of 100, and the success counter is reset. On 10 consecutive successes, the step limit will go up by 25 again until it reaches the max 1000.

In this way, I hope that pathological situations will better resolve themselves, and incur less of an ongoing hit when they happen.
I'm a little concerned with #1 because I count on null to mean 'path blocked', I'm afraid of jerky movement where a monster tries to reach a player but it can't so it'll go back and fourth not giving me feedback that it can't reach the player.

I can't really imagine how it'd look in pixel movement but that's how I see it in tile based.

In any case, I'll test this when 508.1291 is out, thank you for the band aid, I appreciate it a lot and I'm sure a lot of unaware games would benefit from this as well.

Ideally what I should do is lay some foundation to grab a list of the steps involved and info on whether the attempt succeeded or not. Then developers can make better use of the list for more informed pathing decisions.
Would it be preferable to leave #1 out of the Band-Aid? I thought it'd be nice to at least be able to start moving in the right direction as it would help get out of bad situations.
I think leaving out #1 would probably be best; I would prefer no movement to potentially erroneous movement. If you could differentiate between returning no path available and a* failure we could elegantly fall back on our own secondary pathfinding routines. We would be able to have pathfinding work exactly as we intend it to work while still getting the benefits of the built-in pathfinding.
No path available is A* failure; it's just a question of whether it's a failure to explore every available option exhaustively, or if there's legitimately no path.

I see your point about wanting to avoid a best guess that might be wrong though. I'll get rid of that and just go with the step limit tweaking.
I don't think I know enough what #1 means to accurately say. In my head I picture a mob trying to get through a wall in order to find a path that doesn't exist.

Let's say all 8 directions are covered in a tile based game around the target, would movement never end? It's this instant where getting to the target is impossible that I want feedback on.

However, if it means that it's just moving in the right direction and attempting to find another preferable route then maybe it's a good thing.

In response to Rotem12
If a mob is fully restricted and there's no way out, then the A* algorithm should terminate very early, having run out of possibilities to explore. So in that case that's an obvious fail and would not be impacted by the changes above. (Pixel movement does alter the logic a bit in that it might take longer to realize it's boxed in.)

But in the case where the target is walled in, the mob might not know that as easily. In those cases the algorithm will fail after reaching the step limit; those are also the cases incurring heavy CPU usage. SuperAntx made a good case for not wanting movement in that case, so I think I'll stick with that. There is something to also be said for moving in the best-guess direction, but probably no movement is best. Especially since there's no simple way to differentiate between the two fail states.
Page: 1 2