Before this gets used, I want to make sure it is intended (or at least stable) behavior.
Currently, get_step() with a dir of 0 just gets the turf an atom is on, through any number of nested layers. If my testing is to be trusted, it does this WAY more efficiently than the current method using the empty for loop, etc. I just want to make sure this isn't something that will break before I use it.
To clarify, at one point in time Lummox pointed out:
for(A, A && !isturf(A), A=A.loc);// semicolon is for the empty statement
return A

As how he would obtain an Atom's turf, at the time this was around 61% faster than what we were using, so pretty much all SS13 codebases switched over to it. However today Exxion has pointed out that:
/proc/get_turf(A) //Should really be a Macro
return get_step(A,0)

Is faster, and from my tests it's about 50% faster over 1 million calls, the issue is it doesn't say in get_step() that you can supply 0 as a direction, so nobody's really sure if we should go ahead and use it.

The closest thing I can see to this being documented is in walk()'s ref where it says

"To halt walking, call walk(Ref,0). " that's the only time I can remember seeing 0 be considered a direction
if(isturf(A)) return A
return A && A.locs.len ? A.locs[1] : null
That's something Lummox specifically recommended against, and I believe this is faster anyway
In response to GinjaNinja32
You mean /atom/movable, surely, since Atom has no locs list.

Edit: Yours is also the worst of the three (Exxion > Lummox > GinjaNinja)

(Where get_turf() is lummox's, _new is Exxion's, and _bay is yours Ginja (since you're a Bay dev and iirc I saw this version in a Bay PR))
You appear to be correct, though I can't find anywhere he recommended against doing it in anything other than a speed perspective.

My testing says the locs get_turf code I put above takes ~75% the time the loop does; I can make a small optimisation to make it take ~60% the time (get the list once and read it twice, rather than getting it twice), but your get_step version takes only ~35% the time of the list, more than twice as fast as the original locs version.

If this get_step behaviour is going to be consistent, it is the fastest way I've seen to get a turf from an atom.

edit: More accurate results via profiler.

Nesting: _ Shallow _ Medium _ Deep
loop _____ 0.690 ___ 1.049 __ 1.375
locs _____ 0.648 ___ 0.658 __ 0.678
step _____ 0.319 ___ 0.339 __ 0.362

Shallow is an atom on a turf (one level)
Medium is an atom in an atom in an atom on a turf (three levels)
Deep is five levels.
You appear to be correct, though I can't find anywhere he recommended against doing it in anything other than a speed perspective.

The issue is the locs list is created on read, meaning the first read of an atom's locs list since it moved has overhead.

Your testing wouldn't show that unless you created 1000000 atoms.

(and creating these lists needlessly clogs up the list of lists)
In response to MrStonedOne
atom.locs is not an actual list stored anywhere in memory, but has to be built instead from the atom's bounds when the var is read. So a list has to be created on demand.

this implies to me that it's *not* in fact cached/etc, it's created every time the var is read.
Whenever testing, if you want to really know what one is faster, you have to use ab testing.

Put it in production code, but replace the current proc with one that randomly redirects from one of the two (or more) using prob or rand/switch()

This lets you then see the realistic difference, without things like cpu cache, branch prediction, and cached calc on read byond vars, fogging things up.
I was sure i saw something that suggested it was cached. hmm.
Bump. I too am interested in hearing about this.
The locs list is not cached.

get_step(Ref,0) is a really innovative idea; you're right that it's calling an internal routine to get the turf. The downside is, it's also doing a call to LocXYZ() and then XYZLoc() even if there's no direction given, so it's unpacking and then re-packing the coordinates for no reason. (That's two divisions and two multiplications.) However it definitely is stable and intended behavior, so you can rely on this to work. A sanity check for dir 0 would speed this up in that case, but obviously it would just be dead weight on all other get_step() calls.

At the moment there are no other quick shortcuts to the internal GetLocTurf() function.
In response to Lummox JR
Lummox JR wrote:
At the moment there are no other quick shortcuts to the internal GetLocTurf() function.

could you expose it for us? to avoid the costs of the (un)packing of coords without affecting get_step()?
That would be feasible; I'm not sure if it's all that needed, since I'm not aware of any high-impact use case for it. I see MSO has made a feature request for it, though.
Well there are a lot of general cases in which we try to find the specific turf a mob is located on, this is one of the most used helper procs in any ss13 codebase. But since you want specific use cases I will give one as an example:

For example our say code directly sets a hearer mob to the turf location of all mobs/objects that have the ability to 'hear', (which gets reset after a certain amount of time has passed and something else has been spoken). Then, using hearers() or viewers(), those hearing mobs will pass on what is being said to any mob/object that can hear which can be within an atom at any level. This results in fully recursive saycode at minimal cost.

I could give a few more examples where I've found brute forcing get_turf() is faster than any other single method. And that was before all the speedups we have incrementally found to making it faster.