ID:2285385
 
Occasionally you have some kind of a loop that does something periodically and you only want one instance of that loop running at a time.

Sometimes you want the last invocation to be dominant:

mob
var/tmp
walk_loop = 0
proc
Walk(Dir,lag) //most current only
var/ctr = ++walk_loop //this will start to fail after ~16M iterations
while(walk_loop==ctr)
Step(Dir)
sleep(lag)



Sometimes you want the first invocation to be dominant:

mob
var/tmp
walk_loop = 0
proc
Walk(Dir,lag) //first only
if(walk_loop) return 0
walk_loop = 1
while(walk_loop)
Step(Dir)
sleep(lag)
walk_loop = 0


But there's a lot of problems with these approaches. Try killing a walk_loop and starting a new one during the same frame in the second example.

We could try using time too.

mob
var/tmp
walk_loop = 0
proc
Walk(Dir,lag) //most current only
var/time = walk_loop = world.time
while(walk_loop==time)
Step(Dir)
sleep(lag)


mob
var/tmp
walk_loop
proc
Walk(Dir,lag) //first only
if(!isnull(walk_loop)) return 0
var/time = walk_loop = world.time
while(walk_loop==time)
Step(Dir)
sleep(lag)
walk_loop = null


There's all kinds of problems here, because if we start and stop loops during the same frame, things get out of whack.

We need some kind of a unique value that increments per-call and isn't subject to overflow issues like the counter, and isn't subject to non-unique values EVER.

Turns out, DM provides you something that will do just the trick: args.

Every single proc gets a unique args list object. And it also forcibly destroys them the minute a proc ends, so they are reliable as you can get for checking if a proc is still running. Just store that args list somewhere outside of the proc, and you can properly gate a function to prevent double-loops:

var/list/walkers = list()
mob
var/tmp
walk_loop
proc
Walk(Dir,lag) //first only
if(!Dir)
walk_loop = null
return
if(walk_loop)
return

var/timehack = (walk_loop = args)
while(walk_loop==timehack)
//do walk loop


mob
var/tmp
walk_loop
proc
Walk(Dir,lag) //most current only
if(!Dir)
walk_loop = null
return
var/timehack = (walk_loop = args)
while(walk_loop==timehack)
//do walk loop


Using the args list to test loop continuation ensures that there will never be a duplicate entry in the list. You can even extract information from within the running proc in an abstract way by checking the values within the referenced list.

This is definitely weird, and probably should not be recommended, but this solution is 100% functional and beats the alternative that I wrote to handle timestamps that can never collide:

var
list/timehack_registry = list()
proc
register_timehack(id,subid,time=world.time)
var/list/l = timehack_registry[id]
if(!l)
l = list()
timehack_registry[id] = l
var/hack = l[subid]
if(istext(hack))
var/list/sl = splittext(hack,":")
if(text2num(sl[1])==time)
hack = "[time]:[text2num(sl[2])+1]"
else
hack = "[time]:0"
else if(isnull(hack))
hack = "[time]:0"
l[subid] = hack
return hack

clear_timehack(id,subid,hack)
var/list/l = timehack_registry[id]
if(!l)
return
if(istext(hack))
if(hack!=l[subid])
return
else if(!isnull(hack))
return

l -= subid
if(!l.len)
timehack_registry -= id

get_timehack(id,subid)
var/list/l = timehack_registry[id]
if(!l)
return null
return l[subid]


Yeah... Let's not do this. I have zero interest in the "correct" solution. Use the args hack. It works. It works for a reason. You don't need to overengineer everything to be logical. Don't do what I just did above. Use cached args lists as a control loop tracker.
This is indeed a clever solution. I had to take a look internally to figure out why it works, and whether there was any downside like making the engine force-delete the list.

As it turns out, the args list is a special list type. Its ID, the ID of the running proc instance it belongs to, comes from an internal counter that's incremented every time a new proc begins. That counter is four bytes so it will take a very very long time to ever loop back around. Thus the arg list is a great way of putting that counter to use.
Might it make sense to expose that ID as a normal value? The overhead might not be worth it, but certainly it would be more useful than the useless "expanding" var.
Might it make sense to expose that ID as a normal value? The overhead might not be worth it, but certainly it would be more useful than the useless "expanding" var.

Exposing a 32 bit integer to code that can only handle 32 bit floats is... Questionable.
More byond technical debt.

Threading, external api access, udp, ipv6, return typing, 64bit, access to ints vs floats, linux/mac support, opengl, etc etc etc, all things that can't happen, but need to happen.

Games start out on byond because byond hand holds you and treats you with kid gloves, but once a game grows up, byond is extremely reluctant to take the kid gloves off and let you fly.
Aye, it'd be nice if we has some sort of strict typing syntax that would allow different numeric types, but we can't realistically burn the whole language and start over.
In response to MrStonedOne
I'm willing to give OpenGL another go sometime. When I tried to work with it the main drawback was that it wreaked havoc with some interface cases.

Full threading is of course impossible; only the partial threading that was done way back was ever workable. I suspect I've corrected some of the major issues that were causing problems way back when, but of course the feature set has improved since then and the map loop in particular has seen some changes, so that might need reevaluating.

Also external APIs I'm definitely cool with working on; it's just a question of what games need exactly that might be feasible to touch.
Full threading is of course impossible

It's not impossible, you just expose some sort of mlock gateway, and some sort of thread launching platform, and let programmers assume their own risks.

The point i'm trying to get across, is that the kid gloves are nice when you are starting out, but /tg/ needs them off. We've grown up, and I know we aren't the only codebase on byond that has felt this strain.

That's not to mention that if you do blocking threaded execution of byond operators, you can speed up so much stuff by just making it run on more than one thread but making the main thread wait for the child threads.

view() calculation? spawn some threads, split the workload.

need to do anything to a list (copy, merge, add, subtract, expand)? spawn some threads, split the workload.

icon operation? spawn some threads, split the workload.

My experience of DM is that crunching data isn't really a big CPU drain. It's mostly the backend handling of appearances and sending netmessages.

If looping through view(), crunching lists, and performing icon work is eating up your CPU budget, odds are there are better implementations out there for what you are doing. Or you are doing a lot of stuff on the server that a server-directed game has no business doing in the first place.

Seems to me the root of a lot of the complaints about DM's "speed problems" are people trying to do things on the server that just don't belong on the server. --Of course, it makes sense, seeing as we don't have any kind of client-side code authoring capability, but still. Seems like if we're gonna talk about unicorn farts and rainbow sprinkle wet dreams, the ideal solution to most problems would be "simply" writing a client-side DM interpreter / limited API that allows developers to more or less do their own shit and graduate to magic big boy funtimeland.
In response to MrStonedOne
Icon calculations probably would be able to benefit from limited threading, but ultimately I suspect they're not a big enough drain on performance in most games, especially these days, to justify any of that.

view() is an interesting thought. Now in theory no internal data (refcounts, etc.) is ever changing during a view() calculation, so maybe it'd be possible for threads to treat the main program memory as shared and read all the requisite values needed (opacity, lighting) before doing the calculations. The actual calculation part is probably not threadable, at least not in anything like its current format, but I've actually found looking up visibility data tends to be the bigger bottleneck now that I've gotten us away from the old horrible O(N^3) logic. (Being able to do some limited threading to fill view parameters on the client would also be a big deal, as would threaded icon sorting.)

List operations probably won't benefit from threading at all; it'd take a massive list to see any difference.
Just had to add as a note on the original post: The args list currently doesn't work--as far as doing anything with it like reading its length or grabbing values from it--outside of the current proc or its callers. There's a very old TODO note in the code that says it could look through sleeping procs to find the right one, but currently it doesn't do so. Running/sleeping procs aren't indexed in such a way that random-access lookup would be feasible.
If looping through view(), crunching lists, and performing icon work is eating up your CPU budget,

They aren't (edit, icon is, kinda, getflaticon is expensive as fuck), those were just the easily parallelizable ideas that came to mind.

Ideally anything that can be easily parallelized should be. Its all around if you can partition the input easily, and if you can merge the output easily. (as well as dealing with edge cases where things need to access entries across the partition).

With the right mutability controls you can get away with parallelizing a lot. Especially since the main thread is still deterministic and blocks for these parallel operations.

Edit: I brought alot of this up here: http://www.byond.com/forum/?post=1904940

With the right mutability controls you can get away with parallelizing a lot. Especially since the main thread is still deterministic and blocks for these parallel operations.

Problem with the view() example in particular, is that there is no mutability control whatsoever within the engine regarding how the map array is updated, meaning in order to thread it safely to prevent read/write collision:

1) copy everything that could be in view() to a region of memory that isn't going to be changed by the main thread (just like calling range). This has to be a blocking operation.

2) pass the memory region to the threadworker and tell it to start operating.

3) wait around for the threadworker to finish.

4) finally return the list that the threadworker built.

So we're gonna be piling on to what view() already does a little bit to allow it to filter the list in another thread. The minimum time it can take is how long range() takes.

http://www.byond.com/forum/?post=1409710

The above thread demonstrates that range() is actually already comparable to view(). Meaning that there is very little possible benefit to the idea in this case.

viewers() hearers(), on the other hand, might be able to be improved... Potentially.


As for your other examples for crunching data in a threadworker, file loading, or icon manipulation, they aren't inherently problematic per se. File streaming, database queries, and icon operations would be pretty solid ideas for popping off into a threaded workflow.

The list thing could be used well, or it could be used poorly. That's largely up to the developer's own level of competence, so it's really a non-issue to me.

While I can understand your current frustration with the engine (And I share some of it), I don't really have a problem with most of Lummox's priorities ATM. It seems to me that he's realistically trying to maintain an existing product, not chase rainbow sherbert skywhales.
Problem with the view() example in particular, is that there is no mutability control whatsoever within the engine regarding how the map array is updated, meaning in order to thread it safely to prevent read/write collision:


You have just missed the entire point of everything I just said

The main thread still blocks


So only view() runs


So there is no need to copy the work state since the only thing view() would need to write to is each threads internal list of found turfs (to be sorted later)

The only overhead is the sorting, and if you partition the workload up cleverly, you avoid even needing to sort the resulting lists.
In response to MrStonedOne
You don't understand how BYOND works.

If a proc yields via a blocking operation, other things in the scheduler can function.

BYOND's map is stored in an array of turfs. If the view() function pulls from the live array, anything yielded to can cause updates to occur between the time the threaded worker begins and when it ends.

If you don't copy the potential state of that map that view is going to pull from, you can't ensure read/write safety. Meaning view() has to be a blocking operation, or has to at least pull all relevant items from the map before yielding.

Moreover, if anything a threaded view() yielded to changed the state of an atom after being pulled from the map by a threaded operation, its data would not be aligned to the values when view() was actually called. Objects returned by view() might no longer be in view() by the time it returns because of how the scheduler works.

And if the entire damn interpreter has to wait on a single threaded view() call, and the scheduler can't yield at all, threading it only adds additional overhead, because it's not god damn threaded.


I didn't miss the point. You don't know what you are talking about. view() isn't client view sending. Threading was attempted during client view sending. It doesn't happen in the interpreter pass. You've either badly miscommunicated what you mean, or you are legit way off the mark with what you are talking about.


TBH this is getting tedious. If you've got a soapbox for poorly thought through pie in the sky feature requests, I'd request that you find a thread to winge about them in that's not in Tutorials and Snippets.
You don't understand how BYOND works.

If a proc yields via a blocking operation, other things in the scheduler can function.

Normally, yes.

I said block, not yield

I know how the scheduler works, I've seen code for it.

Nothing in my entire idea had this yield back to the scheduler. Comments from both me and lummox made this clear.
And if the entire damn interpreter has to wait on a single threaded view() call, and the scheduler can't yield at all, threading it only adds additional overhead, because it's not god damn threaded.

Yes, it is threaded.

Taking a task that has to do work, and using threads to make that work not take as long by partitioning the workload up and spreading them to the other cpus is still threaded.

Don't let your mind get set in such a narrow view of things.
In response to MrStonedOne
MrStonedOne wrote:
And if the entire damn interpreter has to wait on a single threaded view() call, and the scheduler can't yield at all, threading it only adds additional overhead, because it's not god damn threaded.

Yes, it is threaded.

Taking a task that has to do work, and using threads to make that work not take as long by partitioning the workload up and spreading them to the other cpus is still threaded.

Don't let your mind get set in such a narrow view of things.

Wait, you are talking about parallelizing the actual process of the view floodfill? My bad. I thought you were actually talking about offloading the entire calculation and moving on. I had you on a totally different track.

Guess I missed the point.
well good, now that we are on the same track, we can finally move, off.. this... track....? Because well thats kinda the end of the discussion. =P
Page: 1 2