Multicore CPU support

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:2374065

Jun 3 2018, 5:58 pm

Czoaf

Applies to:

Dream Daemon

Status:

Open

Issue hasn't been assigned a status value.

I know there was some post's about this, but is this a possibility to make multi-core support on dream daemon? Maybe we can get a date? this year,next year,...

Jun 3 2018, 6:36 pm
Genesismagician	It wont be added at all. This was brought up before and deemed "unfeasable". Why? I have no idea, but it would be nice to see multi-core CPU support and a higher RAM support allowance.

Jun 3 2018, 9:34 pm
Somepotato	64bit support would be great, but multicore support isn't as easy as snapping your fingers. Would require a pretty massive rewrite of BYOND.

Jun 4 2018, 7:50 am
MrStonedOne	No it wouldn't.

Jun 5 2018, 2:44 pm

Lummox JR

The answer is a bit complicated. Threading could in theory allow for multicore use, but there are limits.

The main things that are possible to thread out are the frontend interface, and the SendMaps() functionality. When we last tried this it was an utter disaster and proved impossible to debug, which is why for some time now the threading code has been disabled. One of the major sticking points that caused crashes, world.Export(), has likely been dealt with, but it's very likely that SendMaps() would need to be revisited because there are some limits to its concurrence and newer features since then might have disturbed some stuff. (As a specific example, the MapObject class keeps track of some things that are refcounted, like appearances. There is a possibility, very uncommon but still possible, that some appearances could be deleted as a result of the refcount returning to 0 at this time. To avoid this, those refcount changes have to be kept track of separately and resolved after the concurrency ends.)

Some of the stuff in DD/DS still doesn't function properly with threading at all (namely queries about whether an operation should be allowed), which would require some updates there. A lot of that is still stuff I want to do regardless of threading anyway, just because it's bad form to have the proc waiting on this without sleeping. A number of procs do this, so it's a bit of a project.

Spawning threads for some stuff like MySQL has been discussed, and is something I'm strongly considering. (Caveat: It probably requires re-enabling part of the threading code for the Linux side, even if the remaining threading code isn't re-enabled, because of the way signals are handled.) The idea here is that one thread would handle MySQL calls, and it would communicate with the main program thread to let it know when to wake a sleeping proc.

As far as running actual game code, like procs and such, multithreading is utterly and completely impossible. There's simply no such thing as a safe way for the code to handle things like object var accesses, string creation, etc. without massively degrading performance with lock checks.

Jun 6 2018, 2:42 am

Bmc777

When it comes to threading regular game code, would it not be possible to completely separate threaded code from normal game code through adding a new ancestor data type named something like "threaded"?

This would mean a threaded datum and proc would be defined like so:

/threaded/datum/my_datum

        var/thread = thread_name

Where the built in thread var is required to be defined.
Alternatively, if the architecture is set up in such a way that object definitions cannot happen outside of a thread, a different option could be:

/thread/my_thread/datum/my_datum

For both examples, a thread would be defined as:

/thread/my_thread

Moving forward, I'm going to just go with the second syntax because I think it's cleaner but if the first example would work better I could rewrite the rest to fit that style of defining.

Defining a proc would look like:

/thread/my_thread/proc/my_proc() // "Global" Thread proc
/thread/my_thread/datum/proc/my_proc() // Datum proc

Instantiating a datum would be the same syntax as normal, just with the thread type as a prefix:

/thread/my_thread/datum/my_datum/D = new

Calling a proc would be exactly the same as normal:

D.my_proc()

Where, in this case, my_proc()'s definition is:

/thread/my_thread/datum/my_datum/proc/my_proc()

So far, everything here is being handled completely separate from the main world thread. Zero interaction whatsoever without even the ability to interact with the world thread. This preserves the simplicity of DM coding for the average user who has no interest in threading whatsoever.

If we were to stop here, without the ability to access objects in other threads at all to avoid synchronization issues entirely, things that could potentially cause strange issues (like clients) could simply be impossible to access in a /thread.

But we can go deeper. The ability to access vars and procs that are members of other threads could, and should, be added. This does come with the danger of programmers creating situations where deadlocks can occur though.

There is no safe way to completely prevent deadlocks without massively degrading performance, like you said. This doesn't just apply to DM though. Windows, macOS, and Linux don't prevent deadlocks. Other virtual machine based languages like Java and C# don't prevent deadlocks either. They do give the programmers the tools to help prevent deadlocks. If the programmer decides to utilize multithreading, it is up to them to write their code in such a way that deadlocks cannot occur. This same philosophy should apply to DM multithreading.

Behind the scenes, monitor locks would have to be implemented. Monitor locks are a very lightweight way of providing the ability for programmers to ensure two threads are never accessing the same object at the same time.

The implementation of monitor locks could be a whole discussion on it's own but it really boils down to, every object has a lock associated with it, whichever thread currently owns the lock owns the rights to access the object the lock is tied to, vars, procs, everything. The only way for another thread to acquire an object's lock is to wait until the lock becomes available. The lock only becomes available when the thread currently owning the lock finishes whatever it is doing with the lock's object and releases the lock.

A DM implementation of synchronized procs could look something like this when calling a proc that exists in other threads and/or making changes to a datum that exists in other threads:

/thread/my_thread/proc/my_proc()

        set synchronized = 1
        D.my_synchronized_proc()
        D.name = "blah"

In this example, set synchronized must be set at the beginning of a proc (like set background). D is just a datum that is named exactly the same across all threads. This proc will wait until D's monitor lock becomes available, then will take ownership of the lock, then will perform D's proc call and var edit, then finally will release D's monitor lock to be used by any other thread that needs it.

Ensuring D is named exactly the same across all threads and doesn't conflict with names anywhere else is essentially a global variable across all procs. To make this a bit more clear and backend processing, a new synchronized type could be utilized in datum defining, like so:

synchronized/datum/my_datum

A proc belonging to this datum would look like:

synchronized/datum/my_datum/proc/my_proc()

This convention would prevent having to search every datum in existence between all threads when synchronizing. Now only the datums of type synchronized need to be searched. This would be very fast unless the programmer went crazy and defined all datums in existence as synchronized. But that's the programmers fault.

Under these constraints, /client should really be synchronized/client by default, same goes for /world, and /verb. Nothing else should require change.

That's my first take on DM multithreading.

Jun 6 2018, 10:29 am

In response to Bmc777

Lummox JR

It's not the proc that has to sync, though; it's the objects it accesses. Any read/write on a global var would have to temporarily lock the var. Any read/write on an object would have to temporarily lock the whole object.

There's no way to say "This code is meant to worry about threading, but this other code is not". In a low-level language you can often tell what things need to be worried about threading, but in a language like DM it would never be possible.

Jun 6 2018, 12:17 pm

Bmc777

In my take on DM multithreading, all objects could potentially be accessed by other threads, there is no saying this code has to worry about multithreading but this code does not.

To make things less complicated we'll do away with the synchronized type entirely. Now all objects have an associated monitor lock on initialization.

We'll incur some small overhead on object initialization but we will not perform lock checks by default when accessing the object, so no extra runtime overhead.

As the programmer, I am still perfectly capable of doing something like:

var/global/datum/my_datum/D = new

/thread/my_thread1/proc/can_deadlock1()
        D.my_proc()
        D.my_var = 0

/thread/my_thread2/proc/can_deadlock2()
        D.my_proc()
        D.my_var = 1

Doing this will of course at some point probably cause a deadlock because locks are not checked by default, but the point is it's possible because all objects can be accessed from any thread.

To avoid a deadlock the programmer would do this instead:

var/global/datum/my_datum/D = new

/thread/my_thread1/proc/cannot_deadlock1()
        set synchronized = 1
        D.my_proc()
        D.my_var = 0

/thread/my_thread2/proc/cannot_deadlock2()
        set synchronized = 1
        D.my_proc()
        D.my_var = 1

When we add set synchronized = 1 to the beginning of these procs we're telling DM internally that any objects attempted to be accessed within this proc need to have their locks available.

Assuming cannot_deadlock1() is called first, my_thread1 checks to see if the lock associated with D is locked. If not, my_thread1 locks the lock, it now owns the lock and continues processing as normal.

Now when cannot_deadlock2() is called, my_thread2 checks the lock associated with D and sees that it is locked. my_thread2 does not attempt to access D. my_thread2 continues checking the lock over and over.

When cannot_deadlock1() completes, my_thread1 unlocks object D's lock, along with any other objects' locks that were locked because they were accessed within the proc.

my_thread2 was still checking the lock over and over during this time. As soon as the lock is unlocked, my_thread2 locks it and now owns the lock. The same process of completing the proc call and ultimately unlocking the lock occurs.

Performance does not get massively degraded with this approach to multithreading since lock checks only occur when the programmer wants them to occur.

Jun 6 2018, 3:43 pm

In response to Bmc777

Lummox JR

You're thinking about this all wrong. What you've been talking about is the high-level behavior: how BYOND could, if it were threadsafe already at the lower levels, try to resolve high-level conflicts. In other words you're looking at all of the components as the abstracted, neat little boxes the language presents them as, but underneath the surface they're anything but.

The low-level gets of the engine are where you need to really worry about conflicts, and those are more numerous and severe. If you get a conflict on low-level structures, you don't just have a deadlock; you get a crash. (Sometimes also a hang or other weird behavior, to keep things interesting.)

Internally you have trees all over the place, arrays that get reallocated as the need arises, and so on. These operations need to keep out of conflict also, and they're frequent: any operation you did, aside from basic math, would likely end up triggering at least one lock if not several.

Jun 7 2018, 6:16 pm

Bmc777

I understand that, I was attempting to address your concern that you stated here:

"As far as running actual game code, like procs and such, multithreading is utterly and completely impossible. There's simply no such thing as a safe way for the code to handle things like object var accesses, string creation, etc. without massively degrading performance with lock checks."

I may have misinterpreted what you meant by game code though, I assumed by game code you were talking about the front-end that us DM programmers see.

I can't accurately go deeper with back-end threadsafe implementation because I don't have much detailed information to work with. I'm in to this sort of stuff so if that sort of info is available to Patreon subscribers or Byond subscribers I'd donate in a heartbeat.

I am guessing the issues you're mentioning that could occur with regards to trees all over the place and arrays that get reallocated and whatnot are because these would often cause low-level instruction re-ordering and/or volatile variables are not supported in any way currently.

This time I'll start as low-level as possible to make sure I'm not misinterpreting again, sorry about that.

Since Byond doesn't seem to behave well with any OS other than Windows I assume some of that is that VM to CPU interaction is written in a non-generic way and constrained to running on x86 based ISAs, so anything that I say further is with that assumption.

x86 based ISAs all have memory barrier support, or memory fence, whatever name we want to call them, fence is easier so I'll go with that. The compiler should add these fences where needed on compile.

The x86 memory fences are LFENCE, SFENCE, and MFENCE. These are used to achieve the cross thread memory syncing necessary for multithreading. I can outline the exact situations to use each of these but that's another long post and I just wanna make sure I'm talking about the same thing you are before going into greater detail.

Jun 7 2018, 9:14 pm
Lummox JR	I don't know much about memory fences, but I've worked with mutexes a bit. Problem is, mutexes inherently slow down a program, if only a little bit--and to run multiple procs concurrently, BYOND would need them everywhere internally.

Jun 12 2018, 8:28 pm

Somepotato

Could do something similar to web workers; require you to use a messaging/channels datum to communicate to the main thread.
Datums could be passed back and forth if they have a flag set (so you'd have one mutex per datum with the flag set, possibly simply a subtype of /synchronized)

Alternatively, monitor locks are a possibility, although you'd still need to work out the issues with execution states and whatnot. You can runtime attempts to use a variable in multiple threads, and use interlocks and whatnot.

Fencing has the potential to slow things down and I think you could get away with not using them.

I think the first option would be easier and more performant but the latter (any implementation of) would be easier to use.

Jun 13 2018, 2:49 pm

Optimumtact

The simplest use I can think of for a lot of what we want threads for is just background I/O read/writes

which can be done with a simple messaging system and explicit background threads that can't interact with the normal world and global.

This would let us offload a bunch of I/O work to another thread and hopefully free up more CPU time for just pure game processing.

Longer term being able to access limited parts of the world (I.E certain vars on turfs to calculate atmos etc) with locks would be nice.

I.e if we could declare synchronized vars, that are available to background threads and have locking semantics, then we could declare our gas mixes to be synchronized for example and then do atmos in another thread (one of our heaviest game subsystems)