Lummox JR wrote:
Multi-core processing won't be feasible except for internal operations that can run in parallel. That basically means stuff like icon operations, which we now offload to the client as much as possible. To parallelize that kind of thing though we'd still need to basically have a setup where the processing could split up but the main thread waited until all the workers were ready. Anything like allocations, deallocations, and interpreting code still has to be done single-threaded.

There's probably a better place to discuss this, but I'm curious to hear why this is the case. Interpreting code would seem like an ideal candidate for parallel processing - BYOND already has the idea of separate threads of execution, they're just not processed that way. If two players are moving in separate parts of the world, their calls to Move (and whatever other procs are called) are reading and modifying completely independent sets of variables. You can do them in parallel with no consequence.

Falacy wrote:
You seem to somehow be going under the assumption that processing in these other games isn't tied to the rendering frame rate? I can't speak for Minecraft's, but I know that for Unity that is the case. Each frame, the main Update function(s) you have setup are processed, the GUI is reprocessed, and all cameras are re-rendered, etc.

The sentence I wrote had two parts, both are important:

"The reason BYOND's framerate is limited is because the client's refresh rate is tied to the server's ticks and the server is responsible for processing events for all clients."

In BYOND, the server handles everything. In other games, the client program often has more responsibilities and can more fully respond to input. In these other games, each client's screen refresh rate is bound to the rate that it processes other things (ex: movement) at. In BYOND, each client's refresh rate is tied to the rate at which the server processes all events.

When you distribute the processing across all clients you'll get better performance. This isn't a "fail" on BYOND's part, this is the reason why BYOND games are easy to develop.
Forum_account wrote:
In BYOND, the server handles everything. In other games, the client program often has more responsibilities and can more fully respond to input. In these other games, each client's screen refresh rate is bound to the rate that it processes other things
Again, that isn't the case with Unity

When you distribute the processing across all clients you'll get better performance. This isn't a "fail" on BYOND's part, this is the reason why BYOND games are easy to develop.
You won't necessarily get better performance, or more like smoother gameplay. And though it does make it easier to develop BYOND games because the server is automatically forced to handle everything, they could just as easily automatically offload processing onto clients - like they now do with icon operations.
Falacy wrote:
they could just as easily automatically offload processing onto clients - like they now do with icon operations.

Icon operations are easily offloaded to the client because:

1. The operation is static and clearly defined.
2. The server doesn't need the result.

You tell a client to rotate an icon and it can do that. There's no part of the process that can be customized by the DM developer. Also, the processing ends on the client - no result or other information needs to be sent back to the server.

Movement (among other things) would be harder to offload because neither of these points are true. The movement rules would be customized by the DM developer and position updates would need to be sent to the server. This is much harder to handle with a generic game client (especially if you want to keep game development as simple as it is).

If you're making a game from scratch (or at least the multiplayer aspects from scratch) it makes sense to handle more things on the client. You're already making the game client and there's no reason to go for a generic solution. For BYOND, the generic solution does make sense. The way to improve performance and still have a generic solution is through parallel processing.

Falacy wrote:
Again, that isn't the case with Unity

You said it was:

"but I know that for Unity that is the case. Each frame, the main Update function(s) you have setup are processed, the GUI is reprocessed, and all cameras are re-rendered, etc."

The point I was making is that the BYOND client's refresh rate is dependent upon how fast the server does all of the processing for all clients. It sounds like Unity does not use this design, and that the client's refresh rate is dependent upon how fast it can do its own processing.
Forgive me if I expect a 3.6ghz computer to be able to handle 20+ mobs at 30 fps.

I can play Crysis 2 on this comp, but 20+ players at 30fps with pixel movement? I must be insane.
Bravo1 wrote:
Forgive me if I expect a 3.6ghz computer to be able to handle 20+ mobs at 30 fps.

It can. I'm not sure where the problem is. You're speaking as though its not possible but I haven't seen an example that shows it's not possible. If you want to complain about limits, let's have some concrete examples. My latest test had 100 mobs (constantly moving) running at 40 fps and 40-50% CPU usage on a 2.0 ghz processor.

I can play Crysis 2 on this comp, but 20+ players at 30fps with pixel movement? I must be insane.

1. Your GPU is doing most of the work.

2. BYOND does a lot of stuff to simplify game development and this comes with a performance tradeoff. I'm sure it took weeks for the Crysis developers to make something that resembled a game. You can make a simple game with BYOND in a few minutes. You can't have game development be this simple and that powerful.

I'm not saying that BYOND's ease-performance tradeoff is ideal, I'm just saying its a bit naive to be surprised by it.
1.Forum_Account Wrote:
Having games run at 10 fps has spoiled you! No matter what we're talking about (pixel movement, AI, pathing, attacks, special effects, etc.), you'll find that you can get away with a lot when the game runs at 10 fps. When you want the game to run at a decent framerate, performance becomes more and more of an issue.

You were the one who told me not to expect good performance at 30fps and 20+ players with this system.

2. My GPU doesn't calculate the games physics, guidelines, control input, math, positioning, etc. It only processes and displays the results of what I see in the game. Much in the same way the server handles the cacluations and dreamseeker displays the results while also processing icon procs.

Yes, the brunt is being handled by the GPU, but the in-game physics are much more advanced than pixel_movement with 20+ players, bit it still runs smoothly, and both are being handled by the CPU.

3. You can have easy development and viable performance in the same field, it depends on the engine. Currently, Byond isn't that amazing of an engine, it needs to go through a lot of reform and updates.

The problem is: instead of working on updating byond to become a better development tool, they're arguing semantics.

The only reasons you shouldn't include a feature in a system is. A: redundancy, or B: incapability.

Byond can capably support a built in pixel based movement system.

Byond supporting a pixel based system would not be redundant, as the only current pixel based systems are soft-coded, and, as a result, are not viable as a permanent solution to the issue.


Let's say I want to get a bigger hard drive, from 160gb to 200gb. You're telling me I shouldn't because you have a 20gb flash drive that I could leave plugged in to my comp. That's fine and dandy, but transfer speeds and reliability with a 200 drive would be much higher, even though I'd have to take the time to transfer over all the data on my current hard drive.

And upgrade is always better than an add-on, whether or not it's a simpler task.
1.Forum_Account Wrote:
Having games run at 10 fps has spoiled you! No matter what we're talking about (pixel movement, AI, pathing, attacks, special effects, etc.), you'll find that you can get away with a lot when the game runs at 10 fps. When you want the game to run at a decent framerate, performance becomes more and more of an issue.

You were the one who told me not to expect good performance at 30fps and 20+ players with this system.

2. My GPU doesn't calculate the games physics, guidelines, control input, math, positioning, etc. It only processes and displays the results of what I see in the game. Much in the same way the server handles the cacluations and dreamseeker displays the results while also processing icon procs.

Yes, the brunt is being handled by the GPU, but the in-game physics are much more advanced than pixel_movement with 20+ players, bit it still runs smoothly, and both%2
Bravo1 wrote:
Forgive me if I expect a 3.6ghz computer to be able to handle 20+ mobs at 30 fps.

I can play Crysis 2 on this comp, but 20+ players at 30fps with pixel movement? I must be insane.

You're comparing apples and oranges. Crysis 2 is pretty CPU-intensive but more than that it's GPU-intensive. Most of the work is not being done by your main processor. Even to the extent it is, Crysis is surely multithreaded and designed to take advantage of multiple cores. When you get right down to it, all the CPU has to do is handle AI, collisions, and so on, and possibly a physics engine.

BYOND servers run under a completely different architecture. The engine is single-threaded and this is for many reasons--some of it simplifies issues that would hugely complicate the engine, and other parts simplify the DM language.

You're falling into the "man on the moon" fallacy, which is that people like to say "If we can put a man on the moon, why can't we ____?" The answer is, simply, putting a man on the moon was a straightforward engineering problem with straightforward solutions, and the only thing holding it back was the need to throw gobs and gobs of money at it. Other major challenges are often never so simple.

On the other hand, if gobs and gobs of money were thrown at us we'd probably take a stab at multithreading.

Forum_account wrote:
Lummox JR wrote:
Multi-core processing won't be feasible except for internal operations that can run in parallel. That basically means stuff like icon operations, which we now offload to the client as much as possible. To parallelize that kind of thing though we'd still need to basically have a setup where the processing could split up but the main thread waited until all the workers were ready. Anything like allocations, deallocations, and interpreting code still has to be done single-threaded.

There's probably a better place to discuss this, but I'm curious to hear why this is the case. Interpreting code would seem like an ideal candidate for parallel processing - BYOND already has the idea of separate threads of execution, they're just not processed that way. If two players are moving in separate parts of the world, their calls to Move (and whatever other procs are called) are reading and modifying completely independent sets of variables. You can do them in parallel with no consequence.

Here's the rub: How do you tell the two procs can run without disturbing each other?

I've given this a lot of thought and I just don't see a feasible way around the single-threading. Quite aside from the fact that it'd require some significant changes in the engine to handle a multithreaded model at all, the problem of objects interfering with each other isn't soluble without author-controlled mutexes and semaphores.

As a simple example: Say I tried to automate this so I could just run procs willy-nilly and if one accessed a mob or something it would just lock it down. Proc A is working with mob X, and proc B is working with mob Y. Say X.bestfriend==Y. If proc A needs to make a change to X.bestfriend.group, it needs to lock down Y too. Obviously whichever proc got to Y first should get a lock on it, but this can result in deadlock. The only way to avoid deadlock is if proc A, which ran just barely first, got the lock on X and Y together, which it could do by looking at all of X's vars. Obviously though it would also have to look at all of Y's vars too, and so on. This could be limited only to the vars used in the proc, conceivably, but this automated mutex could get ugly really fast. Calls to other procs have to be handled too.

But not everything can even be handled that way, because you also need to account for locate(), which can pick up objects by reference. You need to account for vars[], which would be mutex-proof. get_step() has the same issue, at least if it's used in a loop. (In fact pretty much any var access in a loop is problematic.) Any proc using things like that would need a full lock. Even then I'm not sure deadlock is 100% avoidable. Doing any kind of locking, even very simple locking, can cause unexpected problems.

The only real way to make this work would be to add some complications to the language, and to the code underlying it, to allow code to run in parallel at the user's discretion and let them setup their own locks. The engine however would need a lot of work under the hood to make this possible, because right now if an object is deleted it can just be deleted outright, but now it would have to be deleted while no other parallel procs were running. This impacts not just refcount deletions but also the del() proc, because the latter actually has to search for references and null them out; in its place we would need some means of checking if a reference was still valid pending proper deletion. Also under the hood we would need to treat any object allocations and deallocations as critical sections.
Bravo1 wrote:
You were the one who told me not to expect good performance at 30fps and 20+ players with this system.

I didn't say it like that. The example in comment #156 was about achieving 30% CPU usage with 80 moving mobs at 40 fps. These are arbitrary values, they're not intended to represent the threshold between good and bad performance. Failing to meet this performance goal just means that CPU usage will exceed 30%.

No matter what (whether pixel movement is built-in or user-made) there will be a fairly low limit on the number of moving objects you can support, at least compared to the expectations commercial games might set for you (remember, we haven't been discussing network performance issues). I'm not sure what you're trying to achieve by pointing out that BYOND is slow.

Also, I expect that Crysis supports GPU-based physics.
Lummox JR wrote:
Here's the rub: How do you tell the two procs can run without disturbing each other?
I've given this a lot of thought and I just don't see a feasible way around the single-threading. Quite aside from the fact that it'd require some significant changes in the engine to handle a multithreaded model at all, the problem of objects interfering with each other isn't soluble without author-controlled mutexes and semaphores.

I'm not sure how well this would work, but here's my guess:

You split tasks up and run them in parallel (each event from a client, "player one called the North command", would be a separate thread). Each separate thread is reading and modifying vars, but the changes aren't permanent - you have them written to scratch space. When the processing is done, you look to see which threads have accessed the same vars. If one thread wrote to a var and another thread read it, you might have a problem (because if the events occurred in a different order you may have gotten a different result).

When you detect a problem like this you simply disregard the work done by the second thread. This is why changes were written to scratch space. If a thread reads vars that nobody else wrote to, there's no conflict and the changes are committed.

The worst case is that you split up the work, do it in parallel, then discover that everything conflicts and you have to do things sequentially. You could have a master thread whose work wouldn't be thrown away, so you wouldn't have to do everything over again. Because the work is distributed you'll find this out more quickly. Any time that was wasted was wasted by CPU cores that would have otherwise been idle.

The idea is that sometimes the overhead will outweigh the benefit, but more often that won't be the case. If you have a bunch of enemies running simple AI loops and moving around, unless two enemies try to move to the same tile there won't be any conflicts (separate threads can both read the same var, you only get a problem when there's some writing going on) and the work can all be done in parallel.

It wouldn't envy the person who tries to implement this, but it does seem like there's some potential to get something that is effective and transparent to the developer.
The scratch idea is interesting, but I don't really see how it could be done. Everything would have to be modified into a more virtualized form where all reading and writing took place at a higher level. This is actually a much bigger change than the one path I consider remotely viable: making major changes to allocations and other such internals, and adding language features to let users control threading. And by remotely I do mean quite remote; there's a lot that gets done via globals and being able to separate that out into threaded behavior would be a feat in and of itself.
Lummox JR wrote:
The scratch idea is interesting, but I don't really see how it could be done. Everything would have to be modified into a more virtualized form where all reading and writing took place at a higher level.

When the interpreter needs to read a variable, it checks the scratch space first. If it's there, it takes that value. If it's not there, it loads it from the global set of values. When the interpreter writes to a variable it writes to its scratch space.

This'll take up more memory because you have to store all of these scratch spaces before the changes are all committed. You'd also need to make the necessary adjustments just to get things working this way. It shouldn't be that monumental of a change, but I wouldn't be surprised if the .dmb interpreter is rarely touched and is quite a mess.

making major changes to allocations and other such internals, and adding language features to let users control threading.

This just lets DM developers create a lot of situations that aren't fun to debug. An internal solution could be transparent to the DM developer and robustly execute independent code in parallel so there's no chance for race conditions.

My biggest concern is that it'd be hard to tell what code segments are independent, or that you'd end up having very few independent segments.

To check for dependencies between segments you need to look at what vars each thread read from and wrote to. This is fairly straightforward, but with a lot of vars it might prove to be time-consuming.

The way I described it, if you have two threads of execution (called A and B), and A writes to a global var and B reads that same global var, the outcome of B may depend on the order. The value of the global var at the time that B reads it depends on whether or not A happened first. If you assume all of these situations are dependent code segments and must be run sequentially, this code would allow for no movement to be processed in parallel:

var
global_var = 0

mob
Move()
. = ..()
global_var += 1


Because every thread that runs Move() will be reading and writing the same global variable.
You could also think of processing as a production line.

Let's say there's two workers on the production line. Two items come into the line and one of the workers grabs the item and begins to process it, the second worker does the same to the second item. Let's say the second worker finishes his work first.

The issue is that the second worker might place his work ahead of the first worker even though the first workers result should go first.

You can then introduce a sequencer to make sure that #2 places his work behind #1's by putting it into a standby area. Worker 1 will pass of his work when he finishes then he will pass over the work in the standby area and prepare for the next batch of items.

You'd basically take processes in parallel and sort it into a queue in ram for the main thread to read while the side threads do all of the actual work. This converts fine into single-threading because it just means that the main thread processes all the work itself.

It's akin to pre-processing icons and saving them in a file so that the cpu doesn't have to handle it on the fly. The code will be processed by the second core and loaded into memory, then when the first core attempts to handle the process, it checks the ram and returns the loaded result, afterward it will wipe the loaded process from memory, which in most cases is much faster than handling the process itself.

You could also do a check for multiprocessors by running a process which would break if running in parallel. If it returns normally then it switches to a single threaded mode, if it breaks, a multiprocessing mode.
Bravo1 wrote:
The issue is that the second worker might place his work ahead of the first worker even though the first workers result should go first.

Not only that, but the work done by one worker might depend on the work being done by another worker. It actually changes the work that's done.

var
global_var = 1

// thread A
global_var = 0

// thread B
if(global_var)
world << "test"


You can't pre-process thread B and sort the results out later because the result of B depends on whether or not A was executed first.
How about separating these comments into their respective request threads? I've followed four or five different conversations in these comments.
Just out of curiosity, to the people that matter (Lummox and Tom), is there actually any chance that this will ever be implemented regardless of any supposed benefits? *

* I MADE THIS MULTICOLORED, BOLD, AND ITALICIZED JUST TO MAKE SURE YOU'D SEE IT.
We'll investigate pixel movement in tandem with more native big-icon handling, since the issues overlap (eg, collisions and location handling).

The question for me is whether we want to keep everything backwards-compatible, maintaining both tile & pixel systems, or whether it'd be better to just spin this off into a separate system. I'm leaning towards the latter because it's a lot easier (and we're already taking this approach with the Flash project by making those games a limited, specialized subset of BYOND). Really, the roguelike verb/tile model is pretty obsolete for the types of games people are making these days.

As far as a system to generally parallelize BYOND code.. no, that ain't gonna happen. We can and should offload certain existing operations (such as file transfer) to other threads, though. That's a long overdue request.
See: Software Transactional Memory.

I personally plan on doing much of the heavyweight processing of my own project (so far including map generation and probably eventually including some pathfinding stuff) in a separate library. Map generation is already multithreaded and pathfinding can also be threaded out in a straightforward manner.
Tom wrote:
As far as a system to generally parallelize BYOND code.. no, that ain't gonna happen.

Wuss! Yes it took me a month to come up with that :-)

I figured this was the most relevant place to ask this question: If you host a BYOND game with Dream Daemon and connect with Dream Seeker on the same computer, they're separate processes and they'll make use of multi-core CPUs. If you run the game through Dream Seeker and host, is there just one process (dreamseeker.exe) that is both the host and your client (that may not make use of multi-core CPUs)?

I ask because games with higher framerates are more demanding on the CPU for both the client and server (especially if the client isn't using hardware rendering). If everything is single-threaded and there's one process acting as the server and the host's local client, if it doesn't make use of multiple cores the higher client CPU usage (due to the increased framerate) will slow down the server.
Yes, the dual DS/DD is single-threaded and as such less efficient on a multi-core machine. In practice, I'm not sure how much difference it makes for a single-player game (you'd have to test to see), but I definitely wouldn't recommend hosting multi-player games from DS because the client usage could interfere with the server.

That said, multithreading this special case of the client-server should be do-able, since the code is essentially two separate pieces already.
Page: 1 2 3 ... 7 8 9 10