ID:2403258
 
Last weekend I was reluctant to release 512.1449 right away because of the sweeping changes under the hood, so it was going through some private testing. That turned out to be a good thing because a few items cropped up right away that I was able to fix before release.

However as I think anyone could have predicted, there were still a few issues to deal with. Most of the problems that testing caught were server-side, and it turned out that the client and Linux compiler had a couple of problems that jumped out in release mode. This led to a few more releases, culminating in the Linux-only 512.1451 build (Windows didn't need to update past 1450) which is where we're at now. I did a little more work in this refactoring area for the next release, updating the server-side color matrices like I mentioned and making some changes to color matrix handling in the MapIcon struct.

With things holding steady for the moment, I decided to start in on the compiler profiling I'd been talking about for a while. I promised this to the SS13 devs some time ago and hadn't gotten around to it yet because 1) there were too many other things on my plate at the time, and 2) profiling sucks. Profiling tends to produce inconsistent reports based on the instrumentation and how the system happens to be feeling at any given time, and in newer compilers it also produces massive files that take up a lot of space and take forever to analyze--which among other things, meant I couldn't simply profile an entire compilation process from start to finish, but had to start and stop the profiling to catch specific time slices of interest.

Well, it turned out that profiling did highlight an optimization target, and it was a big one. It uncovered that a lot of time was being spent in an O(N) loop in a function used to convert info about a var (its name, some definition info, etc.) into a variable ID. In other words, this was a lookup, and it was trying to find out if the info already had an ID or needed a new one by going through an entire list one at a time. In small projects these O(N) loops are no big deal, but in something the size of SS13 it gets bad. Remember, this loop is O(N) but as N rises it means you're dealing with O(N) references to vars in the code; which means the compiler is effectively seeing O(N2) performance. My solution to that loop was to replace it with the CBag template class that we use for appearances and many other structures, which allows a binary search on a self-balancing tree.

After dealing with that I tried some more tests and broke into compilation with the debugger here and there. This highlighted two more O(N) loops: one for adding ID arrays which are a very big deal, and one for unique var paths (an internal compiler concept). These, and the var IDs above, each numbered in the tens of thousands, and I believe the number of ID arrays ended up somewhere around 50K. So I replaced these two loops with CBag handlers as well.

The results I got for those efforts were little short of miraculous. I didn't get a baseline time for compiling SS13 before, but it took something like 3 or 4 minutes at least on my machine. The new compilation (in release mode) takes 33 seconds; sometimes 32. I was saying initially that I got about a 300%+ improvement just by eyeballing it, but based on my recollection of how long compilation used to take, it's probably more accurate to say I got about 500% or more. Obviously the improvement will be smaller for smaller projects, but it's still a huge deal.

Also I added some additional output to the compiler: total elapsed time in seconds, and also notification of when it includes files. I tried to add something fancy to estimate % of completion, but... nope, that did not work. Anyway the good stuff will be coming in 512.1452, but ain't no way am I taking chances with the Friday curse.

If you haven't gotten to work yet on your killer Halloween game, what are you waiting for? This is pumpkin season! With the filters in 512 you can have some fun with ghostly effects, like maybe a nice eerie mist or a blurry figure in the corner.

Thanks as always to our Members, donors, and Patrons for their support!
The new compilation (in release mode) takes 33 seconds; sometimes 32. I was saying initially that I got about a 300%+ improvement just by eyeballing it, but based on my recollection of how long compilation used to take, it's probably more accurate to say I got about 500% or more.



That's a huge performance boost. The next version will definitely be better for large projects. Great work.

Amazing.
Lummox JR wrote:
My solution to that loop was to replace it with the CBag template class that we use for appearances and many other structures, which allows a binary search on a self-balancing tree.

Out of curiosity, was there a consideration for a hash map instead? O(n log n) is better than O(n2), but with a hash map you'd have O(1) amortized lookup complexity -> O(n) complexity overall.

I'm not arguing that you took the wrong approach, but I am interested if you had a specific rationale that led to one vs the other.

Either way, sounds like a major improvement for large projects; going from O(n2) to O(n log n) scales substantially better.
I had a few reasons for preferring the CBag approach.

First: CBag is more "mature"; the Bag class it's based on has been around in the code for a long time. DMHash is a very new class and it lacks some of the bells and whistles I ultimately want to give it. For instance DMHash has nothing built in to allow for load rebalancing, which is somewhat important if you're going to be hashing as many items as are used here.

Running a hash on the entire structure also seemed like a big time investment compared to log2(N) comparisons for some of these structures, since the var definition struct in particular carries most of its entropy in the name field which makes for fast comparisons.
Hmm, I think the hashing function would have to be really slow to outweigh the time complexity in large projects. (In smaller projects, the difference would be negligible anyway.) I can’t speak for what the var definition structure looks like, but I imagine the whole thing needn’t be hashed to determine its key value; perhaps the hash could be simply the hash value of “name” string and some scope ID or pointer?

Out of curiosity, are we using C++11?
It’s tough to say without seeing benchmarks advocating for them1, but I see little value personally in CBag or DMHash (besides CBag dispersal in legacy code) when there’s:

std::map<TKey, TValue> - implemented as an ordered, self-balancing tree. I would expect the implementation to be as robust and performant (if not moreso) as CBag

std::unordered_map<TKey, TValue> - implemented as a hash table. I would expect the implementation to be as robust and performant (if not moreso) as DMHash.

I’m aware of arguments relating to portability and implementation details of the STL, but to my understanding that argument has been moot for many years now. I would be surprised (and impressed!) if the custom implementations were faster than the standard containers.

1That would be a fun Patreon writeup to read that I’d be willing to patron for—benchmarking operations using BYOND’s various custom containers vs the same operations replaced with STL containers.

Edit: It’s funny, after writing this, I clicked into the patreon page for the first time, and immediately found a non-Patron-only post discussing the disuse of the STL.

That being said, the post alludes to you wanting to know “what goes on under the hood,” for performance reasons. I still maintain my belief that the STL implementation is probably faster. Care to give it a spin? I’ll commit to a five-year BYOND membership and subscribe at the “Guru” patron level for a writeup with some code (ie custom container implementation) and benchmarks. ;)
Sounds like a big improvement! Glad we have SS13 so that performance updates like these can happen.
In response to Hiead
I've mentioned on earlier posts and on Patreon that we don't use STL; there are reasons for that going way back, pkus I want absolute fine control over the code so I know what I'm dealing with.

C++11 is not in play. While the Windows compiler supports a lot of it, we're a few versions back on gcc and frankly it isn't worth the headache to upgrade and figure out why umpteen things stop working right.

As for giving STL a spin, that's an absolute no-go. Even if I commit to refactoring to use it at some point, that's a huge change under the hood. It's definitely not something I could just swap in and out to benchmark.
One thing is that classes are designed to add a lot of modularity and flexibility. Since CBag is a class and not like its C counterparts, extending or even replacing it (for the purpose of testing STL for example) I think could be just that and a tiny refactor on top at most.

I don't like STL, but your reasons why you definitely can't use it are intriguing.

I sure hope you don't use CBag by accessing its members with raw pointers outside of the class's own implementation.
In response to VampyCrampy
VampyCrampy wrote:
One thing is that classes are designed to add a lot of modularity and flexibility. Since CBag is a class and not like its C counterparts, extending or even replacing it (for the purpose of testing STL for example) I think could be just that and a tiny refactor on top at most.

CBag is used in enough places that it'd be a huge pain to change it, not to mention some types actually rely on it to assign IDs. But the places that would be most amenable to replacement with STL would also benefit the least from it, IMO.

The CBag template class is actually a template-based rewrite of the older Bag and BagOf<T> classes (BagOf was just a template wrapper around Bag), but the main reason I put it in was because Bag uses virtual functions pretty heavily and I saw no reason to accept the performance hit. There's absolutely nowhere that Bag is used that we rely on polymorphism.
In response to Lummox JR
Lummox JR wrote:
CBag is used in enough places that it'd be a huge pain to change it, not to mention some types actually rely on it to assign IDs. But the places that would be most amenable to replacement with STL would also benefit the least from it, IMO.

I don't think STL would be an improvement if any unless you're doing something very wrong. But fortunately for us, your CBag is a template class, so its definition must be very close together and easy to share with us!

The CBag template class is actually a template-based rewrite of the older Bag and BagOf<T> classes (BagOf was just a template wrapper around Bag), but the main reason I put it in was because Bag uses virtual functions pretty heavily and I saw no reason to accept the performance hit. There's absolutely nowhere that Bag is used that we rely on polymorphism.

Whomever designed BagOf<T> may of had a good idea on how compilers conduct devirtualization. If they did and done so properly, your CBag template rewrite could have been no improvement.
In response to VampyCrampy
BagOf<T> had a member that was just a regular Bag class, and the items in the bag (type T) always derived from BagItem which had the virtual functions, so it wasn't the best arrangement. Considering when the class was first written I doubt compiler optimization was very good then and I doubt it was changing the virtual functions to hard calls even with later compilers.
In response to Lummox JR
So now Bag and BagOf don't use virtual calls, BagItem does, so CBag must implement a CBagItem of sorts, and for some reason you can't plug-in class definitions and replace those for the purpose of a superior implementation. Okay then.

Well best of luck to you on that, and also getting newer versions of GCC to work without giving yourself the headache.

All the community can do is speculate around what the source code is like. Your cants and wonts violate my entire fundamental understanding of programming, and without more disclosure, many of us will never understand you.