ID:2311276
 
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
So the value struct is 5 bytes, but because of memory alignment concerns, the compiler will expand it to 8 bytes.

Since this struct is used for everything, This basically means there is a 40% increase in memory, or in other terms basically 40% of the memory used by DD is air.

Now! There is a way to disable this behavior in the compiler. I mentioned this to lummox over pm to ask if he had tried it, he said he hadn't.

#pragma pack(push, 1) // exact fit - no padding
struct value
{
char typeid;
union value {...};
};
#pragma pack(pop) //return to old pack value


This has the chance to bring a 40% reduction in memory usage! but at a cost. I haven't been able to get any actual benchmarks or data out of the theorycrafters on google, but theory is that it would cost more to access the data as they would more likely be out of alignment with the WORD size used to access data.

So, what I am hoping for, is for lummox to push out a version of the 512 beta that has pack 1 enabled, so the peoples can run benchmarks on it to see if the overhead/memory tradeoff is worth it or not.

Because its literally a two line change*, the hope is that this doesn't represent a high time investment to test what might be a bad change.

(And that's not to mention the benchmarks that I saw that suggested packing might increase performance because of the cpu cache)

* the claim "two line change" does not take in to account any backwards compatibility code needed if any savefiles or network data is serialized directly in values without sanitation
+1

We're sort of desperate for memory at the moment in our project, and there's no harm in trying this.
:+1:, this will be super useful.
So I thought I'd run some benchmarks.

https://gist.github.com/MrStonedOne/ cb16172ec8ebce4764e351bfd5d44a4c

basically we create 500,000 random "dm_list" structs, allocated dynamically, each holding a random amount of "value"s, somewhere between 0 and 500.

The lists are not in contiguous memory, but each list's values are.

The counter var is just to ensure the compiler doesn't try to get clever and optimize away any of the accesses.

c:\Users\kyle\Documents\GitHub\packtest>cl.exe /EHsc /Ox /Qpar main.cpp
...
c:\Users\kyle\Documents\GitHub\packtest>Main.exe
Hello world
value size: 8
dmlist size: 8

Generating array
It took 7.586 seconds to generate 500000 random lists

Accessing array
It took 48.89 seconds to access 5000000 random lists


c:\Users\kyle\Documents\GitHub\packtest>cl.exe /EHsc /Ox /Qpar /Dpackitbaby main.cpp
...
c:\Users\kyle\Documents\GitHub\packtest>Main.exe
Hello world
value size: 5
dmlist size: 8
Using packed format

Generating array
It took 7.92 seconds to generate 500000 random lists

Accessing array
It took 51.196 seconds to access 5000000 random lists


So that seems to be a minor, but likely insignificant amount of overhead in a synthetic benchmark. (but then again, I have no idea how much of that overhead is noise from rand() overhead)

The memory usage between those two tests was 1.02 and 0.657 gb, respectively. Some of that could be from the randomness of the list sizes, but I saw mostly the same results when I was testing with smaller number of lists while developing the benchmark.
IMO the speed overhead, and the potential uncertainty this could introduce in other parts of the code, both rule this out. Aligned structures are well-behaved structures.

And the memory savings likely wouldn't be even as significant as the ones in this demo. Value isn't the only structure used on the server by any stretch; it mainly appears in certain places like lists, which are a minority of the memory use in any program.
Lists are by far the largest use of memory on /tg/

I know this because every time we hit the edge of memory the list of lists goes away and all list accesses fail, and because lists tends to be the largest entry by far in memory stats.

also Value is (i would assume) used to store all numerical values, like all the floats we store for things like lighting datums atmos, armor values, etc.

But if you know of other strucs that have a 40% or so memory overhead from compiler offsetting feel free to pack those as well with #pragma pack.
Because alignment matters a lot, I don't think packing Value is doable. An alignment issue was behind a problem you reported at one point with certain map objects not updating properly (in response to pixel offset changes, IIRC).

If lists are the biggest memory hogs in /tg, though, that's interesting in itself because even at 8 bytes instead of 5, Value is a very small struct. Have you investigated what objects are using the most and biggest lists, and whether those lists would be replaceable with other things (or whether they could be let go of after some initial use)?
I think it'd be unadvisable to pack the value struct for performance reasons. But x64 builds might be worth looking into to allow for more memory playroom.
I think it'd be unadvisable to pack the value struct for performance reasons.

Regardless of everything else, I can't just let this argument stand unopposed.

I've seen a lot of theorycrafting on this subject across google, but I haven't seen a single benchmark that proves packing from 8 -> 5 actually leads to a noticeable performance lost.

The only benchmarks i've seen (or done myself) either show a improvement in performance from packing, or a very minor lost in performance. So we just can't know one way or another.

What we also don't know, is how much Value access even contributes to the overhead of the var or list accesses that uses Value. (or other Value using things) As we saw with the datumvar access change. moving from O(n) to O(logn) made for very little real world change because the actual access/lookup contributes very little to the overall overhead in datum var accesses.

This isn't likely to live or die on performance, i knew it would live or die on rather or not byond serializes Values in a way where misalignment can happen if a packing server talks to an non-packing client or vise-versa
If Values only existed in lists, I think you'd have a better case for packing. But they're used all over the place.

It's my understanding that the architecture really frowns on 4-byte values being accessed out of their proper alignment. I strongly suspect that any attempt to pack the struct would result in a lot of really weird, difficult-to-find bugs.