ID:1907136
 
Applies to:DM Language
Status: Open

Issue hasn't been assigned a status value.
see: title.


Or, failing that, a compile option to disable it, for games where a O(n log n) operation for even the most simple of string modifications is inappropriate.
Completely impossible. The string tree is a necessity built into the language, because strings are immutable and shared.

Adding something to rebalance the tree as needed: much more plausible. It should be fairly good about catching extreme cases of imbalance though.

The bigger question here is: what string ops are you doing, and how? There may be ways to drastically simplify. I've also wanted, from time to time, to look into a string datum much like /icon that could avoid committing to the string tree until it was necessary.
I've also wanted, from time to time, to look into a string datum much like /icon that could avoid committing to the string tree until it was necessary.

Wow, I was just talking about something like that.

[08:59:56] MrStonedOne I KNOW HOW WE CAN SPEED UP STRINGS! Move all strings to lists of ascii numbers
[09:01:08] MrStonedOne just make a string datum
[09:01:30] MrStonedOne and implement everything in the cstrings library in byond to work with it.

I got stuck on how i could reassemble it without each += in the loop being O(n log n) with respects to the string tree, for outputting.

Easiest is if you can use inline expressions, which compile to text(), because that basically does everything internally before the string tree lookup.
What do you mean?
If you need to style text for instance: "[style][text]" or something similar.
string = "[string][new_value]"


As opposed to

string += new_value


(I think, I could be totally wrong)
Can confirm, inline string building has been testing incredibly fast. My JSON library that I've been working on has been unrolling loops of string addition in favor of a single operation per child type.
Still, if i had a var/list/nums that contained ascii numbers from text2ascii(), and i wanted to convert that back, it would require a call to ascii2text(), that would trigger a string tree lookup for the single character string, then it would have to concatenate with the current string, that would trigger a string tree lookup, then it would move on to the next character.

Turning that big O to O(m(n log n)2) where m is the length of the string, and n is the size of the string tree.

The most I could do is cut that down by doing it a chunk at a time.

It's times like this that I'm reminded that user friendliness ALWAYS comes with a cost.

because strings are immutable and shared.

I never understood the push in programming to have strings be immutable. Modifications to a string within a proc should absolutely be private and mutable. At that point you treat it like it really is, internally, an array of 1byte ascii numbers. and dynamically handle expansions (or rather, let the c++ string class's system (or cstring's system) for doing that handle it)

You can handle the transition for proc args (and compile time strings) by just copying it to the proc's internal space if and when it tries to change the string.
The memory requirements for handling strings any other way would be nasty.
The CPU requirements of handling strings this way are also bloody insane. The DM string type could be implemented as a simple copy-on-write refcounted string (with modify-in-place fast paths for when the refcount is 1).

The requirement on single-core performance is what makes server operators weep, not the memory requirements (unless you're running lots of small servers).

E: Although, this may not help current games terribly much, considering string operations are currently avoided at all costs.
Generally, ya, What tobba said.

CPU performance is generally much much much more of a concern then memory performance.

You aren't gonna be able to justify avoiding excess memory usage until threading in the language becomes a thing. Because right now, if you were to run a poll, I can guarantee the majority of developers are going to say that cpu becomes a bottleneck LONG before memory does.

To make my point: (this vm has access to 2 cores, so 50% = maxed out 1 core/thread)


So 86% of single thread's max usage, vs 25% of the max amount of memory a 32bit process can use.

Another note, the massively majority of same strings are in duplicates of object with shared data (like name, or ss13's desc variable)

Something tells me the only benefit you really get in memory from the string tree is in initial(). Something also tells that could be easy to specialize away in any new refcounted copy on write system.
BYOND is also 32-bit, so memory can't simply be ignored, either. Strings can grow monstrous with little effort. However in principle, at least in a limited way, I agree with you. It's likely that in most cases, strings don't need to be "normalized" by finding and giving them a canonical ID. I can however think of exceptions: the vars list and call(), for example, could not function without normalized strings. (Also the internal routine for text2path.) The icon_state var and some other appearance vars might need normalization as well to avoid appearances from blowing up.

This concept intrigues me enough to look into it further, though. If a non-canonical string existed as a concept, a lot of things might be simplified. (Mind you, inline expressions will still always perform better, as would a string datum, because more can be optimized under the hood.) Likely I would have to replace a great many calls with a new call for this to work right.
Instead of normalization you could search by the hash of the string in most of these cases (and then strcmp() or a simple pointer comparison if a suitable candidate is found).
Also just in case you're not aware: every hashing implementation BYOND currently has is horrifically broken. You'd want to implement something like xxHash, but unlike what Dan did, actually verify the results.
In response to Tobba
Tobba wrote:
Also just in case you're not aware: every hashing implementation BYOND currently has is horrifically broken. You'd want to implement something like xxHash, but unlike what Dan did, actually verify the results.

I can confirm this; The garbled junk that the built-in function uses is not even capable of being utilized for a login system. At least, not with salted hash's! I eventually gave up after trying to implement my own account creation/encrypted login system. Didn't want to bother with storing peoples passwords and just can't accept doing it plain text, haha (no way I can figure out AES for byond, just not my kind of capability).
In response to AERProductions
AERProductions wrote:
Tobba wrote:
Also just in case you're not aware: every hashing implementation BYOND currently has is horrifically broken. You'd want to implement something like xxHash, but unlike what Dan did, actually verify the results.

I can confirm this; The garbled junk that the built-in function uses is not even capable of being utilized for a login system. At least, not with salted hash's! I eventually gave up after trying to implement my own account creation/encrypted login system. Didn't want to bother with storing peoples passwords and just can't accept doing it plain text, haha (no way I can figure out AES for byond, just not my kind of capability).

I'm more referring to internally. For example, the CRC32 function BYOND uses is incorrect (one shift is the wrong way around, the table is incorrect and is truncated to 16 bits for whatever reason).
In response to Tobba
Tobba wrote:
I'm more referring to internally. For example, the CRC32 function BYOND uses is incorrect (one shift is the wrong way around, the table is incorrect and is truncated to 16 bits for whatever reason).

I was under the impression that BYOND's CRC32 just used a different polynomial (0xAF, rather than the relatively standard 0x04C11DB7)-- it's that way with savefiles and RSCs, at least.
In response to Audeuro
Audeuro wrote:
I was under the impression that BYOND's CRC32 just used a different polynomial (0xAF, rather than the relatively standard 0x04C11DB7)-- it's that way with savefiles and RSCs, at least.


It's possible it's CRC32 with an exceptionally bad choice of polynomial, but it looks broken in am extremely BYOND way (everything being truncated into shorts).
Well I still have that MD5 hashing login system (uses Mysql, so you will need a Mysql server to test with) if anyone wanted to check it out. I'd have to scrounge it up and package it but perhaps it would offer some insight?
As for thread topic, it is really interesting and I want to see Lummox's results! :)
Page: 1 2