String Operator Effeciency

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

Page: 1 2

Feb 7 2011, 5:45 am

Forum_account

This never hurts speed, it only helps speed. This has a slight, but completely acceptable, impact on memory.

This is not true. It doesn't only help speed, there's the chance it doesn't affect speed at all.

In my previous comment I mentioned that BYOND often handles concatenations efficiently. You are talking about the average case, but the average case is not just (best case + worst case) / 2. You have to consider what cases are possible and what the likelihood is of each one occurring.

Because most cases are handled efficiently there will often be no benefit. There will always be increased memory usage. Even without a workaround you'd have a hard time justifying this change.

Feb 7 2011, 7:24 am

Loduwijk

Forum_account wrote:

It doesn't only help speed, there's the chance it doesn't affect speed at all.

You are missing the point again. The point is that it does not hurt speed. There you are just nitpicking about grammar.

You are talking about the average case, but the average case is not just (best case + worst case) / 2.

I never said it was. In fact, I specifically said the average case memory usage for this improvement would be greater than 1.5 times the current memory usage for text strings.

You have to consider what cases are possible and what the likelihood is of each one occurring.

Obviously, based on all of my real arguments, I have done exactly this. I think you need to consider what cases are possible and what the likelihood is of each one occurring. It is in favor of my suggestion.

Because most cases are handled efficiently there will often be no benefit. There will always be increased memory usage. Even without a workaround you'd have a hard time justifying this change.

No, actually doing it exactly as I showed and not optimizing any past that would not require any hard time justifying the change. What I have said already justifies it. On the other hand, you would need to try hard to justify leaving things the way they are.

And, as I said, I was merely giving a simple example, and there are further optimizations you can do on top of that (some of them very trivial) to make the outcome even better. Left alone, it is already better than the current system. Optimized, it could be strictly better than the current system.

There is not really any argument here. Any competent developer can see the superiority, and it is not a question of if it should be done, rather it is a question of when. The only reason to not do it at all is if you assign a low enough priority to the change that it is on the list but just never gets done. THAT would be a good argument for not doing it, the fact that there might be more important things to do instead.

But, other Byond updates aside and time-management aside, if it is just a question as to whether it is a good idea to optimize or not, there really is no question as to whether it is a good idea.

Feb 7 2011, 7:35 am

Forum_account

I am not nitpicking, you are ignoring the cases where your "optimization" would increase memory usage and offer no performance gain. For example:

var/a = "aaaaaa" + "aaaaaa"

You'd allocate extra memory in case you append to a again but you're not performing this concatenation any faster. If you don't append to a again (which the compiler can't reliably determine) you're using more memory but the performance is the same.

Feb 7 2011, 8:03 am

Loduwijk

Forum_account wrote:

you are ignoring the cases...

I am ignoring nothing. I specifically address these things. My comparisons take into account all possible cases. I specifically address everything including this.

Either you are not reading what I am writing, you are not understanding it, or you are trolling. If the former, please either read it or stop arguing. If the second, you might not even realize the shortfall, but regardless I don't have the time to spell it out much clearer. If the latter, don't.

You'd allocate extra memory in case you append to a again

BINGO!

but you're not performing this concatenation any faster.

You are in some situations, you aren't in others.

If you don't append to a again (which the compiler can't reliably determine)

Actually, it can determine it. I have written code analysis for other languages which do exactly this. Byond is nothing special where this is concerned.

you're using more memory but the performance is the same.

Wrong answer, and I just started to type out why, then I realized that I'm basically saying the same thing I've said before, so I highlighted it and hit DELETE. I'm not saying it again.

I think part of your problem might stem from having a programming paradigm that includes "ON NO it uses a little bit more memory. We have to conserve as much as we can." That outlook was valid a couple of decades ago, but if you have a computer from this decade the real outlook that you SHOULD have is "OH NO there's a potential for crappy speed there. We have to conserve as much time as we can."

Feb 7 2011, 9:39 am

Forum_account

Actually, it can determine it. I have written code analysis for other languages which do exactly this. Byond is nothing special where this is concerned.

You can determine if the program might append to a again, but at the time of performing the concatenation you cannot always be sure. For example:

var/a = "aaaa" + "aaaaa"
if(prob(50)) a += "aa"

If you could determine with absolute certainty that another concatenation will occur then you could solve the halting problem.

Feb 7 2011, 12:20 pm

Loduwijk

I don't understand why you keep arguing about this since, in the beginning, I was originally supporting your comments. You seem to have quickly changed your tune a long time ago and have been arguing against your initial stance that the current system is slow. But regardless...

Forum_account wrote:

You can determine if the program might append to a again, but at the time of performing the concatenation you cannot always be sure. For example: (example here) If you could determine with absolute certainty...

There is no need to be always sure or have absolute certainty. If you want to start going down this path, I will go there. It is not the same argument as we were having before, but it is still directly on topic.

If you want to do an analysis on this at compile-time, for this specific instance the initial concatenation of two text literals could easily be optimized out by the compiler and it could see them as a single text literal. Then, once it gets to the if(prob(50))a+="aa", there are a few different ways you could handle that.

The analysis could attempt to determine at compile time which path seems more likely (with a default guess if it is impossible to tell), and it could reserve the extra space ahead of time if it thinks the a+="aa" path is more likely or just enough space if it thinks that path is not as likely.

HotSpot does something similar to this, and it assumes that, if the likelihood of one path vs another cannot be determined, then the programmer has made an effort to try and make the condition fall through more often than not. This is a good assumption in general since it cannot hurt anything and can only help (though often many programmers do not make such good calls).

And, if you were bothering to optimize at all in the first place in this fashion, there are a ton of other things you could do in combination with this. For this specific example, the last concatenation only increases size by 20%, so you could just always assume it will happen, and even if it does not then you've barely lost anything. If it went the other way, adding 9 a's to 2, such that it would have increased the size by 500%, you could assume that it will not happen, or you could add this decision to a list and defer it for later, then you could come back later with more information to make a better estimate. You could even make such decisions at runtime: is the game currently using a lot of resources and does this computer have a lot of resources to spare?

All of these are excellent optimization techniques and some of them are trivial. If you are so worried about memory consumption, then decrease the factor that it keeps around for extra buffering; I suggested a factor of 2, you could use a factor of 1.5 if that made you feel better. That combined with just a few of the techniques that I listed here (which are just the tip of the iceberg) could cause the average case for the improved system to be practically the same as the average case for the old (current) system, and the worst case would be 1.5 times more memory use for strings (or lower, if it pleases you, which would use less memory and still allow for a big speed boost; use 1.25 worst case if you want, it would still have sped up your 5000-string example by a lot).

Do all the computationally expensive optimizations that you can at compile time, then do a few at runtime. The performance hit for all of my suggestions above could be made to be insignificant.

Feb 7 2011, 12:33 pm

Forum_account

Several walls of text ago you claimed that your "optimization" was only beneficial and had no drawbacks. My last few comments have been trying to show that your method is likely to allocate extra memory and not see any benefits when no further concatenations are performed.

Initially I had believed a feature to improve performance when dealing with strings was necessary. I wasn't aware until Lummox mentioned it that BYOND does some optimizations already. By leveraging this, the feature isn't really necessary (though the list.Join() syntax would still be preferred). The fact that you can write as much as you have about this suggests that it's not worth the trouble to implement considering a simple and fairly elegant workaround exists.

Feb 8 2011, 10:46 am

Loduwijk

Forum_account wrote:

The fact that you can write as much as you have about this suggests that it's not worth the trouble...

If that is true, you just dissed at least a quarter of your content on Byond.
An example of exactly that

Of course, I don't believe that is true, since I actually agree with some of what you said in that example. But that's where your above-mentioned logic leads.

I want so badly to explain why the rest of what you have there is inaccurate, as I cannot stand leaving the last statements be so false, but it would be a repeat once again of what I have already said. So I will just say see this page, and it will be my rebuttal for all further arguments. And yes, that is not a mistake, that is a link to this very page itself.

And I am now done here. This feature had all the discussion it needed in the first 14 replies, and all of them since then have been mostly BS, so it's a waste of space. The only further discussion that could possibly help here is if LummoxJR wants to reply again and discuss the options available for this feature. Otherwise, it should have its status set to on-hold and be done with it until someday if/when there's not more important things to do.

Feb 8 2011, 10:55 am (Edited on Feb 8 2011, 11:11 am)

Forum_account

Loduwijk wrote:

Forum_account wrote:

The fact that you can write as much as you have about this suggests that it's not worth the trouble...

If that is true, you just dissed at least a quarter of your content on Byond.
An example of exactly that

I'm not sure what you mean there. Maybe I wasn't clear. To rephrase what you quoted of me: If it takes this much explaining to suggest that this change would be beneficial to BYOND, it's not as simple as you think and it probably isn't worth the staff's time to implement.

I want so badly to explain...

I hope that one day I will have provided you with enough practice that you can explain a single idea in less than 500 words =)

Edit: In the future, you might want to consider these three things:
1. There are three things here: your, your idea, and your explanation of that idea. Don't treat a criticism of your explanation as a criticism of you or the idea.
2. A discussion is a two-way exchange of ideas. If you're inclined to say "oh, huff. I quit!" then perhaps you entered this discussion for the wrong reason. Focus on exchanging information clearly, not on correcting people.
3. Brevity helps.

Feb 8 2011, 12:19 pm

Lummox JR

Loduwijk wrote:

Lummox JR wrote:

Allocating extra space with each string is doable, although it's wasteful in most cases.

No, it's not.

Yes, it really is. The only way extra space in a string is not wasteful is when you know you're doing further concatenation and you know you can give up that space when you're through. That's not remotely true of the way BYOND handles (or can handle) strings. It is true of the internal string builder class we use, but that's used in cases where the purpose is known.

I think you're working from a concept that BYOND strings are stored or manipulated in a way other than they currently are. I'll see if I can clarify.

BYOND does already have a string builder that allocates extra space like you say. It's used in all kinds of internal situations where we know we'll be concatenating.

String objects are stored internally as immutable strings, and are reference-counted, so you don't use 40 times as much memory if you have 40 cases of "abcd".

When a concatenation such as a += b is done in soft code, the action taken is identical to saying a = a + b, because BYOND still has to do its reference counting. The strings a and b are simply concatenated together because that's more efficient than using a string builder, and then the result is checked against the current table of immutable string objects to see if one exists that can have its reference count increased; if not, a new one is added. The old string a loses a reference in this case, and may be deleted.

The sequence in which references are juggled in soft code means that by necessity we are not using a string builder for these situations. It simply isn't possible, unless the operation creates a new object that internally references a builder. However, that internal reference would have to be changed back to a hard string on assignment, and it would have to be changed everywhere it was used. It isn't pretty; it's much less straightforward than the way /icon can handle a similar operation, because strings are assigned in many more places.

So basically, what I'm saying and what I think Forum_account is getting at, is that your reasoning doesn't apply to this kind of structure. The optimization you're arguing for basically already exists--the problem is that applying it to work with soft code is non-trivial.

Feb 9 2011, 2:27 am

Forum_account

So basically, what I'm saying and what I think Forum_account is getting at, is that your reasoning doesn't apply to this kind of structure. The optimization you're arguing for basically already exists--the problem is that applying it to work with soft code is non-trivial.

I think Loduwijk would say that you can apply the optimization in all cases. Sometimes you'll see an improvement in performance, sometimes you won't (and you'll just use extra memory for nothing) but the memory-for-speed trade is still worth it.

In case it was lost in the comments, I think the best option is to give the developer a way to use your internal string builder. A list.Join proc would use the string builder to concatenate all elements in the list. Developers can use it when performance is a concern and you aren't messing with how strings are handled in all other cases.

Aside from being more efficient, it also gives you an easy way to output lists. When you write "world << a" and a is a list, it's not very useful to see "/list". More often you'd want to see a's contents, so you can write: world << a.Join(",")

Feb 10 2011, 12:56 pm
Falacy	I like the idea of list.Join(Separator)

Feb 11 2011, 4:43 am

Lummox JR

A list.Join() proc actually would be a practical solution. It works well with BYOND's immutable string format. For users requiring more elaborate joining, they could write some simple workarounds. For instance I often use a proc like this:

proc/ListToText(list/L, conj="and")
  if(!L || !L.len) return ""
  if(L.len == 1) return "[L[1]]"
  if(L.len == 2) return "[L[1]] [conj] [L[2]]"
  . = ""
  for(var/i=1, i<L.len, ++i) . += "[L[i]], "
  . += "[conj] [L[L.len]]"

With a Join() proc, it could be rewritten as:

proc/ListToText(list/L, conj="and")
  if(!L || !L.len) return ""
  if(L.len == 1) return "[L[1]]"
  if(L.len == 2) return "[L[1]] [conj] [L[2]]"
  L = L.Copy()
  var/last = L[L.len--]
  . = "[L.Join()] [conj] [last]"

The only issue is that DM's automatic prefixing of "the" to proper nouns would go away, so maybe some special mode would be desired for that. But essentially that would be cleaner than trying to offer direct access to the string concatenator.

Oct 11 2015, 10:46 am

Clusterfack

I've been running into this same issue with string concatenation for some user interfaces using HTML which collate a lot of data. It's funny, I was going to suggest lummox's exact idea of a list concatenation, but then searched through the archives and found this thread.

Is it still feasible to do this, and would it improve performance at all if a list joining operation like this existed? Because I can think of a number of objects that it could immediately improve for us.

Oct 11 2015, 11:34 am In response to Clusterfack
Lummox JR	Yes, it should be feasible to create a list.Join() proc.

Oct 11 2015, 1:58 pm
Somepotato	Give the Join proc an optional argument to determine what to merge the list with.

Oct 13 2015, 4:44 pm

MrStonedOne

Here's the thing, everybody arguing over byond's string implementation.

Lummox JR

1) Memory must be allocated for the result string.
2) The two strings are copied to memory.
3) The result is added to the string table.
4) The old strings, if no longer used, may be removed from the string table.

Its not step 1 or 2 that causes the lag, its steps 3 and 4. Each new string has to be indexed on the string table then given a string id. This is how byond stores strings, so that each string is only stored once, and referenced in different places.

This means that for each string concatenation you do, byond has to search the string tree to see if that string already exists in memory.

There is any easy workaround, using bitmasks on the length to make this concatenation only trigger a few string tree lookups.

// **Code licensed under GPL, from /tg/station's helpers/type2type.dm**
// Concatenates a list of strings into a single string.  A seperator may optionally be provided.
/proc/list2text(list/ls, sep)
    if(ls.len <= 1) // Early-out code for empty or singleton lists.
        return ls.len ? ls[1] : ""

    var/l = ls.len // Made local for sanic speed.
    var/i = 0 // Incremented every time a list index is accessed.

    if(sep != null)
        // Macros expand to long argument lists like so: sep, ls[++i], sep, ls[++i], sep, ls[++i], etc...
        #define S1    sep, ls[++i]
        #define S4    S1,  S1,  S1,  S1
        #define S16   S4,  S4,  S4,  S4
        #define S64   S16, S16, S16, S16

        . = "[ls[++i]]" // Make sure the initial element is converted to text.

        // Having the small concatenations come before the large ones boosted speed by an average of at least 5%.
        if(l-1 & 0x01) // 'i' will always be 1 here.
            . = text("[][][]", ., S1) // Append 1 element if the remaining elements are not a multiple of 2.
        if(l-i & 0x02)
            . = text("[][][][][]", ., S1, S1) // Append 2 elements if the remaining elements are not a multiple of 4.
        if(l-i & 0x04)
            . = text("[][][][][][][][][]", ., S4) // And so on....
        if(l-i & 0x08)
            . = text("[][][][][][][][][][][][][][][][][]", ., S4, S4)
        if(l-i & 0x10)
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S16)
        if(l-i & 0x20)
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S16, S16)
        if(l-i & 0x40)
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S64)
        while(l > i) // Chomp through the rest of the list, 128 elements at a time.
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S64, S64)

        #undef S64
        #undef S16
        #undef S4
        #undef S1

    else
        // Macros expand to long argument lists like so: ls[++i], ls[++i], ls[++i], etc...
        #define S1    ls[++i]
        #define S4    S1,  S1,  S1,  S1
        #define S16   S4,  S4,  S4,  S4
        #define S64   S16, S16, S16, S16

        . = "[ls[++i]]" // Make sure the initial element is converted to text.

        if(l-1 & 0x01) // 'i' will always be 1 here.
            . += S1 // Append 1 element if the remaining elements are not a multiple of 2.
        if(l-i & 0x02)
            . = text("[][][]", ., S1, S1) // Append 2 elements if the remaining elements are not a multiple of 4.
        if(l-i & 0x04)
            . = text("[][][][][]", ., S4) // And so on...
        if(l-i & 0x08)
            . = text("[][][][][][][][][]", ., S4, S4)
        if(l-i & 0x10)
            . = text("[][][][][][][][][][][][][][][][][]", ., S16)
        if(l-i & 0x20)
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S16, S16)
        if(l-i & 0x40)
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S64)
        while(l > i) // Chomp through the rest of the list, 128 elements at a time.
            . = text("[][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]\
                [][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][][]", ., S64, S64)

        #undef S64
        #undef S16
        #undef S4
        #undef S1

Page: 1 2