Text

by Forum_account
Some handy text procs.
ID:510182
 
Version 2 (posted 03-14-2012)
  • Fixed a bug with the split() proc.
  • Changed how the replace proc works to increase its speed by 50%.
  • Added the Replace() and split() procs which are case-sensitive versions of replace and split.
  • Added the prefix() and ending() procs which determine if one string starts or ends with another string. Prefix() and Ending() are the case-sensitive versions.
  • Did some benchmarking to compare performance against the Deadron.TextHandling library. The benchmarking code is included as demo\benchmark.dm
Here are the results of the comparison:
Replace:
Forum_account.Text 50,000 calls in 5.700 seconds: 114 microseconds per call
Deadron.TextHandling 50,000 calls in 9.520 seconds: 190 microseconds per call
(dd_replacetext)

Split:
Forum_account.Text 50,000 calls in 3.150 seconds: 63 microseconds per call
Deadron.TextHandling 50,000 calls in 4.176 seconds: 83 microseconds per call
(dd_text2list)

Concat:
Forum_account.Text 50,000 calls in 1.894 seconds: 38 microseconds per call
Deadron.TextHandling 50,000 calls in 5.537 seconds: 110 microseconds per call
(dd_list2text)

The prefix() and ending() procs are almost identical to the dd_hasprefix and dd_hassuffix procs. Their performance is about the same and because those procs are so simple, it can't make much of a difference. The only difference is because the dd_hassuffix proc is incorrect:

    dd_hassuffix(text, suffix)
var/start = length(text) - length(suffix)
if (start) return findtext(text, suffix, start)

it should be findtext(text, suffix, start + 1), otherwise this will happen:

    if(dd_hassuffix("fails", "fail"))
world << "oops!"

The value of start will be length("fails") - length("fail"), which is 1. String indexes start at 1, so this is checking if the string "fail" is found in "fails" (and it is). Instead it should start checking at index = 2, so it checks if "fail" is found in "ails" (and it's not). The library even has an automatic built-in test, oops indeed!
Interesting looking library. A few thoughts:

1) The naming convention would benefit from some kind of library-specific prefix, in my opinion. The proc names you use are somewhat generic.

2) In int(), instead of using an associative list and copytext(), text2ascii() and a couple of if statements might be just as good. The only place you stand to lose is in losing the lookup from the associative list, but I doubt that's actually a problem; I think those instructions should execute pretty quickly, and you could possibly even avoid the trouble of needing to convert to uppercase. This also would let you handle bases up to 36. In addition, I would suggest the debug mode also ASSERT() that the base is an integer.

3) Based on my experience with BYOND's internals, the concat() method seems really ingenious. I'm not sure though that the cases above 10 items are really of great benefit, since the vast majority of concats will use 10 or fewer and big replacement operations will tend to be infrequent. I suspect that past 10, you'd gain a lot of simplicity and lose little in speed just by recursing into halves. If not 10, then 20 or 40 maybe. This at any rate would be a better way to handle the 321+ cases, which currently use tail recursion.
In response to Lummox JR
Lummox JR wrote:
1) The naming convention would benefit from some kind of library-specific prefix, in my opinion. The proc names you use are somewhat generic.

These names are fairly generic, but I don't think people are likely to create global procs of the same name. I can imagine people using the same proc names for mob procs, but not necessarily for global procs. Adding a prefix would avoid this, but I'd rather not make the names a little longer, uglier, and less intuitive in all situations for the few situations where the names may cause problems.

If you have a global proc of your own with the same name, you can comment out the library's proc or change its name directly. If you have a member proc with the same name, you can refer to them as global.join(), global.split(), etc.

I was also hoping that by providing multiple names for each proc you have enough options. The library defines merge(), join(), and concat() to all do the same thing. If your game uses the name "merge" for something (ex: merging armies together), you can use join() or concat() to refer to the text proc.

2)

I'm not too concerned about performance with this one, so I'll probably make these changes to support additional bases whether it improves performance or not. If it ends up being slow, I can make a fast version specifically for base 16. I can't imagine many people will need support for fast conversions from base 7 strings.

I suppose it also makes sense to have a proc that computes the inverse - turning integers into strings of a specified base.

3)

I have another version of the proc that uses recursion for all cases above 10 (just like how 321+ works). The problem is that the Total CPU numbers aren't making sense when I profile it. I have a verb that calls the new version of concat (called concat2) 10,000 times. The Total CPU time of the verb is less than the CPU time of the concat2 proc (by a significant amount too, about 0.6 seconds).
In response to Forum_account
                       Profile results (total time)
Proc Name Self CPU Total CPU Real Time Calls
---------------------- --------- --------- --------- ---------
/mob/verb/test_concat1 0.420 20.291 20.291 3
/proc/concat 19.871 19.884 19.929 30000
/mob/verb/test_concat2 0.555 12.115 12.116 3
/proc/concat2 11.560 21.135 21.249 120000

I guess it doesn't count recursion properly. If concat2 recursively calls itself, it looks like the time spent in the recursive call is counted double. The time inside the recursive call is counted and it's counted again as time that the parent call spends waiting for it to return. That's the best I can figure, but based on the time that the test_concat2 verb takes, concat2 looks to be better.
I posted an update which includes the change you suggested for int(), some changes to concat(), and some new procs. The timing of concat() when I profile it doesn't seem to be correct but the new version appears to run faster.