ID:2026759
 
Resolved
By popular demand, splittext() has been changed so that the delimiter is an exact match, rather than a set of characters to match. The delimiter is always case-sensitive.
Applies to:DM Language
BYOND Version:510.1321
Operating System:Windows 7 Home Premium 64-bit
Web Browser:Chrome 47.0.2526.111
Status: Resolved (510.1322)

This issue has been resolved.
Descriptive Problem Summary:
splittext() appears to be incorrectly ignoring the second character of a two character delimiter, instead assuming the first character is the only condition, resulting in a bad list.

Edit: This is entirely a guess I dunno what the fuck is really going on.
Edit 2: Actually it appears to occur any time a character in the delimiter is used further on, regardless of if it only matches the delimiter partially. This is thus only applicable when using multi-character delimiters.
Edit 3: Heck the docs actually make it sound like this might be INTENDED behavior (referring to the delimiter parameter as "delimiters" and such). If this is the case: ahaha no this is not how this proc should work, 100% absolutely and truly not.

Code Snippet (if applicable) to Reproduce Problem:
/client/verb/test()
var/text = "foo@=bar@baz"
var/list/thing = splittext(text, "@=")
src << list2params(thing)


Expected Results:
foo&bar%40baz

Actual Results:
foo&bar&baz
Both chars are delimiters, so the results you're seeing are correct.
Lummox JR resolved issue (Not a bug)
But the final @ is not being seen as a delimiter, it's being returned as a literal @ ...that doesn't seem correct?
No its not Saiyan.
And the intended behavior should be to use the entire string as the delimeter, otherwise its just a separator, not a delimeter.
If people want to use it like the current behavior they can use regex, but no language implements splitting of strings by having a string of single character separators. This should not be the intended behavior IMO
Yeah this is very objectively not sane behavior for this proc. In other languages this is not how this sort of thing works.
Oh, I misread actual results vs expected results.
I feel the need to chip in that this is incredibly dumb. I'd understand if C couldn't make the distinction between char[] and char*, but since we have built-in byond constructs that represent this differently, why not use them? If someone handed me a splittext proc, I'd expect it to split:
- On an exact, full match if I use a string as a delimiter
- Same for regexes
- And only exhibit this behaviour if a list of strings is passed instead.
There are basically two ways to look at splitting text: One is with a delimiter that could be a multi-char string, and the other is with a set of possible delimiters. Lots of tokenizers use the second way, and it's also the easiest to implement.

There's no reason I couldn't change the implementation at this stage if the general consensus calls for it, I suppose, but how many people truly want to split by a multi-char string?

And of course whichever way the implementation goes, regular expressions can always handle the other way.
Make a vote? I dunno.
In response to Lummox JR
Lummox JR wrote:
There's no reason I couldn't change the implementation at this stage if the general consensus calls for it, I suppose, but how many people truly want to split by a multi-char string?

I do... If I wanted to split 2 different characters I would just call the proc again. It's also the exact behavior I would expect. I don't see the first approach used very much.

Consider this:
var/message = "One Two|@|Three Four|@|Five Six|@|Seven Eight"

var/list/L = splittext(message, "|@|") // expected: list("One Two", "Three Four", "Five Six", "Seven Eight")
var/list/L2

world << "And..."

for (var/word in L)
L2 = splittext(word, " ") // expected: list("One", "Two")

world << L2[1]
sleep(5)
world << L2[2]

sleep(15)


Not the best example I'm sure, but just by reading this code I would expect that the message is split by the full "|@|" and not by every character in the string "|@|".

The latter sounds weird (to me) and I'd only expect that behavior if I were specifying a list of strings, like so:

var/message = "One@Two=Three&Four,Five:Six;Seven!Eight"

var/list/L = splittext(message, list("@", "&", ";", ":", "!", ",")) // expected: list("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight")


This has the reader's expectation that the text will be split by each character provided. I'm sure it's not economical given the creation of a list and several strings, but as you said regular expressions would be a viable alternative.

EDIT: Originally I used "<br/>" instead of "|@|" as the delimiter. Unfortunately the forum replaces "<br/>" with an empty string if you use it in a "<DM>" tag and there's no alternative as "&lt;br/&gt;" isn't transformed.
In response to Lummox JR
JavaScript uses the 2nd approach, taking the whole string. I think this is particularly relevant as it's the secondary language used for the webclient.

The only language I've found so far that doesn't use the 2nd method is C++. So I understand now where you're coming from. But someone probably came up with the strok function somewhere in the 80's/90's, so I don't think it's the right benchmark for this time and age.
I would definitely prefer having a set of possible delimiters. The current behavior isn't really what I would expect. Why should delimiters be limited to a single character? Multi-character delimiters are just too useful.

This is the kind of behavior that I would expect:
var/text = "foo@=bar@baz"

splittext(text, "@=") // returns list("foo", "bar@baz")

// Use any number of delimiter arguments:
splittext(text, "@=", "@") // returns list("foo", "bar", "baz")

// or just pass a list as an alternative:
splittext(text, list("@=", "@")) // also returns list("foo", "bar", "baz")


This would make splittext() both strict and flexible.
I haven't toyed with it but how would you accomplish splitting strings with multi-char strings at the moment?
In response to Somepotato
Somepotato wrote:
no language implements splitting of strings by having a string of single character separators.

False. Erlang does.
In response to Rotem12
Rotem12 wrote:
I haven't toyed with it but how would you accomplish splitting strings with multi-char strings at the moment?

Regular expressions.
lummox: how about splitting by each string if the delimiter is a list, and the whole string otherwise.

Then devs have the ability to do both.
In response to MrStonedOne
MrStonedOne wrote:
lummox: how about splitting by each string if the delimiter is a list, and the whole string otherwise.

Then devs have the ability to do both.

^ This, very much.
That is a good solution. Agreed.
its either that, or an arg
Page: 1 2