splittext() not correctly splitting on delimiter

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

Page: 1 2

ID:2026759

Jan 29 2016, 3:25 pm (Edited on Jan 29 2016, 4:44 pm)

Wirewraith

Resolved

By popular demand, splittext() has been changed so that the delimiter is an exact match, rather than a set of characters to match. The delimiter is always case-sensitive.

Applies to:	DM Language
BYOND Version:	510.1321
Operating System:	Windows 7 Home Premium 64-bit
Web Browser:	Chrome 47.0.2526.111

Status:

Resolved (510.1322)

This issue has been resolved.

Descriptive Problem Summary:
splittext() appears to be incorrectly ignoring the second character of a two character delimiter, instead assuming the first character is the only condition, resulting in a bad list.

Edit: This is entirely a guess I dunno what the fuck is really going on.
Edit 2: Actually it appears to occur any time a character in the delimiter is used further on, regardless of if it only matches the delimiter partially. This is thus only applicable when using multi-character delimiters.
Edit 3: Heck the docs actually make it sound like this might be INTENDED behavior (referring to the delimiter parameter as "delimiters" and such). If this is the case: ahaha no this is not how this proc should work, 100% absolutely and truly not.

Code Snippet (if applicable) to Reproduce Problem:

/client/verb/test()
    var/text = "foo@=bar@baz"
    var/list/thing = splittext(text, "@=")
    src << list2params(thing)

Expected Results:
foo&bar%40baz

Actual Results:
foo&bar&baz

Jan 29 2016, 6:07 pm
Lummox JR	Both chars are delimiters, so the results you're seeing are correct.

Jan 29 2016, 6:07 pm
Lummox JR	Lummox JR resolved issue (Not a bug)

Jan 29 2016, 6:14 pm
Super Saiyan X	But the final @ is not being seen as a delimiter, it's being returned as a literal @ ...that doesn't seem correct?

Jan 29 2016, 6:32 pm
Somepotato	No its not Saiyan. And the intended behavior should be to use the entire string as the delimeter, otherwise its just a separator, not a delimeter. If people want to use it like the current behavior they can use regex, but no language implements splitting of strings by having a string of single character separators. This should not be the intended behavior IMO

Jan 29 2016, 6:33 pm
Wirewraith	Yeah this is very objectively not sane behavior for this proc. In other languages this is not how this sort of thing works.

Jan 29 2016, 6:38 pm
Super Saiyan X	Oh, I misread actual results vs expected results.

Jan 29 2016, 9:48 pm

Marquesas

I feel the need to chip in that this is incredibly dumb. I'd understand if C couldn't make the distinction between char[] and char*, but since we have built-in byond constructs that represent this differently, why not use them? If someone handed me a splittext proc, I'd expect it to split:
- On an exact, full match if I use a string as a delimiter
- Same for regexes
- And only exhibit this behaviour if a list of strings is passed instead.

Jan 29 2016, 9:57 pm

Lummox JR

There are basically two ways to look at splitting text: One is with a delimiter that could be a multi-char string, and the other is with a set of possible delimiters. Lots of tokenizers use the second way, and it's also the easiest to implement.

There's no reason I couldn't change the implementation at this stage if the general consensus calls for it, I suppose, but how many people truly want to split by a multi-char string?

And of course whichever way the implementation goes, regular expressions can always handle the other way.

Jan 29 2016, 10:50 pm
Wirewraith	Make a vote? I dunno.

Jan 29 2016, 11:39 pm (Edited on Jan 29 2016, 11:54 pm)

In response to Lummox JR

NullQuery

Lummox JR wrote:

There's no reason I couldn't change the implementation at this stage if the general consensus calls for it, I suppose, but how many people truly want to split by a multi-char string?

I do... If I wanted to split 2 different characters I would just call the proc again. It's also the exact behavior I would expect. I don't see the first approach used very much.

Consider this:

var/message = "One Two|@|Three Four|@|Five Six|@|Seven Eight"

var/list/L = splittext(message, "|@|") // expected: list("One Two", "Three Four", "Five Six", "Seven Eight")
var/list/L2

world << "And..."

for (var/word in L)
    L2 = splittext(word, " ") // expected: list("One", "Two")

    world << L2[1]
    sleep(5)
    world << L2[2]

    sleep(15)

Not the best example I'm sure, but just by reading this code I would expect that the message is split by the full "|@|" and not by every character in the string "|@|".

The latter sounds weird (to me) and I'd only expect that behavior if I were specifying a list of strings, like so:

var/message = "One@Two=Three&Four,Five:Six;Seven!Eight"

var/list/L = splittext(message, list("@", "&", ";", ":", "!", ",")) // expected: list("One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight")

This has the reader's expectation that the text will be split by each character provided. I'm sure it's not economical given the creation of a list and several strings, but as you said regular expressions would be a viable alternative.

EDIT: Originally I used "<br/>" instead of "|@|" as the delimiter. Unfortunately the forum replaces "<br/>" with an empty string if you use it in a "<DM>" tag and there's no alternative as "<br/>" isn't transformed.

Jan 29 2016, 11:51 pm (Edited on Jan 30 2016, 12:00 am)

In response to Lummox JR

NullQuery

JavaScript uses the 2nd approach, taking the whole string. I think this is particularly relevant as it's the secondary language used for the webclient.

The only language I've found so far that doesn't use the 2nd method is C++. So I understand now where you're coming from. But someone probably came up with the strok function somewhere in the 80's/90's, so I don't think it's the right benchmark for this time and age.

Jan 30 2016, 12:31 am (Edited on Jan 30 2016, 12:40 am)

Multiverse7

I would definitely prefer having a set of possible delimiters. The current behavior isn't really what I would expect. Why should delimiters be limited to a single character? Multi-character delimiters are just too useful.

This is the kind of behavior that I would expect:

var/text = "foo@=bar@baz"

splittext(text, "@=") // returns list("foo", "bar@baz")

// Use any number of delimiter arguments:
splittext(text, "@=", "@") // returns list("foo", "bar", "baz")

// or just pass a list as an alternative:
splittext(text, list("@=", "@")) // also returns list("foo", "bar", "baz")

This would make splittext() both strict and flexible.

Jan 30 2016, 2:21 am
Rotem12	I haven't toyed with it but how would you accomplish splitting strings with multi-char strings at the moment?

Jan 30 2016, 3:12 am In response to Somepotato
GinjaNinja32	Somepotato wrote: no language implements splitting of strings by having a string of single character separators. False. Erlang does.

Jan 30 2016, 8:55 am In response to Rotem12
Lummox JR	Rotem12 wrote: I haven't toyed with it but how would you accomplish splitting strings with multi-char strings at the moment? Regular expressions.

Jan 30 2016, 11:44 am
MrStonedOne	lummox: how about splitting by each string if the delimiter is a list, and the whole string otherwise. Then devs have the ability to do both.

Jan 30 2016, 11:47 am In response to MrStonedOne
CrimsonVision	MrStonedOne wrote: lummox: how about splitting by each string if the delimiter is a list, and the whole string otherwise. Then devs have the ability to do both. ^ This, very much.

Jan 30 2016, 11:56 am
Wirewraith	That is a good solution. Agreed.

Jan 30 2016, 12:06 pm
MrStonedOne	its either that, or an arg

Page: 1 2