ID:2338253
 
Code:
mob
verb
testregex()
var/regex/rgxtest = new/regex(@"^(\S+)\x20=\x20(list\x28(?:\x0A\x09+)?(?:.*\x2C\x0A\x09+)*.*\x29)(?=$)","gm")

var/test = @{"test1 = list()
test2 = list("herp")
test3 = list("herp","derp")
test4 = list(410,203,9)
test5 = list(
898260096,
109128472)
test6 = list("a"=1,"b"=2,"c"=3,"d"=4)
test7 = list(
"a"=1,
"b"=2,
"c"=3,
"d"=4)
test8 = list(
"a"=list(1,2,3),
"b"=list(1,2,3),
"c"=list(1,2,3),
"d"=list(1,2,3))"}


var/counter = 0
while(rgxtest.Find(test))
world.log << "match: [json_encode(rgxtest.group)]"
++counter
world.log << "[counter] matches!"


Problem description:

I can't seem to get multi-line lists regexing properly. Using an online tool for regex testing, I've confirmed that my regex pattern is functional and returns all 8 matches. However, DM is only returning the 5 single-line lists out of my 8 test cases.

If you head on over to https://regex101.com/ and use the following settings, you can see that I'm getting all 8 matches successfully, which is just straight up maddening.

pattern:
(?:^)(\S+)(?:\x20=\x20)(list\x28(?:\x0A\x09+)?(?:.*\x2C\x0A\x09+)*.*\x29)(?=$)


flags:
gm


test string:
test1 = list()
test2 = list("herp")
test3 = list("herp","derp")
test4 = list(410,203,9)
test5 = list(
                898260096,
                109128472)
test6 = list("a"=1,"b"=2,"c"=3,"d"=4)
test7 = list(
                "a"=1,
                "b"=2,
                "c"=3,
                "d"=4)
test8 = list(
                "a"=list(1,2,3),
                "b"=list(1,2,3),
                "c"=list(1,2,3),
                "d"=list(1,2,3))


Is this a BYOND bug? Am I using the syntax wrong for BYOND's implementation?
By default, BYOND does not include line breaks in .. Because it does not have a flag to change this behavior, use (.|\n) in your regular expression instead.
You'll notice my pattern is actually trapping for newlines via \x0A.
Alright, I just straight up blew it up and started over.

@"^(\S+)(?:\x20=\x20)(list\x28(?:.|\n)*?\x29)$"


That's the proper regex to handle any list, I'm pretty sure.

I don't much care for the doubly worse performance thanks to the .|\n, but it does work.

I have NO clue where my old pattern was failing. I ultimately couldn't diagnose it, but the new pattern is at least functional.
I found it.

The + modifier is broken when used with non-capture groups:

"((?:.+)+)"


The above regex string should return basically anything when used on just about any string. Unfortunately, it never finds any results, because the + modifier seems to break when used on non-capture groups.

Meanwhile:

"((.+)+)"


If I switch to a capture group, it suddenly starts working.

Every other regex specification does not show this behavior.

You can test it yourself on regex101.com
This is 100% confirmed a BYOND bug.