ID:2338289
 
Resolved
Regular expressions failed to handle non-capturing groups correctly in some cases.
BYOND Version:512.1403
Operating System:Windows 7 Ultimate 64-bit
Web Browser:Chrome 63.0.3239.84
Applies to:Dream Daemon
Status: Resolved (512.1404)

This issue has been resolved.
Descriptive Problem Summary:

The regex + modifier doesn't work at all properly with non-capture groups that happen to end on another modifier:

mob
Login()
..()
var/regex/rgx = new/regex("((?:.+)+")
if(rgx.Find("something"))
world << "Regex isn't broken"
else
world << "Regex is broken"


https://regex101.com/r/0aSWZZ/5/

As you can see, the above regex should match "something".

But when you run it in BYOND, you get the error message indicating that regex is broken.
Running through a test of this, it looks like the problem is somewhere in the regex compilation process. I'll have to take some solid time to whack away at this because the regex code is dense, but I should be able to figure it out.
the regex code is dense

I believe you.

Regex in general is terrible.
Lummox JR resolved issue with message:
Regular expressions failed to handle non-capturing groups correctly in some cases.
I'm getting regex compilation failures now for all kinds of things that shouldn't be failing.

I'll try to narrow it down.

It looks like it's complaining of nested modifiers, so groups are still not compiling properly.
Nevermind, it looks like the regex site I was using to test the function was more permissive with its characters than BYOND's regex is. It was interpreting characters 2B and 2D as literals while BYOND was interpreting them as part of a modifier.
yeah, this one's 100% not fixed.

I'm getting complete nonsense out of my expressions now, and I have no clue what the source of the problems are.

Gonna spend a few more hours hacking apart expressions into tiny little bits to find what the failure is. Again. 20 hours of my week have been spent chasing this bug. Fuck me.
Probably should have checked here first but I've a simple reproduction case for the problems introduced in 1404 in http://www.byond.com/forum/?post=2338673
Rats. I did run some other tests, but it looks like I'm gonna need to set up some kind of more complete test suite here.
Want an insanely complicated regex to test?

Edit: sent you one I've been fiddling with this afternoon. Check your PMs. It can come pretty close to parsing DM, so it's fairly biggish.
I forget: does BYOND use a hand-rolled regex engine, a 3rd party library, or std::regex?
In response to Hiead
Hiead wrote:
I forget: does BYOND use a hand-rolled regex engine, a 3rd party library, or std::regex?

It's hand-rolled based on really old code that had super permissive licensing. I basically had to rewrite most of the internals from the ground up though.
In response to Lummox JR
Lummox JR wrote:
It's hand-rolled based on really old code that had super permissive licensing. I basically had to rewrite most of the internals from the ground up though.

Instead of hacking it around, trying to fix bugs and introducing new ones, is there a reason for not using any of the battle-tested modern libraries available?

Obviously there's the C++ standard library, but if you wanted to keep it C-like, Rust is a powerful, fast, proven systems programming language. The Rust regex library also exports C ABI-compatible bindings for use in pretty much any language anywhere. It's also extremely permissively licensed. https://github.com/rust-lang/regex/tree/master/regex-capi
In response to Hiead
I didn't find anything out there that had both friendly licensing and the features I wanted. Rust is lacking some things like look-ahead and look-behind assertions.