ID:2472413
 
Resolved
Case insensitivity was not properly handled in some complex regular expressions.
BYOND Version:512
Operating System:Windows 10 Pro 64-bit
Web Browser:Chrome 74.0.3729.169
Applies to:Dream Daemon
Status: Resolved (512.1472)

This issue has been resolved.
Descriptive Problem Summary:
On my Space Station 13 server, we run a series of regexes to filter out bigoted speech.

Recently, I noticed that my most complicated regex, used to detect the n-word, would not work on upper-case text, and would only work on lowercase, as that was the case the regex was written in. Another regex, used for the same purpose, did not experience this problem, and properly followed the case-insensitive flag.

Numbered Steps to Reproduce Problem:
1. Run the code below, with your bigoted expletives of choice.
2. Be sad.

Code Snippet (if applicable) to Reproduce Problem:
/proc/isnotpretty(var/text)
var/list/pretty_filter_items = list(
@"\b[nl]+[\W_]{0,4}[!i\/?1\\]+[\W_]{0,4}[qgb]+[\W_]{0,4}[qgb]?[\W_]{0,4}(?:[e3][\W_]{0,4}r|a)(?!ia|al)s*\b",
"nigg+"
)

for(var/pattern in pretty_filter_items)
var/regex/R = new(pattern, "ig")
if(R.Find(text)) //If found
return TRUE // Yes, it isn't pretty.
return FALSE // No, it is pretty.
/proc/main()
var/list/expletives = list() // Fill this yourself!
for(var/word in expletives)
world.log << isnotpretty(word)


Expected Results: For the more complicated regex to obey it's "i" flag

Actual Results: A lack of case insensitivity.

Does the problem occur:
Every time? Or how often? Every time.
On other computers? Tried MoMMIv2 and the same bug occurred, yep.

When does the problem NOT occur? In the second, simple regex given.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.) This happened on my server running 512.1464, and then was confirmed by MoMMIv2 running 512.1454, so it's at least older than those versions.

Workarounds:
It is possible to just manually add the capital versions of every letter in the regex, albeit rather tiresome.
Here's a link to a Regex101 thing showing off this regex, to prove that it's *supposed* to be catching regardless of case:

(Contains offensive speech as examples of what this regex captures, cover your eyes children)
https://regex101.com/r/XxwMID/13
Thanks. This should be helpful for finding the problem. I'll test in 512 and also retest in 513's updated engine.
Okay, the link you provided isn't working for me. Can you just point me to a pastebin or something with the words I should filter against? Or a test project that includes the words? I don't know what I'm supposed to catch or not catch in this regex otherwise.
Yeah, I think I forgot the href parameter to my link. It should work now.
It's not just the physical link. I mean I can't use that site. Please post a pastebin.
In response to Lummox JR
Lummox JR wrote:
It's not just the physical link. I mean I can't use that site. Please post a pastebin.

Does regexr work for you? https://regexr.com/4fd3j

It's apparent that the OP wants to filter against a slew of variations of the anti-black racial slur, including if you have like 5 "i"s or some punctuation ("N.i."-). Though it also doesn't work with e.g. repeating "R" at the end of the word.

OP is saying that the regex isn't being treated as case-insensitive, e.g. "n.i."- works fine, but not "N.I."-
Also not working. Just a pastebin of examples, please.
Still waiting on an example I can work with. Heck, just throw together a test project or something even. That would be best.
https://file.house/eTxu.txt here, I just pulled their regex + test strings (match and non matching)
Thanks. I'll look into this.
Lummox JR resolved issue with message:
Case insensitivity was not properly handled in some complex regular expressions.
Interesting bug. It turned out not only was the problem also present in 513, it was worse there. One of the regular expressions in the test set that was supposed to match did not match in 512, but two didn't match in 513, and it was because of two different bugs present in both versions. Apparently a slight difference in behavior caused one of the bugs to manifest in 513 but not 512.