ID:2578488
 
Resolved
Regular expressions struggled with some UTF-8 cases.
BYOND Version:513.1517
Operating System:Linux
Web Browser:Chrome 83.0.4103.97
Applies to:Dream Daemon
Status: Resolved (513.1526)

This issue has been resolved.
Note: This was also tested on 513.1525 on Windows 10 Home 64-bit, but less extensively.

Descriptive Problem Summary:
Regex character range matches ([a-z], etc.) don't really work with non-ASCII characters as bounds of the range. If either end of the range is non-ASCII, it will match a nearly-but-not-quite static set of characters at the beginning of the second Unicode block. (This probably depends on which characters exactly are used in the range, but with low-valued ones it holds.) It's probably just not looking at all the bytes properly.

Code Snippet (if applicable) to Reproduce Problem:
var/regex/test = regex(@"[¡-ÿ]", "g")
world.log << test.Replace("¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ", "")


Expected Results:
An empty string. The regex should match every character given.

Actual Results:
ÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Does the problem occur:
Every time? Or how often?
Every time.
In other games?
Presumably.
In other user accounts?
Doesn't require login.
On other computers?
Yes.

When does the problem NOT occur?
When both bounds of the range are ASCII.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
I would guess that this has been the case for all of 513, though I don't know for sure.

Workarounds:
None I'm aware of.
Nice find. I'll take a look. Any chance you can package it up into a test project for me? Just to be sure I don't accidentally run afoul of any character encoding issues from the forum itself.
In response to Lummox JR
Test project here: https://gofile.io/d/9YuK2d
Lummox JR resolved issue with message:
Regular expressions struggled with some UTF-8 cases.