ID:2138703
 
BYOND Version:511.1353
Operating System:Windows 7 Home Premium
Web Browser:Chrome 52.0.2743.116
Applies to:DM Language
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:
The \Z anchor in BYOND's regex behaves how \z would.
\z should not match the end of a string only when there is a trailing newline whereas \Z should match the end of a string regardless of a trailing newline.
However when the \Z anchor is used in BYOND it only matches per \z when there is no trailing newline.
See: 'Strings Ending with a Line Break' section at http://www.regular-expressions.info/anchors.html and testbed https://regex101.com/r/cG5jN3/1

Numbered Steps to Reproduce Problem:
1. Run provided code
2. Add a newline to the end of testfile.txt
3. Rerun and compare result of outputs

Code Snippet (if applicable) to Reproduce Problem:
https://dl.dropboxusercontent.com/u/169349932/regextest.7z
var/output = file("output.txt")
var/file = file2text("testfile.txt")
/world/New()
output << file
any_3()
any_3_n()
any_3_n_n()
any_3_Z()
any_3_n_Z()

/proc/any_3()
var/regex/r = new("(.)(.)(.)")//should return 'CAT'
output << "-------"
if(r.Find(file))
output << "passed any_3 find()"
else
output << "failed any_3 find()"
output << r.name
output << r.match

/proc/any_3_n()
var/regex/r = new("(.)(.)(.)\n")//should return 'CAT\n'
output << "-------"
if(r.Find(file))
output << "passed any_3_n find()"
else
output << "failed any_3_n find()"
output << r.name
output << r.match

/proc/any_3_n_n()
var/regex/r = new("(.)(.)(.)\n\n")//should return 'BAT\n\n'
output << "-------"
if(r.Find(file))
output << "passed any_3_n_n find()"
else
output << "failed any_3_n_n find()"
output << r.name
output << r.match

/proc/any_3_Z()
var/regex/r = new("(.)(.)(.)\\Z")//should return 'COW', fails when there's a newline at end of file (\Z is meant to match the end of the string regardless of a trailing newline or not)
output << "-------"
if(r.Find(file))
output << "passed any_3_Z find()"
else
output << "failed any_3_Z find()"
output << r.name
output << r.match

/proc/any_3_n_Z()
var/regex/r = new("(.)(.)(.)\n\\Z")//should return 'COW\n' only when there's a newline at end of file (this is meant to be behavior of \z)
output << "-------"
if(r.Find(file))
output << "passed any_3_n_Z find()"
else
output << "failed any_3_n_Z find()"
output << r.name
output << r.match


Expected Results:
\Z to match with or without a trailing newline.

Actual Results:
\Z only matches when there is no trailing newline.

Does the problem occur:
Every time? Or how often? Yes
In other games? Yes
In other user accounts? Yes
On other computers? Yes

When does the problem NOT occur?
N/A

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
This has occurred since 510 afaik.

Workarounds:
Using \n\Z when you know there'll be a trailing newline in your text; obviously this then fails if there isn't.
I don't think this is a bug. The . token does not match newline characters. Your COW test would return OW\n if it did, not COW, because those are the last 3 chars before the end of the file. If you want to match against the end of text but ignore any trailing newlines, the correct regex is \n*\Z, not \Z.
On further review of that link, I'm marking this as a non-bug. If anything this is should be a feature request.
Lummox JR resolved issue (Not a bug)
In response to Lummox JR
Lummox JR wrote:
On further review of that link, I'm marking this as a non-bug. If anything this is should be a feature request.

The regex tester? I'm seeing "...\Z" matching "COW" in "CAT\nBAT\n\nCOW\n"; "...\z" does not, though, because of the trailing newline.

I'm also seeing "\z" on the BYOND side *never* matching, even when there isn't a newline;

/world/New()
var/text = "foo\nbar\n\nbaz\n"
for(var/R in list("...\\z", // 0; regex tester: no match
"...\\Z", // 0; regex tester: "baz"
"...\\n\\z", // 0; regex tester: "baz\n"
"...\\n\\Z")) // 10 ie "baz\n"; regex tester: "baz\n"
world.log << "[R]: [findtext(text, regex(R))]"


/world/New()
var/text = "foo\nbar\n\nbaz"
for(var/R in list("...\\z", // 0; regex tester: "baz"
"...\\Z", // 10 ie "baz"; regex tester: "baz"
"...\\n\\z", // 0; regex tester: no match
"...\\n\\Z")) // 0; regex tester: no match
world.log << "[R]: [findtext(text, regex(R))]"
That's because \z is not supported, and \Z is strictly end-of-text. This is documented. This report should be a feature request.
I thought the standard for this shit is ^ and $?
In response to Monster860
Monster860 wrote:
I thought the standard for this shit is ^ and $?

Those have a different meaning in multiline mode.
Sorry for the late reply. As was mentioned, my contention is the differentiation between the \Z and \z anchors as when I was first using regex in BYOND it took me a while to find out that \Z doesn't perform the same as the majority of other regex implementations as one would expect.

However I do agree it's a minor concern and if \z is intentionally not supported then this is a feature request for it, and by extension, support for \Z's standard behaviour.