text2ascii() returns the first character and not the first byte

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:2775064

Mar 12 2022, 9:39 pm

Alexkar598

Resolved

Unicode handling has been improved across the board. In particular, the Unicode replacement character � is used now in situations where text2ascii(), copytext(), or splicetext() encounter bogus encoding.

BYOND Version:	514
Operating System:	Linux
Web Browser:	Firefox 98.0
Applies to:	Dream Daemon

Status:

Resolved (514.1582)

This issue has been resolved.

Descriptive Problem Summary:
text2ascii() returns the first character(same as text2ascii_char())'s codepoint and not the first byte

Numbered Steps to Reproduce Problem:
1. Call text2ascii() with a unicode character
2. Get the full codepoint (minus the funny byte markers at the start to specify the number of bytes)

Code Snippet (if applicable) to Reproduce Problem:

/proc/wat(input)
    world.log << input[1]
    world.log << text2ascii(input, 1)
    world.log << num2text(text2ascii(input, 1), 8, 2)
    world.log << num2text(text2ascii_char(input, 1), 8, 2)

/world/New()
    //Forums formatting broke this, this is U+0FFFFF, the last codepoint from the Supplemental Private Use Area A (SPUA-A) plane
    wat("&#1048575;")

Expected Results:
� //Invalid unicode character
243
11110011 //or 00000011
11111111111111111111

Actual Results:
󿿿
1048575
11111111111111111111
11111111111111111111

Does the problem occur:
Every time? Or how often? yes
In other games? n/a
In other user accounts? n/a
On other computers? yes

When does the problem NOT occur?
When not calling it with a unicode character

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
Unknown

Workarounds:
Use a DLL to manipulate bytes or deal with everything being in 1 byte

Mar 12 2022, 9:44 pm
Alexkar598	https://cdn.discordapp.com/attachments/725458598213451879/ 952441940325048320/testcase_2775064.zip

Mar 14 2022, 1:38 pm
Lummox JR	Lummox JR resolved issue with message: Unicode handling has been improved across the board. In particular, the Unicode replacement character � is used now in situations where text2ascii(), copytext(), or splicetext() encounter bogus encoding.