ID:2775064
 
Resolved
Unicode handling has been improved across the board. In particular, the Unicode replacement character � is used now in situations where text2ascii(), copytext(), or splicetext() encounter bogus encoding.
BYOND Version:514
Operating System:Linux
Web Browser:Firefox 98.0
Applies to:Dream Daemon
Status: Resolved (514.1582)

This issue has been resolved.
Descriptive Problem Summary:
text2ascii() returns the first character(same as text2ascii_char())'s codepoint and not the first byte

Numbered Steps to Reproduce Problem:
1. Call text2ascii() with a unicode character
2. Get the full codepoint (minus the funny byte markers at the start to specify the number of bytes)

Code Snippet (if applicable) to Reproduce Problem:
/proc/wat(input)
world.log << input[1]
world.log << text2ascii(input, 1)
world.log << num2text(text2ascii(input, 1), 8, 2)
world.log << num2text(text2ascii_char(input, 1), 8, 2)

/world/New()
//Forums formatting broke this, this is U+0FFFFF, the last codepoint from the Supplemental Private Use Area A (SPUA-A) plane
wat("&#1048575;")


Expected Results:
� //Invalid unicode character
243
11110011 //or 00000011
11111111111111111111

Actual Results:
󿿿
1048575
11111111111111111111
11111111111111111111

Does the problem occur:
Every time? Or how often? yes
In other games? n/a
In other user accounts? n/a
On other computers? yes

When does the problem NOT occur?
When not calling it with a unicode character

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)
Unknown

Workarounds:
Use a DLL to manipulate bytes or deal with everything being in 1 byte
Lummox JR resolved issue with message:
Unicode handling has been improved across the board. In particular, the Unicode replacement character � is used now in situations where text2ascii(), copytext(), or splicetext() encounter bogus encoding.