ID:2502930
 
Resolved
The string tree was subject to occasional corruption in Linux in rare cases.
BYOND Version:512.1463
Operating System:Ubuntu 16.04.6 LTS
Web Browser:Chrome 76.0.3809.100
Applies to:Dream Daemon
Status: Resolved (513.1533)

This issue has been resolved.
Descriptive Problem Summary:
At /tg/, our automated code testing and review system will occasionally fail to complete with errors relating to a failed in check.

The code looks something like this:

/datum/asset/spritesheet/vending/register()
for (var/k in GLOB.vending_products)
var/atom/item = k
if (!ispath(item, /atom))
continue

var/icon_file = initial(item.icon)
var/icon_state = initial(item.icon_state)
var/icon/I

var/icon_states_list = icon_states(icon_file)
if(icon_state in icon_states_list) // <----- this fails
I = icon(icon_file, icon_state, SOUTH)
var/c = initial(item.color)
if (!isnull(c) && c != "#FFFFFF")
I.Blend(c, ICON_MULTIPLY)
else
//this debugging info indicates it should have passed.
stack_trace("[item] does not have a valid icon state, icon=[icon_file], icon_state=[json_encode(icon_state)], icon_states=[json_encode(icon_states_list)]")
I = icon('icons/turf/floors.dmi', "", SOUTH)
//snip

(This code generates browser resources for items in vending machines, its ran during automated code testing as part of a check to ensure everything initializes without generating runtime errors.)

It will randomly error out, failing the first if but print debugging info that indicates the first if should have passed:

/obj/item/lightreplacer does not have a valid icon state, icon=icons/obj/janitor.dmi, icon_state="lightreplacer0", icon_states={
"caution":null,
"cleaner":null,
"sprayer_large":null,
"bucket":null,
"mop":null,
"mopbucket":null,
"mopbucket_water":null,
"cone":null,
"lightreplacer1":null,
"lightreplacer0":null, <----- here it is!
"4no_raisins":null,
"candy":null,
"chips":null,
"popcorn":null,
"cheesie_honkers":null,
"sosjerky":null,
"syndi_cakes":null,
"waffles":null,
"plate":null,
"pistachios_pack":null,
"semki_pack":null,
"tray":null,
"trashbag":null,
"trashbag1":null,
"trashbag2":null,
"trashbag3":null,
"cart":null,
"cart_water":null,
"cart_garbage":null,
"cart_mop":null,
"cart_spray":null,
"cart_sign1":null,
"cart_sign2":null,
"cart_sign3":null,
"cart_sign4":null,
"cart_replacer":null,
"sodawater":null,
"tonic":null,
"cola":null,
"purple_can":null,
"ice_tea_can":null,
"energy_drink":null,
"thirteen_loko":null,
"space_mountain_wind":null,
"dr_gibb":null,
"starkist":null,
"space-up":null,
"lemon-lime":null,
"shamblers":null,
"air":null,
"laughter":null,
"mister":null,
"waterbackpack":null,
"advmop":null,
"bluetrashbag":null,
"bluetrashbag1":null,
"bluetrashbag2":null,
"bluetrashbag3":null,
"smmop":null,
"adv_smmop":null,
"rainbowbpack":null,
"rainbowmister":null,
"energybar":null,
"woodbucket":null
}


Here is a failed travis build log from 2 days ago: https://travis-ci.org/tgstation/tgstation/jobs/576051728

Here is the code as it was during that failed check: https://github.com/tgstation/tgstation/blob/ bda590ec9f953f481f163d08404e3046495afb62/code/modules/ client/asset_cache.dm#L686-L694

It happens maybe 1% of the time, and has been going on since at least July 2nd when a report was made on github about it. It always triggers on the same atom (the light replacer)

Potential string tree fuckary with icon_states()? General failure of in?
Just curious if wrapping the in usage inside of () like you used to have to would help?
almost certainly something related to http://www.byond.com/forum/post/2129715 - that's also an `item in list` failure involving icon_states().
Still happening, still causing random CI failures. I agree that it's probably the same issue that GinjaNinja linked.

/obj/item/lightreplacer does not have a valid icon state, icon=icons/obj/janitor.dmi, icon_state="lightreplacer0"([0x600fa20]), icon_states="caution"([0x600c00f]), "cleaner"([0x600d02d]), "sprayer_large"([0x600d02c]), "bucket"([0x6006218]), "mop"([0x600709c]), "mopbucket"([0x60151b0]), "mopbucket_water"([0x60151b2]), "cone"([0x600b71e]), "lightreplacer1"([0x6032e27]), "lightreplacer0"([0x6033016]), "4no_raisins"([0x600d73e]), "candy"([0x600ccaa]), "chips"([0x600d408]), "boritos"([0x600d518]), "popcorn"([0x6006c46]), "cheesie_honkers"([0x600d74c]), "sosjerky"([0x600d736]), "syndi_cakes"([0x600d74e]), "waffles"([0x600d571]), "plate"([0x600f82b]), "pistachios_pack"([0x600f82d]), "semki_pack"([0x600f82f]), "tray"([0x60070e6]), "trashbag"([0x600cc2d]), "trashbag1"([0x6033017]), "trashbag2"([0x6033018]), "trashbag3"([0x603303b]), "cart"([0x600fbcc]), "cart_water"([0x6015127]), "cart_garbage"([0x6015122]), "cart_mop"([0x6015123]), "cart_spray"([0x6015124]), "cart_sign1"([0x603305d]), "cart_sign2"([0x603305e]), "cart_sign3"([0x603305b]), "cart_sign4"([0x6032f8b]), "cart_replacer"([0x6015125]), "sodawater"([0x600d830]), "tonic"([0x600d82d]), "cola"([0x6004f26]), "purple_can"([0x600d842]), "ice_tea_can"([0x6032fb3]), "energy_drink"([0x600d845]), "monkey_energy"([0x600d847]), "thirteen_loko"([0x600d83f]), "space_mountain_wind"([0x600d83d]), "dr_gibb"([0x600d841]), "starkist"([0x600d83b]), "space-up"([0x600d838]), "lemon-lime"([0x600d833]), "shamblers"([0x600d844]), "air"([0x6000b7a]), "laughter"([0x60051bf]), "mister"([0x6000d55]), "waterbackpack"([0x60100bc]), "advmop"([0x600748e]), "bluetrashbag"([0x600cc35]), "bluetrashbag1"([0x603301b]), "bluetrashbag2"([0x6032e44]), "bluetrashbag3"([0x603301d]), "smmop"([0x6033004]), "adv_smmop"([0x6032fe7]), "rainbowbpack"([0x6033003]), "rainbowmister"([0x6032fea]), "energybar"([0x600d752]), "woodbucket"([0x600da88])

Note "lightreplacer0"([0x600fa20]) and "lightreplacer0"([0x6033016])

513.1503
Still happening on 513.1528


/obj/item/computer_hardware/hard_drive/portable does not have a valid icon state, icon=icons/obj/module.dmi, icon_state="datadisk6"([0x601289f]), icon_states="std_mod"([0x600f81e]), "blank_mod"([0x6037ea7]), "card_mod"([0x6037e32]), "power_mod"([0x600f6d7]), "id_mod"([0x600f7b6]), "abductor_mod"([0x600f75f]), "clock_mod"([0x6037e34]), "mainboard"([0x600f6d1]), "mcontroller"([0x600f820]), "cyborg_upgrade"([0x601190b]), "cyborg_upgrade1"([0x6011915]), "cyborg_upgrade2"([0x6037e35]), "cyborg_upgrade3"([0x6011921]), "cyborg_upgrade4"([0x6037e36]), "secmodschematic"([0x6037e37]), "airalarm_electronics"([0x6008598]), "door_electronics"([0x600f6d3]), "cyborg_upgrade5"([0x601193a]), "selfrepair_off"([0x6011940]), "selfrepair_on"([0x6037e39]), "card_mini"([0x6012858]), "cpu"([0x600004d]), "cpu_adv"([0x6037d68]), "cpu_super"([0x601287f]), "cpuboard"([0x6012875]), "cpuboard_adv"([0x6037e09]), "cpuboard_super"([0x601287c]), "ram"([0x6008e8f]), "net_wired"([0x60128bc]), "radio"([0x600055c]), "radio_mini"([0x60128a6]), "radio_micro"([0x6037e0a]), "harddisk"([0x6012896]), "harddisk_mini"([0x6012882]), "harddisk_micro"([0x6037e0b]), "cell_con"([0x601285e]), "cell_con_micro"([0x6037e0c]), "cell"([0x6005b95]), "cell_mini"([0x6011f74]), "cell_micro"([0x6011f7b]), "charger_pda"([0x6037e0d]), "charger_lambda"([0x60128d5]), "charger_APC"([0x60128cf]), "charger_wire"([0x60128d2]), "ssd_large"([0x6037de6]), "ssd"([0x6037e56]), "ssd_mini"([0x6012899]), "ssd_micro"([0x60086d5]), "cddrive"([0x6037e57]), "flopdrive"([0x6037e58]), "printer"([0x60091f2]), "printer_mini"([0x60128c8]), "cart_connector"([0x6037e59]), "bluespacearray"([0x600f7b3]), "prizevendor"([0x6037e5a]), "depositbox"([0x6037df3]), "beaker_holder"([0x6037e5e]), "servo"([0x60128d9]), "nucleardisk"([0x600de8f]), "datadisk0"([0x600de82]), "datadisk1"([0x600dea5]), "datadisk2"([0x60327c9]), "datadisk3"([0x6010e73]), "datadisk4"([0x6032b19]), "datadisk5"([0x60128a1]), "datadisk6"([0x6037e5f]), "datadisk_gene"([0x600de88]), "ash_plating"([0x6011937]), "datadisk_hydro"([0x6037e60]), "boris"([0x6008838]), "boris_recharging"([0x6037e61]), "holodisk"([0x600865c]), "cargodisk"([0x600dea2]), "rndmajordisk"([0x600deba]), "command"([0x60001ab]), "generic"([0x6002fb6]), "security"([0x60015d1]), "science"([0x60015c7]), "service"([0x60015d7]), "medical"([0x600151f]), "engineering"([0x6001451]), "supply"([0x6001617])
Without a way to reproduce this, the only hope of fixing it is to randomly stumble into it. Personally I lean toward the belief that this is a weird out-of-memory situation.
This seems to be a Linux-specific bug where the string-tree can become out of order. I believe this is
because the lookup/insert functions for the tree take the result of `strcmp` and cast it to a signed char.

This works fine on Windows because `strcmp` only ever returns -1, 0, or 1 there. On other systems the values will be anywhere between -254 and 254 (although even larger values are legal.)

The issue is especially noticeable when strings containing interpolated values have the same prefix as another string. Because having a literal such as `value_[i]` in your code inserts `value_\xFF\x01` into the string-tree, any comparisons to normal strings that reach the `0xFF` byte while iterating the tree are very likely to return a value too large to store inside a signed char.

I've got some code that'll fill your string-tree up with unsorted values, but it won't produce the reported symptom of two variables with the same string data but different refs because, while possible, it's just tricky to do reliably.

Reproduction Code: https://gist.github.com/willox/ 788577aecafe32e871ce21d74fb3c420

I'm sorry the code doesn't exactly prove itself, hopefully the explanation makes the issue obvious enough that you won't even need it.
The signed char theory is interesting. I'll see if that turns anything up.

[edit]
Actually there's a huge problem with your theory. I'll look into it anyway, but I don't think the return type can be a serious problem because the bug would be pervasive, to the point that the string tree wouldn't work at all.
My first instinct after realising this would mean so many string comparisons were wrong was also that it'd have clearly broken everything, but the pervasive nature of the problem is that `strcmp` remains fairly deterministic. The problem only comes when _other_ strings are being looked up in the string table which end up comparing to these strings with large bytes in them. (From their point of view, the string-table is out of order, from the original strings' point of view it is not.)

If it comes down to it and my code isn't enough to reproduce it for you, try something like using num2text on 10,000 values starting from 1 in hexadecimal (after my other reproduction code). I've found that this will cause at least one node on the tree to be 1,000 elements deep so the whole thing will be re-balanced - it's an easy way to muddle stuff up.

Let me know if you run into further problems because I've put almost a week's worth of nights into this and I'm pretty confident in my findings. I can provide you core dumps, code to analyse said dumps, more reproduction code, whatever.
In response to Willox
Willox wrote:
The problem only comes when _other_ strings are being looked up in the string table which end up comparing to these strings with large bytes in them. (From their point of view, the string-table is out of order, from the original strings' point of view it is not.)

I don't quite understand what you mean by that.
I can only guess at what you mean when you say there is a huge problem with my theory, so I'm not sure what to tell you.

What I mean is that:
1) `(char) strcmp("\xFF", "\x01") == -106`
2) `(char) strcmp("\x01" "\xFF") == 106`


Here we have established that \xFF is lower than \x01. (Obviously wrong, but let's stick to it.)

But `(char) strcmp("a", "b") == -1`.

Ok. a is lower than b, this is correct.

You can see these results at the following link. The compiler version isn't too important as it's all fairly normal, but I've used a wrapper to stop a certain optimization GCC can perform with strcmp and string literals. https://cpp.godbolt.org/z/n9E871

--

I consider what I've said to be true, but also to be annoying as hell to read, so I'll leave it at that and continue with the disassembly I believe to be the problem: https://i.imgur.com/GiDGqfQ.png


Inside the function that I'm going to name "find_or_create_stringtree_entry", `strcmp` is called. It's done using a normal CALL instruction, and as the function is of a normal cdecl calling convetion, the 32-bit return value will be in the EAX register.
After that, we see the following instructions:
```
TEST AL, AL
JZ
```
This is all good, except AL represents the bottom 8-bits of the EAX register. It's already clear that the return value of `strcmp` is being treated as an 8-bit value at this point.

Next we get the code checking if we're 1000 (or 0x3e8) nodes deep in the string-tree, I'll skip over that.

Now we're at this set of instructions:
```
TEST AL, AL
JLE <some code that goes into the string's `left` (this is what I have named it) branch`
```
AL, again, is only the bottom 8-bits of EAX. JLE is also doing a signed comparison. It's a signed 8-bit value.

I'll stop there, there's no need to go over the other bits. Have my explanations helped? We'd probably get further if you told me what you thought was wrong with my theory because otherwise I feel the need to just re-state my previous assertions but in more detail.
In response to Willox
Where I'm confused is you suggested the result is accurate at one level but not another. That doesn't really make sense to me. It's almost like you're suggesting the function is non-transitive. But as far as I can tell it's completely transitive. As long as that's the case, the string tree logic should either always hold or often fail.

[edit]
To clarify a bit more, traversing the string tree is done in ProtoStrSearch() which uses strcmp() at each level of the tree. There's no exception, so no matter what order the function returns should be fine as long as it's consistent. Unless you're suggesting a tree rebalancing could be a problem, since that might potentially be sorting differently (but I'd have to check).

[edit 2]
Nope, this can't be it. There is no string comparison done in the rebalance routine. All that does is fill a flat array with string IDs based on the existing sort order, then returns that to a tree structure. So there's no chance of a rebalance causing the tree to be out of order.
Preface: Common strcmp implementations return the difference between the first two differing bytes.

The result becomes inaccurate only once the differing bytes between two strings are greater than a value that can fit within a signed 8-bit value.

Your string-tree code - which expects stuff to be in order - fucks up once stuff isn't in order. Here's a tiny example of three strings that disagree with each other: https://cpp.godbolt.org/z/Td8MK4

The first three comparisons do not cast to a signed char, and they all out put sane results.

The second three do cast to a signed char. Notice how the middle one is giving us a positive value. This is because the difference between 0x41 and 0xC3 is too large to fit within a signed 8-bit value and the truncation (or whatever) ends up losing that important information.

The result of this is that:
0x41 is treated as smaller than 0xBF (good)
0x41 is treated as LARGER than 0xC3 (BAD BAD BAD)
0xBf is treated as smaller than 0xC3 (good)

When trying to construct a sorted binary tree of strings this would be a problem.
I'm not an expert on BYOND's string tree, but a truncated comparison is NOT sufficiently consistent mathematically for binary trees in general.

good logic:
"\x10" vs. "\xF0" is -224, so "\x10" is to the LEFT of "\xF0"
"\x10" vs. "\x80" is -112, so "\x10" is to the LEFT of "\x80"
"\x80" vs. "\xF0" is -112, so "\x80" is to the LEFT of "\xF0"

order: "\x10" < "\x80" < "\xF0".

bad logic:
"\x10" vs. "\xF0" is 32, so "\xF0" is to the LEFT of "\x10"
"\x10" vs. "\x80" is -112, so "\x10" is to the LEFT of "\x80"
"\x80" vs. "\xF0" is -112, so "\x80" is to the LEFT of "\xF0"

order: "\xF0" < "\x10" < "\x80" < "\xF0" < "\x10" < "x80" < "\xF0" in a loop forever; there is no consistent ordering. Where they appear in the tree is based on luck (BAD!).

A binary tree cannot function with a "looping" comparison.
Please ignore my mention of the rebalance routine. I think it's just been inlined by the compiler so I did my best to skip over it.
In response to SpaceManiac
Hrmmm. So intransitivity is on the table in some cases? I wonder if it'd be possible to generate a test case for that with a simple set of strings, then. It should be a matter of just using ascii2text() a few times to create strings that didn't exist at compile-time, although in theory the tree should get just as messed up with existing strings on world load.

In any event, I'll change the underlying code for ProtoStrCompSigned() to return an int instead of s1c, and make sure the returned value is used as an int without truncation as well.
I'm able to easily reproduce the issue in the original post of this thread so once you've made that change I can let you know if it is fixed.
Please give this a test in BYOND 513.1533 and let me know if it's fixed.
Yup it seems to be fixed. Thanks.
Lummox JR resolved issue with message:
The string tree was subject to occasional corruption in Linux in rare cases.