ID:2131741
 
BYOND Version:510.1346
Operating System:Linux
Web Browser:Chrome 52.0.2743.85
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:
Occasionally the BS12 server starts freezing for unknown reasons; on investigation, the profiler lists several text-manipulation-heavy procs as being unusually high in CPU usage.

Opening the 'View Variables' window (lists all vars on a specific object or datum, has links to change their values) causes a major freeze, and the proc that does most of the text manipulation for the window jumps up the profiler (see /proc/make_view_variables_var_list in the profile results below; that's 16.5 self cpu for one call); in a 'normal' round, this window doesn't have a noticeable effect on the server.

Profiler results: https://gn32.uk/f/profile.txt
(hear_say, updateinfolinks, interact, create_text_tag, attack_hand, GetInteractWindow, hear_radio, add_fibers, and likely others further down all do significant text manipulation, and are all higher than I'd expect from a 'normal' round)

Numbered Steps to Reproduce Problem:
Unknown

Code Snippet (if applicable) to Reproduce Problem:
Unknown. BS12 source may do it, but it may require 30+ players playing a full round to reproduce

Expected Results: DD would not begin lagging occasionally and cause text-heavy procs to freeze the server

Actual Results: It did the above.

Does the problem occur:
Every time? Or how often? Not every time, not hugely often
In other games? Unknown
In other user accounts? Unknown
On other computers? Unknown

When does the problem NOT occur? Unknown

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? Unknown

Workarounds: None known
For comparison, after DD was killed and restarted, VVing a virtually identical mob put /proc/make_view_variables_var_list at 0.010 self CPU, about a thousandth of what it took in the laggy round.

edit: This is approximately the same amount of time of a 'normal' round;
The text-heavy procs are nowhere near as high as they were in the laggy round;
- hear_say, the top proc in the laggy round, not even near the top of the normal one
- - 77.4 over 2648 (avg 0.0292) in the laggy round.
- - 0.63 over 1543 (avg 0.000408) in the normal one.
The non-text-heavy procs are in roughly the same place;
- handle_regular_hud_updates
- - 49 over 14400 (avg 0.00340) in the laggy round.
- - 44 over 13000 (avg 0.00338) in the normal one.
One thing i would do, is have something periodically send what ever the signal is for printing memory stats. they will go to the runtime log, or stdout if you don't redirect world.log, then you can see whats going on there.

My suspicions would be that this is caused by something happening to the string tree, and memory stats does print off info on how many entries are in it.

Seeing a timeline of string count might expose something odd happening
MSO is on point here; that's the first thing I'd check.

The string tree will occasionally rebalance itself--normally it doesn't--but for this to happen it has to reach a fairly insane level of depth (5000). However after the rebuild the times for text handling ought to see a dramatic drop.
It's ALMOST CERTAINLY the classic string building problem.

At the start of your view variables function make a list, add all the softcoded strings you are going to use to populate your menu and then at the end use Join() to bring it all together into one string to avoid overhead on concatenation.
cluster, while you might be correct, some of their interfaces use string building, the fact that string operations hit some snapping point and start taking 100X longer is a bug.

edit: for example, the hear_say proc, only does 2 string manipulations in its avg use case, 6 in the worst/rarest use case, but is taking 20ms to run each call? that doesn't sound right at all.
In response to Clusterfack
Clusterfack wrote:
It's ALMOST CERTAINLY the classic string building problem.

At the start of your view variables function make a list, add all the softcoded strings you are going to use to populate your menu and then at the end use Join() to bring it all together into one string to avoid overhead on concatenation.

We already do.
Do you actually know what was the object that you VVed to cause this?
It happens with VV on any object while the server is being laggy; the time comparison above is specifically from VVing my own observer mob.
Big-picture questions here are what the size of the string tree and the size of the total memory used looked like, and also what else on the system may have competed for memory at the time.

Text manipulation will also use memory allocation to create the new string, and if something has slowed down all memory allocation, it might show up most prominently there. Allocations are done in all kinds of other places, but many of those places also do some recycling. If memory allocation suddenly got bad, so would text handling--and that kind of thing could be indicative of the system virtually swapping around memory to and from a file.

If however memory is looking good and it's strictly text handling, we're looking at a tree malformation. Now the tree won't rebalance on its own until it gets to a depth of 5000--because I found out that trying more clever methods of rebalancing led to all kinds of headaches--so maybe it's ending up in a state where it's shy of that, maybe a lot shy, but still terribly underperforming as a result.

(It's worth noting that the string tree used to be a lot worse. When nodes were deleted from it, it snipped them out in a very naive way that wasn't at all sensitive to depth.)
Mloc did check the memory, since I suspected that might have been an issue; IIRC DD was at about 400MB usage, so nowhere near a limit; it may have been some sort of fragmentation issue though, I guess?

The machine itself should have plenty of RAM, AFAIAA the only high-memory task running on it is BYOND.

Is the string tree cleared between rounds, or does it stick around until DD itself is closed and restarted?
The string tree is cleared on a reboot.
have mloc have something call kill -SIGUSR2 on the process every now and then to make it print memory stats to stdout/world.log, then show them here.

The key is to maybe see if the string tree is getting insanely large or something.
I've not managed to get memory stats yet, however the problem just corrected itself without a restart, not even a soft-restart. One round, less than half an hour from laggy to not laggy.
In response to GinjaNinja32
That's a data point in favor of my theory that the string tree got badly out of balance, but not so badly that it triggered a rebuild of the tree.

With many of the tree's rebuilding woes behind it, I wonder if I should bring that high limit down. A depth of 1000 isn't really acceptable either.
I just had this happen because windows was for some reason stuffing parts of byond's memory into the page file even though there was plenty of free memory, with at least some part of the string tree being paged.

Could be the same cause on linux if something else started leaking or using lots of memory and the swap had to be used.
Now that's truly weird. I can't think of any way to guard against something like that happening, though. If the memory availability is fine--and more to the point, DD is nowhere near its own internal memory limits for a 32-bit program--then that shouldn't happen.
Windows can be stupid sometimes about how it pages.

Luckily for the server op, disabling the page file is a solution if there is enough memory.
Bump. Bay just hit this again.

Profile:


Note the procs near the top, again the string-handling ones; text tags, ghost follow links, say(), do_communicate, etc.
Do you even use red-black trees for strings? It would probably self-balance itself if red-black trees were used.
Page: 1 2