ID:1500611
 
BYOND Version:504
Operating System:Linux
Web Browser:Chrome 32.0.1700.107
Applies to:DM Language
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:

When we boot the game up, the CPU is a steady 30-50% or so with 50 people. Over time the CPU increases, but the profiler doesn't change all that much, whereas the CPU can reach peaks of 200% after a day of the server being live.

It usually takes about 24 hours before we have to reboot to restore the CPU. I don't think there's any bad code in Eternia's code-base trigging this because the profiler comparisons remain fairly consistent, regardless of CPU.

Numbered Steps to Reproduce Problem:

1.) Boot up an Eternia server. Take a 1 hour profile of the CPU.

2.) Wait 24 hours.

3.) Take a 1 hour profile of the CPU. It will be similar to the other profile but the game's server will be much higher (consistently past 120%).


Does the problem occur:
Every time? Or how often? This happens every time we boot the game up. We have to reboot every 24 hours, if not every 24 hours for an OK performance.
In other games? Unsure.
In other user accounts? Yes.
On other computers? Yes.

When does the problem NOT occur?

Cannot be sure. I believe this may have been an issue for a while but only recently noticed the drastic difference in CPU after optimizing Eternia.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.)

Unsure.

Workarounds:

N/A
Is the amount of RAM being used by the game steadily increasing over time?
I have one theory that may be viable, and it's something you might be able to collect data for using the profiler. (I'm curious about the RAM usage also, though.)

There are certain things that could build up over time that would cause particular operations to run longer, such as the string tree getting badly out of balance. We had code in place at one point to rebalance it automatically, but it didn't work out well. (As you may recall this was back when many games had those mysterious crashes on Linux, because something was getting broken and that something was usually the string tree. The auto-rebalance was causing games to crash that had previously just hung, only it tended to trigger much sooner.) The only indicator I can think of that you might rely on is that procs that do no string manipulation whatsoever--not even so much as changing a var that holds a string--would be completely unaffected. Balance issues would tend to impact proc profile times whenever:

1) Any string manipulation is done, such as adding, using text macros or bracketed expressions like "your [obj]", etc.
2) A string that was unique is no longer in use, like when the var it was assigned to has a new value

In any of those cases, an unbalanced string tree would take longer to process.

This theory isn't necessarily consistent with much higher RAM usage. Moderately higher, it could go either way.
Currently saving the profiler log every hour and then clearing, to see if there's a rise in string manipulation procs.

Not sure how to to log the RAM, so any help there would be appreciated. I'm using a Linux VPS and have root access via WinSCP.

Edit: current ram: http://puu.sh/71C3U.png Going to check a while later and post the numbers.
You can use 'top' for a constant update or 'ps | grep DreamDaemon' to get a snapshot of RAM usage in linux.
Okay, we had a big increase in RAM usage: http://puu.sh/71ViL.png

Four hours after the above screenshot: http://puu.sh/723dW.png
I've reported the same issue in the developer help section, my RAM usage increases over time, and maxes out at my 2 GB of RAM, then starts using slow pagefile memory.
(This is SS13 we're talking, by the way)

Our game reboots every hour or so, but the RAM stays as high and just continues to rise until I eventually need to reboot the process.

Running 503.1224, it was in the beta, too. (Which is why I switched back)
In response to Laser50
Can you narrow down where this started to become an issue?

Also, are you using animation for anything, or no?
In response to Lummox JR
Lummox JR wrote:
Can you narrow down where this started to become an issue?

Also, are you using animation for anything, or no?

I'm not *too* familiar with the whole codebase, but doing a quick search, I don't think I do.

I wouldn't know when this started to become an issue, honestly. I know it was quite some time ago. I'd say at 501/502 if I had to take a guess.
In response to Laser50
500 introduced animations; 501 introduced blend_mode and the first threading. Since this happens in 503.1224 also, where threading was disabled, I think we can rule that out. Likewise we can rule out blend_mode, which is relatively minor. So I'd tend to believe the issue started in 500 or earlier, if it was in 501. But I'll need more info.

You didn't mention if you were using animation.
I checked, and I didn't use Animation.

We do use blend_mode, however. But the proc it's used in seems related to icon updating.

Wouldn't there be a way to check where RAM is being allocated to? I've tried sigusr. But the results didn't pass 100MB when the server was using at least 500.

Doing a check now..
I'm assuming SIGUSR2 returns the memory values in Bytes. So calculating that all, I end up on approx. 53 MB.
Server's using around 500 MB right now.
(And yes, I may have calculated that wrong.)

server mem usage:
Prototypes:
obj: 1085936 (6585) -- 1.03563 MB
mob: 1088352 (151) -- 1.03793 MB
proc: 8605844 (15449) -- 8.20717 MB
str: 4150340 (75893) -- 3.95807 MB
appearance: 6114260 (10020) -- 5.83101 MB
id array: 8633996 (29576) -- 8.23402 MB
map: 1472864 (255,255,6) -- 1.40463 MB
objects:
mobs: 71260 (54) -- 0.0679588 MB
objs: 14186408 (55727) -- 13.5292 MB
datums: 5456480 (55727) -- 5.2037 MB
lists: 9390128 (269564) -- 8.95512 MB

For any other info.. Server's running Ubuntu 12.04.4 TLS, 64 bit.
I'm sorry I don't have a lot of info more. I was hoping the issue was on my side, but the code I made hasn't, or shouldn't, do this.
In response to Laser50
Indeed, if it was your code you should be seeing much higher memory usage for one of those items, which you're not. (That's not to say your code doesn't do something in particular that triggers the leak, but it wouldn't be the fault of the code.) Quite a bit of memory usage is not covered by this, handling some of the internals, so something else has to be rising. But if it's not animation, then that would also tend to rule out 500 as the place this changed. Nevertheless it would be helpful if you could get any test info from running 500 without blend_mode to see if maybe that shows any differences. Knowing where the problem appeared would go a long way toward narrowing down what's wrong.
The only proc that uses blend_mode was something I was able to trigger quite easily, had 5 or so players, plus myself trigger the proc a whole bunch of times for a few minutes long, there was no change in RAM usage throughout this at all.

Server rebooting/startup doesn't seem to change anything, on the ram, either, it's still idling at around 527 MB.

Edit: I'll keep a close look on htop, and see if I can figure out what happened in-game when the RAM goes up.
In response to Laser50
Laser50 wrote:
The only proc that uses blend_mode was something I was able to trigger quite easily, had 5 or so players, plus myself trigger the proc a whole bunch of times for a few minutes long, there was no change in RAM usage throughout this at all.

There's no way blend_mode would have an impact on server RAM; I've already ruled that out. I meant if you disabled blend_mode, you could recompile in an earlier version that could run in 500. Unless you're using atom transforms, which are another 500 feature; then I'm not sure I could do much.

There were some /matrix leaks in the past, but those have been dealt with and I haven't seen any new reports of them.
The entire code was changed to use atom transforms when version 500 hit. So I'm not gonna be able to go back any further than that.

Would there be a way to find out what else is using RAM somehow? Perhaps even as part of a debug build, I'm not sure what else I could think of.
In response to Writing A New One
Writing A New One wrote:
3 hours: http://pastebin.com/yj3CgmKJ

24 hours: http://pastebin.com/DBduUSCu

And did you check the proc that was on top?
In response to Writing A New One
Those pastebins seem remarkably similar, all things considered. If you look at the procs that are called most often (and therefore should be less susceptible to variations in timing), the results are too close to call. I think this rules out string tree imbalance, as that would be way more obvious. There does seem to be a bit higher time all around, which could be partly explained by things like fragmentation and using/recycling a lot more objects during that time; but I think the actual underlying cause is tied to the RAM usage.

This suggests the RAM usage plays a major factor here. Again fragmentation is possible, but I don't see it being an issue on that level. There are some internals that could tie into things that use CPU, like map ticks, without influencing proc profiling very much.

It would be instructive to know if the CPU problem is very much lower on 505, when hosting for a bunch of users who are also on 505. However I think that's new enough that it's unlikely you'd get an adequate test audience for that.
In response to Laser50
Laser50 wrote:
I checked, and I didn't use Animation.

We do use blend_mode, however. But the proc it's used in seems related to icon updating.

Wouldn't there be a way to check where RAM is being allocated to? I've tried sigusr. But the results didn't pass 100MB when the server was using at least 500.

Doing a check now..
I'm assuming SIGUSR2 returns the memory values in Bytes. So calculating that all, I end up on approx. 53 MB.
Server's using around 500 MB right now.
(And yes, I may have calculated that wrong.)

server mem usage:
Prototypes:
obj: 1085936 (6585) -- 1.03563 MB
mob: 1088352 (151) -- 1.03793 MB
proc: 8605844 (15449) -- 8.20717 MB
str: 4150340 (75893) -- 3.95807 MB
appearance: 6114260 (10020) -- 5.83101 MB
id array: 8633996 (29576) -- 8.23402 MB
map: 1472864 (255,255,6) -- 1.40463 MB
objects:
mobs: 71260 (54) -- 0.0679588 MB
objs: 14186408 (55727) -- 13.5292 MB
datums: 5456480 (55727) -- 5.2037 MB
lists: 9390128 (269564) -- 8.95512 MB

For any other info.. Server's running Ubuntu 12.04.4 TLS, 64 bit.
I'm sorry I don't have a lot of info more. I was hoping the issue was on my side, but the code I made hasn't, or shouldn't, do this.

How'd you do that? I tried kill -12 pid and kill -SIGUSR2 pid, I'd like to give my input on some games that have an outrageous buildup of RAM. So much that I have to ask them to please shut it down and bring it back up so it's all freed up.
SIGUSR 2 will often work, but never if the server is stuck.

I actually took the method from you guide.
Simply use one of these:

SIGSEGV (print a backtrace and abort)
SIGBUS (print a backtrace and abort)

if the server is stuck. Or just SIGUSR2 if it works fine.

Considering you're hosting for others, it's probably a better idea to create a full restart system. Call a command that kills the whole game on shutdown, and then reboot it with an automated script. This should help you on RAM buildup, and even give you a slight performance boost on some areas.

My friend used to do it this way, yet I have no damn clue now.
Page: 1 2 3 4