ID:2026658
 
BYOND Version:509
Operating System:Windows 10 Pro 64-bit
Web Browser:Firefox 41.0
Applies to:Dream Daemon
Status: Open

Issue hasn't been assigned a status value.
Descriptive Problem Summary:
Performance degredation with lists with a large amount of items whether adding to, removing from, or just looping through.
Numbered Steps to Reproduce Problem:
make a large list, profile a loop through it, split said list up into chunks and profile the loops.
Code Snippet (if applicable) to Reproduce Problem:
world
loop_checks = 0

var/global/list/one = list()
var/global/list/two = list()
var/global/list/three = list()
var/global/list/four = list()
var/global/list/five = list()
var/global/list/six = list()
var/global/list/seven = list()
var/global/list/eight = list()
var/global/list/nine = list()
var/global/list/ten = list()
var/global/list/eleven = list()


var/global/list/listone = list()
var/global/list/listtwo = list()
var/global/list/listthree = list()
var/global/list/listfour = list()
var/global/list/listfive = list()
var/global/list/listsix = list()
var/global/list/listseven = list()
var/global/list/listeight = list()
var/global/list/listnine = list()
var/global/list/listten = list()
var/global/list/listeleven = list()

verb/test_list()
var/start = world.timeofday
//world << "starting at [start]"
var/n = 1
for(var/i = 1 to 2600000)
listone += i
if(n != 10 && !(i % 260000))
//world << "Hit milestone [i] in [((world.timeofday - start)/10)] second\s"
n++
switch(n)
if(1)
listtwo += i
if(2)
listthree += i
if(3)
listfour += i
if(4)
listfive += i
if(5)
listsix += i
if(6)
listseven += i
if(7)
listeight += i
if(8)
listnine += i
if(9)
listten += i
if(10)
listeleven += i
var/startone = world.timeofday
//world << "starting processing of list one at [startone] total of [((world.timeofday - start)/10)] second\s have passed"
process_list_one()
//world << "finished list one in [((world.timeofday - startone)/10)] second\s; total of [((world.timeofday - start)/10)] second\s have passed"
startone = world.timeofday
//world << "starting processing of list one at [startone] total of [((world.timeofday - start)/10)] second\s have passed"
process_list_two()
//world << "finished list two in [((world.timeofday - startone)/10)] second\s; total of [((world.timeofday - start)/10)] second\s have passed"

/proc/process_list_one()
for(var/ii in listone)
one += ii
/proc/process_list_two()
var/starttime = world.timeofday
var/i
var/substart = world.timeofday
for(i in listtwo)
two += i
//world << "finished sub list one in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listthree)
three += i
//world << "finished sub list two in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listfour)
four += i
//world << "finished sub list three in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listfive)
five += i
//world << "finished sub list four in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listsix)
six += i
//world << "finished sub list five in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listseven)
seven += i
//world << "finished sub list six in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listeight)
eight += i
//world << "finished sub list seven in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listnine)
nine += i
//world << "finished sub list eight in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listten)
ten += i
//world << "finished sub list nine in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"
substart = world.timeofday
for(i in listeleven)
eleven += i
//world << "finished sub list ten in [((world.timeofday - substart)/10)] second\s; total of [((world.timeofday - starttime)/10)] second\s have passed"


Expected Results:
similar performance
Actual Results:
                       Profile results (total time)
Proc Name Self CPU Total CPU Real Time Calls
---------------------- --------- --------- --------- ---------
/mob/verb/test_list 20.202 39.803 39.803 1
/proc/process_list_one 17.912 17.912 17.913 1
/proc/process_list_two 1.689 1.689 1.690 1

Does the problem occur:
every time with large lists


When does the problem NOT occur?
When the list is split up into smaller chunks

Workarounds:
make a lot of smaller lists.

Reason i bothered testing for this is we are wanting to increase map size to 500x500x6 but looping through turfs causes infinite loop errors resulting in poor performance and eventually a non working proc.

edit: processing only the list chunks in the example, IE commenting out the references to listone and process_list_one(), i get this
                       Profile results (total time)
Proc Name Self CPU Total CPU Real Time Calls
---------------------- --------- --------- --------- ---------
/mob/verb/test_list 2.164 3.713 3.713 1
/proc/process_list_two 1.549 1.549 1.549 1

I'm honestly not sure what I'm looking at here. The example is ultra-complicated and it's hard to tell the goal from the code. What are the sizes of all the lists involved? Even that's not clear from the code. The process routines are very apples-and-oranges in terms of what they're doing, so it's hard to see how any inferences can be drawn from this.
I just saw you edited this to add a removal test, but I can tell you that test is really messed up. You're looping through list items in forward order, and Remove() and -= both operate in reverse order, so you'll have worst-case performance that way and it's O(N^2).

If you're removing a bunch of items from a list while looping through it, it's best to do it by index when feasible; it's even better, if list order is not an issue, to swap those items to the end of the list and remove them all in one shot by decreasing the list length.
profile of list destroy
                       Profile results (total time)
Proc Name Self CPU Total CPU Real Time Calls
---------------------- --------- --------- --------- ---------
/mob/verb/test_list 2.158 331.502 333.978 1
/proc/destroy_list_two 327.728 327.753 328.800 1
/proc/process_list_two 1.569 1.587 2.557 1
/proc/announce 0.047 0.047 2.456 22


Lummox JR wrote:
I just saw you edited this to add a removal test, but I can tell you that test is really messed up. You're looping through list items in forward order, and Remove() and -= both operate in reverse order, so you'll have worst-case performance that way and it's O(N^2).

If you're removing a bunch of items from a list while looping through it, it's best to do it by index when feasible; it's even better, if list order is not an issue, to swap those items to the end of the list and remove them all in one shot by decreasing the list length.

Ah alright
In response to Lummox JR
Lummox JR wrote:
I'm honestly not sure what I'm looking at here. The example is ultra-complicated and it's hard to tell the goal from the code. What are the sizes of all the lists involved? Even that's not clear from the code. The process routines are very apples-and-oranges in terms of what they're doing, so it's hard to see how any inferences can be drawn from this.

size of listone is 2.6e+006 items long
size of listtwo through listeleven is 260,000 items long with a total of 2.6e+006
I need to see that code vastly simplified, because right now it's really not clear at all what it's doing. I'm really not even sure this is appropriate for a bug report, as performance issues may not be bugs at all.
other than coming up with better variable names dunno how i can simplify it as it's pretty simple already. I can comment the whole thing out for clarification if that would help

If so
http://ss13.moe/uploads/2016-01-29_14-59-32.txt profile for this code: http://ss13.moe/uploads/2016-01-29_14-59-46.txt
You're appending to a growing list each time, and 'one' grows much faster than the rest. Clear the lists at the top of your verb (or don't append to a list as part of your loop) and the problem disappears.
In response to GinjaNinja32
GinjaNinja32 wrote:
You're appending to a growing list each time, and 'one' grows much faster than the rest. Clear the lists at the top of your verb (or don't append to a list as part of your loop) and the problem disappears.

point of this is to show that working with large lists is very different performance wise than several smaller lists even at the same total item length
My guess is lists are linked lists, ie insertion at the end is slow. That'd mean inserting N items at the end of a long list costs significantly more than inserting one item at the end of N short ones (it's approximately O(m*n**2) vs O(m*n), where n is the number of items and lists, and m is the length of the shorter lists)

edit: Additionally, if you do as I suggest, the *split* lists perform slightly worse.

double edit: I'm breaking my own arguments here.

triple edit: Profile results when clearing lists between runs: http://nyx.gn32.uk/i/2016-01-29_22:39:19.png
In response to GinjaNinja32
GinjaNinja32 wrote:
My guess is lists are linked lists, ie insertion at the end is slow. That'd mean inserting N items at the end of a long list costs significantly more than inserting one item at the end of N short ones (it's approximately O(m*n**2) vs O(m*n), where n is the number of items and lists, and m is the length of the shorter lists)

edit: Additionally, if you do as I suggest, the *split* lists perform slightly worse.

double edit: I'm breaking my own arguments here.

triple edit: Profile results when clearing lists between runs: http://nyx.gn32.uk/i/2016-01-29_22:39:19.png
care to post the code

http://gn32.uk/f/lists.dme

edit: Not sure whether I misread my own code or what, but this is clearing the initial not final lists. Unsure what that means for this bug report.
lummox, let me clear this up for you:

First, ignore the deletion part, it's a red herring.

Ignore the part about adding to the list.

The key part is /proc/process_list_one and /proc/process_list_two.

He is showing that if you stuff 260000 items into a list, and loop thru it reading only, it takes 17.9 seconds, but if you split the same list up between 10 lists of 26000 items each, and loop thru all 10, it takes 1.6 seconds TOTAL to loop thru all 10.

Really those numbers should be identical, in fact looping thru the 10 lists of 26000 items should be higher then looping thru 1 list of 260000 items.

Meaning there is a overhead to looping thru a list that scales with it's size.

In table format:

Total Time to loop thru 260000 items spread across 1 list with 260000 in each list: 17.912
Total Time to loop thru 260000 items spread across 10 lists with 26000 in each list: 1.689

yeah what MSO said, though it does show to get increasingly more expensive to add something to a larger list seen here http://ss13.moe/uploads/2016-01-29_14-59-32.txt
I'm trying to rule adding out, as that has to expand the list, and some downgrade in performance for doing that is expected.

Reading isn't affected by that.

i confirmed on my machine, I used my slower laptop to make it easier to see the difference:


Time it takes to loop thru 260,000 items in a list and increment a proc var number for each item: 300 seconds

Time it takes to loop thru 26,000 items in a list and increment a proc var number: 3 seconds

Total Time it takes to loop thru 10 lists with 26,000 items in each list and increment a proc var number: 30: seconds.

Note: I have the page file disabled, so any possibility that the large list is getting paged to virtual ram can be dismissed.
In response to MrStonedOne
MrStonedOne wrote:
Time it takes to loop thru 260,000 items in a list and increment a proc var number for each item: 300 seconds

Time it takes to loop thru 26,000 items in a list and increment a proc var number: 3 seconds

Total Time it takes to loop thru 10 lists with 26,000 items in each list and increment a proc var number: 30: seconds.

Do you have cleaner example code that shows this?
No, i used his code,

i'll see about cleaning it up a bit, but the way it works is very necessary, doing a list of lists gives the same performance as a large list for some reason.
Are DM lists implemented as vectors/arrays or linked-lists under the hood? I've seen conflicting reports over the years (or maybe it's been changed over time).

If vectors, then the profile results for adding new items would make some sense, as a single large DM list would have to reallocate and copy over significantly more data when it reached capacity. I'm not sure what to make of the accessing rates MSO posted, though.

If linked lists, does for(... in list) keep track of the last node it accessed, or only the last index*? If the latter, does that mean it has to re-traverse the linked list from the head each iteration to find the node corresponding to the current index?


*it has always surprised me how well it handles altering the list while iterating through it
*it has always surprised me how well it handles altering the list while iterating through it

I'm pretty sure that it creates a copy of the list in memory to iterate through, so modifications to the actual list don't actually affect it.
Page: 1 2 3