ID:151655
 
I'm having some trouble optimizing a bit of code. I'm attempting to loop through a set of tiles, and figure out which tiles are blocked and which aren't. Tiles that don't block the movement of the loop are swapped into their own area with all those tiles that can be reached from one another trough pathing.

I'm trying to do this over a significantly large area, and as such, need some advice on how to do this effectively.

The method I'm using is pretty brute-force. Can anyone point me to a better method, or point me to a paper or something on the subject?

This is what I've got so far:

proc
divide(var/turf/start)
var/list/edges = list(start)
var/list/contents = list()
var/list/walls = list()
while(edges.len>0)
var/turf/t = edges[1]
if(!t.density)
var/d = NORTH
for(var/count=0;count<=3;count++)
var/turf/a = get_step(t,d)
if(!edges.Find(a)&&!contents.Find(a))
if(a.density)
walls += a
else
edges += a
d = turn(d,90)
contents += t
else
walls += t
edges -= t
var/area/a = new/area()
a.contents.Add(contents)
a.walls = walls


This method freezes pretty badly when diving big areas, and isn't really meant to profile an entire map. on a 1000x1000 tile scene, it takes over 15 minutes to manage all of this.

However, on a smaller area (50x50), it performs miraculously well, dividing an area in somewhere under a tenth of a second.

Anybody know of a better way to do this? I've got tons of ideas, but I'm pretty well stumped when it comes to optimizing them.
The method is O(n), where n is the number of tiles you're looping through. For an x by y map, it's O(x*y). That's pretty good, so I'm not sure if there's any major algorithmic improvements that can be made.

I'm not sure what the backing data structure for a raw /list in DM is - associative lists are done with a binary tree of some sort, so the find(), add(), and remove() operations are probably all O(log n).

What you're looking for here is some sort of set data structure - a hashset, for example. find(), add(), remove() in O(1) if it's done properly. Not sure if there are any DM implementations.

Any particular reason you're using list.Find() instead of the in operator? Not sure what the relative speeds are, just asking.
In response to Jp (#1)
Find has been shown to be anywhere between twice and four times as fast as the in operator.

Relevant links ahead: [link]
In response to Ter13 (#2)
Ter13 wrote:
Find has been shown to be anywhere between twice and four times as fast as the in operator.

Relevant links ahead: [link]

AJX never published the methodology on which he based his tests, and while Foomer anecdotally claimed to see a 4x speed difference, I can find no good evidence to back that up. Actually the code for both operations is very similar, so I think anything with a 2x speedup or more is highly improbable. Under average circumstances I expect the difference would actually be negligible. Anyway I think the way that "in" is handled is outdated and could be changed at some point, which would render the difference between routines completely moot.

My information suggests that the 2-4x figure is wildly inaccurate, but we would need reliable data to determine the actual speed difference between routines under different circumstances.

Lummox JR
In response to Lummox JR (#3)
Lummox JR wrote:
AJX never published the methodology on which he based his tests, and while Foomer anecdotally claimed to see a 4x speed difference, I can find no good evidence to back that up.

I decided to form a small test of my own.

Hypothesis: The 'in' operator is up to 4x slower than Find().

Method: The code may not be the optimal performance comparison, but it seems to me to be controlled enough to isolate the single variable in question here. It's intended to be run a single Test_ verb per instance (so between executions, the differences in small-to-large lists could become apparent).
#define DEBUG

var/list
elem_10
elem_100
elem_1000
elem_10000

proc
/*
List generation:
*/

rand_string() // 6-12 letter string
. = ""
for(var/i in 1 to rand(6, 12))
. += ascii2text(rand(65, 122))

generate_lists()
if(!elem_10)
world << "Generating lists..."
sleep(0)
elem_10 = new
for(var/i in 1 to 10)
elem_10 += rand_string()

elem_100 = elem_10.Copy()
for(var/i in 1 to 90)
elem_100 += rand_string()

elem_1000 = elem_100.Copy()
for(var/i in 1 to 900)
elem_1000 += rand_string()

elem_10000 = elem_1000.Copy()
for(var/i in 1 to 9000)
elem_10000 += rand_string()

world << "Generation complete"
sleep(0)

/*
Execute Find() or 'in' on all list elements
*/


find_all(list/L)
for(var/elem in L)
. = L.Find(elem)

in_all(list/L)
for(var/elem in L)
. = (elem in L)

/*
Testing verbs:
*/


mob/verb/Test_10()
generate_lists()

for(var/i in 1 to 1000) // otherwise both profile at 0.000
src << "Testing 'in'"
sleep(0)
in_all(elem_10)
src << "Finished"

src << "Testing 'Find()'"
sleep(0)
find_all(elem_10)
src << "Finished"

mob/verb/Test_100()
generate_lists()

for(var/i in 1 to 1000)
src << "Testing 'in'"
sleep(0)
in_all(elem_100)
src << "Finished"

src << "Testing 'Find()'"
sleep(0)
find_all(elem_100)
src << "Finished"

mob/verb/Test_1000()
generate_lists()

src << "Testing 'in'"
sleep(1)
in_all(elem_1000)
src << "Finished"

src << "Testing 'Find()'"
sleep(1)
find_all(elem_1000)
src << "Finished"

mob/verb/Test_10000()
generate_lists()

src << "Testing 'in'"
sleep(1)
in_all(elem_10000)
src << "Finished"

src << "Testing 'Find()'"
sleep(1)
find_all(elem_10000)
src << "Finished"

mob/verb/list_dump() // to see our "random" strings and prove a fair amount of uniqueness
. = "<pre>"
for(var/elem in elem_10000)
. += elem + "<br>"
usr << browse(. + "</pre>")

Notably, the Test_10() and Test_100() verbs will take a little longer to execute because they perform the tests a thousand times (and inherently perform many outputs). Otherwise, the results just came up at 0.000 due to the small data set.

Data:

Test_10():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_10           0.072        0.476        7.217            1
/proc/generate_lists        0.028        0.381        0.401            1
/proc/rand_string           0.353        0.353        0.358        10000
/proc/in_all                0.011        0.012        0.012         1000
/proc/find_all              0.012        0.012        0.012         1000


Test_100():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_100          0.083        0.843        2.898            1
/proc/generate_lists        0.026        0.385        0.405            1
/proc/rand_string           0.359        0.359        0.365        10000
/proc/in_all                0.244        0.244        0.244         1000
/proc/find_all              0.131        0.131        0.131         1000


Test_1000():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_1000         0.000        0.404        0.436            1
/proc/generate_lists        0.031        0.378        0.388            1
/proc/rand_string           0.347        0.348        0.354        10000
/proc/in_all                0.021        0.021        0.021            1
/proc/find_all              0.005        0.005        0.005            1


Test_10000():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_10000        0.001        2.866        2.941            1
/proc/in_all                2.030        2.030        2.030            1
/proc/find_all              0.455        0.455        0.455            1
/proc/generate_lists        0.028        0.380        0.406            1
/proc/rand_string           0.352        0.354        0.357        10000


Analysis:

At the small list size of 10 elements, in_all() and find_all() appear to perform equally well with matching CPU scores.

With 100 elements, in_all() appears to be roughly twice as slow as find_all() on average.

Using a list of 1000 elements, in_all() appears to perform 4x slower than find_all().

Finally, at 10,000 elements, in_all() rings in at ~4.5x slower than find_all().

Conclusion: The 'in' operator seems to be slower than Find(), especially in large data sets. This test does not, however, demonstrate the specific performance advantages of individual locations in the list; that is, which performs better for elements near the beginning or near the end of the list---or directly in the middle---cannot be concluded from this.
In response to Kuraudo (#4)
While this is more methodology than AJX showed, I'm not sure it shows a proper test of garden-variety usage. The way "in" and Find() are being tested, it's testing exhaustively rather than doing some of the more common operations done within a real game. And it's worth pointing out that for many cases most lists will be shorter rather than longer, so this test probably only shows the extreme behavior. But it does bear Foomer out for those cases.

If it's feasible to drop the method of loop indexing that the "in" test uses and use the same method as in Find(), it seems that a moderate speed improvement can be eked out here--perhaps a big speed improvement for complex games.

Lummox JR
In response to Lummox JR (#5)
I agree that the test is not representative of the most common use; an exhaustive approach seemed the most viable in comparing the average search speed of each method.

I wanted to chime in with regard to [link] and [link]---specifically:
The need to check for type safety before using Find() though would likely completely negate any potential savings from it.

And:
To compare it equally with Find(), you'd have to include a type check in the process, which is going to be inevitably slower since with 'in' it's automatically done internally. So if in a given scenario you'd want to include a list type-check, 'in' would actually be the faster method to use.

I edited my snippet such that find_all() was written as such:
    find_all(list/L)
for(var/elem in L)
if(istype(L)) // To accomodate this need
. = L.Find(elem)


It only had a negligible impact on the results:

Test_10():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_10           0.121        0.533        0.557            1
/proc/generate_lists        0.033        0.381        0.401            1
/proc/rand_string           0.348        0.349        0.354        10000
/proc/in_all                0.016        0.016        0.017         1000
/proc/find_all              0.015        0.015        0.015         1000


Test_100():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_100          0.044        0.837        8.159            1
/proc/generate_lists        0.026        0.380        0.400            1
/proc/rand_string           0.354        0.356        0.360        10000
/proc/in_all                0.261        0.261        0.261         1000
/proc/find_all              0.152        0.152        0.153         1000


Test_1000():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_1000         0.000        0.403        0.502            1
/proc/generate_lists        0.026        0.378        0.397            1
/proc/rand_string           0.352        0.354        0.355        10000
/proc/in_all                0.020        0.020        0.020            1
/proc/find_all              0.005        0.005        0.005            1


Test_10000():
                      Profile results (total time)
Proc Name                Self CPU    Total CPU    Real Time        Calls
--------------------    ---------    ---------    ---------    ---------
/mob/verb/Test_10000        0.000        2.873        2.908            1
/proc/in_all                2.034        2.034        2.034            1
/proc/find_all              0.462        0.462        0.462            1
/proc/generate_lists        0.032        0.377        0.389            1
/proc/rand_string           0.345        0.345        0.351        10000


Even with the istype() check for type-safety, the results remain nearly unchanged from the previous tests.
proc
divide2(var/turf/start)
var/list/status[world.maxx][world.maxy]
var/list/edges = list(start)
status[start.x][start.y] = 1
var/list/contents = list()
var/list/walls = list()
while(edges.len>0)
var/turf/t = edges[1]
var/d = NORTH
for(var/count=0;count<=3;count++)
var/turf/a = get_step(t,d)
if(a)
if(!status[a.x][a.y])
if(a.density)
walls += a
else
edges += a
status[a.x][a.y] = 1
d <<= 1
contents += t
edges.Cut(1,2)
var/area/a = new/area()
a.contents.Add(contents)
a.walls = walls


Does 1000x1000 in a bit over ten seconds. Note that it does result in a relatively hefty memory usage of about 25MB at that size.
In response to Jp (#1)
Jp wrote:
The method is O(n), where n is the number of tiles you're looping through.

Wrong. The inner-most loop is that of the Find() proc, which is going to be executing 8n times. The search function itself is likely log(n), so the function is O(nlog(n)). That is, of course, an overly simplistic analysis, as the search time increases per iteration (rather, the size of the data set that must be searched starts at 1 and increases towards n), but right now I'm already doing homework on this crap and I don't feel like doing more. Regardless: it ain't linear.
In response to Kuraudo (#6)
And that does it for me. I'm still sticking with Find except in very particular circumstances.
In response to Garthor (#7)
So it's the find that's the bottleneck?
In response to Ter13 (#10)
Ter13 wrote:
So it's the find that's the bottleneck?

That's a silly way of saying things. Find() is taking time, as is looping through everything, as is everything else. By substituting Find() with a more efficient method of searching - namely a table lookup - the runtime is reduced. Find() itself, however, was no more or less a bottleneck than looping through every turf. It was simply the easiest to improve upon.
In response to Garthor (#7)
Is it too late to submit my entry to the mix?
proc
divide3(var/turf/start)
set background = 1
var/list
edges
contents = new
walls

if(start.density)
walls = list(start)

else
edges = list(start)
walls = new

var/status[world.maxx][world.maxy]
status[start.x][start.y] = 1

do
var/turf
t = edges[1]
n = get_step(t, NORTH)
s = get_step(t, SOUTH)
e = get_step(t, EAST)
w = get_step(t, WEST)

if(n && !status[n.x][n.y])
if(n.density)
walls += n
else
edges += n
status[n.x][n.y] = 1
if(w && !status[w.x][w.y])
if(w.density)
walls += w
else
edges += w
status[w.x][w.y] = 1
if(s && !status[s.x][s.y])
if(s.density)
walls += s
else
edges += s
status[s.x][s.y] = 1
if(e && !status[e.x][e.y])
if(e.density)
walls += e
else
edges += e
status[e.x][e.y] = 1

contents += t
edges.Cut(1, 2)
while(edges.len)
var/area/a = new
a.contents.Add(contents)
a.walls = walls

It is a little bit longer; it's based on the original snippet by Ter13 with the same table lookup Garthor used, as well some other niceties, such as an unrolled loop and a bit of refactoring. For example, the original snippet checks if t (the first element in edges) is dense. However, there is only one instance where this is ever dense, and that is if start is dense! So by moving this test out of the loop and to the beginning of the proc, the test can be done once instead of, say, 1000x1000 times.

That said, I wanted to point out that Garthor's doesn't quite do the same thing as the original (I tried to make mine perform the exact same function; if it doesn't, I made an error). Garthor's differs greatly if the start turf is dense because it removes this test entirely, whereas Ter13 has it in the while() loop and I have it at the beginning of the proc. That said, if you never call divide2() on a dense turf I don't believe you'll notice a difference, and if you're following that logic then you can also remove the test from my snippet.

I'm actually jealous of Garthor, who said his snippet performs in about 10 seconds. With a blank map (achieved by setting world/maxx and world/maxy directly to 1000), and calling via divide2(locate(world.maxx/2, world.maxy/2, 1)), his comes in at around 32 seconds for me! Mine comes in about 2.5 seconds faster than his. I imagine that could translate into something decent on your evil processors!

#define DEBUG

world
maxx = 1000
maxy = 1000

area/var/list/walls

mob/verb/Test_2()
usr << "divide2()...\..."
divide2(locate(world.maxx/2, world.maxy/2, 1))
usr << "Complete"

mob/verb/Test_3()
usr << "divide3()...\..."
divide3(locate(world.maxx/2, world.maxy/2, 1))
usr << "Complete"

                    Profile results (total time)
Proc Name            Self CPU    Total CPU    Real Time        Calls
----------------    ---------    ---------    ---------    ---------
/mob/verb/Test_2        0.028       30.437       32.151            1
/proc/divide2          30.409       30.409       32.126            1
/mob/verb/Test_3        0.025       28.105       29.711            1
/proc/divide3          28.080       28.080       29.686            1
Like Garthor, I would also start with the data structure you use for the list of dense/nondense objects. No matter how you do it, you always have to loop at least through either a) all turf objects on the map or b) all nondense turf objects connected (either directly or via other nondense turf objects) to the initial one and all dense turf objects adjacent to them.

The only way you might be able to improve upon looping through everything is to design your own data structure that keeps track of the density information of all turf objects. A bitmap would be sufficient; a maxx/8 by maxy bunch of bytes, use 1 bit per object to represent whether or not it's dense. Then, every time you want to loop through all the object densities, this might be faster than using get_step(). Though, if you only "divide the map," as you're doing, once over the course of an instance of the world, setting this up might not be a benefit at all. Even if it were, though, it would still likely be a small benefit compared to refactoring the data structure that is actually a list of what's dense/nondense.

So, the big question is "What does Byond use behind the scenes to represent a list?" Someone else in this thread mentioned a binary tree. If it is just a straight binary tree with no optimizations, then you could make a much better data structure yourself, as binary trees are very ineficient. The "fact" that they are log(n) speed for insert and search is not completely accurate. They are log(n) best-case, but they are n worst case, and if you insert into them in an ordered fashion, then you will encounter the worst case, and it will be no better than a fancy linked list.

Look up some data structures that guarentee O(log(n)) worst case; there are several of them that aren't too difficult to implement. That is what you should do in situations like this, where you just need to optimize something that throws tons of data around.
In response to Loduwijk (#13)
Loduwijk wrote:
So, the big question is "What does Byond use behind the scenes to represent a list?" Someone else in this thread mentioned a binary tree. If it is just a straight binary tree with no optimizations, then you could make a much better data structure yourself, as binary trees are very ineficient. The "fact" that they are log(n) speed for insert and search is not completely accurate. They are log(n) best-case, but they are n worst case, and if you insert into them in an ordered fashion, then you will encounter the worst case, and it will be no better than a fancy linked list.

A binary search tree is O(log(n)) assuming it's balanced. An O(n) case is possible if it is just reduced to a linear list, but a properly-constructed data type is able to rebalance the tree when you add elements to it, maintaining logarithmic search time. It would be like claiming a hash table has O(n) search time because you could have a table where there's a collision for every element.

Anyway, nobody claimed lists were binary search trees, simply that a binary search is likely the best-case for searching them. Practically, that's not likely, as it would require it to be ordered, and so a search algorithm would necessarily be O(n).
In response to Garthor (#14)
Garthor wrote:
A binary search tree is O(log(n)) assuming it's balanced. An O(n) case is possible if it is just reduced to a linear list, but a properly-constructed data type is able to rebalance the tree when you add elements to it, maintaining logarithmic search time.

That was my point, and plain old "normal" binary trees become unbalanced easily. That is why I was suggesting looking into a data structure that guarentees log(n).

It would be like claiming a hash table has O(n) search time because you could have a table where there's a collision for every element.

Not really. It's easy for a tree to become unbalanced if you don't implement some way to balance it, and I have seen tutorials online that cover trees leave out the balancing part since binary trees are extremely fast and simple to construct if you leave out the rebalancing act.

Hash tables, on the other hand, are unlikely to suffer the number of collisions you mention unless you have a crappy hash function. To do a hash table properly, you have to have a hash function; to do a tree properly, you don't have to make it rebalance, but you should, and that's what my point was supposed to be.

As a side note, the above assumes the hash table is properly sized for the need, of course, otherwise it can end up being another fancy list, except this time an unsorted one.

Anyway, nobody claimed lists were binary search trees,

My bad, I misread the following quote by Jp. "I'm not sure what the backing data structure for a raw /list in DM is - associative lists are done with a binary tree of some sort, so the find(), add(), and remove() operations are probably all O(log n)"

simply that a binary search is likely the best-case for searching them. Practically, that's not likely, as it would require it to be ordered, and so a search algorithm would necessarily be O(n).

Binary searches, whether they be in trees or lists/arrays/whatever requires that the data be ordered to begin with, and doesn't require anything to be O(n). I think I'm misunderstanding your last statement there.
In response to Loduwijk (#15)
Loduwijk wrote:
Anyway, nobody claimed lists were binary search trees,

My bad, I misread the following quote by Jp. "I'm not sure what the backing data structure for a raw /list in DM is - associative lists are done with a binary tree of some sort, so the find(), add(), and remove() operations are probably all O(log n)"

Just to clarify, internally lists are simple arrays, but lists with associated values use a binary tree for lookups, which I believe is a self-balancing red-black tree.

The Find() and "in" tests all use the array, not the binary tree, so the only difference between them would have to do with the overhead from the function calls involved. Add() and Remove() are about the same, though they do perform some checks on the associative list.

Associative assignment is somewhat of a mix. If the "key" is found in the binary tree, the association is simply a straightforward matter of searching the binary tree and making a replacement. If it is not found in the binary tree, the array has to be searched for a match. If it's found in the array, the association is added to the binary tree; if not, the key element is added to the end of the array and the association is added to the tree.

You can get some of the speed benefits of advanced data structures by using lists judiciously, like for instance if you know how to keep a binary tree in balance you can simply have a packed list where index i*2 is the left branch and i*2+1 is the right branch. In SotS II, my pathfinding algorithm uses this concept to implement a simple heap.

Lummox JR
In response to Lummox JR (#16)
Lummox JR wrote:
Associative assignment is somewhat of a mix. If the "key" is found in the binary tree, the association is simply a straightforward matter of searching the binary tree and making a replacement. If it is not found in the binary tree, the array has to be searched for a match. If it's found in the array, the association is added to the binary tree; if not, the key element is added to the end of the array and the association is added to the tree.

Let me see if I understand you correctly. A list object is backed by both a straightforward array and an r/b tree (possibly implemented in a second array?). Any time you add anything to a /list instance, it is added to the end of the array, and, if it has an associated value, it is also added to the tree that maps associations. Either way, all keys and non-associated elements are in the array, thereby duplicating all map keys.

Is this correct? If so, is the array part kept around just to preserve insertion order? Also, do you know how array size is handled? Does it make a moderately sized array and create new arrays when old arrays are filled?

You can get some of the speed benefits of advanced data structures by using lists judiciously, like for instance if you know how to keep a binary tree in balance you can simply have a packed list where index i*2 is the left branch and i*2+1 is the right branch.

Do you mean us, on the Byond-user side, sizing the list appropriately and manually inserting elements at specific indices, using it as a tree? If so, are you implying that Byond will use the internal tree for all of this, or are you suggesting also doing search/deletion/traversal manually on our side? If the latter, I think that is what Garthor and I were getting at; if the former, I must be misunderstanding your explanation of how the list is working behind the scenes.

Either way, it is good to know that associations utilize a red-black tree. Why don't you have the tree keep track of non-associated data as well to increase speed for all searches, regardless of whether the data is mapped or not? I have more that I would say on the topic, but the more I type the more my statements rest upon my understanding of your post here and on my assumptions based on it. If I am understanding you correctly, though, it sounds like it could still use some improvement. I'm not complaining though, as I now know that /list at least works better than I had previously though.
I already posted my optimized version of your algorithm, but I wanted to suggest that you could likely modify a flood-fill algorithm to accomplish exactly what you're doing and achieve better results. The modifications would likely specifically be:
  • Instead of checking for differing color values, you would be checking if the tile is dense or not.
  • You would be storing non-dense tiles into a list instead of coloring them.
  • You would store the dense tiles (the borders) into a list instead of ignoring them.

I would think a modified non-recursive, stack-based flood fill algorithm should provide a decent speed boost. The scanline-based flood fills seem particulary effective.
In response to Lummox JR (#16)
Lummox JR wrote:
Just to clarify, internally lists are simple arrays, but lists with associated values use a binary tree for lookups, which I believe is a self-balancing red-black tree.

Does this mean that this sort of code:
var/list/somelist = list()
for(var/turf/t) somelist["\ref[t]"] = t
var/turf/someturf = get_a_turf_ref_somehow()
if(somelist["\ref[t]"])
//Do something


will be of lower algorithmic complexity than this sort of code?

var/list/somelist = list()
for(var/turf/t) somelist["\ref[t]"] = t
var/turf/someturf = get_a_turf_ref_somehow()
if(someturf in somelist)
//Do something


If so, that's an obvious speedup for anything that's checking to see if an object is in a list - like pathfinding, or flood fill algorithms, or what-have-you.
Page: 1 2