ID:151655   Oct 11 2009, 10:43 pm I'm having some trouble optimizing a bit of code. I'm attempting to loop through a set of tiles, and figure out which tiles are blocked and which aren't. Tiles that don't block the movement of the loop are swapped into their own area with all those tiles that can be reached from one another trough pathing. I'm trying to do this over a significantly large area, and as such, need some advice on how to do this effectively. The method I'm using is pretty brute-force. Can anyone point me to a better method, or point me to a paper or something on the subject? This is what I've got so far: proc divide(var/turf/start) var/list/edges = list(start) var/list/contents = list() var/list/walls = list() while(edges.len>0) var/turf/t = edges[1] if(!t.density) var/d = NORTH for(var/count=0;count<=3;count++) var/turf/a = get_step(t,d) if(!edges.Find(a)&&!contents.Find(a)) if(a.density) walls += a else edges += a d = turn(d,90) contents += t else walls += t edges -= t var/area/a = new/area() a.contents.Add(contents) a.walls = walls  This method freezes pretty badly when diving big areas, and isn't really meant to profile an entire map. on a 1000x1000 tile scene, it takes over 15 minutes to manage all of this. However, on a smaller area (50x50), it performs miraculously well, dividing an area in somewhere under a tenth of a second. Anybody know of a better way to do this? I've got tons of ideas, but I'm pretty well stumped when it comes to optimizing them.
 <-> Oct 12 2009, 12:52 am The method is O(n), where n is the number of tiles you're looping through. For an x by y map, it's O(x*y). That's pretty good, so I'm not sure if there's any major algorithmic improvements that can be made. I'm not sure what the backing data structure for a raw /list in DM is - associative lists are done with a binary tree of some sort, so the find(), add(), and remove() operations are probably all O(log n). What you're looking for here is some sort of set data structure - a hashset, for example. find(), add(), remove() in O(1) if it's done properly. Not sure if there are any DM implementations. Any particular reason you're using list.Find() instead of the in operator? Not sure what the relative speeds are, just asking.
 <-> Oct 12 2009, 2:06 am In response to Jp Find has been shown to be anywhere between twice and four times as fast as the in operator. Relevant links ahead: [link]
 <-> Oct 12 2009, 5:24 am In response to Ter13 Ter13 wrote: Find has been shown to be anywhere between twice and four times as fast as the in operator. Relevant links ahead: [link] AJX never published the methodology on which he based his tests, and while Foomer anecdotally claimed to see a 4x speed difference, I can find no good evidence to back that up. Actually the code for both operations is very similar, so I think anything with a 2x speedup or more is highly improbable. Under average circumstances I expect the difference would actually be negligible. Anyway I think the way that "in" is handled is outdated and could be changed at some point, which would render the difference between routines completely moot. My information suggests that the 2-4x figure is wildly inaccurate, but we would need reliable data to determine the actual speed difference between routines under different circumstances. Lummox JR
 <-> Oct 12 2009, 7:38 am In response to Lummox JR Lummox JR wrote: AJX never published the methodology on which he based his tests, and while Foomer anecdotally claimed to see a 4x speed difference, I can find no good evidence to back that up. I decided to form a small test of my own. Hypothesis: The 'in' operator is up to 4x slower than Find(). Method: The code may not be the optimal performance comparison, but it seems to me to be controlled enough to isolate the single variable in question here. It's intended to be run a single Test_ verb per instance (so between executions, the differences in small-to-large lists could become apparent). #define DEBUGvar/list elem_10 elem_100 elem_1000 elem_10000 proc/* List generation:*/ rand_string() // 6-12 letter string . = "" for(var/i in 1 to rand(6, 12)) . += ascii2text(rand(65, 122)) generate_lists() if(!elem_10) world << "Generating lists..." sleep(0) elem_10 = new for(var/i in 1 to 10) elem_10 += rand_string() elem_100 = elem_10.Copy() for(var/i in 1 to 90) elem_100 += rand_string() elem_1000 = elem_100.Copy() for(var/i in 1 to 900) elem_1000 += rand_string() elem_10000 = elem_1000.Copy() for(var/i in 1 to 9000) elem_10000 += rand_string() world << "Generation complete" sleep(0) /* Execute Find() or 'in' on all list elements*/ find_all(list/L) for(var/elem in L) . = L.Find(elem) in_all(list/L) for(var/elem in L) . = (elem in L) /* Testing verbs:*/mob/verb/Test_10() generate_lists() for(var/i in 1 to 1000) // otherwise both profile at 0.000 src << "Testing 'in'" sleep(0) in_all(elem_10) src << "Finished" src << "Testing 'Find()'" sleep(0) find_all(elem_10) src << "Finished" mob/verb/Test_100() generate_lists() for(var/i in 1 to 1000) src << "Testing 'in'" sleep(0) in_all(elem_100) src << "Finished" src << "Testing 'Find()'" sleep(0) find_all(elem_100) src << "Finished"mob/verb/Test_1000() generate_lists() src << "Testing 'in'" sleep(1) in_all(elem_1000) src << "Finished" src << "Testing 'Find()'" sleep(1) find_all(elem_1000) src << "Finished"mob/verb/Test_10000() generate_lists() src << "Testing 'in'" sleep(1) in_all(elem_10000) src << "Finished" src << "Testing 'Find()'" sleep(1) find_all(elem_10000) src << "Finished" mob/verb/list_dump() // to see our "random" strings and prove a fair amount of uniqueness . = "
"    for(var/elem in elem_10000)        . += elem + "
" usr << browse(. + "
")  Notably, the Test_10() and Test_100() verbs will take a little longer to execute because they perform the tests a thousand times (and inherently perform many outputs). Otherwise, the results just came up at 0.000 due to the small data set. Data: Test_10():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_10 0.072 0.476 7.217 1 /proc/generate_lists 0.028 0.381 0.401 1 /proc/rand_string 0.353 0.353 0.358 10000 /proc/in_all 0.011 0.012 0.012 1000 /proc/find_all 0.012 0.012 0.012 1000  Test_100():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_100 0.083 0.843 2.898 1 /proc/generate_lists 0.026 0.385 0.405 1 /proc/rand_string 0.359 0.359 0.365 10000 /proc/in_all 0.244 0.244 0.244 1000 /proc/find_all 0.131 0.131 0.131 1000  Test_1000():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_1000 0.000 0.404 0.436 1 /proc/generate_lists 0.031 0.378 0.388 1 /proc/rand_string 0.347 0.348 0.354 10000 /proc/in_all 0.021 0.021 0.021 1 /proc/find_all 0.005 0.005 0.005 1  Test_10000():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_10000 0.001 2.866 2.941 1 /proc/in_all 2.030 2.030 2.030 1 /proc/find_all 0.455 0.455 0.455 1 /proc/generate_lists 0.028 0.380 0.406 1 /proc/rand_string 0.352 0.354 0.357 10000  Analysis: At the small list size of 10 elements, in_all() and find_all() appear to perform equally well with matching CPU scores. With 100 elements, in_all() appears to be roughly twice as slow as find_all() on average. Using a list of 1000 elements, in_all() appears to perform 4x slower than find_all(). Finally, at 10,000 elements, in_all() rings in at ~4.5x slower than find_all(). Conclusion: The 'in' operator seems to be slower than Find(), especially in large data sets. This test does not, however, demonstrate the specific performance advantages of individual locations in the list; that is, which performs better for elements near the beginning or near the end of the list---or directly in the middle---cannot be concluded from this.
 <-> Oct 12 2009, 7:48 am In response to Kuraudo While this is more methodology than AJX showed, I'm not sure it shows a proper test of garden-variety usage. The way "in" and Find() are being tested, it's testing exhaustively rather than doing some of the more common operations done within a real game. And it's worth pointing out that for many cases most lists will be shorter rather than longer, so this test probably only shows the extreme behavior. But it does bear Foomer out for those cases. If it's feasible to drop the method of loop indexing that the "in" test uses and use the same method as in Find(), it seems that a moderate speed improvement can be eked out here--perhaps a big speed improvement for complex games. Lummox JR
 <-> Oct 12 2009, 8:10 am In response to Lummox JR I agree that the test is not representative of the most common use; an exhaustive approach seemed the most viable in comparing the average search speed of each method. I wanted to chime in with regard to [link] and [link]---specifically: The need to check for type safety before using Find() though would likely completely negate any potential savings from it. And: To compare it equally with Find(), you'd have to include a type check in the process, which is going to be inevitably slower since with 'in' it's automatically done internally. So if in a given scenario you'd want to include a list type-check, 'in' would actually be the faster method to use. I edited my snippet such that find_all() was written as such:  find_all(list/L) for(var/elem in L) if(istype(L)) // To accomodate this need . = L.Find(elem)  It only had a negligible impact on the results: Test_10():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_10 0.121 0.533 0.557 1 /proc/generate_lists 0.033 0.381 0.401 1 /proc/rand_string 0.348 0.349 0.354 10000 /proc/in_all 0.016 0.016 0.017 1000 /proc/find_all 0.015 0.015 0.015 1000  Test_100():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_100 0.044 0.837 8.159 1 /proc/generate_lists 0.026 0.380 0.400 1 /proc/rand_string 0.354 0.356 0.360 10000 /proc/in_all 0.261 0.261 0.261 1000 /proc/find_all 0.152 0.152 0.153 1000  Test_1000():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_1000 0.000 0.403 0.502 1 /proc/generate_lists 0.026 0.378 0.397 1 /proc/rand_string 0.352 0.354 0.355 10000 /proc/in_all 0.020 0.020 0.020 1 /proc/find_all 0.005 0.005 0.005 1  Test_10000():  Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls -------------------- --------- --------- --------- --------- /mob/verb/Test_10000 0.000 2.873 2.908 1 /proc/in_all 2.034 2.034 2.034 1 /proc/find_all 0.462 0.462 0.462 1 /proc/generate_lists 0.032 0.377 0.389 1 /proc/rand_string 0.345 0.345 0.351 10000  Even with the istype() check for type-safety, the results remain nearly unchanged from the previous tests.
 <-> Oct 12 2009, 2:30 pm proc divide2(var/turf/start) var/list/status[world.maxx][world.maxy] var/list/edges = list(start) status[start.x][start.y] = 1 var/list/contents = list() var/list/walls = list() while(edges.len>0) var/turf/t = edges[1] var/d = NORTH for(var/count=0;count<=3;count++) var/turf/a = get_step(t,d) if(a) if(!status[a.x][a.y]) if(a.density) walls += a else edges += a status[a.x][a.y] = 1 d <<= 1 contents += t edges.Cut(1,2) var/area/a = new/area() a.contents.Add(contents) a.walls = walls  Does 1000x1000 in a bit over ten seconds. Note that it does result in a relatively hefty memory usage of about 25MB at that size.
 <-> Oct 12 2009, 2:45 pm In response to Jp Jp wrote: The method is O(n), where n is the number of tiles you're looping through. Wrong. The inner-most loop is that of the Find() proc, which is going to be executing 8n times. The search function itself is likely log(n), so the function is O(nlog(n)). That is, of course, an overly simplistic analysis, as the search time increases per iteration (rather, the size of the data set that must be searched starts at 1 and increases towards n), but right now I'm already doing homework on this crap and I don't feel like doing more. Regardless: it ain't linear.
 <-> Oct 12 2009, 2:55 pm In response to Kuraudo And that does it for me. I'm still sticking with Find except in very particular circumstances.
 <-> Oct 12 2009, 2:56 pm In response to Garthor So it's the find that's the bottleneck?
 <-> Oct 12 2009, 4:27 pm In response to Ter13 Ter13 wrote: So it's the find that's the bottleneck? That's a silly way of saying things. Find() is taking time, as is looping through everything, as is everything else. By substituting Find() with a more efficient method of searching - namely a table lookup - the runtime is reduced. Find() itself, however, was no more or less a bottleneck than looping through every turf. It was simply the easiest to improve upon.
 <-> Oct 12 2009, 5:26 pm (Edited on Oct 12 2009, 5:37 pm) In response to Garthor Is it too late to submit my entry to the mix? proc divide3(var/turf/start) set background = 1 var/list edges contents = new walls if(start.density) walls = list(start) else edges = list(start) walls = new var/status[world.maxx][world.maxy] status[start.x][start.y] = 1 do var/turf t = edges[1] n = get_step(t, NORTH) s = get_step(t, SOUTH) e = get_step(t, EAST) w = get_step(t, WEST) if(n && !status[n.x][n.y]) if(n.density) walls += n else edges += n status[n.x][n.y] = 1 if(w && !status[w.x][w.y]) if(w.density) walls += w else edges += w status[w.x][w.y] = 1 if(s && !status[s.x][s.y]) if(s.density) walls += s else edges += s status[s.x][s.y] = 1 if(e && !status[e.x][e.y]) if(e.density) walls += e else edges += e status[e.x][e.y] = 1 contents += t edges.Cut(1, 2) while(edges.len) var/area/a = new a.contents.Add(contents) a.walls = walls  It is a little bit longer; it's based on the original snippet by Ter13 with the same table lookup Garthor used, as well some other niceties, such as an unrolled loop and a bit of refactoring. For example, the original snippet checks if t (the first element in edges) is dense. However, there is only one instance where this is ever dense, and that is if start is dense! So by moving this test out of the loop and to the beginning of the proc, the test can be done once instead of, say, 1000x1000 times. That said, I wanted to point out that Garthor's doesn't quite do the same thing as the original (I tried to make mine perform the exact same function; if it doesn't, I made an error). Garthor's differs greatly if the start turf is dense because it removes this test entirely, whereas Ter13 has it in the while() loop and I have it at the beginning of the proc. That said, if you never call divide2() on a dense turf I don't believe you'll notice a difference, and if you're following that logic then you can also remove the test from my snippet. I'm actually jealous of Garthor, who said his snippet performs in about 10 seconds. With a blank map (achieved by setting world/maxx and world/maxy directly to 1000), and calling via divide2(locate(world.maxx/2, world.maxy/2, 1)), his comes in at around 32 seconds for me! Mine comes in about 2.5 seconds faster than his. I imagine that could translate into something decent on your evil processors! #define DEBUGworld maxx = 1000 maxy = 1000area/var/list/wallsmob/verb/Test_2() usr << "divide2()...\..." divide2(locate(world.maxx/2, world.maxy/2, 1)) usr << "Complete" mob/verb/Test_3() usr << "divide3()...\..." divide3(locate(world.maxx/2, world.maxy/2, 1)) usr << "Complete"   Profile results (total time) Proc Name Self CPU Total CPU Real Time Calls ---------------- --------- --------- --------- --------- /mob/verb/Test_2 0.028 30.437 32.151 1 /proc/divide2 30.409 30.409 32.126 1 /mob/verb/Test_3 0.025 28.105 29.711 1 /proc/divide3 28.080 28.080 29.686 1 
 <-> Oct 13 2009, 8:10 am Like Garthor, I would also start with the data structure you use for the list of dense/nondense objects. No matter how you do it, you always have to loop at least through either a) all turf objects on the map or b) all nondense turf objects connected (either directly or via other nondense turf objects) to the initial one and all dense turf objects adjacent to them. The only way you might be able to improve upon looping through everything is to design your own data structure that keeps track of the density information of all turf objects. A bitmap would be sufficient; a maxx/8 by maxy bunch of bytes, use 1 bit per object to represent whether or not it's dense. Then, every time you want to loop through all the object densities, this might be faster than using get_step(). Though, if you only "divide the map," as you're doing, once over the course of an instance of the world, setting this up might not be a benefit at all. Even if it were, though, it would still likely be a small benefit compared to refactoring the data structure that is actually a list of what's dense/nondense. So, the big question is "What does Byond use behind the scenes to represent a list?" Someone else in this thread mentioned a binary tree. If it is just a straight binary tree with no optimizations, then you could make a much better data structure yourself, as binary trees are very ineficient. The "fact" that they are log(n) speed for insert and search is not completely accurate. They are log(n) best-case, but they are n worst case, and if you insert into them in an ordered fashion, then you will encounter the worst case, and it will be no better than a fancy linked list. Look up some data structures that guarentee O(log(n)) worst case; there are several of them that aren't too difficult to implement. That is what you should do in situations like this, where you just need to optimize something that throws tons of data around.
 <-> Oct 13 2009, 9:34 am In response to Loduwijk Loduwijk wrote: So, the big question is "What does Byond use behind the scenes to represent a list?" Someone else in this thread mentioned a binary tree. If it is just a straight binary tree with no optimizations, then you could make a much better data structure yourself, as binary trees are very ineficient. The "fact" that they are log(n) speed for insert and search is not completely accurate. They are log(n) best-case, but they are n worst case, and if you insert into them in an ordered fashion, then you will encounter the worst case, and it will be no better than a fancy linked list. A binary search tree is O(log(n)) assuming it's balanced. An O(n) case is possible if it is just reduced to a linear list, but a properly-constructed data type is able to rebalance the tree when you add elements to it, maintaining logarithmic search time. It would be like claiming a hash table has O(n) search time because you could have a table where there's a collision for every element. Anyway, nobody claimed lists were binary search trees, simply that a binary search is likely the best-case for searching them. Practically, that's not likely, as it would require it to be ordered, and so a search algorithm would necessarily be O(n).
 <-> Oct 13 2009, 10:04 am In response to Garthor Garthor wrote: A binary search tree is O(log(n)) assuming it's balanced. An O(n) case is possible if it is just reduced to a linear list, but a properly-constructed data type is able to rebalance the tree when you add elements to it, maintaining logarithmic search time. That was my point, and plain old "normal" binary trees become unbalanced easily. That is why I was suggesting looking into a data structure that guarentees log(n). It would be like claiming a hash table has O(n) search time because you could have a table where there's a collision for every element. Not really. It's easy for a tree to become unbalanced if you don't implement some way to balance it, and I have seen tutorials online that cover trees leave out the balancing part since binary trees are extremely fast and simple to construct if you leave out the rebalancing act. Hash tables, on the other hand, are unlikely to suffer the number of collisions you mention unless you have a crappy hash function. To do a hash table properly, you have to have a hash function; to do a tree properly, you don't have to make it rebalance, but you should, and that's what my point was supposed to be. As a side note, the above assumes the hash table is properly sized for the need, of course, otherwise it can end up being another fancy list, except this time an unsorted one. Anyway, nobody claimed lists were binary search trees, My bad, I misread the following quote by Jp. "I'm not sure what the backing data structure for a raw /list in DM is - associative lists are done with a binary tree of some sort, so the find(), add(), and remove() operations are probably all O(log n)" simply that a binary search is likely the best-case for searching them. Practically, that's not likely, as it would require it to be ordered, and so a search algorithm would necessarily be O(n). Binary searches, whether they be in trees or lists/arrays/whatever requires that the data be ordered to begin with, and doesn't require anything to be O(n). I think I'm misunderstanding your last statement there.
 <-> Oct 13 2009, 8:19 pm In response to Lummox JR Lummox JR wrote: Just to clarify, internally lists are simple arrays, but lists with associated values use a binary tree for lookups, which I believe is a self-balancing red-black tree. Does this mean that this sort of code: var/list/somelist = list()for(var/turf/t) somelist["\ref[t]"] = tvar/turf/someturf = get_a_turf_ref_somehow()if(somelist["\ref[t]"]) //Do something  will be of lower algorithmic complexity than this sort of code? var/list/somelist = list()for(var/turf/t) somelist["\ref[t]"] = tvar/turf/someturf = get_a_turf_ref_somehow()if(someturf in somelist) //Do something  If so, that's an obvious speedup for anything that's checking to see if an object is in a list - like pathfinding, or flood fill algorithms, or what-have-you.