ID:121905
 
I'm sure I've had a previous post about DM's /savefile's before, but I figured I'd go ahead and make another one for this feature of DM that so many people over-look when using savefiles.

Let's look at this snippet of code:

var/savefile/F = new
F.cd = ckey
F["Hi"] << "There"


What is this doing? The first line creates a new savefile (with no explicit name, so a temporary file). The second line navigates within the savefile to the `ckey` directory. The third line writes the string "There" to the "Hi" directory in the `ckey` directory of the savefile. Now let's say "Hi" was our variable, and now we want to store an "MP" variable.

F["MP"] << 6


This is how one generally goes about storing separate variables- one directory is delegated for each variable. This is, by default, how Read()/Write() do it. Why? I can only imagine because it's incredibly flexible. If you add or delete a var from the object, none of your other vars are affected.

That's good, from a developer's standpoint. They don't want to have to do that much work to ensure that stuff doesn't break, but what about from a server's standpoint? What are the implications of having a one-to-one directory-to-var relationship in your saving?

Bloat. Without going into too much detail, every directory you create adds on another 14 + length(directory name) bytes to your savefile. This may not seem like a lot when you only have a few vars, but what if you have, say, 10 vars? That's an additional 140 bytes + the lengths of all of the names. 140 bytes? That's not too bad. What if you want to allow the character to have up to 3 characters (seems reasonable) per key? That's a good 140 * 3 = 420 bytes. That's not too bad, right?

Well, let's switch back to the developer's standpoint for a second. Upon processing one header, DS is given a parent ID for the node that it has to look-up before it can even read the data. This is organized in a tree structure, so generally this won't take that long and shouldn't ever be considered a bottleneck. If it is, you might revamp your design.

So what's the point of all of this? Well, there are ways to squeeze every last bit of performance and size out of savefiles that you can. This is done by using them in one of the ways that they were actually intended to be used. Let's look at the following snippet:

var/savefile/F = new
F << "Hi"
F << "There"


What would this savefile look like if we used ExportText() on it?

. = "Hi","There"


Notice that we have stored two strings in one directory ("."). This is because when you open the savefile or change the directory using 'cd', a buffer is created to write the current directory's data, starting at the beginning of the directory. This means that if we were to change directories and then come back to this one and start writing again, the new data will over-write what was already there. Let's look at the entirety of the snippet from above:

var/savefile/F = new
F.cd = ckey
F["Hi"] << "There"
F["MP"] << 6


The second line creates the new buffer for the `ckey` directory. From there, the third line essentially creates a new buffer for the "Hi" directory within, writes "There" to it, then stores it in "Hi", overwriting any data that was there. The same goes for the "MP" line with the number '6'.

But how can we use this knowledge of DS/DD creating buffers for the current directory and appending data to our advantage? Well, we can avoid creating new directories for every single variable and instead store them sequentially in the savefile. The pitfall of this is that if you make any changes to your saveable vars, you need to add that to your code manually and allow for backwards compatibility, usually by having the first entry be a version number for the format, so you can do your reads based on the version number.

I wrote up a small datum to assist in easing into the buffered IO mentality:

BufferedIO
var
filename = ""
savefile/save = null
New(fname)
filename = fname
if(filename) Open()
Read()
if(!save || save.eof) return null
var/ret = null
save >> ret
return ret
Write(data)
if(!save || istype(data, /savefile)) return 0
save << data
return 1
proc
Clear()
save.eof = -1
EOF()
return (save && save.eof)
Open(fname = "")
if(fname) filename = fname
if(!filename) return 0
if(save) Close()
save = new /savefile(filename)
Close()
if(!save) return 0
del save
return 1
cd(dir)
if(!save) return 0
save.cd = dir
return 1
File()
return save


Let's say you want to store HP, MP, and ATK vars:

mob/var
HP = 40
MP = 30
ATK = 20

mob/proc/Save()
var/BufferedIO/IO = new("player.sav")
IO.cd("playername")
IO.Write(0) // Version number
IO.Write(HP)
IO.Write(MP)
IO.Write(ATK)


What does this look like in the savefile?

playername = 0,40,30,20


Now, how would you read this?

mob/proc/Load()
var/BufferedIO/IO = new("player.sav")
IO.cd("playername")
var/version = IO.Read()
HP = IO.Read()
MP = IO.Read()
ATK = IO.Read()


Say you delete the MP var. Remove IO.Write(MP), increment the version number, then if(version < 1) in Load(), you can simply call IO.Read(), like so:

mob/var
HP = 40
// MP = 30
ATK = 20

mob/proc/Save()
var/BufferedIO/IO = new("player.sav")
IO.cd("playername")
IO.Write(1) // Version number
IO.Write(HP)
// IO.Write(MP)
IO.Write(ATK)

mob/proc/Load()
var/BufferedIO/IO = new("player.sav")
IO.cd("playername")
var/version = IO.Read()
HP = IO.Read()
if(version < 1) IO.Read()
// MP = IO.Read()
ATK = IO.Read()


Most people won't find this a very useful feature, and more won't ever use it, but if you're really interested in squeezing every last bit out of the /savefile format, I'd give it a shot.</1>
I wouldn't want to save things in a way that I had to remember the exact order. I'd gladly stick another 400 bytes onto every savefile than have to track down bugs due to the order of loading not matching saving.

It seems like there should be a way to conserve space but still keep a way to access things as key-value pairs. Every time you write a value you also specify a name. The datum keeps a list of the names and also saves that. When you read variables back in, it uses the list of names to reconstruct the key-value pairs. It'd obviously use more space than what you're doing here but it would still avoid much of that 14 byte overhead for each directory.
Most people wouldn't, and that's alright. It's only a solution for those that are comfortable doing so and want to squeeze every last bit of performance out of savefiles. Personally, I'm comfortable doing it because that's how I've always had to do it outside of BYOND.

That would be reasonable and it would avoid the problems mentioned. Plus, you could also set it up to read in a loop (using EOF()) and not have to worry about whether you actually got all of the data or not.
Do you have some actual data about the read/write performance?
Amsel wrote:
Do you have some actual data about the read/write performance?

Negative, I imagine it's mostly negligible, the main part you'll see an affect is in file size when you have a LOT of data that you're storing. The internal buffering and file I/O itself is going to be fast enough that you won't notice any sort of difference there.