(See the best response by Ter13.)

Problem description:

So, I have a massive game, with roughly 20 za at 500x500. Our save and load times are extremely inefficient and we are having item duplication issues (specifically original map objects, but sometimes player made objects if the server crashes). Currently our map has 19,724 turfs saved and 34745 objects saved, this is after running 30 days of a player wipe.

Any suggestions for large game maps and efficiency that has been discovered?
Don't save things in a giant pool, it requires you to do a largescale save operation anytime you have to save and if that operation fails you'll end up with corrupt data.

Narrow down the pool as much as possible, even as far as saving things tile-by-tile (which allows you to only save when a tile changes and only save that tile), you'll have 19,724 small files, but you'll have far less chance of largescale corruption and failure and you'll be able to save things right away so you don't have to try saving on a timer or world destruction.
Could you give me an example on how you would save by tile?
We are currently converting the need to save data to text and then saving it as a list
How are you saving the smaller blocks? As part of an area, as a unique text string or etc. What file type are you sending to the save file command ?

Sorry for the message blast, I keep thinking of questions
You could just keep a list of stuff that needs to be saved per-turf, then save that list using BYOND's generic savefile stuff using the coordinates of the turf as a file name. Then just save the turf when said list updates.

Keep in mind, you shouldn't keep the list initialized, or you'll run out of memory really quick, only have it initialized if it has data.

If the objects don't contain any unique data like their own contents list or dynamic icon data you could serialize the list into something that's not dumping an entire object into the file, but that's a more complicated question.

As for loading, you'd just loop over the files and load the lists and objects within them at either the game's start or as players need to see the turfs.
You probably should not be saving turfs. Objects, you might be able to get away with, but you should absolutely not be saving turfs. A format is really difficult to work out for you without knowing what kind of data your turfs actually carry.

What you should be saving instead at an absolute minimum are tile ids that correspond to each unique tile type. Numeric IDs are probably best.

I go over this approach in the following thread and offer comparisons with an alternative system that was... admittedly not good. 1450705?page=2#comment8726752

Now, as for saving players and objects in the world, players should not be saved with the world. This is part of what causes duplication. Give each player a unique id and save them separately from the world on login/logout.

The other risk of duplication comes from players doing things like picking something up and then the world crashing before that object's zone is saved. --This is much harder to avoid, but do not give in to the urge to attempt to completely prevent it by making your chunks save too often. Better to have a little duplicate data than have the world bogged down, or players lose data.

To optimize saving, divide your world into chunks. These chunks should track changes. Only trigger a save on chunks that have changes that are unsaved. Simply ++increment a change counter to the chunk's corresponding change tracker variable to mark it changed. You need to add this behavior for any objects being destroyed or spawned, any objects being moved, or any turfs being changed.

Containers can be saved separately from chunks to minimize the amount of data being saved. If you have on-map containers where players move items to/from, consider tying those to a separate saving system. Abstract the container's inventory from the container object itself, so that the container can physically be moved without needing to save or load all of its content. Load container contents only when necessary, such as when destroyed by an enemy, or when accessed the first time. Storing containers in a sort of faction/owner-specific file would be a good way to handle this in a sandbox game, that way you only need to store faction/unique id information to load the container contents on demand.

Only save chunks when they are no longer in use. This means that once a chunk has zero players in it (or a certain amount of time since the last save has occurred, like 2+ hours), force that chunk to save if the chunk has marked changes. Then set the change counter to 0 to unmark it.

In general, saving your world in smaller chunks will slow down your saving, but it will give you the ability to perform a rolling global save that can be used to distribute the CPU load over a longer period of time.

I noticed for my purposes, the sweet spot for raw speed was 256x256 tile chunks Bigger than that, and you run into big performance hits, smaller than that and you wind up losing out on the softcode side a bit.
Working on the same project. Would need to keep track of building's owners, HP and current level of upgrade.

Game is sandbox and would often have large player made structures that need to be saved and recalled accurately including passwords on doors, which we treat as objects.
Best response
That shouldn't be that big of a deal then.

Create a UniqueID of each faction. Make it numeric. Buildings assigned to a faction should treat all members of that faction as an owner. You can grab the faction info from the separate faction savefile using that unique id.

Don't store current upgrade level. Treat each upgrade level as a separate tile_id to begin with.

Don't store current HP. Store current damage.

Then drop all of these variables to the base /turf type:

tile_id = 0 //unique one-up serialization for each unique tile type.
faction = 0 //stores faction unique_id
damage = 0 //how much damage the item has taken

When serializing each tile in the chunk, you can simply iterate over an entire chunk, serialize the 4 values into a single buffer, and then call it good:

//F is the savefile.
//x,y,z is the lower-left corner of the chunk
//w,h is the width/height of the chunk
var/list/chunk = block(locate(x,y,z),locate(x+w,y+h,z))
var/len = chunk.len
var/turf/t = "tiledata"
for(var/count in 1 to len) //loop over the block
t = chunk[count]
F << t.tile_id //serialize the important properties only
F << t.faction
F << t.damage = ".."

When loading the chunk:

var/list/chunk = block(locate(x,y,z),locate(x+w,y+h,z))
var/len = chunk.len
var/id, fac, dmg, idtype = "tiledata"
for(var/count in 1 to len) //loop over the block
t = chunk[count]
F >> id //deserialize the important properties
F >> fac
F >> dmg
idtype = tile_ids["[id]"] //look up the new turf type.
t = new idtype(t)
t.faction = fac
t.damage = dmg = ".."

To create a global dictionary of typeids:

var/id, list/l = list()
for(var/v in typesof(/turf))
id = initial(v:id)
l["[id]"] = v
return l

var/list/tile_ids = __init_tileids()

Saving objects is a bit tougher. For the sake of speed, you are going to want to keep a list of all savable objects whose home location is in a particular chunk. When that item is removed from that chunk, you want to remove that object from the chunk's contents list. Note, you do not want to store each item in every chunk it may straddle if it's bigger than one tile. This WILL cause duplication. Only save it in the chunk the movable atom's loc variable would indicate. You will also want to save the LOCAL x,y values and any relevant step offsets of those objects. The local x,y coords are relative to the bottom-left corner of the chunk that's being saved.

I would not recommend looping over the contents of every chunk, or running a bounds() call over the entire chunk to grab all atoms quickly. I would strongly recommend maintaining the list in memory and tracking object relocation/movement/construction somehow consistently in your code so that the chunk savable object list is always up to date.