ID:46984
 

Dream Tutor: Do You Hear What I Hear?

by Lummox JR

BYOND's sound features advanced quite a lot with the upgrade to FModEx, but what does that mean to you? It means better quality sound, more channels to mix, but more importantly it means there are techniques you can use to add realism that were never available before. The greatest of these are new Mod music formats, 3D sound, and reverberation.

Location, Location, Location

3D sounds are a powerful tool for making games immersive, if used well. The first thing you need to figure out when using them is position. The sound datum has vars named x, y, and z. They determine the location of the sound relative to the listener, when the sound is played. These don't necessarily have any relationship to the same vars that belong to each atom in the world; it's your job to give them meaning. A value of 0 in each var (the default) means the sound is centered. Positive values of x put the sound further to the right, negative to the left. Positive y sounds as if it comes from above the listener. Positive z is straight ahead.

How you set those vars depends largely on the type of game you make. If the game is side-view, then just using relative x and y based on the position of the atoms involved probably makes sense. Anything above you on the screen should sound as if it's actually above you.

mysound.x = source.x - listener.x
mysound.y = source.y - listener.y

These values are assumed to be in meters, but you can play with that a bit. It doesn't matter much until we get to echoes. Until then you can change the scale you're working with by changing the falloff var. A higher value than 1 means you're working with a smaller scale. If falloff=100, then x=1 is effectively 1 cm to the right.

Figure 1a: A simple overhead map. Like many overhead maps, it mixes in some side-view icons, or icons seen from an angle, to feel more immersive. The result is an impression that the player is closer to ground level, but still looking down on it.
Figure 1b: Side view of an overhead map soundspace. In the mind's eye, the player is looking down at an angle onto the map. A sound directly to the north will have both y (upward) and z (forward) components.

Most BYOND games are based on an overhead map view. For a sound north of the listener, however, positioning the sound above doesn't make much sense. When you use 3D sound you have to think about not just how the player perceives the world, but how they see their place in it. North may be up on the screen, and they may even think of it as up, but true up would be a sound directly overhead. That doesn't say "Look north" to our brains; it says "Look out for birds." If faced with choosing a more 3D perspective than merely what they see on the monitor, a player is more inclined to interpret north as forward.

Yet straight forward isn't the right choice either. Using the z axis is a step in the right direction, but it doesn't fully give us a sense of direction corresponding with an overhead map view. Think about it: If you hear a sound behind you, will you automatically think it's to the south? Probably not. While forward/back is more intuitive than above/below, we're missing something. What we need to do is combine them.

When a player looks at an overhead map, they're picturing a familiar floor or ground surface which implies north/south = forward/back, but they're also seeing that surface on a screen oriented up and down. In our imagination it's horizontal; actual perception is vertical. What our mind's eye sees, and what our mind's ear must hear, is in the middle. BYOND games are basically all 3rd person perspectives, so in the mind it's as if we're looking down and forward onto the map at the same time. It's time to break out the math.

mysound.x = source.x - listener.x
mysound.y = (source.y - listener.y) / sqrt(2)
mysound.z = mysound.y

We've just played a trick on human perception, and by doing so we bind the player further into the game. How? The eyes see vertical, the imagination sees horizontal, and with sound we can bridge that gap by providing a realistic perspective—something pretty close to what we actually picture. Sounds that are to the north will sound like they come from both above and ahead of us, while sounds to the south are both behind and below. I find a 45° angle works well for this sort of thing, in which case y and z are equal as you see. They're divided by the square root of 2 because of the triangle formed between the y axis, the z axis, and actual distance. If that sentence is gibberish to you, just smile and nod and copy and paste.

This takes care of everything but verticality. If some objects are higher than others, we need to figure out how to represent that in sound. Let's say every atom has been given a var gz ("game z") to represent its altitude. Should sounds directly overhead be heard directly overhead, or since we're more or less rotating the soundscape, should we split up/down into y and z parts like we did with north/south? If we try it the first way, the math is simple enough:

mysound.x = source.x - listener.x
mysound.y = (source.y - listener.y) / sqrt(2) + source.gz - listener.gz
mysound.z = mysound.y

The down side of this skewed-axis approach is that it screws up the perceived distance of a sound. It will sound either closer or farther away than it really is in the game world. A potentially better method is to pretend we've rotated the whole world. Just as we've changed north to point in both the +y and +z directions, up would point +y and -z. That is, a sound straight overhead would sound as if it was not just above, but behind us. That's not too weird if you think about it, and just might provide the right realism.

mysound.x = source.x - listener.x
var/sy = source.y - listener.y
var/sz = source.gz - listener.gz
mysound.y = (sy + sz) * 0.707106781187
mysound.z = (sy - sz) * 0.707106781187

Now I've muddied the waters a bit, so it's time to explain that. Multiplying by 0.707106781187 is approximately the same as dividing by sqrt(2); it's just pre-calculated to save time since we'll use the number more than once here. The sy var is how far north the sound is from the listener, while sz is how far above. Both are calculated in advance, because they get reused. The upshot of all this is that we've basically rotated the world.

Not all perspectives are side view or overhead, though. One other common format is board games. Here, although it doesn't hurt to use the above formulas and play to a horizontal mindset, we can get away with just using above/below because our minds don't map chessboards onto physical directions. A board is abstract; it doesn't have to be thought of as being in front of you on a table. In a scheme like this, the listener's perspective is the center of the screen, and you might try adding a good amount of z axis to make it sound as if moves are happening on the screen itself. If we're working with a 12×12 map, the center of the screen is at (x,y) coordinates of (6.5,6.5). (To find that, add the lower left corner (1,1) to the upper right (12,12), then divide by 2 to find the center. If the map size is odd, like 13×13, its center is at (7,7).) Let's assume each icon is about half an inch across, and the monitor is maybe 2.5 feet (30 inches) ahead; enough of this metric tyranny.

Figure 2: Soundscape of a game board.
// assume a 12×12 map
falloff = 100/2.54  // (1 m)*(100 cm/1 m)*(1 in./2.54 cm)
mysound.x = (source.x - 6.5) * 0.5
mysound.y = (source.y - 6.5) * 0.5
mysound.z = 30

The effect of this will be quite subtle. Look how much bigger the z axis is than x or y. All the sounds will be perceived as coming from more or less ahead of you. On an 8×8 chessboard, the farthest out in any direction is 3.5 tiles, or 1.75 units on the x or y axis. That's about 6.68° from side to side across the board, 9.43° from corner to corner. Any way you slice it that's not a lot of arc. With luck, though, the pieces will seem to come from vaguely the position they appear on the monitor. (Incidentally, if you use client.dir to turn the board around for another player, you'll have to use the negative values of x and y for them, too.)

But enough of all this perspective. It's time to think about making that sound more real. Direction is one thing, but it still doesn't sound like you're in a real space. Let's change that.

On Further Reflection

To give your sounds any kind of body, you'll want to put them in an environment. Real spaces have obstacles like walls, hills, ceilings, even the ground. For this we can use the environment var. The environment you specify will be used for all sounds a user hears. This can change from room to room, or you may prefer to keep it the same. BYOND has preset environments you can use, and for fine control you can set your own values.

The /sound datum's environment var is set to -1 by default, which means no change. Let's say we're working with a game set in a castle. This would seem to call for preset 5, a stone room.

sound
  // default all sounds to a stone room
  environment = 5

Now one consideration here is that you might not want all sounds to be considered 3D. Music, for example, shouldn't sound as if it's in the room; it should be in the background. You might also want to use sounds for your game interface, like a little blip when you choose an item in your inventory. For those sounds, all you need to do is set environment back to -1, and don't give them any 3D coordinates. In that case BYOND will assume the sound is not meant to seem three-dimensional. You can take a shortcut by creating a special datum for these cases.

music
  parent_type = /sound
  environment = -1
  repeat = 1
  volume = 50   // 50% volume

  New(file)
    src.file = file

Well that was easy. Presets are pretty simple to use: Just set environment to a number and that's it. Suppose, though, you want an environment slightly different than any of the ones provided. Then it's time to use a list. The environment var can take a list which is 23 items long. Each item can be null if you want to stay with the default value, or a number that sets that parameter. (In versions up through 3.5 beta 5 however, null doesn't work, so use the actual default setting unless the server is definitely a later version.) These are all documented in the Dream Maker help file and the DM reference, so there's no need to go into them in detail. Instead, let's play with the values.

For a target environment, let's try to make a gorge. This suggests we should start with the mountain preset's values and adjust them. Instead of delving into FMOD's documentation to find those, I created a little utility to save time. That tells us that the mountain environment (preset 17) is equivalent to the following:

environment = list(100, 0.27, -1000, -2500, 0, 1.49, 0.21, 0, -2780, 0.3,
                   -1434, 0.1, 0.25, 1, 0.25, 0, 5, 5000, 250, 0, 27, 100, 31)

There's not a lot of echo going on there, probably because of the 9th and 11th parameters: -2780 for reflections and -1434 for late reverb. If we make both of those positive values, like 1000 and 500 respectively, echoes appear. But it doesn't sound very canyony yet. Perhaps we need to make the environment size smaller, dropping from 100 to 20. That too is a definite improvement. For the final touches, playing around with reflection and reverb delay (the 10th and 12th parameters), I found that 0.2 and 0.02 were a better fit than 0.3 and 0.1. Now this is sounding like a gorge.

sound/gorge
  environment = list(20, 0.27, -1000, -2500, 0, 1.49, 0.21, 0, 1000, 0.2,
                     500, 0.02, 0.25, 1, 0.25, 0, 5, 5000, 250, 0, 27, 100, 31)

To get a feel for how that sounds, try playing a sound a little off center. It will echo in lower frequencies on the other side, simulating the walls of the gorge. The reason you hear mostly low frequencies in the echo is because of the 4th parameter, RoomHF, which is the room's echo effect at high frequencies. In other words, we're dropping the volume of high frequencies considerably with every echo, so mostly the lower ones get through.

Interior Design

Just by using environments, you can improve realism considerably, but an environment treats everything as if it's in one big room. What if it's a little more complicted than that? Real sounds aren't just in different positions in a room; they're behind obstacles, coming in from other rooms, or carrying from upstairs. Since the same environment covers all 3D sounds heard by the player, how would we handle those differences?

The echo var is our solution. It can be set to a list, 18 items long, with parameters that define how individual sounds differ from the regular environment. Any sound with the echo var set will be treated as 3D, so if you have this available by default, you need to set it to null when playing music.

Figure 3: Different ways sound may be heard or blocked.

Going over the individual parameters here will take too long. Instead, let's discuss some of the concepts you can use. There are 5 major types of sound interactions you can change here that deal with the layout of the room and the sound's position:

  • Direct effect: The level of the sound in a direct path to the listener
  • Room effect: Sound level relative to room reverberations (environment)
  • Obstruction: Obstacles partially block direct path but allow room reflections
  • Occlusion: Walls partially block both direct path and reflections
  • Exclusion: Walls with openings block room reflections

These effects are not mutually exclusive, or on/off. You can use a little of each. The amount you use depends on the layout of the rooms. Consider a haunted house. If a sound comes from the next room, an occlusion level of -2000 might be appropriate. If one of the rooms has an open door leading into a hallway, you might consider altering the occlusion a bit. One of the settings for occlusion is the room effect ratio, normally 1.5, which says how much of the main control affects room reflections. With one door open this might be 0.75, and with two doors open you might try 0.375 so some room reflections get picked up.

// use default settings except:
// Occlusion: -2000
// OcclusionRoomRatio: 0.375 (25% normal)
mysound.echo = list(0, 0, 0, 0, 0, 0, -2000, 0.25, 0.375, 1,
                    0, 1, 0, 0, 0, 0, 1, 7)

Now the sound seems to come from the next room, with both hallway doors open. If the sound comes from further down the hallway, like 2 rooms down, try an occlusion level of -4000 instead. If the sound is in the hallway and you have an open door, try -2000 occlusion with a room ratio of 0.375 and a direct ratio (normally 1.0) of 0.75. The rules are simple:

  • Find the shortest path from sound to listener.
  • For each room separating the sound from the listener, add occlusion level -2000. Multiply room ratio by 0.5 if an open doorway links them.
  • If an open door directly links the two rooms, multiply the room ratio by 0.5 again, and direct level by 0.75.

The same logic can be used for sounds on different floors. For any change of floors, try an occlusion value of -4000. Make it -1500 if the sound happens on the floor of the level above (like a falling vase). So if you're on the ground floor and someone two stories up drops a brick, use occlusion level -5500. Again you would modify the room effect ratio if stairs intervened. If you're in the hallway and there's a stairwell, you'd want to cut the room ratio down to let more echoes through.

The most expedient way to use this concept is to simply find the shortest path from one room to another via doors or stairs. As you cross closed doors, stairs, and floors, keep a running tally and then use that to determine how to modify the echo settings.

Now Playing

Music deserves a mention. BYOND's music capabilities used to be limited to MIDI, but that has changed. You can now use most of the Mod formats like MOD, XM, S3M, IT, etc. FModEx even supports a format called OXM, which is XM with its musical samples compressed with Ogg-Vorbis. (A converter from XM to OXM is available at fmod.org.)

Because of this plethora of file formats, you can now safely ditch MIDI and come up with something that doesn't sound like crap. If I never hear a town song in an RPG that uses the MIDI "voice" instruments, it'll be too soon. One good resource for Mod files is The Mod Archive. Explore there long enough and you should be able to find good ambient pieces for RPGs, techno/industrial/metal for action games, and more. Depending on the game, other genres like blues or medieval/Celtic may commend themselves. Drum'n'base songs also make good background pieces.

When playing background music, I have two recommendations. First, specify a channel for music, like channel 1000. Second, don't play it at full volume. If you go with volume=50 (50%), the music will play quietly in the background instead of taking over the game. With some music or game styles, even lower or perhaps slightly higher volume settings may be better.

Some Mod music files have instructions that can loop the song not from the beginning, but from another point where the loop sounds smoother. If you set repeat to 1, BYOND will use those instructions if they're present, so a song can play a nice intro but still loop for the rest of the time.

When playing music, we need to be sure environment and echo are not used. If you've given either of those a default value for all sounds, then for music you'll need to change that.

music
  parent_type = /sound
  environment = -1
  echo = -1
  repeat = 1
  volume = 50
  channel = 1000

Now playing music is as simple as:

mob << new/music('darkforest.oxm')

Once More From the Top

Now you should have all the tools you need to handle 3D sound and music well in your game. Don't forget that with the new sound features, you can reduce your sound effects from big .wav files to smaller .ogg files compressed with Ogg-Vorbis. Freeware converters are out there to make .ogg files for you.

The big thing to remember is that you want the result to be immersive. Good sound and good music should draw a player into the game, not take them out of it. In this respect, sound is every bit as important as icons, map layout, or even intuitive controls. BYOND now has the potential to give you sound quality as good as any icons or interface you can make; it's up to you to use it most effectively.


Appendix A: A Little More Perspective

One type of game which is possible to produce in BYOND is isometric. Computer games actually use a skewed isometric projection instead of true isometric. We can do the same, or we can rotate the whole thing kind of like we did earlier with overhead maps. The forward axis is tilted 30° upward. If we used a skewed perspective, the vertical axis is still straight up.

// gx,gy are position on the internal map
// gz is altitude on that map
mysound.x = (source.gx - listener.gx + source.gy - listener.gy) * 0.5
var/sy = (source.gy - listener.gy - source.gx + listener.gx) * 0.25
mysound.y = source.gz - listener.gz + sy
mysound.z = sy * sqrt(3)

Here, sound has been transformed so that anything "forward" on the map will have the same amount of "upness" and "aheadness" as it appears to the player. It will be slightly more ahead than above. Anything truly vertical on the map, however, is still heard as straight up and down. If you want to avoid the effects a skewed axis system will have on sound distances, try this:

// gx,gy are position on the internal map
// gz is altitude on that map
mysound.x = (source.gx - listener.gx + source.gy - listener.gy) * 0.5
var/sy = (source.gy - listener.gy - source.gx + listener.gx) * 0.25
var/sz = (source.gz - listener.gz) * 0.5
// pre-calculate sqrt(3)
mysound.y = sy + sz * 1.73205080757
mysound.z = sy * 1.73205080757 - sz

Another type of perspective available is 1st person. There's not much of that in BYOND for obvious reasons, but picture a dungeon crawl where you get a single screen of view at a time—nothing fancy. Here, we're being given a horizontal viewpoint, but forward/back and left/right don't correspond to BYOND's directions. For this we have to use vector math.

I'll calculate two vectors: N, which is represented by nx and ny, and D, for which we have dx and dy. N will be the direction the listener faces, while D will be the direction of the sound. We also need a way to turn a vector 90° to the right, for which I'll use this formula:

N' = <ny,-nx>

To find how much of D is in front of us (for mysound.z), we need to find the dot product. The dot product of two vectors D and N is equal to length(D) × length(N) × cos(θ), where θ is the angle between them. Since N is of length 1, the dot product D.N tells us just how far "forward" D is. To find mysound.x, we'll repeat the process with D.N', where N' is just N turned 90° to the right. That will say how much "rightness" D has.

// find the normal vector: the direction we're facing
var/nx = (listener.dir & 12) ? ((listner.dir & EAST) ? 1 : -1) : 0
var/ny = (listener.dir & 3) ? ((listner.dir & NORTH) ? 1 : -1) : 0
// find vector from listener to sound
var/dx = source.x - listener.x
var/dy = source.y - listener.y
// now translate
mysound.x = ny * dx - nx * dy        // D.N'
mysound.y = source.gz - listener.gz
mysound.z = nx * dx + ny * dy        // D.N

Appendix B: Sound Playground

To help find the right environment or echo settings, I developed a demo called Sound Playground. With it you can experiment with how your 3D sounds will come out, and so find just the right settings for your game.

To use Sound Playground, just download it and compile. Then run the demo. All settings are controlled through the browser.

For each environment setting, you can either pull a value from the currently selected preset, or you can fill in a value of your own. Press any one of the Play buttons to hear your changes.

I wasn't aware of any of this before! Too bad my sound is wacked.
Are there any system requirements for environments? I can't hear any differences between them. I may not have the best ears, but I should be able to sense a difference between Generic, Underwater, and Psychotic :p

Excellent article, by the way. Now I understand why you divide the y by sqrt(2).
Hm, this would be great for games like Murder Mansion :O Then again, not a lot of people use the gun in there >_>

Edit: For those that still don't get the reason why square root of 2 is used, look back at Figure 1b. Assume 'y' and 'z' to have a value of 1 and use Pythagorean Theorem to find the value of the distance between the two.
You lost me by the end of the first section >_<. I'll come back and try reading it later.