Test

by Deadron
A test framework that makes it very easy to test your game, using Extreme Programming principles.
ID:35998
 
This BYONDscape Classic is a real mind-expander. A failing test is progress?! Read on and find out how. -- Gughunter


There are few things in life that come along and make a big difference to how you think and work every day. In my twenty years of programming, I've encountered exactly two:

  • Object-Oriented Programming
  • The Extreme Programming approach to testing

These are the only items that have turned my programming world upside down. Since anyone who has done any BYOND programming has already been exposed to the first, I will concentrate on the second, which is something I encountered in just the last year. I'll also introduce my new <A href="http://www.byond.com/hub/Deadron/Test">Deadron.Test library, the result of several attempts to get it right. I believe this library will make a big difference to anyone who tries it out while following the practices discussed in this article.

Everything discussed here is inspired by Extreme Programming. You can read an overview of the methodology at <A href="http://www.extremeprogramming.org/">ExtremeProgramming.org and a detailed discussion of this approach to testing at <A href="http://c2.com/cgi/wiki?TestDrivenDevelopment">this Wiki site. The methodology covers many important aspects of development, but for the moment I'll focus on testing.

The rules of testing

Okay, I won't drag things out. Here are the testing rules I've found most important, and the rest of the article will explain them in detail.
  1. Test first, code second.
  2. Change your code to make it testable.
  3. Make the tests short, fast, and automated.
  4. Run the tests constantly.

Test first, code second

I realize this makes no sense at all. How can you possibly test something you haven't created yet? The question comes up because of the bad testing habits most of us have accumulated, which can be summarized as:

Bad testing cycle

  1. Write some code.
  2. Run the game and try out some stuff.
  3. Write some more code.

That's testing, right?

Wrong. That's just a quick way to try and make sure you got the code basically right, before you jump to the next feature. If you create a teleport verb, try it out a couple of times, then move on to implementing phasers, what happens? Well, that day, or the next, or the next month or year, your teleporting code stops working. Maybe you notice, maybe you don't. Eventually a players discovers the problem, and if it helps them, they use it as an exploit; if it's a useless bug, they scream at you to fix it. And now, days, weeks, or months after you broke it somehow, you have to go figure out what went wrong. So what do you do? That's right, you write some code, run the game and try some stuff, and when it looks right you release a new version.

You've now created the feature, tested it, released it, broken it, fixed it, tested it again, and released it again. But guess what, you haven't made yourself one whit less likely to break it in the future. You won't even know if you have broken it again unless you or a player happens to try it out. Your only hope in this scenario is that before each release you run the game and try out so many things that you find any problems. Of course, there are hundreds of things to try, and you'll probably forget some, and even if you don't it might take a long time to try everything.

If you don't have automated tests, just how long can a major bug slip by? It can go for years, while having a massive impact on your game. Look at EverQuest. For a couple of years their pet dual-wielding code was broken. They didn't write tests, so they didn't know it. Yet it impacted the entire game dramatically. All the mob encounters were manually tuned based on how easy it was for players to beat the monsters. Since the player's dual-wielding pets were putting out the wrong amount of damage, the encounters were all tuned based on the broken code. When the designers finally realized the bug (after the players insisted it must exist), it was too late to fix. Yes, too late to fix. Hundreds of encounters were already hand-tuned based on the existing code, and since they were manually tweaked, they couldn't be redone without months of effort. They had to live with a broken game.

Most of us live our programming lives just like the EverQuest guys, but it doesn't have to be that way. Their problems would have been avoided if they'd simply spent an extra half hour up front to create a few combat damage tests, then written the combat code.

Test first, code second is the heart of the Extreme Programming methodology. It is critical because if you write the test first, then you are guaranteed to have a test. If you write the test first, then you will make sure to structure your code so it is testable (more on that in the next section). But, like a procedural programmer who is faced with an object-oriented language for the first time, it just seems impossible. How can you actually do it?

Here's how. First, understand that the first few times this is going to be like pulling teeth while simultaneously experiencing malaria: that is, it's not going to be fun. Make yourself go through this. Once you get into it, not only will it become easy and natural, you won't believe you ever tried coding without it. The first thing you need is a way to make your tests run, and to know if they've failed. This step is easier than it seems, yet it trips up many people. The key is you don't need a fancy testing framework, you just need a way to run your test functions. It only takes a few lines of code to set up. I've created the Deadron.Test library (discussed below) after much trial-and-error looking for the simplest approach I could find; you can use that directly or as inspiration for how to set up your own system.

Once you have a way to call your test functions, think about the next thing you need to code. Let's say it's that teleport verb. The first thing to do, before you even think about anything else, is create your teleport_test() function, set it to fail, and run the tests. That's right, you run the test before you even have a test. The first step to success is failure. What could possibly be the purpose of this? Simple: it's to make sure your test is actually being called. It's extremely easy to write a test and forget to call it, lulling yourself into thinking everything is hunky dory. If you run your tests at a point when the new one must fail, you'll know you've set things up correctly. I've gotten to the point where if I run a test and it succeeds the first time, I get very nervous.

Now you have a failing test. That's progress! Finally it's time to think about the functionality you are testing. That's where the next couple of sections come in.

Change your code to make it testable

It's easy if you just need to test whether a function is returning the correct output given a certain input, but how do you test something that involves moving a mob around the map? Or how do you test something that involves reading in files that the player specifies and checking their contents?

This is why you are creating your test first. Because creating tests first changes how you structure your code. It's a Heisenberg principle of testing: You can't test without changing what you test. The kinds of tests we're putting together here are traditionally called unit tests, but this is the big difference between the Extreme Programming approach and the past: In the old days, unit tests were not only written after the code, but they were written by someone else. This severely limited their usefulness, because they didn't help change the code to make it more testable. Getting the point here? In fact, you'll find if you go back to some of your pre-existing code that it's almost guaranteed to be untestable. If you didn't write it to conform to tests, then it's not likely to be testable. A simple principle, but it took decades for anyone to figure out. Along these lines, I recommend you not bother trying to write tests for your old code. Next time you need to touch something in there, rewrite it from scratch (after writing your tests, of course); you'll save a lot of time and hassle.

Back to that function that lets the player specify a file that you read in and parse. Difficult to test, right? How are you going to automatically test something that involves a dialogue window as part of it? And are you going to have to have a test data file sitting around just for this test? The answer is that you skip that stuff. You just test the guts. That typically means that you change your code so that your text parsing is separated out from specifying the file and reading in the file. That means you aren't testing absolutely everything...that other stuff will have to be caught will the old fashioned "run the program and try it" approach. How I typically handle this situation is like so for a verb that lets you read in a file of commands that the game parses and executes:

mob/verb/choose_commands_file(inputfile as file)
var/list/command_list = parse_file(inputfile)

// Add code to do stuff with commands...
return

proc/parse_file(inputfile)
var/text = file2text(inputfile)
var/list/command_list = parse_text(text)
return command_list

proc/parse_text(text)
// Text parsing code here...
return command_list

proc/parse_text_test()
// It's left to you to make sure this test is getting called.
var/text = "jump; shoot; move left"
var/list/command_list = parse_text(text)

// ASSERT is a DM macro that will crash the proc if true.
ASSERT(command_list[1] != "jump")

// Or you can use the CRASH macro to print your own message:
if (command_list.len != 3)
CRASH("parse_text() returned wrong number of commands.")

Notice how the test manages to skip the UI and file aspects of the problem. It does this because I wrote the test first, so I was able to think "how can I structure this so I can test the guts and not worry about the rest?" The answer was to put the guts in a parse_text() function. The player verb calls parse_file(), the parse_file() proc gets the text from the file, then calls parse_text(). Since parse_text() receives text and returns a list, it's easy to create an automated test for without needing to use an external file. If I hadn't written the test first, I might have put all the code in one or two functions, making it impossible to test in an automated fashion, or without the extra overhead of external files.

This sample skips some of the challenges of setting up a test framework. That's what Deadron.Test does for you. Here is what that same test looks like using the library:

obj/test/verb/parse_text_test()
// When using Deadron.Test, the tests must be verbs in the /obj/test class.
var/text = "jump; shoot; move left"
var/list/command_list = parse_text(text)

// The die() proc is provided by the library.
var/first_command = command_list[1]
if (first_command != "jump")
die("parse_text() returned wrong first command: [first_command]")

if (command_list.len != 3)
die("parse_text() returned wrong number of commands.")

If you are using the library, it takes care of running all the tests for you (once you call dd_run_tests()), and it handles responding to a test failure. All you have to do is write tests like that one.

Testing a teleport verb is a bit more complicated. Most likely, the simplest way to handle it is to create a test mob and put it on the map. You may need to make sure you have a map level with no dense items to mess you up...if so, putting that in place is an example of changing your code to make things testable! Or, even better you could simply add the map level dynamically as part of the test function. Once you've placed the mob, you call the teleport verb, then check where the mob landed to make sure it worked. When done, you delete the test mob. The demo code for the Deadron.Test library contains a full example of testing a teleport verb. It's one of the more complicated tests I've ever created, and it's all of twenty lines of code. Which leads to our next section...

Make the tests short, fast, and automated

Your tests must run quickly, otherwise you won't run them. They must be short, otherwise you'll get tired of writing them. They must be automated, otherwise they'll be too much hassle to run.

To facilitate all this, only test what needs to be tested. What needs to be tested is anything that is likely to break in some way. If there is a way it's likely to break, you should test for that; if it's not likely to break in a certain way, don't bother testing for that. You aren't testing for the heck of it, you are testing to make sure it works, and so you know if it breaks in the future. This means you make judgment calls, and sometimes you'll be wrong. That's okay, each time you are wrong, it will give you more sense of what to do in the future.

The average test proc is from five to ten lines long. That's all. You usually don't need anything more. Some of them will only be three lines long.

Your test system should be set up so that if a test fails, everything stops. It must not keep going after a failure, because once something fails you should immediately fix it. You don't want to get overwhelmed by 20 broken things, or lulled into ignoring a bunch of warning messages you see all the time. If a test fails, you must fix it immediately, so just focus on one thing at a time.

Once you kick off the tests, all the tests should be run, and they should run without human intervention.

Run the tests constantly

In most companies, tests are run once or twice for a release cycle, or monthly, or maybe weekly. But they should be run multiple times a day. Every few minutes, even.

Yes, every test in your game should be run everytime you recompile. This way you will know instantly if something breaks, and you can fix it right away. If you have enough test coverage, you'll quickly discover if you accidentally broke something in related code.

My own habit these days is to set the tests to run like so:

world
New()
spawn()
dd_run_tests()
return ..()

This way, each time I start up the game, all my tests run. Before you release the game, you can just comment those lines, (or deselect your test file so it's not compiled in) and you are ready to go. Or you can just ship it with the tests enabled. They should be fast enough that it won't matter, and things should be working so players shouldn't see an error...right?

Think of it like this: Your job as a programmer is to make the tests pass. Sure there are other parts, but that's the key. If the tests pass, your game is probably working. Much more probably than if you had no tests, to be sure.

Deadron.Test: A testing library for BYOND

As simple as the testing process is, it took me a few tries to get the infrastructure right. My first try at this testing approach was for Bwicki, and it required creating subclasses and remembering to add your function to the list to be called, and checking the return results of each function...way too much overhead.

With the Deadron.Test library, I think I've finally gotten it to be as simple as it can be. Here is all you need to know to use it:

  1. Run the tests by calling dd_run_tests().
  2. Make each test a verb in the /obj/test class.
  3. When a test fails, call the die() proc with a message.

That's it. Try it out, and let me know how it goes for you. It contains complete sample code showing a simple test and a complex test.

If you take on the approach suggested in this article and stick with it, I guarantee it will turn your programming world upside down.

Why does that Test library use goto?
" // Using a label here so we only have to delete the mob and call die() in one place.", presumably. =P

Er, nice article though!
For the record, this is an example of valid goto use. His other alternative was to do all of the clean-up and the die() over and over again, which is very messy. The goto way, in this case, doesn't kill off the flow, and the code is still readable.
Well I must say that was an interesting article.

I often end up coding in testing "output" myself in the code I'm directly working on.
Checking what values are returned and if everything is working correctly.
Though as soon as the code it working I take out these test lines.
I now realize that by writing testing procs and keeping them there to be enabled or disabled whenever we want is a very good idea.

One thing I'm curious about is if i should split up my code by quite a bit, using more procs to do the same thing?
If I don't do this then it's not really possible to test if the procs are returning the correct value.

And next to that, the basic idea is to test at runtime without any players right?
I meen the tests are supposed to see if everything works before the game is even played right?

Well I'm going to see if I can get the grasp of this, I love the idea and defenatly think it's a good thing for me to add to my programming skills.
Audeuro wrote:
For the record, this is an example of valid goto use. His other alternative was to do all of the clean-up and the die() over and over again, which is very messy. The goto way, in this case, doesn't kill off the flow, and the code is still readable.

Your mom!
Wow, I've always thought of myself being a weirdo, since I compile the damn code after I add/fix a few things to just test it out.
One thing I'm curious about is if i should split up my code by quite a bit, using more procs to do the same thing?

As a rule, you should try and do something in only one place.
As a rule, you should try and do something in only one place.

But if by "more procs to do the same thing" he means taking a single two-page-long proc and breaking it down into calls to constituent procs, that's probably the right approach to take.

E.g., if a proc has a big section where it's calculating a shell's trajectory, and there's a bunch of unrelated stuff before and after it, it makes sense to break out the calculation into its own CalculateShellTrajectory() proc. That makes the parent proc a lot more readable.

Very true!
Oh, and this approach really works well. I took it with a card game I was writing, and it helped me catch a bug myself, as opposed to letting my player-base catch the bug.
Last Robot Standing couldn't exist without having taken this approach...I documented some of that on my blog, but the interactions between the powers in that game are such that there is simply no way to manually test every permutation every time we change or add something, and the tests have kept the entire game from getting ruined.

Even so, plenty of bugs came up that weren't caught by tests, so this approach doesn't insulate you from bugs...it just makes them less likely, and usually easier to find when they do come up. Plus it provides a good model for hunting down bugs (that is, write a test that displays the bug, then debug until the test works...)
Sorry, didn't understand this artical at all.[im too unskilled >.<] Can you explain the testing a little simpler?
Can you explain the testing a little simpler?

Unfortunately this article is the best I can do on the subject...