How It Works

[NOTE: This is slightly outdated -- the demo uses a scale of 1 to 3 instead of 1 to 11 now -- but the interesting part, the method of recognizing gestures, is still essentially the same.]

All gestures -- both the gesture templates the programmer defines, and the gestures the user enters -- are stored as a list of line segments. When you define a gesture template, it's made up of points in an 11x11 grid, much like the default BYOND map view. The lower-left corner is (1, 1). So you could define the letter V as a segment from (1, 11) to (6, 1), and a second segment from (6, 1) to (11, 11).

Note that this example defines a left-to-right V; the gesture analyzer is sensitive to the direction of movement, so you could define a second gesture as the series of points (11, 11) to (6, 1) to (1, 11) and you'd have a right-to-left V. In fact, if you want the gesture analyzer to recognize V regardless of whether it's drawn left-to-right or right-to-left, you have to define it both ways! (Fortunately, this library provides a function for automatically creating a reversed copy of a gesture template, so you won't have to do double the work.)

If the gesture has been entered by a user, it is re-scaled to fit in an 11x11 grid, to match the size of the gesture templates. If the user enters a gesture that's taller than it is wide, the rescaled gesture will have a height of 11, and it will be moved over as far to the left on the grid as it can be; if the gesture is wider than it is tall, it'll be 11 units wide, and will rest as far to the bottom of the grid as possible. Thus, when you define tall or fat gesture templates, the points should have a range of 1 to 11 in one dimension, and should range from 1 to some-number-less-than-11 in the other dimension.

Once the line segments are defined (and properly scaled if they're user input), it's necessary to calculate the length of each segment, as well as the total of all segment lengths. (In other words: if the gesture were a piece of yarn and you stretched it out straight, how long would it be?) This allows us to interpolate the coordinates of a point anywhere along the whole sequence, even if it isn't an endpoint of a segment. The points at 0% and 100% are easy, of course -- they're the first point of the first segment, and the last point of the last segment. But figuring the length of the whole series lets us pick any percentage of the total length and calculate the point that corresponds to it.

Once we can do that, finding the best match is pretty easy! When a user enters a gesture, we compare it to every stored gesture template. We calculate the coordinates of the point at 0% of the user's gesture, then 5%, then 10%, and so on up to 100%. We also calculate the corresponding points in the gesture template. For each pair of points, we calculate the distance between the user's point and the template's point -- in other words, between the "real" and the "ideal" point. We total this distance up for all 21 pairs of points in our sample... and we repeat this for every stored gesture template. The template that scores the lowest (i.e., has the lowest total distance from the user's input) is the winner!