Hey there. I posted a few months ago about this but figured it'd be best to make a new thread.
I'm trying to write a script that'll look through an HTML log and find tags that aren't closed, then close then. I've never really used BYOND's text manipulation procs so I honestly don't even know what I'm doing.
I'm not concerned with actually putting the tags in order, I just want to make sure for every one b tag per line, there's a closing b tag as well etc.

-Create a list to use as a "stack", which allows us to track which tags are active, and in which order they were opened.
-Create several variables to track positions and boundaries within the line you're working on.
-Get the text and grab your first line. If you're storing all of the lines as one string, you'll have to do findtext("\n") to identify the boundaries, and store the end of the current line.
-Use findtext() to find the first "<" within the line, and then check if the next character is a "/" or not (I like to use text2ascii() for this kind of thing, to avoid a string copy). If there is no slash, then it's an opening tag. Otherwise it's a closing tag. Store which it is.
-Either way, you need to get the tag name. Write a process (something like parseGetTag(string,start,end)) that returns the tag name. Basically, it should search from start (which should be the opening bracket "<") to end (end of line limit) to find the next ">". Search between the found "<" and ">" for any white-space characters (space, tab). The tag name ("b", "font", "strong") will run from the character after the opening bracket ("<") to the first white-space, or to the closing bracket (">") if no white-space was found.
-For an opening tag, add it to the end of the list we're using as a "stack"
-For a closing tag, check if the tag name is in the list. If it is, loop from the END of the list towards the front until you find it. As you do so, add closing tags for all of those tags that you hit BEFORE it, and insert them before the existing closing tag you found (removing the tags from the list as you add them to the line). If the existing tag isn't in the list, you can ignore it, encode it, or cut it out.
-When you reach the end of the line, add closing tags for any tags still in the list (from the END to the front, removing them as you go). Then move on to the next line.
That's the gist of it. Once you get the algorithm down it's not too bad, but the real trick is how you handle poor formatting and special characters. I'm not sure if you can count on non-html to be encoded (such as the emoticon >_>)