As some of you may know, I've been fiddling with the idea of writing an open-source lexer and parser for DM for a while - my current goal is to be able to extract all of the object tree information, without necessarily being able to get all the code as such (So object definitions, variable definitions, and procedure definitions, but not the body of procedures).

I recently decided to have another crack at it, armed with a new approach to handling block-structure-by-indentation granted to me by judicious reading of the Python grammar, as well as more formal education in grammar. I'm nowhere near finished yet - I'm not even sure if what I've got at the moment 'works' in any real sense, because I've not yet started dumping out the parse tree. But it does parse a test DM file without crashing out, so some sort of parse tree is being generated.

I've been using flex and bison to do the lexing and parsing - what I've got right now is available here. I'll probably create a Google Code page or something for it at some point so I can get some sort of repository going.

Once again: This is seriously early development stuff. At the moment, it doesn't actually do anything while parsing - it just tries to verify that the input passed matches the grammar it's got for DM. It doesn't actually work out the object tree. Identifiers with escapes in them (Like I was talking about in the last post) aren't handled, neither is ? as an identifier. The ternary operator (? as an operator) isn't handled, nor are the post- versions of the ++ and -- operators (That is, ++a works, but a++ doesn't). Finally, there's a shift/reduce conflict caused by an ambiguity between these two bits of code:



That's because the parser doesn't know about newlines, and both options look the same without newlines (They look like 'identifier slash identifier', specifically). Dream Maker handles this difference, so it is doable. Presumably I need to add newlines as a token.

I don't expect anyone to find the code in its current state useful - it's mostly out there in case someone else is fiddling with parsing or lexing of DM.

If you do somehow manage to find it useful, feel free to go ahead and use the code for whatever you want. I place it in the public domain.

EDIT: The a/b a\n/b ambiguity has been dealt with via some trickery with the lexer, and the ? ternary operator and ++/-- pre and post increment/decrement operators reinstated (and fixed). Only things missing I can think of right now are procedure calls in expressions, the call()() function, the . and : path search operators, and complicated procedure/verb parameters (i.e., the 'var as type in list' form).

EDIT: Proc calls in expressions, and complicated procedure/verb parameters added. Still missing path search operators and the call()() function.
Completely off-topic but I thought I'd bring your attention to

You will see why once you click the link.
Metamorphman wrote:
Completely off-topic but I thought I'd bring your attention to

You will see why once you click the link.

That was certainly odd.