BYOND Forums - Off Topic - DM is somewhat unique - lexing/parsing issues with block structure

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

DM is somewhat unique - lexing/parsing issues with block structure

ID:54102

Feb 12 2009, 11:33 pm (Edited on Feb 13 2009, 12:43 am)

Keywords: bad, dm, flex, lexing, parsing

I've been continuing on with my attempts to write a parser for DM code, and there's this one block I keep hitting again and again - determining block structure.

I can handle this case:

That is, if it's entirely indented with tabs/spaces/whatever.

I can handle this case:

a
    b
    c/d
    e/f/g
    h

That is, indentation with mixed tabs and slashes, with nothing under the slash'd lines. But this?

Can't figure out how to do it. Everything I've tried so far can't parse 'g' correctly (Parses it as under 'e', rather than under 'f') or can't parse 'h' properly (Parses it as under 'e', not under 'a').

That construct is legal in DM, I'm certain, but I can't for the life of me figure it out. And because DM is the only language I know of (And I've done some searches about it) that uses indentation to indicate block structure but allows that use of slashes, I don't have any examples to crib off of.

I'm using flex and bison - I'm getting the lexer to figure out the block structure (So technically it's not a parsing issue, but whatever) and then passing definitions up to bison with the block level set in a global variable. Any ideas on how to handle it? This is the most correct I've got at the moment:

%{
        int depth = 0;
%}

%%

[\t ]                           depth++;
\n                              depth = 0;
\/                              depth++;

EDIT: Wait, no, I came up with a solution:

%{
        #include

struct depth_t {
        int depth;
        int tabdepth;
} typedef depth_t;

        int depth = 0;
        int tabdepth = 0;
        depth_t tempdepth;
        std::vector<depth_t> stack;
%}

%%

[\t ]*                          {
                                        depth+=yyleng; tabdepth+=yyleng;
                                        tempdepth = stack.back();
                                        if(tempdepth.tabdepth < tabdepth) {
                                                depth = tempdepth.depth + tabdepth - tempdepth.tabdepth;
                                        }
                                        else {
                                                stack.pop_back();
                                        }
                                }

\n                              {
                                        tempdepth.tabdepth = tabdepth;
                                        tempdepth.depth = depth;
                                        stack.push_back(tempdepth);
                                        tabdepth = depth = 0;
                                }

\/                              depth++;

That appears to work, although I can't help but feel that I'm overcomplicating issues... And of course, it won't work with braces. That's easy, though.

EDITEDIT: That's a vector of depth_ts, of course. For some reason the <depth_t> got cut out, even though I've got <pre> tags around it. Is that a bug?

EDITEDITEDIT: More issues! You wouldn't believe it, but it took me this long to realise that / is also used for the division operator. That means block depth is going to have to be determined at the parser level. Crap. I think it'll mostly convert, but I'm not sure how I'll have the grammar distinguish between, for example:

a/b //Defines /a and /a/b

world/New()
    var
        a
        b
    a/b //Does nothing

without making the parsing much more complicated than I had originally invisioned. Ah well, such is life. I suppose I can just ignore everything with a tab-depth greater than a function... and I can tell its a function because it has () at the end of it.

Feb 13 2009, 3:05 am
Metamorphman	I believe in your case it actually IS meant to be parsed under f >.>

Feb 13 2009, 1:37 pm
Jp	Which one? I'm pretty sure all of those I've got the correct idea of where they should be placed. Anyway, I've got a functioning lexer-parser combo that can do depth properly, next is to have that actually do something. Oh, and lex quotes and comments properly.