Reuse previous type nodes when compiling.

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:2319365

Nov 28 2017, 9:01 pm

MrStonedOne

Applies to:

Dream Maker

Status:

Open

Issue hasn't been assigned a status value.

When compiling the compiler has to convert the text representation of a path into something it can work with.

/mob/living/carbon/human/proc/dostuff()

involves doing lookup(/mob) -> lookup(/living) -> lookup(/carbon) -> lookup(human) -> lookup(proc) -> lookup(dostuff)

(as a simplified version, the compiler likely has shortcuts for keywords like proc.)

The issue is it has to repeat these lookups for every proc in a file, something that is rather unoptimal and likely represents a significant part of compile time.

The idea I had was to have it store the last lookup operation and check it first before doing another.

As a even better version, you could store the entire resolved path as a list, and then check that, so you can reuse common nodes. (say if the next proc is a /mob/living/carbon/proc, it could reuse the /mob/living/carbon parts from the cache)

Nov 28 2017, 9:24 pm
Super Saiyan X	why not use shorter paths

Nov 28 2017, 9:40 pm

Ter13

Its a valid feature request to speed up node parsing, but there's some things that ought to be said about the impetus behind the request so that people following this request might better understand the nuance of the issue you guys are experiencing.

Just want to throw out that DMs recommended syntax is to avoid absolute pathing except were explicitly required.

You guys' code standards are and have been creating a scenario where code is not only harder to read, its much more costly to parse.

https://tgstation13.org/wiki/Understanding_SS13_code

mob/living/carbon // / separation is fine when you don't need to expand the node multiple times
    human
         var
             somevar
             someothervar
             list/somelist
         proc
             someproc()
             someotherproc()

I'm aware that your stated reason is for searchability, so you can quickly find overrides vice the declaration. This is what comments are for. Creating a standardized searchable comment format to mark declarations and overrides is a far better solution than slowing down the compiler, as comments are discarded early in the compilation process and generally speaking aren't going to affect compilation performance nearly as bad as hundreds of thousands of unnecessary path navigation tokens, even if optimized.

mob/living/carbon // / separation is fine when you don't need to expand the node multiple times
    human
         var
             //@/mob/living/carbon/human/var/somevar
             somevar

             //@/mob/living/carbon/human/var/someothervar
             someothervar

             //@/mob/living/carbon/human/var/somelist
             list/somelist
         proc
             //@/mob/living/carbon/human/proc/someproc()
             someproc()

             //@/mob/living/carbon/human/proc/someotherproc()
             someotherproc()

Unfortunately, a lot of history with this problem predates even your involvement with SS13. This structure originated because the initial release of SS13's codebase was machine decompiled into something very similar the format you guys use today. A lot of people not familiar with DMs language paradigms (and indeed hostile to them) jumped aboard the goon codebase and assumed that this was just how DM was written.

The legacy of this way of writing code is actually particularly damaging, because people assume that DM is an inflexible, shitty language with really bizarre, unreadable syntax. I've personally re-taught how DM works to a number of relatively inexperienced SS13 coders and they've later gone on to say that they were wrong about how shitty DMs syntax is. (Though they were no more impressed in general with other areas of the engine.)

In general, absolute pathing should ONLY be used for the purposes of grouping related code:

mob
    var
        corpse = /obj/corpse

    proc
        Die()
            new corpse(loc, src)
            Relocate(null)

    /obj/corpse
        New(atom/loc, mob/owner)
            appearance = owner.appearance
            icon_state = "corpse"
            ..()

    goblin
        icon = 'goblin.dmi'
        corpse = /obj/corpse/goblin

        /obj/corpse/goblin
            New(atom/loc, mob/owner)
                ..()
                new/obj/item/gold(src, rand(25, 50))
                if(prob(75))
                    new/obj/item/goblin_ear(src, rand(1, 2))
                if(prob(25))
                    new/obj/item/goblin_herb(src)
                if(prob(1))
                    new/obj/item/goblin_missive(src)

The above isn't a great example of why and when you should use absolute pathing, as a more abstract solution for handling parameterized loot would probably be preferable to this explicit one, but the example of the syntax is valid.

In DM, tabs aren't just for direct inheritance thanks to absolute pathing. They can be used to notify the reader that an object path is related to its parent path indirectly.

Overuse of absolute paths for the sake of searchability is indeed a situation your codebase is stuck with, but it was a bad standard to adopt and teach, and the more people are aware of the potential pitfalls of adopting that style, the better. The lack of discussion as to alternatives to the paradigm in SS13 community resources is even more hurtful to hundreds of individual contributors to boot, as the current paradigm you guys use is actively breeding incorrect assumptions and wasting their time with excessively longer compilations than is strictly necessary.

Nov 28 2017, 10:10 pm

MrStonedOne

The general ideology behind our standards is that code should be naturally self commenting, and that comments more often than not are a sign of an issue in the code, not a solution.

This is (part) of why we require absolute pathing.

And while you say it's harder to read, we would (and do) argue otherwise. I can be linked a file on github with a line number seek built into the url, and look at the code and know exactly what the context is.

And while most files will only have one main context, other files will have meny and it becomes hard when scrolling fast, skimming thru files to know what context you are in under relative pathing, having to scroll back up until you find the root node. Trying to strictly limit files to only contain one context is also a bad idea, (an idea so bad java actually enforces it, so that should tell you how bad it is =P).

So, the solutions are to either add a comment to state the context of proc, or let the proc's definition naturally define the context.

When given a choice between adding a comment, or writing the code in such a way that you can defer the same information, /tg/ will always choose the latter.

It's why we banned magic numbers requiring they be put behind defines (and why we have so many fucking defines). It makes the code self commenting by giving a meaning behind the number.

Nov 28 2017, 10:18 pm
Optimumtact	Throwing away the type information on every proc/var and putting it in comments because BYOND's parser is poorly put together is stupid.

Nov 28 2017, 10:24 pm

Ter13

IMO its not a bad ideology overall. I just think its misguided given that it is forcing you to adopt a programming style that the machine itself is screaming is wrong.

The reason that spacially organized contexts exist is not for the machine's benefit, though. It is for yours. High level programming languages are explicitly designed to be read and understood by the programmer. Comments are simply one tool for communicating meaning and a useful (and common) way of making code more searchable.

Your code should ideally be grouped in such a way that the context is inferrred by its placement.

Java went too far with forced limited context. Sometimes polymorphism isn't the best indicator of relatedness of code.

However, claiming that code that doesn't need documentation is better than code that does is spurious at best. Documentation exists to make finding your way through a massive project easier. Sometimes you don't have the brainpower to work out how, you just need to know why/what something does at a glance.

And honestly, absolute pathing every single line for readability's sake is an indefensibly ugly practice that excessively bloats compilation times.

Defending it in the face of a better solution is just absurd when those comments I mentioned in the above post directly inform you of the scope without bloating compilation times, and can be set to a different color in the editor to allow developers at a glance to discard them instinctively if they don't need to look at them.

Its just a better solution hands down in every single meaningful way for both the machine and the human.

And it isn't the parser being poorly put together that's the root of the problem. Its that you are jamming hundreds of thousands of unneeded tokens into the machine and then asking why your compilation is slow. Even if it is optimized, it is going to be slower than the recommended syntax in the whitepapers.

Nov 28 2017, 10:26 pm

Optimumtact

Well maybe I'm just spoiled by languages who's parsers don't suddenly choke to death because I put the full type path in the proc definition

MSO's argument is not a position I hold, comments are absolutely useful with care and often with abandon as well.

I just find it annoying that BYOND's parser would even care at all how I define procs

side note, we dont' absolutely path every single thing, just the procs and class definitions

Nov 28 2017, 10:28 pm

MrStonedOne

I think one of the big disconnects between /tg/ and ter is /tg/ has had to prioritize the ability to understand a 500 line file without having to know or read thru the other 400,000 lines (not even exaggerating that number).

With 424 developers over the years, and new ones coming in every week, and a requirement that code actually be reviewed by maintainers since we get a lot of first time devs at /tg/, it changes how we view things like this. in the last 30 days we've had ~500 pull requests touching ~85 thousand lines of code by ~75 people.

Keeping shit standardized has to break at compile time for typos, not runtime.

Adding comments for every proc wouldn't be feasible, and keeping them to a standard format wouldn't be feasible either. Not to mention what happens when we can't find something because of a typo in the comment.

Nov 28 2017, 10:34 pm
Ter13	No, I get where you guys are coming from. It's just that you guys are stuck in a position of post hoc. You guys all walked into a project that was this way, so this way it must be, and this way justifies itself. Its just that it doesn't, and the reasoning being done to support it is fallacious. That said, if the compiler can be sped up, I'm all for it.

Nov 28 2017, 10:40 pm

Optimumtact

It justifies itself for us for the reasons we have already laid out about searching and that's all we care about.

Our reasoning is not fallacious just because you disagree with it, especially when your way involves coming up with a custom comment format and then enforcing that across our contributors.

It's reasonable to request lummox to improve the parser to work better in a case widely supported by other languages

Nov 29 2017, 12:18 pm

Lummox JR

TBH I still don't understand at all why you don't just switch over to the non-absolute commented form, even if just a file at a time. There's no reason the change has to be made all at once.

Ultimately, more tokens equals longer compilation and there's no way around that, the parsing portion, at all. The current format you're using is not parser-friendly. Skipping over comments on the other hand is quick.

As for the optimization you described, I put a question mark on its feasibility. This would presumably happen during the "generation" step of the compiler, post-parsing but before code output, where nodes are looked up and translated into an object form. Some research is required to examine the issue. I may have an inkling as to how it might be done, but right now that's all it is. It's an intriguing enough problem that I can definitely look into it.

But again, Ter is 100% correct and the ultimate solution to your problem is to start refactoring the code, gradually, towards a new format. As refactors go it's a very easy one to take on. That will not only save you the current generation time that the proposed optimization would take care of, but also the additional parsing time. The more code gets converted into the friendlier format, the better off compilation times will be regardless of whether this proposal sees the light of day.

Nov 29 2017, 12:33 pm

Ter13

the ultimate solution to your problem is to start refactoring the code, gradually, towards a new format.

Can I offer an alternative solution?

A handful of guys writing a parser to auto-convert large chunks of your code and patch issues caused by the conversion process, a few handles of whiskey, some pizzas, and this feature request:

http://www.byond.com/forum/?post=2319377

The above would solve all problems. Searchability would be achieved by the object tree actually generating all nodes properly, and the conversion of your absolute pathing foo is better achieved by spending a week to write a program to do the grunt work for you as opposed to ruining a programmer's life forever by subjecting them to the repetitive task until they commit suicide.

Nov 29 2017, 12:41 pm
Optimumtact	We don�t' switch because we prefer the other form for the reasons we have suggested and we're not interested in changing it. I guess we'll just write yet another pre compiler step to unpath everything to work around byond when our compile times become unbearable.

Nov 29 2017, 1:10 pm (Edited on Nov 29 2017, 3:23 pm)

MrStonedOne

My plan atm is to look at injecting a intercept into the compile process so we can edit what code byond sees without having to write a second version to disk. Since everything is in byondcore.dll and dm.exe seems pretty simple it should be pretty easy.

From there we can fix a few things, one, we can change the perceived file name to fix the bug where runtime errors only print the filename without the folder its in so you don't know where runtimes are in files with the same name. We can reduce compile time by converting to relative pathing inline, and very likely reduce compile time again by rolling our own macro and define dereferencing since i doubt the way byond is doing it is preformant (as well as fix the bug where you can't use a define in a #error statement so we can't tell people what version of byond they need to get).

Nov 29 2017, 1:33 pm

Quentina

And it isn't the parser being poorly put together that's the root of the problem. Its that you are jamming hundreds of thousands of unneeded tokens into the machine and then asking why your compilation is slow. Even if it is optimized, it is going to be slower than the recommended syntax in the whitepapers.

A few extra tokens per path shouldn't be of consequence, the lexer is definitely built improperly. If hypothetically, it was of consequence, a smart way to cache the tokens generated from each file would fix that. My BYOND replacement project does just that and there's no "context switching" cost.

My problem with you Ter, are the delusions and widespread lies you propagate by defending Lummox's work, and how you erroneously argue that it's acceptable. I'm not the only one that's pissed over it. Anyone that has tried to level with Lummox or an adamant supporter of his, are fed up by the lies and excuses used to justify how things are over how they should be.

Quite frankly, I don't see how MrStonedOne has tolerated you. Most of his suggestions, rather than having been accepted, have been met with either stupid consolations or a "it's not our fault, it's yours". The latter, in my eyes, is extremely offensive.

You have made it so easy for me to persuade the leaders of those keeping this awful business alive to defect and take their products elsewhere. Thank you.

Since this message is probably "shadow muted", I'm just going to say I went ahead and sent MrStonedOne a request on Discord and in time, he'll see the light of things I'm sure. I've contacted many other branches of SS13 as well, and they're on board to switch. Your lie is over, Lummox. I will make sure of it.

Ultimately, more tokens equals longer compilation and there's no way around that, the parsing portion, at all. The current format you're using is not parser-friendly. Skipping over comments on the other hand is quick.

^Like This
You will no longer lie at others expense. Enjoy the beginning of your forced cessation, you fat inept con artist. The next time you hear from me, you will be exposed and your income will bleed dry.

Nov 29 2017, 1:50 pm

In response to Quentina

Ter13

Quentina wrote:

And it isn't the parser being poorly put together that's the root of the problem. Its that you are jamming hundreds of thousands of unneeded tokens into the machine and then asking why your compilation is slow. Even if it is optimized, it is going to be slower than the recommended syntax in the whitepapers.

A few extra tokens per path shouldn't be of consequence, the lexer is definitely built improperly. If hypothetically, it was of consequence, a smart way to cache the tokens generated from each file would fix that. My BYOND replacement project does just that and there's no "context switching" cost.

My problem with you Ter, are the delusions and widespread lies you propagate by defending Lummox's work, and how you erroneously argue that it's acceptable. I'm not the only one that's pissed over it. Anyone that has tried to level with Lummox or an adamant supporter of his, are fed up by the lies and excuses used to justify how things are over how they should be.

Quite frankly, I don't see how MrStonedOne has tolerated you. Most of his suggestions, rather than having been accepted, have been met with either stupid consolations or a "it's not our fault, it's yours". The latter, in my eyes, is extremely offensive.

You have made it so easy for me to persuade the leaders of those keeping this awful business alive to defect and take their products elsewhere. Thank you.

Since this message is probably "shadow muted", I'm just going to say I went ahead and sent MrStonedOne a request on Discord and in time, he'll see the light of things I'm sure. I've contacted many other branches of SS13 as well, and they're on board to switch. Your lie is over, Lummox. I will make sure of it.

Ultimately, more tokens equals longer compilation and there's no way around that, the parsing portion, at all. The current format you're using is not parser-friendly. Skipping over comments on the other hand is quick.

^Like This
You will no longer lie at others expense. Enjoy the beginning of your forced cessation, you fat inept con artist. The next time you hear from me, you will be exposed and your income will bleed dry.

My problem with you, Shane, is that I don't have one anymore. I do my best to not think about you at all. It's old, and it's sad. Seek professional help. I'm sorry that this is the totality of your character, and that your experience interacting with every community you've ever been a part of has been poisoned by your own hand. Seek professional help. You are not a well person.

Nov 29 2017, 4:55 pm (Edited on Nov 29 2017, 5:16 pm)

Optimumtact

I mean to be clear, ter, you do offer a solution.

I don't like that
A) it's not enforced correct by the compiler, which means we have to manually enforce it
B) it requires us to use proc names twice

Having the full path on every proc/class is one of the few things that I like about byond precisely because it makes using grep to search the codebase so intuitive

I'd rather lummox investigates if there is any simple tweaks to be done on the parser to reduce the overhead than to refactor our entire codebase yet again.

Dec 2 2017, 3:16 am (Edited on Dec 2 2017, 3:32 am)

Ter13

I just did some testing on my end. I generated a project with 7K unique datum subtypes and 52 unique procs per root type.

I generated one in relative pathing format, and one in absolute pathing format.

There was almost no difference in compilation time between the two. We're talking a difference of at most 4%.

The only difference between the two approaches is file size. I'm baffled by these results, but @MSO/Optimumtact, it looks like you guys don't have to waste your time trying to convert SS13.

My apologies for making assumptions and then only now testing them. I'm glad I did it now though, as this is the first time we've really acknowledged this point in public as more than just hearsay. Really sad to know I was wrong, but at the same time, happy to know the truth of the matter.

My tests indicate that object tree generation is actually the part of compilation that takes 60% of the total compilation time, so targets for optimization would actually be best targeted there rather than the compilation phase itself.

Though, this could be totally wrong, given the particulars of how I generated my project.

Dec 3 2017, 9:36 pm

Lummox JR

I did a smattering of testing on this Friday, and from what I've seen so far it appears that the potential place to do this optimization that I had in mind might be moot. That is, it appears that after parsing and before object generation (not the object tree), the nodes have already been properly collapsed.

Which means the lookup would have to be happening in the parsing stage, but probably at a point in that stage where it's more of a pain than it's worth to intervene.

Dec 3 2017, 10:23 pm
MrStonedOne	If its happening the early in the process, then its not likely to be a factor in compile times. 90% of all compile time on /tg/ is after "loading interface.dmf" but before "loading mapname.dme" (even thou they are higher up on the include graph I don't know, but i'd assume that this would be after parsing? In fact, one major help would be more compile time profiling, i'll put up a fr