What would be the simplest way to allow certain tags?

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:1441475 Dec 9 2013, 8:50 pm (See the best response by Ss4toby.)
Xirre	It's in the title. I have an idea on how to do it, but it's a really long one that I don't want to write considering the fact that I have approx 1000 more lines of code to do before a release is made. I was going to findtext each < and > and check if it is in the approved tags list.

Dec 9 2013, 9:46 pm

Nadrew

Pretty much what you said, if you want to only allow certain tags you'll have to parse through the text to detect those tags and then go from there by replacing the stuff you don't want with the stuff you want.

I usually do it backwards, I'll html_encode() the string, then go through allowed tags and replace the encoded version of the html with the non-encoded version. It's a whole lot easier to include tags you want than to exclude the ones you don't want -- I imagine the ones you don't want outweigh the ones you do by a bit.

Dec 10 2013, 6:56 am

Xirre

mob/proc/removeTags(var/tx, var/list/tags)
    for(var/i = 1 to length(tx))
        if(findtext(tx, "<", i, i+1))
            var/check_tx = copytext(tx, i+1, findtext(tx, ">", i+1))
            if(check_tx in tags)
                continue
            else
                if(i != 1)
                    tx = copytext(tx, i-1) + copytext(tx, findtext(tx, ">", i+1))
                else
                    tx = copytext(tx, findtext(tx, ">", i+1))
        sleep(1)
    return tx

I did that last night. My way versus your way, which one would be more beneficial from your perspective for a text that has a limit of 5000 characters? I'm trying to condense codes and processing time as much as possible.

Dec 10 2013, 7:18 pm

Best response

Ss4toby

Dang you Xirre. You never message me when it comes to these types of questions. The simple ones :p..

The best method I know of is to use findtext(). By doing so, you will only examine the <'s in the string, and will stop once done. Therefore, by doing so you could replace the <'s in a 500 length text, and only have a loop ran twice.

Example of mentioned method:

var/allowedTags=list("<a","<b","<i","</i","</b","</a")<br/">

mob/verb/ReplaceTags(t as text)
    var/lSpot=0
    for(var/i=1 to length(t))//This is not neccessary, but I like to use it as a fail-safe
        var/spot=findtext(t,"<",lSpot+1)
        if(!spot||spot<lSpot)
            break
        lSpot=spot
        //This will allow arguments for tags
        var/space=findtext(t," ",spot)
        var/greaterThan=findtext(t,">",spot)
        var/use=greaterThan
        if(space<greaterThan&&space>spot)
            use=space
        var/tag=copytext(t,spot,use)
        if(tag in allowedTags)
            continue
        if(spot==1)
            t="&#60;"+copytext(t,2,length(t)+1)
        else
            t=copytext(t,1,spot)+"&#60;"+copytext(t,spot+1,length(t)+1)
    src<<browse(t,"window=Output")

As for whether or not it would be best to manipulate a string alone or to use html_encode(), I am not entirely sure. Either way has it's potential usefulness. However, as Nadrew said, by allow blocked tags through html_encode(), you would essentially cut back on CPU usage, because you are likely to disallow more than you allow.