ID:1441475
 
(See the best response by Ss4toby.)
It's in the title.

I have an idea on how to do it, but it's a really long one that I don't want to write considering the fact that I have approx 1000 more lines of code to do before a release is made.

I was going to findtext each < and > and check if it is in the approved tags list.
Pretty much what you said, if you want to only allow certain tags you'll have to parse through the text to detect those tags and then go from there by replacing the stuff you don't want with the stuff you want.

I usually do it backwards, I'll html_encode() the string, then go through allowed tags and replace the encoded version of the html with the non-encoded version. It's a whole lot easier to include tags you want than to exclude the ones you don't want -- I imagine the ones you don't want outweigh the ones you do by a bit.
mob/proc/removeTags(var/tx, var/list/tags)
for(var/i = 1 to length(tx))
if(findtext(tx, "<", i, i+1))
var/check_tx = copytext(tx, i+1, findtext(tx, ">", i+1))
if(check_tx in tags)
continue
else
if(i != 1)
tx = copytext(tx, i-1) + copytext(tx, findtext(tx, ">", i+1))
else
tx = copytext(tx, findtext(tx, ">", i+1))
sleep(1)
return tx


I did that last night. My way versus your way, which one would be more beneficial from your perspective for a text that has a limit of 5000 characters? I'm trying to condense codes and processing time as much as possible.
Best response
Dang you Xirre. You never message me when it comes to these types of questions. The simple ones :p..

The best method I know of is to use findtext(). By doing so, you will only examine the <'s in the string, and will stop once done. Therefore, by doing so you could replace the <'s in a 500 length text, and only have a loop ran twice.

Example of mentioned method:
var/allowedTags=list("<a","<b","<i","</i","</b","</a")<br/">

mob/verb/ReplaceTags(t as text)
var/lSpot=0
for(var/i=1 to length(t))//This is not neccessary, but I like to use it as a fail-safe
var/spot=findtext(t,"<",lSpot+1)
if(!spot||spot<lSpot)
break
lSpot=spot
//This will allow arguments for tags
var/space=findtext(t," ",spot)
var/greaterThan=findtext(t,">",spot)
var/use=greaterThan
if(space<greaterThan&&space>spot)
use=space
var/tag=copytext(t,spot,use)
if(tag in allowedTags)
continue
if(spot==1)
t="&#60;"+copytext(t,2,length(t)+1)
else
t=copytext(t,1,spot)+"&#60;"+copytext(t,spot+1,length(t)+1)
src<<browse(t,"window=Output")


As for whether or not it would be best to manipulate a string alone or to use html_encode(), I am not entirely sure. Either way has it's potential usefulness. However, as Nadrew said, by allow blocked tags through html_encode(), you would essentially cut back on CPU usage, because you are likely to disallow more than you allow.