ID:167677
 
I'm finalizing my character creation system, and I need help creating a process to treat text strings with specific settings as passed through the procedure, i.e. TextProcess(string,caps=0,singlespacing=1,punctuation=1,html =1,numbers=1)


Here's specifically what I want it to do, assuming each in this list is set to 1:

caps- lowertext all caps.
singlespacing- For each group of spaces in the string, strip the group down to 1 space.
punctuation- Strip all punctuation characters.
html- Strip all HTML tags.
numbers- Strip all numeral characters.


I would like, if possible, a proc for this pre-written, or an explination of how to do this.
caps- lowertext all caps.

Easy, use the lowertext() proc.

singlespacing- For each group of spaces in the string, strip the group down to 1 space.

The easy method would be to just keep looping until all instances of " " are found (and replaced with " ").
Ie;
while(findtext(string, "  "))
//Locate the " "
//Copy the text before and after the " "
//Set string to startText + " " + endText



punctuation- Strip all punctuation characters.

Ok, this would be pretty much the same thing but with two loops and a list of 'illegal' characters.
Ie;
var/list/L = list(".",",","!")
for(var/illegalChar in L)
while(findtext(string, illegalChar))
//Cut illegalChar out of the string.



html- Strip all HTML tags.

This is a little more complicated. I believe Wizkidd wrote up a library for this specific perpose, so I recommend hunting that down.


numbers- Strip all numeral characters.

Same as the illegal characters loop, but with numbers. Just remember, 1 and "1" are not the same thing.
In response to DarkView
The easy method would be to just keep looping until all instances of " " are found (and replaced with " ").
Ie;

No idea how to do that.

Ok, this would be pretty much the same thing but with two loops and a list of 'illegal' characters.
Ie;

It would be very, very, very hard to make a list of all punctuation characters, need another solution.



I don't know how to find something in a string and cut it out from the string, and I think findtext() stored in a variable would only give a position.
I've never done much in terms of text handling.
In response to Artemio
Artemio wrote:
The easy method would be to just keep looping until all instances of " " are found (and replaced with " ").
Ie;

No idea how to do that.

He showed you how; he gave an example right there. And the example can be used both for replacing, or with only a one-line modification for simply removing things (punctuation) altogether.
//This strips text out
var
n; prefix; suffix
do
n = findtext(base_string, "!")
if(!n) break
prefix = copytext(base_string, 1, n)
suffix = copytext(base_string, n+1, 0)
base_string = prefix + suffix
while(n)

To replace one thing with another, you could have just changed the second to last line there to
base_string = prefix + "@" + suffix


It would be very, very, very hard to make a list of all punctuation characters, need another solution.

That is understandable. After all, there are a lot of odd characters people can make with certain button combinations. I'll suggest another route, keeping track of what is allowed, and then taking out anything that's not part of that list.
var/list/valid_characters[0]
world/New()
var/i
valid_characters.Add(32)
for(i = 65 to 90)
valid_characters.Add(i)
for(i = 97 to 122)
valid_characters.Add(i)
proc/strip_invalid_characters(base_string)
var
n = 0; prefix = ""; suffix = ""
i = 1; length = length(base_string)
while(i <= length)
n = text2ascii(base_string, i)
if(!(n in valid_characters))
prefix = copytext(base_string, 1, i)
suffix = copytext(base_string, i+1, 0)
base_string = prefix + suffix
length -= 1
continue
i += 1
return base_string

Something along those lines.


I don't know how to find something in a string and cut it out from the string

copytext

, and I think findtext() stored in a variable would only give a position.

Which you could then use to tell where to splice.
In response to Loduwijk
Both of those examples cause infinite loops depending on the size of the string.
In response to Artemio
Maybe lengthy loops, but not infinite. Either way, however, I thought this was to process names. If someone has a name long enough to choke this out then you have a problem with more than just your algorithm.
In response to Loduwijk
Actually, no. Even my key takes over 5 minutes to process.
In response to Artemio
Oops, I did make a mistake.
if(n in valid_characters)
i += 1
continue

I forgot the i+=1 line in the original.
[edit]
Actually, no need for redundancy. I just moved i+=1 to after the n=ascii2text(base_string, i) line. See my original post where I edited the example.
In response to Loduwijk
Ehh, now it doesn't do anything to the string but return it the way it was sent in.
In response to Artemio
If you're using it on your key again, that'd be because there's nothing to change. Either way, it was just an example; if I did make some other mistakes which I am somehow overlooking, try to work them out.
In response to Loduwijk
I'm using it on a text string, so it's probably a mistake on your part. I hardly understand it, but I can try to work it out I guess.
In response to Artemio
Alright, I did make a few mistakes. I just went through and corrected them, and it works fine now.

Not often that I make blunders like that, and more than one in the same place. Sorry for any confusion it may have caused.
[edit]
On, and of course I changed it in the original post to reflect that.
In response to Loduwijk
Thanks, works perfectly now. One last question, what numbers represent these punctuation symbols- apostraphe, period, dash?

Edit: Using KeyState, I think I found the numbers. I added them to the valid_characters list and it still strips them. The numbers I used- 222,190,189.
In response to Artemio
http://www.lookuptables.com/

KeyState uses KeyCodes which are often but not always the same as ascii.