(Note: Function examples are in PHP. Also a few I am too lazy to test, but if they don't work it's something small that I missed.)

When site crawling with links that use relative pathing, you may need to cut off the end of your current URL to prepend it to the next link. Here's how you'd cut off the end (Note that it doesn't work with URLs that use / as attribute delimiters, which is luckely rare):

preg_replace('/(?<=\/)[^/]*$/', '', $input);

Match a tag and its content (Note that this does not count tags, so your result will end at the first matching end tag, so it can't be applied in all situations. However, this is more powerful than most tag matching patterns you'll find on the net in that it doesn't stop on a tag that doesn't match the first tag.):


Match a tag and its content that contains only a certain attribute:

/\s+class\s*=\s*('|")?myClass(?(2)\2|)[^>]*>.*?\/\1>/ i

Match only the contents of the href="" in an anchor tag:

I love RegEx, it's so freaking versatile. I use it most for parsing purposes.

I notice your samples don't seem to make use of groups for capturing. Your last one appears to use lookaheads/behinds at least. Are you going to post later about capturing, and the difference between matches/groups/captures?

System.Text.RegularExpressions for the win!
lol, you might want to re-read these expressions:


/&ass\s*=\s*('|")?myClass(?(2)\2|)[^>]*>.*?<\/\1>/ i

I capture once in the first one and twice in the second one. I couldnt use (?(x)) withough capturing first :P
What I mean is, you aren't capturing anything useful for the programmer to make use of other than the tag name. The contents of the tag and attribute for example. Also, I don't believe I've ever seen the (?(x) ) construct. How does that work?
Oh.... Those examples were to match the whole thing, but good point.

I kind of avoided those because they require knowledge of the program using the RegEx instead of just the RegEx itself. I could probably whip up some examples showing that.

(?(x) | ) is a conditional that executes the true side if a capture was successful, and the false side if there is no capture. It's used in conjunction with ? and ?? optional operators since they can allow a capture to be false.