RegEx of the day in Off Topic
|
|
(Note: Function examples are in PHP. Also a few I am too lazy to test, but if they don't work it's something small that I missed.)
When site crawling with links that use relative pathing, you may need to cut off the end of your current URL to prepend it to the next link. Here's how you'd cut off the end (Note that it doesn't work with URLs that use / as attribute delimiters, which is luckely rare):
preg_replace('/(?<=\/)[^/]*$/', '', $input);
Match a tag and its content (Note that this does not count tags, so your result will end at the first matching end tag, so it can't be applied in all situations. However, this is more powerful than most tag matching patterns you'll find on the net in that it doesn't stop on a tag that doesn't match the first tag.):
/<\s*([\w_][\w_-]*)[^>]*>.*?<\/\1>/i
Match a tag and its content that contains only a certain attribute:
/&ass\s*=\s*('|")?myClass(?(2)\2|)[^>]*>.*?<\/\1>/ i
Match only the contents of the href="" in an anchor tag:
/(?<=href=")[^"]+(?=")/i
|
I notice your samples don't seem to make use of groups for capturing. Your last one appears to use lookaheads/behinds at least. Are you going to post later about capturing, and the difference between matches/groups/captures?
System.Text.RegularExpressions for the win!