In this article I’ll cover two useful regular expressions that often come in handy when writing validation logic for a Web page. Most programmers will find they need to either validate a URL or an Email at some point, and the example expressions below are the most commonly used versions of each expression.
I’m not sure who originally created each of these expressions, so I have no idea who I can give the credit for these to, but they have helped me often. Bottom line for these two examples is that I like to use the example expressions found in an article on NetTuts+/regular-expressions.info and modify them to meet my needs.
Since I’m mentioning the regular-expressions.info site, I definitely suggest browsing over to it if you are looking for further reading about regular expressions including sample code and reference material.
Also, have a look at a free online regular expression tool by Derek Slager. You can use this tool to test your regular expressions if you don’t have a program on your computer to do so. I have spent quite some time trying to find a good online regular expression testing tool, but every one I have found is either too expensive, or full of advertising, so this tool is definitely worth checking out.
Regular Expression for Validating a Web site
For starters, you’ll want to make sure that you set your expression to ignore the case, or capital letters in a URL will cause it to fail. Alternately, convert the URL to lower case before applying this expression.
To review the structure of the expression; the ^ symbol matches the start of the line and the $ symbol matches the end of the line. This makes sure that extra text will not be included in valid matches.
The expression logically breaks address matches into four groups. Let’s take http: / / www . google . com / mysearch / android.html as an example; here are how the regular expression matches this URL:
- http: / /
- www . google
- . com
- / mysearch/ android.html
- The first group search in the regular expression is for the http: / / portion of the Web site address. Since some addresses use https, this is matched by using the s? which means that the s is valid but optional in the match. The ? at the end of this group indicates that it is optional and can be left out but still be a valid URL.
- The second group then matches the domain. In the case of our example this is www . google
- The third group then matches the domain suffix. The expression looks for the . and then allows the suffix as a series of at least two letters, but no more than six.
- The fourth group matches any remaining parts of the URL containing forward slashes, word characters, periods, and dashes.
Regular Expression for an email address
This expression checks for valid email address syntax such as email@example.com
As with the URL validation expression above, this email expression uses the ^ symbol to match the start of a line and the $ symbol to match the end.
It’s good to note that this expression is not case-sensitive and will match email addresses written in mixed case. You don’t explicitly need to tell your regular expression engine to ignore case.
- The first part of this expression ([a-zA-Z0-9._-]+@) captures the name for the email address. This can include alphanumeric characters, periods, underscores, and dashes.
- The second part of this expression then captures the domain component of the email address. In this case it is yahoo. The permitted domain characters include alphanumeric characters, periods, and dashes.
- The third part of the expression captures the domain suffix. This suffix can only be letter characters and must be at least two characters long, but it cannot be longer than four characters.
These expressions are definitely very useful and come up time and time again in the tasks of any developer. I hope the descriptions of how they function are of use in deciphering the details of each of these two expressions. Please feel free to add a comment with your favorite expressions for matching an email or Web address.