Regular Expressions

From CSE330 Wiki
Revision as of 18:56, 23 March 2013 by Shane (talk | contribs)
Jump to navigationJump to search

Is this string a phone number? Is it an e-mail address? Is it a URL? How do I parse the time of day out of this date string? How do I find all words in the string that contain the letter "a"?

The tool we use to answer these questions is regular expressions. Regular expressions are a powerhouse for string matching.

This guide serves as an introduction to regular expressions. However, there is enough to discuss about them that regular expressions could be their own class. We will only be scratching the surface so that you can get up and running with regular expressions and start to use them in your projects.

Grouping

One use of regular expressions is when you want to extract information of a known format out of a string. This is when groups come into play.

Are you ready to see your first regular expression? Here we go:

This regular expression finds all words of the form "_are", where _ is an alphanumeric character, and matches that first letter. Let's go through the parts of this regular expression.

  • \b means "word boundary". If we didn't have the \b's, then this regular expression would also match words like "daycare" or "apparel" or "arest".

Testing Regular Expressions

[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)

Good tool: http://www.debuggex.com/

Using Regular Expressions in Programming Languages

You are learning three new programming languages in CSE 330: PHP (from Module 2), Python (from Module 4), and JavaScript (from Module 6). Below, you may find how to implement regular expressions in each of these three languages.

The example is a function that tests whether a string matches a regular expression representing an e-mail address, and if it does, the function returns the domain name from the e-mail.

PHP

The magic function in PHP is preg_match($regex, $str, $matches), where $regex is the regular expression, $str is the string to test, and $matches is an extra argument which will be modified to contain the matches array.

function domain_from_email($str){
	if(preg_match("/^[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)$/", $str, $matches)){
		return $matches[1];
	} else return false;
}

Python

JavaScript