Difference between revisions of "Regular Expressions"

From CSE330 Wiki
Jump to navigationJump to search
m (Created page with 'Good tool: http://www.debuggex.com/')
 
Line 1: Line 1:
 +
Is this string a phone number?  Is it an e-mail address?  Is it a URL?  How do I parse the time of day out of this date string?  How do I find all words in the string that contain the letter "a"?
 +
 +
The tool we use to answer these questions is '''regular expressions'''.  Regular expressions are a powerhouse for string matching.
 +
 +
This guide serves as an introduction to regular expressions.  However, there is enough to discuss about them that regular expressions could be their ''own class''.  We will only be scratching the surface so that you can get up and running with regular expressions and start to use them in your projects.
 +
 +
== Grouping ==
 +
 +
One use of regular expressions is when you want to extract information of a known format out of a string.  This is when ''groups'' come into play.
 +
 +
Are you ready to see your ''first'' regular expression?  Here we go:
 +
 +
{{RegexExample
 +
|regex=\b(\w)are\b
 +
|str=The tortoise and the hare took on a dare to find the rare Aare.
 +
|method=regexpal
 +
}}
 +
 +
This regular expression finds all words of the form "_are", where _ is an alphanumeric character, and matches that first letter.  Let's go through the parts of this regular expression.
 +
 +
* '''\b''' means "word boundary".  If we didn't have the '''\b''''s, then this regular expression would also match words like "dayc'''are'''" or "app'''are'''l" or "'''are'''st".
 +
 +
== Testing Regular Expressions ==
 +
 +
{{RegexExample
 +
|regex=[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)
 +
|str=me@example.com
 +
|method=debuggex
 +
}}
 +
 
Good tool: http://www.debuggex.com/
 
Good tool: http://www.debuggex.com/
 +
 +
== Using Regular Expressions in Programming Languages ==
 +
 +
You are learning three new programming languages in CSE 330: [[PHP]] (from Module 2), [[Python]] (from Module 4), and [[JavaScript]] (from Module 6).  Below, you may find how to implement regular expressions in each of these three languages.
 +
 +
The example is a function that tests whether a string matches a regular expression representing an e-mail address, and if it does, the function returns the domain name from the e-mail.
 +
 +
=== PHP ===
 +
 +
The magic function in PHP is '''preg_match($regex, $str, $matches)''', where ''$regex'' is the regular expression, ''$str'' is the string to test, and ''$matches'' is an extra argument which will be modified to contain the matches array.
 +
 +
<source lang="PHP">
 +
function domain_from_email($str){
 +
if(preg_match("/^[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)$/", $str, $matches)){
 +
return $matches[1];
 +
} else return false;
 +
}
 +
</source>
 +
 +
=== Python ===
 +
 +
=== JavaScript ===

Revision as of 18:56, 23 March 2013

Is this string a phone number? Is it an e-mail address? Is it a URL? How do I parse the time of day out of this date string? How do I find all words in the string that contain the letter "a"?

The tool we use to answer these questions is regular expressions. Regular expressions are a powerhouse for string matching.

This guide serves as an introduction to regular expressions. However, there is enough to discuss about them that regular expressions could be their own class. We will only be scratching the surface so that you can get up and running with regular expressions and start to use them in your projects.

Grouping

One use of regular expressions is when you want to extract information of a known format out of a string. This is when groups come into play.

Are you ready to see your first regular expression? Here we go:

This regular expression finds all words of the form "_are", where _ is an alphanumeric character, and matches that first letter. Let's go through the parts of this regular expression.

  • \b means "word boundary". If we didn't have the \b's, then this regular expression would also match words like "daycare" or "apparel" or "arest".

Testing Regular Expressions

[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)

Good tool: http://www.debuggex.com/

Using Regular Expressions in Programming Languages

You are learning three new programming languages in CSE 330: PHP (from Module 2), Python (from Module 4), and JavaScript (from Module 6). Below, you may find how to implement regular expressions in each of these three languages.

The example is a function that tests whether a string matches a regular expression representing an e-mail address, and if it does, the function returns the domain name from the e-mail.

PHP

The magic function in PHP is preg_match($regex, $str, $matches), where $regex is the regular expression, $str is the string to test, and $matches is an extra argument which will be modified to contain the matches array.

function domain_from_email($str){
	if(preg_match("/^[\w\-\+\.]+@([\w\-]+(?:\.[\w]{2,4})+)$/", $str, $matches)){
		return $matches[1];
	} else return false;
}

Python

JavaScript