dark mode light mode Search Menu
Search

Regular Expressions

Clemens v Vogelsang on Flickr

This Code Snippets section of the magazine explains parts of programming languages common across most or all languages. It's a great way to understand how languages work. Learning how to code should be much easier and less frightening if I explain a part of a programming language, then show you how it works in a couple different languages.

Let’s begin by showing you regular expressions, a complex but useful tool programmers can use. Regular expressions are also called regex or regexp for short.

In software programming, you might have a large chunk of data to work through, pulling out names and discarding other data. Regular expressions are one way to carefully go through data, character by character, to find data that matches what you want to find.

What is a Regular Expression?

Regular expressions are simply a set of characters, numbers, and punctuation in a predefined order that can be used to create a pattern used by your programming language to find matches. Every programming language has variations in how they interpret regular expressions. However, for people who like puzzles, creating regular expressions is an extremely fun challenge. It’s like cracking a cryptographic code.

Imagine this chunk of nonsense text:

LoremipsumdolorsitametconsecteturadipiscingelitEtiamnisivelitfringillaquisquamaeleifend
accumsanquamNunchendreritquamaloremimperdietneciaculisleoscelerisqueUttellusenimfringil
laquisfringillaatvenenatisvellacusInseddignissimrisusCurabitelvisualivervulputatearcuet
metushendreritposuereMauristempusnonlacusatullamcorperFuscelacusnequescelerisquesedsagi
ttisacblanditsedfelisSuspendissemaurislacusconsectetureumassaaportaaccumsanrisusDuisfeu
giatintortorfeugiatimperdietNuncjustonisicondimentumegetnullainluctussempeelvisraliveur
naVestibulumegetgravidanislvitaelaoreetfelisProinconsequatsemperipsumetelementumeratfri
ngillaaDonecetlaoreetlacusCurabiturluctusatliberoeuvehiculaMaurisinnelvisquejustoVestib
ulumultricesmiatinterdumfringillaAliquameratvolutpatProinnecelementumrisusconsecteturad
ipiscingmassaPhasellusacmassalectusDonecquammassaportaidleositametullamcorperlobortisen
imQuisquevitaenisitemportristiquelectusvitaetempornibh

How would you find Elvis in this mess? And is he alive?

According to one online regular expression editor (see links below), the expression /elvis/ run against this chunk of text will find three instances of Elvis spelled in lower case letters.

However, in the .Net programming language, the expression is \belvis\b. Notice the differences? In one language the slashes (also called delimiters) lean forward while in .Net the slashes lean backwards (probably because Windows uses backslashes, for example, for file folder path names). And .Net requires \b where the other language uses a single forward slash.

What do the slashes and \b mean to a programming language processing this nonsense text? The slashes mark the start (open) and end (close) of the regular expression pattern. In the .Net programming language, the \b tells the language where to begin and end with the search phrase.

In the nonsense text example, there are two instances of the word elvis followed by a single character followed by the word alive. We can find elvis alive by adapting our regular expression to /elvis.alive/. In the .Net programming language, the expression to use is \belvis\b.*\balive\b. The single period or .* tells the programming language the word elvis and the word alive are separated by any character except a newline (a special character used to mark the end of one line of data and the start of a new line of data).

How Are Regular Expressions Used?

There are more useful and serious reasons to use regular expressions beyond finding Elvis. For example, regular expressions are used to confirm a phone number is in the correct format and does not contain non-numbers. You and I know the phone number ABC-DEF-GHIJ is not a phone number but programming languages must be told phone numbers are a collection of numbers, starting with two groups of three numbers followed by a group of four numbers.

Zip or postal codes are another common use for regular expressions. Another use is search and replace, for example, to use patterns to search online comments to find URLs, swear words, or other data then remove them.

Perl and Regular Expressions

Perl allows you to use regular expressions with any character to mark the start and end of an expression. Remember how the forward slash marks the start and end of our /elvis.alive/ expression? In Perl, we could use !elvis.alive! or @elvis.alive@ instead. What would be the reason to do it this way? Imagine this expression:

drive:/\path\to\file.dat/

Notice the /\ bit? It looks like a tent, doesn’t it? Now imagine these alternatives:

drive:!\path\to\file.dat!

drive:{\path\to\file.dat}

These are much easier to read and understand. However, these examples also illustrate what drives some people crazy about the Perl language. It’s an extremely powerful programming language because it is so flexible. Remembering the two or five or ten different ways to do something in Perl, however, drives some people crazy.

Go and Regular Expressions

In the Go language, the forward or back slashes are replaced by open and closed parentheses. So looking for elvis alive in the nonsense text above might be e([a-z]+)s. But this also would find and return words like elephants and egrets because the pattern matches words that begin with e and end with s and have any combination of letters in between.

Regular Expressions in the Wild

As you can imagine, few if any programmers know all the possible ways to create a regular expression. When people need to use a regular expression in their code, they might use a textbook or have a plugin in their editing software or search online. For example, to confirm an email address is in the proper format, there are many examples online in forums and articles. There also are standalone regular expression editors.

Some programmers love the challenge of creating patterns with regular expressions and become very good at creating them. Most programmers, however, know the basics and use a tool to build the expression they need only when they need it.

Learn More

Regular Expressions

https://en.wikipedia.org/wiki/Regular_expression
http://www.it.cornell.edu/security/depth/practices/data_discovery/tools/regexes.cfm
http://www.regular-expressions.info/tutorial.html
http://www.regular-expressions.info/examples.html

Online Regular Expression Editor

http://www.regexr.com/

Examples of Regular Expressions in Perl

http://affy.blogspot.com/p5be/ch10.htm

Examples of Regular Expressions in Go

https://gobyexample.com/regular-expressions

Examples of Regular Expressions in JavaScript

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Examples of Regular Expressions in .Net

http://www.codeproject.com/Articles/9099/The-Minute-Regex-Tutorial
http://technet.microsoft.com/en-us/library/gg440701.aspx

Examples of Regular Expressions in Java

http://www.vogella.com/tutorials/JavaRegularExpressions/article.html