Summary: in this tutorial, you are going to learn about Perl regular expression, the most powerful feature of the Perl programming language.
A regular expression is a pattern that provides a flexible and concise means to match the string of text. A regular expression is also referred to as regex or regexp.
A regular expression can be either simple or complex, depending on the pattern you want to match.
Basic matching
The following illustrates the basic syntax of regular expression matching:
string =~ regex;
Code language: Perl (perl)
The operator =~
is the binding operator. The whole expression returns a value to indicate whether the regular expression regex
was able to match the string
successfully.
Let’s take a look at an example.
First, we declare a string variable:
my $s = 'Perl regular expression is powerful';
Code language: Perl (perl)
Second, to find if the string $s
contains the substring ul
you use the following regular expression:
$s =~ /ul/;
Code language: Perl (perl)
Putting it all together.
#!/usr/bin/perl
use warnings;
use strict;
my $s = 'Perl regular expression is powerful';
print "match found\n" if( $s =~ /ul/);
Code language: Perl (perl)
match found
Code language: Perl (perl)
To identify if a string does not match a given regular expression, you use a negated form of the binding operator ( !~
). The following example demonstrates how to use the negation to find all strings in an array that does not match the regular expression /er/
:
#!/usr/bin/perl
use warnings;
use strict;
my @words= (
'Perl',
'regular expression',
'is',
'a very powerul',
'feature'
);
foreach(@words){
print("$_ \n") if($_ !~ /er/);
}
Code language: Perl (perl)
And the output is:
regular expression
is
feature
Code language: Perl (perl)
If you want to match a pattern that contains a forward slash (/) character, you have to escape it using a backslash (\) character. You can also use a different delimiter if you precede the regular expression with the letter m
, the letter m
stands for match.
Let’s take a look at the following example:
#!/usr/bin/perl
use warnings;
use strict;
my @html = (
'<p>',
'html fragement',
'</p>',
'<br>',
'<span>This is a span</span>'
);
foreach(@html){
print("$_ \n") if($_ =~ m"/");
}
Code language: Perl (perl)
How it works.
- First, declared an array of strings that contains HTML code.
- Second, looped over the elements of the array and displayed the element that contains any number of forward-slash characters (/). Notice that we preceded the letter m and used double quotes as the delimiter for the regular expression.
The following shows the output of the program:
</p>
<span>This is a span</span>
Press any key to continue . . .
Code language: HTML, XML (xml)
Matching case-insensitively
Let’s take a look at the following example:
#!/usr/bin/perl
use warnings;
use strict;
my $s = "Regular expression";
print "match" if $s =~ /Expression/;
Code language: Perl (perl)
We expect the output of the program is “match”. However, it is not. Because the string $s
does not contain the word Expression
, but expression with the first letter E in lowercase.
To instruct Perl to match a pattern case insensitive, you need to add a modifier i
as the following example:
#!/usr/bin/perl
use warnings;
use strict;
my $s = "Regular expression";
print "match\n" if $s =~ /Expression/i;
Code language: Perl (perl)
Now, we got what we expected.
Perl regular expression with quantifiers
In the previous examples, we have created regular expressions by simply putting the characters we want to match between a pair of slashes. What if you want to find the same sequence of characters multiple times? you may quickly write something like:
/aaa/
Code language: Perl (perl)
How about 100 times or more? Fortunately, a regular expression engine provides you with quantifiers to build such kinds of patterns. For example, to find a match 100 times in a text, you could do it as follows:
/a{100}/
Code language: Perl (perl)
The following table provides some useful quantifiers:
Quantifier | Meaning |
---|---|
A* | Zero or more A |
A+ | One or more A |
A? | A is optional |
A{10} | Ten A |
A{1,5} | From one to five A |
A{2,} | Two A or more |
Let’s take a look at the following example:
#!/usr/bin/perl
use warnings;
use strict;
my @words = ("available", "avatar", "avalon");
foreach(@words){
print $_, "\n" if(/a*l+/);
}
Code language: Perl (perl)
The regular expression /a*l+/
means zero or more a
followed by at least one or more l
, therefore, the output is:
available
avalon
Code language: Perl (perl)
Up to now, you’ve noticed that the regular expression engine treats some characters in a special way. These characters are called metacharacters. The following are the metacharacters in Perl regular expressions:
{}[]()^$.|*+?\
Code language: Perl (perl)
To match the literal version of those characters, you have to a backslash \ in front of them in the regular expressions.
In this tutorial, we have introduced you to some techniques to match strings of text using Perl regular expression including basic matching, case-insensitive matching, and quantifiers.