阅读(4409) 赞(4)

Perl正则表达式

2016-08-23 14:17:14 更新

Perl正则表达式，匹配操作符，替换操作符修饰符，转换操作符，正则表达式是一个字符串的字符定义视图的模式或多模式

正则表达式是一个字符串的字符定义视图的模式或多模式。在Perl的正则表达式的语法是什么，你会发现在其他正则表达式，如sed，grep和awk的支持程序非常相似。

运用正则表达式的基本方法是使用结合的经营模式=〜和！〜。第一个是一个测试操作符，第二是一个赋值操作符。

匹配正则表达式 - m//
替代正则表达式 - s///
直译（拼写）正则表达式 - tr///

在每种情况下斜线作为正则表达式（regex的），你指定的分隔符。如果你喜欢用任何其他分隔符，那么你可以代替使用斜线的位置。

匹配操作符

m//匹配操作符，用来匹配一个正则表达式字符串或语句。例如，要匹配的字符序列“foo”对标量$bar，你可能会使用这样的语句：

if ($bar =~ /foo/)

m//其实与同样功能的q//操作符。你可以使用任何自然匹配的字符作为分隔符表达式的组合，例如，{}，m()，和m><都是有效的。

如果分隔符是斜杠，你可以从m//省略成m，但所有其他的分隔符，你必须使用m前缀。

请注意，整个匹配表达式表现出来。即=〜！或〜匹配操作符左边的表达式，返回true（在标量上下文）如果表达式匹配。因此，语句：

$true = ($foo =~ m/foo/);

将会设置$true的值为1 如果$foo匹配正则表达式, 否则$true为0匹配失败。

匹配在列表上下文中，返回任何分组表达式的内容。例如，从字符串中提取的小时，分钟和秒时，我们可以使用：

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

匹配运算符修饰符

匹配的操作符支持其自己的一套修饰符。 /g的修饰符，使全局匹配，/i修饰符将匹配不区分大小写。这里是完整的修饰符列表：

Modifier	Description i 	Makes the match case insensitive
m 	Specifies that if the string has newline or carriage
	return characters, the ^ and $ operators will now
	match against a newline boundary, instead of a
	string boundary
o 	Evaluates the expression only once
s 	Allows use of . to match a newline character
x 	Allows you to use white space in the expression for clarity
g 	Globally finds all matches
cg 	Allows the search to continue even after a global match fails

只匹配一次

还有一个简单的版本匹配操作符 - ？Pattern？操作符。这基本上是等同于m//运算符但它仅匹配一次在字符串之间的每个调用reset。

例如，可以使用此列表内的第一个和最后一个元素：

#!/usr/bin/perl

@list = qw/food foosball subeo footnote terfoot canic footbrdige/;

foreach (@list)
{
   $first = $1 if ?(foo.*)?;
   $last = $1 if /(foo.*)/;
}
print "First: $first, Last: $last\n";
# by www.zijiebao.com
This will produce following result
First: food, Last: footbrdige

替换操作符

替换操作符，s///确实是只是一个扩展，使您可以更换一些新的文本匹配的文本匹配运算符。此运算符基本形式是：

s/PATTERN/REPLACEMENT/;

PATTERN 是我们正在寻找的正则表达式的文本。REPLACEMENT 是一个规范，我们要用来替换找到的文字与文本或正则表达式。

例如，我们可以使用.cat. 替换所有出现的.dog。

$string =~ s/dog/cat/;

另外一个例子：

#/user/bin/perl

$string = 'The cat sat on the mat';
$string =~ s/cat/dog/;

print "Final Result is $string\n";

This will produce following result

The dog sat on the mat

替换操作符修饰符

这里是替代操作符的所有修改的列表：

Modifier	Description i 	Makes the match case insensitive
m 	Specifies that if the string has newline or carriage
	return characters, the ^ and $ operators will now
	match against a newline boundary, instead of a
	string boundary
o 	Evaluates the expression only once
s 	Allows use of . to match a newline character
x 	Allows you to use white space in the expression
	for clarity
g 	Replaces all occurrences of the found expression
	with the replacement text
e 	Evaluates the replacement as if it were a Perl statement,
	and uses its return value as the replacement text

转换

转换相似但不完全相同替换的原则，但不像替换，转换（翻译）不使用正则表达式搜索替换值。转换操作符是：

tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds

翻译替换在SEARCHLIST与在REPLACEMENTLIST相应出现的字符所有字符。例如，使用“The cat sat on the mat.”字符串我们已经在本章中使用：

#/user/bin/perl

$string = 'The cat sat on the mat';
$string =~ tr/a/o/;

print "$string\n";

This will produce following result

The cot sot on the mot.

也可用于标准的Perl范围，允许你指定字符的范围，由字母或数值。要改变字符串的情况下，您可以使用以下语法在位置的uc函数。

$string =~ tr/a-z/A-Z/;

转换操作符

以下是有关操作符的运算符名单

Modifier 	Description
c 	Complement SEARCHLIST.
d 	Delete found but unreplaced characters.
s 	Squash duplicate replaced characters.

/ D的修饰符删除匹配SEARCHLIST的字符，不具备相应的条目在REPLACEMENTLIST。例如：

#!/usr/bin/perl 

$string = 'the cat sat on the mat.';
$string =~ tr/a-z/b/d;

print "$string\n";

This will produce following result
b b   b.

最后的修饰符，/s删除被替换的字符的重复序列，因此：

#!/usr/bin/perl

$string = 'food';
$string = 'food';
$string =~ tr/a-z/a-z/s;

print $string;

This will produce following result
fod

更复杂的正则表达式

你不只是有固定的字符串匹配。事实上，你可以在任何可以使用更复杂的正则表达式只是匹配。这里有一个快速的小抄：

Character		Description .              a single character
\s             a whitespace character (space, tab, newline)
\S             non-whitespace character # by www.zijiebao.com
\d             a digit (0-9)
\D             a non-digit
\w             a word character (a-z, A-Z, 0-9, _)
\W             a non-word character
[aeiou]        matches a single character in the given set
[^aeiou]       matches a single character outside the given set
(foo|bar|baz)  matches any of the alternatives specified

量词可以用来指定有多少以前的东西，你要匹配，其中“thing”是指一个原义字符，上面列出的元字符，或一组括号中的字符或元字符。

Character            Description *              zero or more of the previous thing
+              one or more of the previous thing
?              zero or one of the previous thing
{3}            matches exactly 3 of the previous thing
{3,6}          matches between 3 and 6 of the previous thing
{3,}           matches 3 or more of the previous thing

^元字符匹配字符串的开头和 $ metasymbol 匹配字符串的结尾。
这里有一些简单的例子

# nothing in the string (start and end are adjacent)
/^$/   

# a three digits, each followed by a whitespace
# character (eg "3 4 5 ")
/(\d\s){3}/  

# matches a string in which every
# odd-numbered letter is a (eg "abacadaf")
/(a.)+/  

# string starts with one or more digits
/^\d+/

# string that ends with one or more digits
/\d+$/

让我们看看另一个例子

#!/usr/bin/perl

$string = "Cats go Catatonic\nWhen given Catnip";
($start) = ($string =~ /\A(.*?) /);
@lines = $string =~ /^(.*?) /gm;
print "First word: $start\n","Line starts: @lines\n";


This will produce following result
First word: Cats
Line starts: Cats When

匹配边界

匹配任何单词边界，\w类和\W类之间的区别定义。因为\w一个字的字符，\W相反，这通常是指一个词的终止。 \B断言不是一个单词边界匹配任何位置。例如：

/cat/ # Matches 'the cat sat' but not 'cat on the mat'
/\Bcat\B/ # Matches 'verification' but not 'the cat on the mat'
/cat\B/ # Matches 'catatonic' but not 'polecat'
/\Bcat/ # Matches 'polecat' but not 'catatonic'

选择替代品

|字符是一样的标准或按位或在Perl。它指定一个正则表达式或组内的备用匹配。例如，以匹配表达式中的“cat”或“dog”，你可能会使用这个：

if ($string =~ /cat|dog/)

您可以将单个表达式的元素结合在一起，以支持复杂的匹配。寻找两个人的名字，可以实现两个独立的测试，像这样：

if (($string =~ /Martin Brown/) ||
   ($string =~ /Sharon Brown/))

This could be written as follows

if ($string =~ /(Martin|Sharon) Brown/)

分组匹配

从一个角度的正则表达式看没有区别，也许前者是稍微更清晰。

$string =~ /(\S+)\s+(\S+)/;

and 

$string =~ /\S+\s+\S+/;

然而，在分组的好处是，它使我们能够从一个正则表达式提取序列。返回一个列表的顺序，在他们出现在原来的分组。例如，在下面的片段中，我们已经从一个字符串取出小时，分钟和秒。

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

除了这种直接的方法，也可以在特殊的$x变量，其中x是该组内一些正则表达式匹配组。因此，我们可以重写前面的例子如下：

$time =~ m/(\d+):(\d+):(\d+)/;
my ($hours, $minutes, $seconds) = ($1, $2, $3);

当组用于替代表达式，$ x的语法，可以用来替换文本。因此，我们可以使用此格式化的日期字符串：

#!/usr/bin/perl

$date = '03/26/1999';
$date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#;

print "$date";

This will produce following result
1999/03/26

使用\G断言

\G断言，让您可以继续搜索从最后一个匹配发生的点。

例如，在下面的代码，我们使用的\G，使我们可以搜索到正确的位置，然后提取一些信息，而无需创建一个更复杂的，单一的正则表达式：

#!/usr/bin/perl

$string = "The time is: 12:31:02 on 4/12/00";

$string =~ /:\s+/g;
($time) = ($string =~ /\G(\d+:\d+:\d+)/);
$string =~ /.+\s+/g;
($date) = ($string =~ m{\G(\d+/\d+/\d+)});

print "Time: $time, Date: $date\n";

This will produce following result
Time: 12:31:02, Date: 4/12/00

\G断言，其实只是元符号相当于pos函数，所以正则表达式之间的调用，您可以继续使用pos，甚至修改pos的值（因此\ G）的使用pos作为一个lvalue子程序：

正则表达式中的变量

正则表达式的变量，包括$，包含匹配无论最后的分组匹配; $&, 其中包含整个匹配的字符串; $`, 其中包含匹配字符串前的一切; 和$', 其中包含匹配的字符串后的一切。

下面的代码演示的结果：

#!/usr/bin/perl

$string = "The food is in the salad bar";
$string =~ m/foo/;
print "Before: $`\n";
print "Matched: $&\n";
print "After: $'\n";
# www.zijiebao.com
This code prints the following when executed:
Before: The
Matched: foo
After: d is in the salad bar

← MySQL正则表达式

word正则表达式 →