第十五章. 正则表达式

Regular Expressions


本章翻译仅用于 Raku 学习和研究, 请支持电子版或纸质版

第十五章. 正则表达式

Regular expressions (or regexes) are patterns that describe a possible set of matching texts. They are a little language of their own, and many characters have a special meaning inside patterns. They may look cryptic at first, but after you learn them you have quite a bit of power.

Forget what you’ve seen about patterns in other languages. The Raku pattern syntax started over. It’s less compact but also more powerful. In some cases it acts a bit differently.

This chapter shows simple patterns that match particular characters or sets of characters. It’s just the start. In Chapter 16 you’ll see fancier patterns and the side effects of matching. In Chapter 17 you’ll take it all to the next level.


忘记你在其他语言中看到的关于模式的内容。 Raku模式语法重新开始。它不那么紧凑,但也更强大。在某些情况下,它的作用有点不同。


The Match Operator

A pattern describes a set of text values. The simple pattern abc describes all the values that have an a next to a b next to a c. The trick then is to decide if a particular value is in the set of matching values. There are no half or partial matches; it matches or it doesn’t.

A pattern inside m/.../ immediately applies itself to the value in $_. If the pattern is in the Str the match operator returns something that evaluates to True in a condition:


m /…/中的模式立即将其自身应用于$ _中的值。如果模式在Str中,则匹配运算符返回在条件中评估为True的值:

$_ = 'Hamadryas';
if m/Hama/ { put 'It matched!'; }
else       { put 'It missed!';  }

That’s a bit verbose. The conditional operator takes care of that:


put m/Hama/ ?? 'It matched!' !! 'It missed!';

You don’t have to match against $_. You can use the smart match to apply it to a different value. That’s the target:

你不必匹配$ _。您可以使用智能匹配将其应用于其他值。这是目标:

my $genus = 'Hamadryas';
put $genus ~~ m/Hama/ ?? 'It matched!' !! 'It missed!';

That target could be anything, including an Array or Hash. These match a single item:


$genus                ~~ m/Hama/;
@animals[0]           ~~ m/Hama/;
%butterfly<Hamadryas> ~~ m/perlicus/;

But you can also match against multiple items. The object on the left side of the smart match decides how the pattern applies to the object. This matches if any of the elements in @animals matches:


if @animals ~~ m/Hama/ {
    put "Matches at least one animal";

This is the same as matching against a Junction:


if any(@animals) ~~ m/Hama/ {
    put "Matches at least one animal";

The match operator is commonly used in the condition inside a .grep:


my @hama-animals = @animals.grep: /Hama/;

Match Operator Syntax

The match operator can use alternate delimiters, similar to the quoting mechanism:



Whitespace inside the match operator doesn’t matter. It’s not part of the pattern (until you say so, as you’ll see later). All of these are the same, including the last example with vertical whitespace:


m/ Hama /
m{ Hama }
m! Hama !

You can put spaces between alphabetic characters, but you’ll probably get a warning because Raku wants you to put those together:


m/ Ha ma /

If you want a literal space inside the match operator you can escape it (along with other things you’ll see later):


m/ Ha\ ma /

Quoting whitespace makes it literal too (the space around the quoted whitespace is still insignificant), or you can quote it all together:


m/ Ha ' ' ma /
m/ 'Ha ma' /

You need to quote or escape any character that’s not alphabetic or a number, even if those characters aren’t “special.” The other unquoted characters may be metacharacters that have special meaning in the pattern language.


Successful Matches

If the match operator succeeds it returns a Match object, which is always a True value. If you put that object it shows you the part of the Str that matched. The say calls .gist and the output is a bit different:


$_ = 'Hamadryas';
my $match = m/Hama/;
put $match; # Hama
say $match; # ?Hama?

The output of say gets interesting as the patterns get more complicated. That makes it useful for the regex chapters, and you’ll see more of that here compared to the rest of the book.

If the match does not succeed it returns Nil, which is always False:



$_ = 'Hamadryas';
my $match = m/Hama/;
put $match.^name;    # Nil

It’s usually a good idea to check the result before you do anything with it:


if my $match = m/Hama/ { # matched
    say $match;

You don’t need the $match variable though. The result of the last match shows up in the special variable $/, which you’ll see more of later:

您不需要 $match 变量。最后一个匹配的结果显示在特殊变量$ /中,稍后您会看到更多:

if m/Hama/ { # matched
    say $/;

Defining a Pattern

Useful patterns can get quite long and unwieldy. Use rx// to define a pattern (a Regex) for later use. This pattern is not immediately applied to any target. This allows you to define a pattern somewhere that doesn’t distract from what you are doing:

有用的模式可能会变得非常冗长和笨拙。使用rx //定义模式(正则表达式)供以后使用。此模式不会立即应用于任何目标。这允许您在某个地方定义一个不会分散您正在做的事情的模式:

my $genus = 'Hamadryas';
my $pattern = rx/ Hama /; # something much more complicated
$genus ~~ $pattern;

and reuse the pattern wherever you need it:


for lines() -> $line {
    put $line if $line ~~ $pattern;

It’s possible to combine saved patterns into a larger one. This allows you to decompose complicated patterns into smaller, more tractable ones that you can reuse later (which you’ll do extensively in Chapter 17):


my $genus = 'Hamadryas';

my $hama  = rx/Hama/;
my $dryas = rx/dryas/;
my $match = $genus ~~ m/$hama$dryas/;

say $match;

Rather than storing a variable in an object, declare a lexical pattern with regex. This looks like a subroutine because it has a Block but it’s not code inside; it’s a pattern and uses that slang:


my regex hama { Hama }

Use this in a pattern by surrounding it with angle brackets:


my $genus = 'Hamadryas';
put $genus ~~ m/<hama>/ ?? 'It matched!' !! 'It missed!';

You can define multiple named regexes and use them together:


my regex hama  { Hama }
my regex dryas { dryas }

$_ = 'Hamadryas';
say m/<hama><dryas>/;

Each named regex becomes a submatch. You can see the structure when you output it with say. It shows the overall result and the results of the subpatterns too:


 hama => ?Hama?
 dryas => ?dryas?

Treat the Match object like a Hash (although it isn’t) to get the parts that matched the named regexes. The name of the regex is the “key”:


$_ = 'Hamadryas';
my $result =  m/<hama><dryas>/;

if $result {
    put "First: $result<hama>";
    put "Second: $result<dryas>";

Predefined Patterns

Table 15-1 shows several of the predefined patterns that are ready for you to use. You can define your patterns in a library and export them just like you could with subroutines:


# Patterns.pm6
my regex hama is export { Hama }

Load the module and those named regexes are available to your patterns:


use lib <.>;
use Hama;

$_ = 'Hamadryas';
say m/ <hama> /;
Predefined pattern What it matches
<alnum> Alphabetic and digit characters
<alpha> Alphabetic characters
<ascii> Any ASCII character
<blank> Horizontal whitespace
<cntrl> Control characters
<digit> Decimal digits
<graph> <alnum> + <punct>
<ident> A valid identifier character
<lower> Lowercase characters
<print> <graph> + <space>, but without <cntrl>
<punct> Punctuation and symbols beyond ASCII
<space> Whitespace
<upper> Uppercase characters
<|wb> Word boundary (an assertion rather than a character)
<word> <alnum> + Unicode marks + connectors, like ‘_’ (extra)
<ws> Whitespace (required between word characters, optional otherwise)
<ww> Within a word (an assertion rather than a character)
<xdigit> Hexadecimal digits [0-9A-Fa-f]

EXERCISE 15.1Create a program that uses a regular expression to output all of the matching lines from the files you specify on the command line.


Matching Nonliteral Characters

You don’t have to literally type a character to match it. You might have an easier time specifying its code point or name. You can use the same \x[*CODEPOINT*] or \c[*NAME*] that you saw in double-quoted Strs in Chapter 4.

If you specify a name it must be all uppercase.

You could match the initial capital H by name, even though you have to type a literal H in the name:

您不必逐字输入匹配它的字符。您可以更轻松地指定其代码点或名称。您可以使用在第4章中双引号Strs中看到的相同\ x [* CODEPOINT *]或\ c [* NAME *]。



my $pattern = rx/
$_ = "Hamadryas";

put $pattern ?? 'Matched!' !! 'Missed!';

You can do the same thing with the code point. If you specify a code point use the hexadecimal number (with either case):


my $pattern = rx/
     \x[48] ama
$_ = "Hamadryas";

put $pattern ?? 'Matched!' !! 'Missed!';

This makes more sense if you want to match a character that’s either hard to type or hard to read. If the Str has the 🐱 character (U+1F431 CAT FACE), you might not be able to distinguish that from 😸 (U+1F638 GRINNING CAT FACE WITH SMILING EYES) without looking very closely. Instead of letting another programmer mistake your intent, you can use the name to save some eyestrain:

my $pattern = rx/
     \c[CAT FACE]  # or \x[1F431]
$_ = "This is a catface: 🐱";
put $pattern ?? 'Matched!' !! 'Missed!';

Matching Any Character

Patterns have metacharacters that match something other than their literal selves. Some of these are listed in Table 15-2 (and most you won’t see in this chapter). The . matches any character (including a newline). This pattern matches any target that has at least one character:


m/ . /

To match a Str with an a and a c separated by a character, put the dot between them in the pattern. This skips the lines that don’t match that pattern:


for lines() {
    next unless m/a.c/;


Some characters have special meaning in patterns. The colon introduces an adverb and the # starts a comment. To match those as literal characters you need to escape them. A backslash will do:


my $pattern = rx/ \# \: Hama \. /

This means to match a literal backslash, you need to escape that too:


my $pattern = rx/ \# \: Hama \\ /

You can do the same thing with the other pattern metacharacters. To match a literal dot, escape it:


my $pattern = rx/ \. /

The backslash only escapes the character that comes immediately after it. You can’t escape a literal space character, and you can’t escape a character that isn’t special. Table 15-2 shows what you need to escape, even though I haven’t shown you most of those features yet.


Metacharacter Why it’s special
# Starts a comment
\ Escapes the next character or a shortcut
. Matches any character
: Starts an adverb, or prevents backtracking
( and ) Starts a capture
< and > Used to create higher-level thingys
[, ], and ' Used for grouping
+, |, &, -, and ^ Set operations
?, *, +, and % Quantifiers
| Alternation
^ and $ Anchors
$ Starts a variable or named capture
= Assigns to named captures

Characters inside quotes are always their literal selves:


my $pattern = rx/ '#:Hama' \\ /

You can’t use the single quotes to escape the backslash since a single backslash will still try to escape the character that comes after it.



You have a tougher time if you want to match literal spaces. You can’t escape a space with \ because unspace isn’t allowed in a pattern. Instead, put quotes around the literal space:


my $pattern = rx/ Hamadryas ' ' laodamia /;

Or put the entire sequence in quotes:


my $pattern = rx/ 'Hamadryas laodamia' /;

Those single quotes can quickly obscure what belongs where; it can be helpful to spread the pattern across lines and note what you are trying to do:


my $pattern = rx/
    Hamadryas    # genus
    ' '            # literal space
    laodamia     # species

You can make whitespace significant with the :s adverb:


my $pattern = rx:s/ Hamadryas laodamia /;

my $pattern = rx/ :s Hamadryas laodamia /;

The :s is the short form of :sigspace:


my $pattern = rx:sigspace/ Hamadryas laodamia /;

my $pattern = rx/ :sigspace Hamadryas laodamia /;

Notice that this will match Hamadryas laodamia, even though the pattern has whitespace at the beginning and end. The :s turns the whitespace in the pattern into a subrule <.ws>:

请注意,这将匹配Hamadryas laodamia,即使该模式在开头和结尾都有空格。 :s将模式中的空格转换为子规则<.ws>:

$_ = 'Hamadryas laodamia';
my $pattern = rx/ Hamadryas <.ws> laodamia /;
if m/$pattern/ {
    say $/;  # ?Hamadryas laodamia?

You can combine adverbs, but they each get their own colon. Order does not matter. This pattern has significant whitespace and is case insensitive:


my $pattern = rx:s:i/ Hamadryas Laodamia /;

Matching Types of Characters

So far, you’ve matched literal characters. You typed out the characters you wanted, and escaped them in some cases. There are some sets of characters that are so common they get shortcuts. These start with a backslash followed by a letter that connotes the set of characters. Table 15-3 shows the list of shortcuts.

If you want to match any digit, you can use \d. This matches anything that is a digit, not just the Arabic digits:


如果要匹配任何数字,可以使用\ d。这匹配任何数字,而不仅仅是阿拉伯数字:

/ \d /

Each of these shortcuts comes with a complement. \D matches any nondigit.

这些快捷方式中的每一个都有补充。 \ D匹配任何非数字。

Shortcut Characters that match
\d Digits (Unicode property N )
\D Anything that isn’t a digit
\w Word characters: letters, digits, or underscores
\W Anything that isn’t a word character
\s Any kind of whitespace
\S Anything that isn’t whitespace
\h Horizontal whitespace
\H Anything that isn’t horizontal whitespace
\v Vertical whitespace
\V Anything that isn’t vertical whitespace
\t A tab character (specifically, only U+0009)
\T Anything that isn’t a tab character
\n A newline or carriage return/newline pair
\N Anything that isn’t a newline

EXERCISE 15.2Write a program that outputs only those lines of input that contain three decimal digits in a row. You wrote most of this program in the previous exercise.



The Unicode Character Database (UCD) defines the code points and their names and assigns them one or more properties. Each character knows many things about itself, and you can use some of that information to match them. Place the name of the Unicode property in <:...>. That colon must come right after the opening angle bracket. If you wanted to match something that is a letter, you could use the property Letter:


/ <:Letter> /

Instead of matching a property, you can match characters that don’t have that particular property. Put a ! in front of the property name to negate it. This matches characters that aren’t the title-case letters:


/ <:!TitlecaseLetter> /

Each property has a long form, like Letter, and a short form, in this case L. There are other properties, such as Uppercase_Letter and Lu, or Number and N:


/ <:L> /
/ <:N> /

You can match the characters that belong to certain Unicode blocks or scripts:


<:Block('Basic Latin')>

Even though you can abbreviate these property names I’ll use the longer names in this book. See the documentation for the other properties.



One property might not be enough to describe what you want to match. To build fancier ones, combine them with character class set operators. These aren’t the same operators you saw in Chapter 14; they’re special to character classes.

The + creates the union of the two properties. Any character that has either property will match:



/ <:Letter + :Number> /
/ <:Open_Punctuation + :Close_Punctuation> /

Subtract one property from another with -. Any character with the first property that doesn’t have the second property will match this. The following example matches all the identifier characters (in the UCD sense, not the Raku sense). There are the characters that can start an identifier and those that can be in the other positions:

用 - 减去另一个属性。具有第一个属性但没有第二个属性的任何字符都将与此匹配。以下示例匹配所有标识符字符(在UCD意义上,而不是Raku意义上)。可以启动标识符的字符和可以位于其他位置的字符:

/ <:ID_Continue - :Number> /

You can shorten this to not match a character without a particular property. It looks like you leave off the first part of the subtraction; the - comes right after the opening angle bracket. That implies you’re subtracting from all characters. This matches all the characters that don’t have the Letter property:

您可以将此缩短为与没有特定属性的角色不匹配。看起来你放弃了减法的第一部分; - 在打开角度支架后面。这意味着你要从所有角色中减去。这匹配所有没有Letter属性的字符:

/ <-:Letter> /

EXERCISE 15.3Write a program to count all of the characters that match either the Letter or Number properties. What percentage of the code points between 1 and 0xFFFD are either letters or numbers? The .chr method may be handy here.

练习15.3编写一个程序来计算与Letter或Number属性匹配的所有字符。 1和0xFFFD之间的代码点百分比是字母还是数字? .chr方法在这里可能很方便。

User-Defined Character Classes

You can define your own character classes. Put the characters that you want to match inside <[...]>. These aren’t the same square brackets that you saw earlier for grouping; these are inside the angle brackets. This character class matches either a, b, or 3:


/ <[ab3]> /

As with everything else so far, this matches one character and that one character can be any of the characters in the character class. This character class matches either case at a single position:


/ <[Hh]> ama /    # also / [ :i h ] ama /

You could specify the hexadecimal value of the code point. The whitespace is insignificant:


/ <[ \x[48] \x[68] ]> ama /

The character name versions work too:


/ <[

You can make a long list of characters:


/ <[abcdefghijklmnopqrstuvwxyz]> / # from a to z

Inside the character class the # is just a #. If you try to put a comment in there all of the characters in your message become part of the character class:


/ <[
    \x[48] # uppercase
    \x[68] # lowercase

You’ll probably get warnings about repeated characters if you try to do that.



But that’s too much work. You can use .. to specify a range of characters. The literal characters work as well as the hexadecimal values and the names. Notice you don’t quote the literal characters in these ranges:


/ <[a..z]> /
/ <[ \x[61] .. \x[7a] ]> /

The range doesn’t have to be the only thing in the square brackets:


/ <[a..z 123456789]> /

You could have two ranges:


/ <[a..z 1..9]> /


Sometimes it’s easier to specify the characters that can’t match. You can create a negated character class by adding a - between the opening angle bracket and the opening square bracket. This example matches any character that is not a, b, or 3:

有时,指定无法匹配的字符会更容易。您可以通过在开角括号和开始方括号之间添加 - 来创建否定字符类。此示例匹配任何不是a,b或3的字符:

/ <-[ab3]> /

Space inside a character class is also insignificant:


/ <-[ a b 3 ]> /

You can use a negated character class of one character. Quotes inside the character class are literal characters because Raku knows you aren’t quoting:


/ <-[ ' ]>  /   # not a quote character

This one matches any character that is not a newline:


/ <-[ \n ]> /   # not a newline

The predefined character class shortcuts can be part of your character class:


/ <-[ \d \s ]> /   # digits or whitespace

Like the Unicode properties, you can combine sets of characters:


/ <[abc] + [xyz]> /    # but, also <[abcxyz]>

/ <[a..z] - [ijk]> /   # easier than two ranges

EXERCISE 15.4Create a program to output all the input lines. Skip any line that contains a letter unless it’s a vowel. Also skip any lines that are blank (that is, only have whitespace).


Matching Adverbs

You can change how the match operator works by applying adverbs, just like you changed how Q worked in Chapter 4. There are several, but you’ll only see the most commonly used here.


Matching Either Case

So far a character in your pattern matches exactly the same character in the target. An H only matches an uppercase H and not any other sort of H:

到目前为止,模式中的字符与目标中的字符完全匹配。 H只匹配大写的H而不是任何其他类型的H:

my $pattern = rx/ Hama /;
put 'Hamadryas' ~~ $pattern;  # Matches

Change your pattern by one character. Instead of an uppercase H, use a lowercase one:


my $pattern = rx/ hama /;
put 'Hamadryas' ~~ $pattern;  # Misses because h is not H

The pattern is case sensitive, so this doesn’t match. But you can make it case insensitive with an adverb. The :iadverb makes the literal alphabetic characters match either case. You can put the adverb right after the rx or the m:

该模式区分大小写,因此不匹配。但是你可以用副词区分大小写。 :iadverb使文字字母符合两种情况。你可以把副词放在rx或m之后:

my $pattern = rx:i/ hama /;
put 'Hamadryas' ~~ $pattern;  # Matches, :i outside

This is the reason you can’t use the colon as the delimiter!

When you use an adverb on the outside of the pattern, that adverb applies to the entire pattern. You can also put the adverb on the inside of the pattern:



my $pattern = rx/ :i hama /;
put 'Hamadryas' ~~ $pattern;  # Matches, :i inside

Isn’t that interesting? Now you start to see why whitespace isn’t counted as part of the pattern. There’s much more going on besides literal matching of characters.

The adverb applies from the point of its insertion to the end of the pattern. In this case it applies to the entire pattern because the :i is at the beginning. Put that adverb later in the pattern, and it applies from there to the rest of the pattern. Here the ha only match lowercase because the adverb shows up later. The rest of the pattern after the :i is case insensitive:


副词从插入点到模式结尾。在这种情况下,它适用于整个模式,因为:i在开头。将该副词放在模式中,然后从那里应用到模式的其余部分。 ha只与小写匹配,因为副词会在稍后出现。在以下情况之后的其余模式:i不区分大小写:

my $pattern = rx/ ha :i ma /; # final ma case insensitive

You can group parts of patterns with square brackets. This example groups the am but doesn’t do much else because there’s nothing else special going on:


my $pattern = rx/ h [ am ] a /;

An adverb inside a group applies only to that group:


my $pattern = rx/ h [ :i am ] a /;

The rules are the same: the adverb applies from the point of its insertion to the end of the group:


my $pattern = rx/ h [ a :i m ] a /; # matches haMa or hama

At this point, you’re probably going to start mixing up what’s going on. There’s another reason whitespace doesn’t matter—you can add comments to your pattern:

在这一点上,你可能会开始混淆正在发生的事情。空白无关紧要的另一个原因 - 您可以为您的模式添加注释:

my $pattern = rx/
    [       # group this next part
        :i   # case insensitive to end of group
    ]       # end of group

Everything from the # character to the end of the line is a comment. You can use embedded comments too:


my $pattern = rx/
    :i #`( case insensitive ) Hama

These aren’t particularly good comments because you’re annotating what the syntax already denotes. As a matter of good practice, you should comment what you are trying to match rather than what the syntax does. However, the world isn’t going to end if you leave a reminder for yourself of what a new concept does.

EXERCISE 15.5Write a program that outputs only the lines of input that contain the text ei. You’ll probably want to save this program to build on in later exercises.



Ignoring Marks

The :ignoremark adverb changes the pattern so that accents and other marks don’t matter. The marks can be there or not. It works if the marks are in the target or the pattern:


$_ = 'húdié';   # ??
put m/ hudie /            ?? 'Matched' !! 'Missed';  # Missed
put m:ignoremark/ hudie / ?? 'Matched' !! 'Missed';  # Matched

$_ = 'hudie';
put m:ignoremark/ húdié / ?? 'Matched' !! 'Missed';  # Matched

It even works if both the target and the pattern have different marks in the same positions:


$_ = 'hüdiê';
put m:ignoremark/ húdié / ?? 'Matched' !! 'Missed';  # Matched

Some adverbs can show up inside the pattern. They apply to the parts of the pattern that come after them:


$_ = 'hüdiê';
put m/ :ignoremark hudie / ?? 'Matched' !! 'Unmatched';  # Matched

Global Matches

A pattern might be able to match several times in the same text. The :global adverb gets all of the nonoverlapping Matches. It returns a List:

模式可能能够在同一文本中多次匹配。 :全局副词获取所有不重叠的匹配。它返回一个List:

$_ = 'Hamadryas perlicus';
my $matches = m:global/ . s /;
say $matches;   # (?as? ?us?)

No matches gets you an empty List:


$_ = 'Hamadryas perlicus';
my $matches = m:global/ six /;
say $matches;   # ()

The match operator can find overlapping matches too. Use :overlap to return a potentially longer list. The ?uta? and ?ani? here both match the same a:

匹配运算符也可以找到重叠匹配。使用:重叠以返回可能更长的列表。 ??和?ani?这里两个匹配相同的a:

$_ = 'Bhutanitis thaidina';

my $global = m:global/ <[aeiou]> <-[aeiou]> <[aeiou]> /;
say $global;  # (?uta? ?iti? ?idi?)

my $overlap = m:overlap/ <[aeiou]> <-[aeiou]> <[aeiou]> /;
say $overlap; # (?uta? ?ani? ?iti? ?idi? ?ina?)

Things That Use Patterns

There are many features that you haven’t been able to use so far because you hadn’t seen regexes yet. Now you’ve seen regexes, so you can see these things. There are a couple of Str methods that work with a pattern to transform values. This section is a taste of the features you’ll use most often.

The .words and .comb methods break up text. The .split method is the general case of that. It takes a pattern to decide how to break up the text. Whatever it matches are the parts that disappear. You could break up a line on tabs, for instance:


.words和.comb方法分解文本。 .split方法就是这种情况的一般情况。它需要一种模式来决定如何分解文本。无论它匹配什么是消失的部分。你可以在标签上划分一条线,例如:

my @words = $line.split: / \t /;

.grep can use the match operator to select things. If the match operator succeeds it returns something that’s True, and that element is part of the result:


my @words-with-e = @word.grep: /:i e/;

Or, to put it all together:


my @words-with-e = $line.split( / \t / ).grep( /:i e/ );

.split can specify multiple possible separators. Not all of them need be matches. This breaks up a line on a literal comma or whitespace:


my @words-with-e = $line
    .split( [ ',', / \s / ] )
    .grep( /:i e/ );

.comb does a job similar to .split, but it breaks up the text by keeping the parts that matched. This keeps all the nonoverlapping groups of three digits and discards everything else:


my @digits = $line.comb: /\d\d\d/;

With no argument .comb uses the pattern of the single . to match any character. This breaks up a Str into its characters without discarding anything:


my @characters = $line.comb: /./;


The .subst method works with a pattern to substitute the matched text with other text:


my $line = "This is PERL 6";
put $line.subst: /PERL/, 'Perl';  # This is Raku

This one makes the substitution for the first match:


my $line = "PERL PERL PERL";
put $line.subst: /PERL/, 'Perl';  # Perl PERL PERL

Use the :g adverb to make all possible substitutions:


my $line = "PERL PERL PERL";
put $line.subst: /PERL/, 'Perl';  # Perl Perl Perl

Each of these returns the modified Str and leaves the original alone. Use .subst-mutate to change the original value:


my $line = "PERL PERL PERL";
$line.subst-mutate: /PERL/, 'Perl', :g;
put $line;  # Perl Perl Perl

These will be much more useful with the regex features you’ll see in the next chapter.

EXERCISE 15.6Using .split, output the third column of a tab-delimited file. The butterfly census file you made at the end of Chapter 9 would do nicely here.


EXERCISE 15.6使用.split,输出制表符分隔文件的第三列。你在第9章结尾处制作的蝴蝶人口普查文件在这里做得很好。


You haven’t seen the full power of regexes in this chapter since it was mostly about the mechanism of applying the patterns to text. That’s not a big deal—the patterns can be much more sophisticated, but the mechanisms are the same. In the next chapter you’ll see most of the fancier features you’ll regularly use.

在本章中你没有看到正则表达式的全部功能,因为它主要是关于将模式应用于文本的机制。这不是什么大问题 - 模式可以更复杂,但机制是相同的。在下一章中,您将看到您经常使用的大多数更高级的功能。

comments powered by Disqus