第四章. 字符串

Strings

声明

本章翻译仅用于 Raku 学习和研究, 请支持电子版或纸质版

第四章 字符串

Strings represent the text data in your program as Str objects. Raku’s facility with text data and its manipulation is one of its major attractions. This chapter focuses on the many ways that you can create Strs; for any job you have there’s likely a feature that makes that easy for you. Along with that you’ll see a bit about inspecting, extracting, and comparing text in preparation for loftier goals coming up.

字符串将程序中的文本数据表示为Str对象。 Raku 的文本数据和它的文本操作天赋是其主要吸引力之一。本章重点介绍可以创建字符串的多种方法;对于你的任何工作,可能有一个功能,使你的工作变得容易。除此之外,你还会看到有关检查,提取和比较文本的内容,以便为即将出现的更高目标做准备。

Literal Quoting

You can type literal text directly into your program. What you type is what the text is, and the compiler does not interpret it as anything other than exactly what you typed. You can surround literal text with half-width corner brackets, and :

你可以直接在程序中键入字面文本。你键入的内容是文本的内容,编译器会将其解释为你输入的内容。你可以使用半角括号来包围字面文本:

「Literal string」

This is your first encounter with a paired delimiter. These characters mark the beginning and end of the Str. There’s an opening character and a closing character that surround your text.

Any character that you use is interpreted as exactly what it is, with no special processing:

这是你第一次遇到配对分隔符。这些字符标记字符串的开头和结尾。文本周围有一个开口字符和一个闭合字符。

你使用的任何字符都被解释为它的确切含义,没有特殊处理:

「Literal '" string with \ and {} and /」

You can’t use only one of the delimiter characters in the Str. These won’t work:

你不能只使用字符串中的一个分隔符。这些不起作用:

「 Unpaired 「 Delimiters 」
「 Unpaired 」 Delimiters 」

However, if you pair delimiters in the text the compiler will figure out if they are balanced—the opening delimiter comes first and a closing delimiter pairs with it:

但是,如果在文本中对分隔符进行配对,编译器将确定它们是否是平衡的 - 开口分隔符首先出现,并且闭合分隔符与它配对:

「 Del「i」miters 」
NOTE

The Raku language is a collection of sublanguages, or slangs. Once inside a particular slang the compiler parses your source code by that slang’s rules. The quoting language is one of those slangs.

If your literal text has corner brackets in it you can use a generalized quoting mechanism. These start with a Q(or q) and can get as limiting or as permissive as you like, as you’ll see in this chapter.

After the Q you can select almost any character to be the delimiter. It can’t be a character valid in a variable name, because that would make it look like a name instead of a delimiter. The paired characters are common; the opening character has to be on the left and its closing partner has to be on the right. Perhaps you want to use square brackets instead of corner brackets. Now the isn’t special because it’s not a delimiter:

注意

Raku 语言是一个子语言或方言的集合。一旦进入特定的方言,编译器就会根据该方言的规则解析你的源代码。引用语言是其中一个方言。

如果你的字面文本中包含角括号,则可以使用通用引用机制。这些以 Q(或 q )开头,可以像你想的那样得到限制或许可,正如你将在本章中看到的那样。

Q 之后,你可以选择几乎任何字符作为分隔符。它不能是变量名中有效的字符,因为这会使它看起来像名称而不是分隔符。配对字符很常见;开口字符必须位于左侧,其闭合字符必须位于右侧。也许你想使用方括号而不是角括号。现在,这并不特别,因为它不是分隔符:

Q[Unpaired 」 Delimiters]

Most of the paired characters act the same:

大多数配对字符的行为相同:

Q{Unpaired 」 Delimiters}
Q<Unpaired 」 Delimiters>
Q<<Unpaired 」 Delimiters>>
Q«Works»

There’s one exception. You can’t have an open parenthesis right after the Q because that makes it look like a subroutine call (but it’s not):

有一个例外。在 Q 之后你不能有开口圆括号,因为这使它看起来像一个子程序调用(但它不是):

Q(Does not compile)

You don’t have to use paired characters. You can use the same character for the opening and closing delimiter:

你不必使用配对字符。你可以对开口和闭合分隔符使用相同的字符:

Q/hello/

You can store a Str in a variable or output it immediately:

你可以将字符串存储在变量中或立即输出:

my $greeting = Q/Hello World!/;
put Q/Hello World!/;

And you can call methods on your Str just like you could do with numbers:

你可以在你的[字符串](https://docs.raku.org/type/Str.html)上调用方法,就像你对数字一样:

Q/Hello World!/.^name;  # Str
Q/Hello World!/.put;

Escaped Strings

One step up from literal Strs are escaped strings. The single tick acts as the delimiter for these Strs. These are often called single-quoted strings:

从字面[字符串](https://docs.raku.org/type/Str.html)向上一步是转义字符串。单个记号作为这些字符串的分隔符。这些通常称为单引号字符串:

% raku
> 'Hamadryas perlicus'
Hamadryas perlicus

If you want to have the single tick as a character in the Str you can escape it with a backslash. That tells the quoting slang that the next character isn’t the delimiter but belongs as literal text:

如果你想让单个记号作为字符串中的一个字符,你可以用反斜杠来转义它。这告诉引用方言的下一个字符不是分隔符但属于字面文本:

% raku
> 'The escaped \' stays in the string'
The escaped ' stays in the string

Since the \ is the escape character, you can escape it to get a literal backslash:

由于 \ 是转义字符,你可以转义它以获得字面反斜杠:

% raku
> 'Escape the \\ backslash'
Escape the \ backslash

A DOS path can be quite annoying to type, but escaped and literal Strs take care of that:

DOS 路径可能非常烦人,但是转义和字面字符串负责:

% raku
> 'C:\\Documents and Settings\\Annoying\\Path'
C:\Documents and Settings\Annoying\Path
> Q/C:\Documents and Settings\Annoying\Path/
C:\Documents and Settings\Annoying\Path

If you want to use a different delimiter for an escaped string you use the lowercase q followed by the delimiter that you want (following the same rules as for the literal quoting delimiters):

如果要对转义字符串使用不同的分隔符,请使用小写 q 后跟所需的分隔符(遵循与字面引用分隔符相同的规则):

q{Unpaired ' Delimiters}
q<Unpaired ' Delimiters>
q<<Unpaired ' Delimiters>>
q«Works»

Adverbs for Quoting

Adverbs modify how something works and are a big part of Raku. You’ll see more of these in Chapter 9, but you’ll get a taste for them in this chapter. Adverbs start with a colon followed by letters or numbers.

All of the quoting methods you’ll see in this chapter are modifications of basic literal quoting. You use adverbs to adjust the quoting behavior.

The :q adverb modifies Q to become an escaping quote. There must be some whitespace after the adverb, but it’s optional after the Q:

副词会修改某些东西的工作方式,并且是 Raku 的重要组成部分。你将在第9章中看到更多这些内容,但在本章中你将会对它们有所了解。副词以冒号开头,后跟字母或数字。

你将在本章中看到的所有引用方法都是对基本字面引用的修改。你使用副词来调整引用行为。

:q 副词修改 Q 成为转义引用。在副词之后必须有一些空格,但在 Q 之后它是可选的:

% raku
> Q:q 'This quote \' escapes \\'
This quote ' escapes \
> Q :q 'This quote \' escapes \\'
This quote ' escapes \

This form doesn’t specifically escape the single tick; it escapes the backslash and the delimiter characters. A backslash that doesn’t precede a delimiter or another backslash is interpreted as a literal backslash:

这种形式并没有特别转义单个记号;它转义了反斜杠和分隔符字符。不在分隔符或另一个反斜杠之前的反斜杠被解释为字面反斜杠:

% raku
> Q :q  「This quote \' escapes」
This quote \' escapes
> Q :q  「This quote \「 escapes」
This quote 「 escapes
> Q :q  「This quote \「\」 escapes」
This quote 「 escapes

The :single adverb is a longer version of :q and might help you remember what you want:

:single 副词是 :q 的较长版本,可能会帮助你记住你想要的内容:

% raku
> Q :single 'This quote \' escapes'
This quote ' escapes

Most of the time you aren’t going to work this hard. The common uses of quoting have default delimiters so you don’t even see the Q. Even though many Strs would be more correctly represented with strict literal quoting, most people tend to use the single ticks simply because it’s easier to type. No matter which quoting method you use you get the same type of object.

大多数时候你不打算这么努力。引用的常见用法具有默认分隔符,因此你甚至不会看到 Q.即使使用严格的字面引用更准确地表示许多字符串,大多数人倾向于使用单个记号,因为它更容易键入。无论使用哪种引用方法,都可以获得相同类型的对象。

String Operators and Methods

Use the concatenation operator, ~, to combine Strs. Some people call this “string addition.” The output shows the two Strs as one with nothing else between them:

使用连接运算符 ~ 来组合字符串。有些人将此称为“字符串添加。”输出显示两个字符串合为一个,它们之间没有其他内容:

my $name = 'Hamadryas' ~ 'perlicus';
put $name;      # Hamadryasperlicus

You could add a space yourself by putting it in one of the Strs, but you can also concatenate more than two Strs at a time:

你可以在两个字符串之间添加一个空格,但你也可以一次连接两个以上的字符串

put 'Hamadryas ' ~ 'perlicus';
put 'Hamadryas' ~ ' ' ~ 'perlicus';

The join routine glues together Strs with the first Str you give it:

join 例程将字符串与你给它的第一个字符串粘在一起:

my $butterfly-name = join ' ', 'Hamadryas', 'perlicus'

You can make larger Strs by repeating a Str. The x is the Str replication operator. It repeats the Str the number of times you specify. This is handy for making a text-based divider or ruler for your output:

你可以通过重复字符串来制作更大的字符串x字符串复制运算符。它会重复字符串指定的次数。这对于为输出创建基于文本的分隔符或标尺很方便:

put '-' x 70;
put '.123456789' x 7;

The .chars methods tells you how many characters are in the Str:

.chars 方法告诉你字符串中有多少个字符:

put 'Hamadryas'.chars;  # 9

Any Str with at least one character is True as a Boolean, including the Str of the single character 0:

任何具有至少一个字符的字符串都是 True 作为布尔值,包括单个字符 0字符串

put ?'Hamadryas';       # True
put ?'0';               # True

The empty string has no characters. It consists only of the opening delimiter and the closing delimiter. It’s False as a Boolean:

空字符串没有字符。它仅包含开口分隔符和闭合分隔符。它作为布尔值是假的:

put ''.chars;           # 0
put ?'';                # False

Be careful that when you test a Str you test the right thing. A Str type object is also False, but .DEFINITEcan tell them apart:

小心,当你测试一个字符串你测试正确的东西。 字符串类型对象也是 False,但 .DEFINITE 可以将它们区分开:

put ''.DEFINITE         # True
put Str.DEFINITE        # False

This is handy in a conditional expression where you don’t care what the Str is (empty, '0', or anything else) as long as it’s not a type object:

这在条件表达式中很方便,只要它不是类型对象,你不关心字符串是什么(空,'0' 或其他任何东西):

given $string {
    when .DEFINITE {
        put .chars ?? 'Has characters' !! 'Is empty';
        }
    default { put 'Type object' }
    }

The .lc method changes all the characters in a Str to lowercase, and .uc changes them to uppercase:

.lc 方法将字符串中的所有字符更改为小写,.uc 将它们更改为大写:

put 'HaMAdRyAs'.lc;     # hamadryas
put 'perlicus'.uc;      # PERLICUS

The .tclc method uses title case, lowercasing everything then capitalizing the first character of the Str:

.tclc 方法使用标题大小写,小写所有内容然后大写字符串的第一个字符:

put 'hamadryas PERLICUS'.tc;    # Hamadryas perlicus

EXERCISE 4.1Write a program to report the number of characters in the text you enter.

EXERCISE 4.2Modify the previous exercise to continually prompt for text and report the number of characters in your answers until you provide an empty answer.

练习4.1 编写一个程序来报告你输入的文本中的字符数。

练习4.2 修改上一个练习以不断提示文本并报告答案中的字符数,直到你提供空答案。

Looking Inside Strings

You can also inspect a Str to find out things about it. The .contains method returns a Boolean value indicating whether it finds one Str—the substring—inside the target Str:

你也可以检查一下字符串来找出它的相关信息。 .contains 方法返回一个布尔值,指示它是否找到一个字符串-子字符串-在目标字符串内:

% raku
> 'Hamadryas perlicus'.contains( 'perl' )
True
> 'Hamadryas perlicus'.contains( 'Perl' )
False

Instead of parentheses you can put a colon followed by the substring to search for:

你可以使用冒号后跟子字符串来代替圆括号来搜索:

% raku
> 'Hamadryas perlicus'.contains: 'perl'
True
> 'Hamadryas perlicus'.contains: 'Perl'
False

The .starts-with and .ends-with methods do the same thing as .contains but require the substring to appear at a particular location:

.starts-with.ends-with 方法与 .contains 的作用相同,但要求子字符串出现在特定位置:

> 'Hamadryas perlicus'.starts-with: 'Hama'
True
> 'Hamadryas perlicus'.starts-with: 'hama'
False
> 'Hamadryas perlicus'.ends-with: 'us'
True

These methods are case sensitive. The case of each character in the substring must match the case in the target Str. If it’s uppercase in the substring it must be uppercase in the target. If you want case insensitivity you canuse .fc to make a “caseless” Str. This “case folding” method is especially designed for comparisons:

这些方法区分大小写。子字符串中每个字符的大小写必须与目标字符串中的大小写匹配。如果它在子字符串中是大写的,则它在目标中必须为大写。如果你想要不区分大小写,你可以使用 .fc 来制作一个“无大小写”的字符串。这种“大小写折叠”方法专门用于比较:

> 'Hamadryas perlicus'.fc.starts-with: 'hama'
False

.fc also knows about equivalent characters such as the ss and the sharp ß. The method doesn’t change the text; it evaluates to a new Str based on a long list of rules about equivalence defined by Unicode. You should case fold both the target and substrings if you want to allow these sorts of variations:

.fc 也知道相等的字符,如 sssharpß。该方法不会改变文本; 它基于由 Unicode 定义的关于等价的一长串规则列表来计算新的 字符串。如果要允许这些变化,你应该折叠目标字符串和子字符串串:

> 'Reichwaldstrasse'.contains: 'straße'
False
> 'Reichwaldstrasse'.fc.contains: 'straße'
False
> 'Reichwaldstrasse'.contains: 'straße'.fc
True
> 'Reichwaldstrasse'.fc.contains: 'straße'.fc
True

.substr extracts a substring by its starting position and length inside the Str. The counting starts with zero at the first character:

.substr 通过字符串中的起始位置和长度提取子字符串。计数从第一个字符的零开始:

put 'Hamadryas perlicus'.substr: 10, 4;     # perl

The .index method tells you where it finds a substring inside the larger Str (still counting from zero), or returns Nil if it can’t find the substring:

.index 方法告诉你它在较大的字符串内部找到一个子字符串(仍然从零开始计数),或者如果它找不到子字符串则返回 Nil

my $i = 'Hamadryas perlicus'.index: 'p';
put $i ?? 'Found at ' ~ $i !! 'Not in string'; # Found at 10

Use both of them together to figure out where to start:

同时使用它俩来确定从哪里开始:

my $s = 'Hamadryas perlicus';
put do given $s.index: 'p' {
    when Nil { 'Not found' }
    when Int { $s.substr: $_, 4 }
    }

EXERCISE 4.3Repeatedly prompt for text and report if it contains the substring “Hamad”. Stop prompting if the answer has no characters (an empty answer). Can you make this work regardless of casing?

练习4.3 如果包含子字符串 “Hamad”,则重复提示文本和报告。如果答案没有字符,则停止提示(空答案)。如果没有大小写,你能做到这一点吗?

Normal Form Grapheme

Raku is Unicode all the way down. It works on graphemes, which most of us think of as “characters” in the everyday sense. These are the full expression of some idea, such as e, é, or img. It expects your source code to be UTF-8 encoded and outputs UTF-8 text. All of these work, although they each represent a different language:

Raku 一直是支持 Unicode 的。它适用于字素,我们大多数人都认为它是日常意义上的“字符”。这些是一些想法的完整表达,例如e,é,或 img。它希望你的源代码是 UTF-8 编码并输出 UTF-8 文本。所有这些都有效,虽然它们各自代表不同的语言:

'көпөлөк'
'तितली'
'蝴蝶'
'Con bướm'
'tauriņš'
'πεταλούδα'
'भंबीरा'
'פרפר'

You can use emojis too:

你也可以使用表情符号:

my $string = '';
put $string;

One of the Raku “characters” might be made of up two or more entries in the Universal Character Database (UCD). Raku refers to entries in the UCD as codes and to their composition as a “character.” It’s not the best terminology. In this book, character means grapheme and code point refers to an entry in the UCD.

Why does any of that matter? The .chars method tells you the length of the Str in graphemes. Consider the Hebrew word for “caterpillar.” It has 11 graphemes but 14 code points:

其中一个 Raku “字符”可能由通用字符数据库(UCD)中的两个或多个条目组成。 Raku 将 UCD 中的条目称为代码,将其组成称为“字符”。这不是最好的术语。在本书中,字符表示字素,而代码点表示 UCD 中的条目。

为什么这有关系? .chars 方法告诉你字素中字符串的长度。考虑希伯来语中的“caterpillar”一词。它有 11 个字素,但有 14 个代码点:

% raku
> 'קאַטערפּיללאַר'.chars
11
> 'קאַטערפּיללאַר'.codes
14

Why the different counts? There are graphemes such as אַ that are more than one code point (in that case, the two code points are the Hebrew Aleph and patah diacritical mark). Most of the time you won’t care about this. If you do, you can get a list of the code points with .ords:

为什么是不同的计数?像 אַ 这样的字素不止一个代码点(在这种情况下,两个代码点是希伯来语Aleph和patah变音符号)。大多数时候你不会关心这个。如果这样做,你可以用 .ords 获得的代码点列表:

> 'קאַטערפּיללאַר'.ords
(1511 1488 1463 1496 1506 1512 1508 1468 1497 1500
1500 1488 1463 1512)

String Comparisons

Str objects know if they are relatively greater than, less than, or the same as another Str. Raku uses lexicographic comparison to go through the Strs character by character.

The numbers comparison operators are symbols, but the Strs use operators made up of letters. The eq operator tests if the Strs are exactly equal. Case matters. Every character at each position in the Str must be exactly the same in each Str:

字符串 对象知道它们是否比另一个字符串相对大,小于或相同。 Raku 使用字典比较来逐字逐句地浏览字符串

数字比较运算符是符号,但字符串使用由字母组成的运算符。 eq 运算符测试字符串是否完全相等。大小写敏感。 字符串中每个位置的每个字符在每个字符串中必须完全相同:

% raku
> 'Hamadryas' eq 'hamadryas'
False
> 'Hamadryas' eq 'Hamadryas'
True

The gt operator evaluates to True if the first Str is strictly lexicographically greater than the second (ge allows it to be greater than or equal to the second Str). This is not a dictionary comparison, so case matters. The lowercase letters come after the uppercase ones and so are “greater”:

如果第一个字符串严格按字典顺序排列大于第二个(ge 允许它大于或等于第二个字符串),则gt运算符的计算结果为 True。这不是字典比较,因此大小写敏感。小写字母位于大写字母之后,因此“更大”:

% raku
> 'Hama' gt 'hama'
False
> 'hama' gt 'Hama'
True

The uppercase letters come before the lowercase ones, so any Str that starts with a lowercase letter is greater than any Str that starts with an uppercase letter:

大写字母位于小写字母之前,因此任何以小写字母开头的字符串都大于以大写字母开头的任何字符串

% raku
> 'alpha' gt 'Omega'
True
> 'α' gt 'Ω'
True

You can get some weird results if you compare numbers as Strs. The character 2 is greater than the character 1, so any Str starting with 2 is greater than any Str starting with 1:

如果将数字作为字符串进行比较,你可能会得到一些奇怪的结果。字符2大于字符1,因此从2开始的任何字符串都大于从1开始的任何字符串

% raku
> '2' gt '10'
True

The lt operator evaluates to True if the first Str is lexicographically less than the second (le allows it to be less than or equal to the second Str):

如果第一个字符串在字典上小于第二个(le 允许它小于或等于第二个字符串),则 lt 运算符求值为 True

% raku
> 'Perl 5' lt 'Raku'
True

If you don’t care about their case you can lowercase both sides with .lc:

如果你不关心他们的大小写你可以使用 .lc 小写双方:

% raku
> 'Hamadryas'.lc eq 'hamadryas'.lc
True

This wouldn’t work for the Reichwaldstrasse example you saw previously. If you wanted to allow for equivalent representations you’d use .fc:

这对你之前看到的 Reichwaldstrasse 例子不起作用。如果你想允许等效表示,请使用 .fc

% raku
> 'Reichwaldstrasse'.lc eq 'Reichwaldstraße'.lc
False
> 'Reichwaldstrasse'.fc eq 'Reichwaldstraße'.fc
True

As with numbers, you can chain the comparisons:

与数字一样,你可以链接比较:

% raku
> 'aardvark' lt 'butterfly' lt 'zebra'
True

Prompting for Input

You’ve already used prompt for simple things. When you call it your program reads a single line and chops off the newline that you typed. A small modification of the program shows you what sort of type you get back:

你已经将 prompt 用于简单的提示了。当你调用它时,你的程序读取一行并切掉你键入的换行符。对程序进行一些小修改即可显示你获得的类型:

my $answer = prompt( 'What\'s your favorite animal? ' );
put '$answer is type ', $answer.^name;
put 'You chose ', $answer;

When you answer the question you get a Str:

当你回答这个问题时,你会得到一个字符串

% raku prompt.p6
What's your favorite animal? Fox
$answer is type Str
You chose Fox

When you don’t type anything other than a Return the answer is still a Str, but it’s an empty Str:

当你没有输入除了换行符之外的任何东西时,答案仍然是一个字符串,但它是一个空字符串

% raku prompt.p6
What's your favorite animal?
$answer is type Str
You chose

You end input with Control-D, which is the same as not typing anything. In that case it returns an Any type object. Notice that the line showing the type appears on the same line as the prompt text—you never typed a Return. There’s also a warning about that Any value, and finally your last line of output:

你使用 Control-D 结束输入,这与不输入任何内容相同。在这种情况下,它返回一个Any 类型的对象。请注意,显示该类型的行与提示文本显示在同一行 - 你从未键入Return。还有关于Any 值的警告,最后是你的最后一行输出:

% raku prompt.p6
What's your favorite animal? $answer is type Any
Use of uninitialized value $answer of type Any in string context.
You chose

To guard against this problem you can test $answer. The Any type object is always False. So is the empty Str:

为了防止这个问题,你可以测试 $answerAny 类型对象始终为 False。空的字符串也是如此:

my $answer = prompt( 'What\'s your favorite animal? ' );
put do
    if $answer { 'You chose ' ~ $answer }
    else       { 'You didn\'t choose anything.' }

prompt takes whatever you type, including whitespace. If you put some spaces at the beginning and end that’s what shows up in the Str:

prompt接受你输入的任何内容,包括空格。如果你在开头和结尾放置一些空格,那就是字符串中出现的空格:

% raku prompt.p6
What's your favorite animal?                 Butterfly
You chose                 Butterfly

You can see this better if you put in something to surround the answer portion of the output, such as <> in this example:

如果你在输出的答案部分放置一些东西,你可以更好地看到这一点,例如本例中的 <>

my $answer = prompt( 'What\'s your favorite animal? ' );
put do
   if $answer { 'You chose <', $answer, '>' }
   else       { 'You didn't choose anything' }

Now you can easily see the extra space in $answer:

现在,你可以轻松地在 $answer 中看到额外的空格:

% raku prompt.p6
What's your favorite animal?                 Butterfly
You chose <                Butterfly            >

The .trim method takes off the surrounding whitespace and gives you back the result:

.trim 方法去掉周围的空格并返回结果:

my $answer = prompt( 'What\'s your favorite animal? ' ).trim;

If you apply it to $answer by itself it doesn’t work:

如果你将它自己应用于 $answer 那么它不起作用:

$answer.trim;

You need to assign the result to $answer to get the updated value:

你需要将结果赋值给 $answer 以获取更新后的值:

$answer = $answer.trim;

That requires you to type $answer twice. However, you know about binary assignment so you can shorten that to use the variable name once:

这要求你输入两次 $answer。但是,你知道二进制赋值,因此你可以缩短它以使用变量名称一次:

$answer .= trim;

If you don’t want to remove the whitespace from both sides you can use either .trim-leading or .trim-trailing for the side that you want.

如果你不想从两侧删除空格,可以使用 .trim-leading.trim-trailing 作为所需的一侧。

Number to String Conversions

You can easily convert numbers to Strs with the .Str method. They may not look like what you started with. These look like number values but they are actually Str objects where the digits you see are characters:

你可以使用 .Str 方法轻松地将数字转换为字符串。它们可能看起来不像开始那样。这些看起来像数字值但它们实际上是字符串对象,其中你看到的数字是字符:

% raku
> 4.Str
4
> <4/5>.Str
0.8
> (13+7i).Str
13+7i

The unary prefix version of ~ does the same thing:

~ 的一元前缀版本做同样的事情:

% raku
> ~4
4
> ~<4/5>
0.8
> ~(13+7i)
13+7i

If you use a number in a Str operation it automatically converts it to its Str form:

如果在字符串操作中使用数字,它会自动将其转换为字符串形式:

% raku
> 'Hamadryas ' ~ <4/5>
Hamadryas 0.8
> 'Hamadryas ' ~ 5.5
Hamadryas 5.5

String to Number Conversions

Going from Strs to numbers is slightly more complicated. If the Str looks like a number you can convert it to some sort of number with the unary prefix version of +. It converts the Str to the number of the narrowest form, which you can check with .^name:

字符串到数字稍微复杂一些。如果字符串看起来像一个数字,你可以使用一元前缀版本+将其转换为某种数字。它将字符串转换为最窄形式的数字,你可以查看 ^name

% raku
> +'137'
137
> (+'137').^name
Int
> +'1/2'
0.5
> (+'1/2').^name
Rat

This only works for decimal digits. You can have the decimal digits 0 to 9 and a possible decimal point followed by more decimal digits. An underscore is allowed with the same rules as for literal numbers. The conversion ignores surrounding whitespace:

这仅适用于十进制数字。你可以使用小数位数0到9以及可能的小数点后跟更多的十进制数字。允许使用与字面数相同的规则的下划线。转换忽略了周围的空格:

% raku
> +' 1234 '
1234
> +' 1_234 '
1234
> +' 12.34 '
12.34

Anything else, such as two decimal points, causes an error:

其他任何内容,例如两个小数点,都会导致错误:

> +'12.34.56'
Cannot convert string to number: trailing characters after number

When you perform numerical operations on a Str it’s automatically converted to a number:

当你对字符串执行数值运算时,它会自动转换为数字:

% raku
> '2' + 3
5
> '2' + '4'
6
> '2' ** '8'
256

EXERCISE 4.4Write a program that prompts for two numbers then outputs their sum, difference, product, and quotient. What happens if you enter something that’s not a number? (You don’t need to handle any errors.)

In the previous exercise you should have been able to create a conversion error even though you didn’t have the tools to handle it. If you want to check if a Str can convert to a number you can use the val routine. That gives you an object that does the Numeric role if it can convert the Str. Use the smart match operator to check that it worked:

练习4.4 写一个程序,提示输入两个数字,然后输出它们的和,差,乘积和商。如果你输入的不是数字,会发生什么? (你不需要处理任何错误。)

在上一个练习中,即使你没有处理它的工具,你也应该能够创建转换错误。如果要检查字符串是否可以转换为数字,可以使用 val 例程。如果它可以转换字符串,那么它将为你提供一个执行 Numeric 角色的对象。使用智能匹配运算符检查它是否有效:

my $some-value = prompt( 'Enter any value: ' );
my $candidate = val( $some-value );

put $candidate, ' ', do
    if $candidate ~~ Numeric { ' is numeric' }
    else                     { ' is not numeric' }

This seems complicated now because you haven’t read about interpolated Strs yet. It will be much clearer by the end of this chapter.

EXERCISE 4.5Update the previous exercise to handle nonnumeric values that would cause a conversion error. If one of the values isn’t numeric, output a message saying so.

Sometimes your text is numeric but not decimal. The .parse-base method can convert it for you. It takes a Str that looks like a nondecimal number and turns it into a number:

现在这看起来很复杂,因为你还没有读过有关插值的字符串。到本章结尾将会更清楚。

练习4.5 更新上一个练习以处理可能导致转换错误的非数字值。如果其中一个值不是数字,则输出一条说明的消息。

有时你的文本是数字但不是小数。 .parse-base 方法可以为你转换它。它需要一个看起来像非十进制数字的字符串并将其转换为数字:

my $octal  = '0755'.parse-base: 8;     # 493
my $number = 'IG88'.parse-base: 36;    # 860840

This is the same thing the colon form was doing in Chapter 3:

这与第3章中的冒号对形式所做的相同:

:8<0755>
:36<IG88>

Interpolated Strings

You’ve taken a long path through this chapter to get to the quoting mechanism that you’re likely to use the most. An interpolated string replaces special sequences within the Str with other characters. These Strs will also make easier some of the code you’ve already seen.

Interpolated Strs use the double quote, ", as the default delimiter and are sometimes called double-quoted strings. You need to escape the " if you want one in the Str, and you can escape the \:

你已经通过本章走了很长的路,以了解你可能最常使用的引用机制。插值字符串用其他字符替换字符串中的特殊序列。这些字符串也会使你已经看过的一些代码变得更容易。

插值字符串使用双引号 " 作为默认分隔符,有时也称为双引号字符串。如果你想在字符串中使用双引号你需要转义 ",你也可以转义 \

% raku
> "Hamadryas perlicus"
Hamadryas perlicus
> "The escaped \" stays in the string"
The escaped " stays in the string
> "Escape the \\ backslash"
Escape the \ backslash

The backslash also starts other special interpolating sequences. A \t represents a tab character. A \n represents a newline:

反斜杠也会启动其他特殊插值序列。 \t 表示制表符。 \n 表示换行符:

put "First line\nSecond line\nThird line";

If you want a character that’s not easy to type you can put its code number (a hexadecimal value) after \x or inside \x[]. Don’t use the 0x prefix; the \x already assumes that:

如果你想要一个不容易输入的字符,你可以在 \x 之后或在 \x[] 之内输入它的代码号(十六进制值)。不要使用 0x 前缀; \x 已经假定:

put "The snowman is \x[2603]";

Several comma-separated code numbers inside \x[] turn into multiple characters:

\x[] 内的几个以逗号分隔的代码编号变为多个字符:

put "\x[1F98B, 2665, 1F33B]";  # 

If you know the name of the character you can put that inside \c[]. You don’t quote these names and the names are case insensitive:

如果你知道字符的名称,可以将其放在 \c[] 中。你不引起这些名称,并且名称不区分大小写:

put "\c[BUTTERFLY, BLACK HEART, TACO]";  # 

Those are nice, but it’s much more handy to interpolate variables. When a double-quoted Str recognizes a sigiled variable name it replaces the variable with its value:

这些很好,但插入变量更方便。当双引号字符串识别出一个带符号的变量名时,它用它的值替换变量:

my $name = 'Hamadryas perlicus';
put "The best butterfly is $name";

The quoting slang looks for the longest possible variable name (and not the longest name actually defined). If the text after the variable name looks like it could be a variable name that’s the variable it looks for:

引用方言查找可能的最长变量名称(而不是实际定义的最长名称)。如果变量名后面的文本看起来像是一个变量名,那就是它所寻找的变量:

my $name = 'Hamadryas perlicus';
put "The best butterfly is $name-just saying!";

This is a compile-time error:

这是一个编译时错误:

Variable '$name-just' is not declared

If you need to separate the variable name from the rest of the text in the double-quoted Str you can surround the entire variable in braces:

如果你需要将变量名与双引号字符串中的其余文本分开,则可以在花括号中包围整个变量:

my $name = 'Hamadryas perlicus';
put "The best butterfly is {$name}-just saying!";

Escape a literal $ where it might look like a sigil that starts a variable name:

转义一个字面 $,它可能看起来像一个启动变量名称的sigil:

put "I used the variable \$name";

Now here’s the powerful part. You can put any code you like inside the braces. The quoting slang will evaluate the code and replace the braces with the last evaluated expression:

现在这是强大的部分。你可以把任何你喜欢的代码放在花括号内。引用方言将计算代码并用最后计算的表达式替换花括号:

put "The sum of two and two is { 2 + 2 }";

This means that the previous programs in this chapter are much easier to type than they first appear. You can construct the Str inside the delimiters rather than using a series of separate Strs:

这意味着本章中的先前程序比首次出现时更容易键入。你可以在分隔符内构造字符串而不是使用一系列单独的字符串

my $answer = prompt( 'What\'s your favorite animal? ' );
put "\$answer is type {$answer.^name}";
put "You chose $answer";

Like with the previous Strs, you can choose a different delimiter for interpolated Strs. Use qq (double q for double quoting) in front of the delimiter:

与之前的字符串一样,你可以为插值的字符串选择不同的分隔符。在分隔符前面使用 qq (两个 q 表示双引号):

put qq/\$answer is type {$answer.^name}/;

The \n is interpolated as a newline and the \t becomes a tab:

\n 插值为换行符,\t 变为制表符:

put qq/\$answer is:\n\t$answer/;

This Str has two lines and the second one is indented:

这个字符串有两行,第二行是缩进的:

answer is:
    Hamadryas perlicus

qq// is the same as Q with the :qq or :double adverb:

qq// 与带有 :qq:double 副词的 Q 相同:

put Q :qq /\$answer is type {$answer.^name}/;
put Q :double /\$answer is type {$answer.^name}/;

If you want to interpolate only part of a Str you can use \qq[] for that part:

如果只想插入字符串的一部分,可以使用 \qq[] 作为该部分:

my $genus = 'Hamadryas';
put '$genus is \qq[$genus]';

Going the other way, you can turn off interpolation for part of a Str by making that part act like a single-quoted Str with \q[]:

换句话说,你可以通过使该部分与 \q[] 的单引号字符串一起使用来关闭部分字符串的插值:

put "\q[$genus] is $genus";

Table 4-1 shows many other special sequences available inside a double-quoted context.

表4-1显示了很多双引号上下文中其它可用的特殊序列。

Escape sequence Description
\a The ASCII bell character
\b Backspace
\r Carriage return
\n Newline
\t Tab
\f Form feed
\c[NAME] Character by name
\q[…] Single quote the part inside the brackets
\qq[…] Double quote the part inside the brackets
\x[ABCD] Character by code number in hex

EXERCISE 4.6Modify your character-counting program to show the Str as well as the number of characters it counts. For example, 'Hamadryas' has 10 characters. You should be able to output a single interpolated Str.

练习4.6 修改你的字符计数程序,以显示字符串以及它所计算的字符数。例如,‘Hamadryas’ 有10个字符。你应该能够输出单个插值的字符串

Here Docs

For multiline quoting you could use the quoting you’ve seen so far, but every character between those delimiters matters. This often results in ugly outdenting:

对于多行引用,你可以使用到目前为止看到的引用,但这些分隔符之间的每个字符都很重要。这通常导致丑陋的外观:

my $multi-line = '
    Hamadryas perlicus: 19
    Vanessa atalanta: 17
    Nymphalis antiopa: 0
    ';

Interpolating \n doesn’t make it any prettier:

插入换行符 \n 不会使它更漂亮:

my $multi-line = "Hamadryas perlicus: 19\n...";

A here doc is a special way of quoting a multiline text. Specify a delimiter with the :heredoc adverb. TheStr ends when the slang finds that same Str on a line by itself:

here doc 是一种引用多行文本的特殊方式。使用 :heredoc 副词指定分隔符。当该方言在一条线上找到相同的字符串时,字符串结束:

my $multi-line = q :heredoc/END/;
    Hamadryas perlicus: 19
    Vanessa atalanta: 17
    Nymphalis antiopa: 0
    END

put $multi-line;

This also strips the same indentation it finds before the closing delimiter. The output ends up with no indention even though it had it in the literal code:

这也剥离了它在结束分隔符之前找到的相同缩进。输出最终没有缩进,即使它在字面代码中有缩进:

Hamadryas perlicus: 19
Vanessa atalanta: 17
Nymphalis antiopa: 0

The :to adverb does the same thing as :heredoc:

:to 副词与 :heredoc 副词的作用相同:

my $multi-line = q :to<HERE>;
    Hamadryas perlicus: 19
    Vanessa atalanta: 17
    Nymphalis antiopa: 0
    HERE

This works with the other quoting forms too:

这与其他引用形式一起使用也有效:

put Q :to/END/;
    These are't special: $ \
    END

put qq :to/END/;
    The genus is $genus
    END

Shell Strings

Shell strings are the same sort of quoting that you’ve seen so far, but they don’t construct a Str to store in your program. They create an external command to run in the shell. A shell string captures the command’s output and gives it to you. Chapter 19 covers this, but here’s something to get you started.

qx uses the same rules as escaped Strs. The hostname command works on both Unix and Windows systems:

Shell 字符串与你到目前为止看到的引用相同,但它们不构造要存储在程序中的字符串。它们创建一个外部命令以在 shell 中运行。 shell 字符串捕获命令的输出并将其提供给你。第19章介绍了这一点,但这里有一些东西可以帮助你入门。

qx 使用与转义的字符串相同的规则。 hostname 命令适用于 Unix 和 Windows 系统:

my $uname = qx/hostname/;
put "The hostname is $uname";
put "The hostname is { qx/hostname/ }"; # quoting inside quoting

In this output there’s a blank line between the lines because it includes the newline in the normal command output:

在此输出中,行之间有一个空行,因为它包含正常命令输出中的换行符:

The hostname is hamadryas.local

The hostname is hamadryas.local

Use .chomp to fix that. If there’s a newline on the end of the text it removes it (although put adds its own):

使用 .chomp 来解决这个问题。如果文本末尾有换行符,则删除它(尽管 put 添加了自己的换行符):

my $uname = qx/hostname/.chomp;
put "The hostname is $uname";
put "The hostname is { qx/hostname/.chomp }";

print doesn’t add a newline for you, so you don’t need to remove the one from the command output:

print不会为你添加换行符,因此你无需从命令输出中删除该换行符:

print "The hostname is { qx/hostname/ }";

qx and qqx are shortcuts for single and double quoting Strs with the :x or :exec adverbs:

qxqqx 是带有 :x:exec 副词的单引号和双引号字符串的快捷方式:

print Q :q      :x    /hostname/;
print Q :q      :exec /hostname/;
print Q :single :exec /hostname/;

Shell Safety

In the previous examples, the shell looks through its PATH environment variable to find the hostname command and executes the first one that it finds. Since people can set their PATH (or something can set it for them), you might not get the command you expect. If you use an absolute path you don’t have this problem. Literal quoting is handy to avoid inadvertent escaping:

在前面的示例中,shell 查看其 PATH 环境变量以查找 hostname 命令并执行它找到的第一个命令。由于人们可以设置他们的 PATH(或者某些东西可以为他们设置),你可能无法得到你期望的命令。如果使用绝对路径,则不会出现此问题。字面引用可以避免无意中的转义:

put Q :x '/bin/hostname';
put Q :x 'C:\Windows\System32\hostname.exe'
NOTE 注意

I won’t cover secure programming techniques here, but I do write more about these problems in Mastering Perl. Although that’s a Perl 5 book, the risks to your program are the same.

Although you have not seen hashes yet (Chapter 9), you could change the environment for your program. If you set PATH to the empty Str your program won’t be able to search for any programs:

我不会在这里介绍安全编程技术,但我在 Mastering Perl 中写了更多关于这些问题的内容。虽然这是一本 Perl 5书,但你的程序风险是一样的。

虽然你还没有看到哈希(第9章),但你可以更改程序的环境。如果将 PATH 设置为空字符串,则程序将无法搜索任何程序:

%*ENV<PATH> = '';
print Q :x 'hostname';       # does not find this
print Q :x '/bin/hostname';  # this works

If that’s too restrictive you can set the PATH to exactly the directories that you consider safe:

如果限制太多,你可以将 PATH 设置为你认为安全的目录:

%*ENV<PATH> = '/bin:/sbin';
print Q :x 'hostname';       # does not find this
print Q :x '/bin/hostname';  # this works

There’s also a double-quoted form of shell Strs:

还有一个双引号形式的shell 字符串

my $new-date-string = '...';
my $output = qqx/date $new-date-string/

What’s in that $new-date-string? If it descends from user data, external configuration, or something else that you don’t control, you might be in for a surprise. That could be malicious or merely accidental, so be careful:

那个 $new-date-string 中有什么?如果它来自用户数据,外部配置或你无法控制的其他内容,你可能会感到惊讶。这可能是恶意的或仅仅是偶然的,所以要小心:

my $new-date-string = '; /bin/rm -rf';
my $output = qqx/date $new-date-string/

EXERCISE 4.7Write a program to capture the output of hostname. Make it work on both Windows and Unix systems. $*DISTRO.is-win is True if you are on Windows and False otherwise.

练习4.7 编写一个程序来捕获主机名的输出。使其适用于 Windows 和 Unix 系统。如果你在 Windows上,$* DISTRO.is-winTrue,否则为 False

Fancier Quoting

You can combine adverbs in generalized quoting to use just the features that you need. Suppose that you want to interpolate only things in braces but nothing else. You can use the :c adverb:

你可以在通用引用中组合副词,以仅使用所需的功能。假设你只想在花括号中插入内容而不是其他内容。你可以使用 :c 副词:

% raku
> Q :c "The \r and \n stay, but 2 + 2 = { 2 + 2 }"
The \r and \n stay, but 2 + 2 = 4

To get only variable interpolation use the :s adverb. No other processing happens:

要只获得变量插值,请使用 :s 副词。没有其他处理发生:

% raku
> my $name = 'Hamadryas'
Hamadryas
> Q :s "\r \n { 2 + 2 } $name"
\r \n { 2 + 2 } Hamadryas

You can combine adverbs to get any mix of features that you like. Cluster the adverbs or space them out. They work the same either way:

你可以组合副词来获得你喜欢的任何功能组合。聚集副词或将它们分开。他们的工作方式相同:

% raku
> Q :s:c "\r \n { 2 + 2 } $name"
\r \n 4 Hamadryas
> Q :s:c:b "\r \n { 2 + 2 } $name"

 4 Hamadryas
> Q :s :c :b "\r \n { 2 + 2 } $name"

 4 Hamadryas

The :qq adverb is actually the combination of :s :a :h :f :c :b. This interpolates all of the variables, the stuff in braces, and all backslash sequences. If you don’t want to interpolate everything, you can turn off an adverb. This might be easier than specifying several just to leave one out. Put a ! in front of the one to disable. :!c turns off brace interpolation:

:qq 副词实际上是 :s :a :h :f :c :b 的组合。这会插入所有变量、花括号中的内容以及所有反斜杠序列。如果你不想插入所有内容,可以关闭副词。这可能比指定几个更简单,只留下一个。放一个 ! 在需要禁用的副词前面。 :!c 关闭花括号插值:

qq :!c /No { 2+2 } interpolation/;

Selected quoting forms and adverbs are summarized in Table 4-2 and Table 4-3.

表4-2和表4-3总结了选定的引用形式和副词。

Short name Long name Description
「…」 Literal Default delimiter, corner brackets
Q ‘…’ Literal Generalized quoting with alternate delimiter
Q[…] Literal Generalized quoting with paired delimiter
‘…’ Escaped Default delimiter, single quote
q{…} Escaped Use alternate paired delimiter
Q:q […] Escaped Generalized quoting with :q adverb
“…” Interpolated Default delimiter, double quote
qq[…] Interpolated Use alternate paired delimiter
Q:qq ‘…’ Interpolated Generalized quoting with :qq adverb
Q:c ‘…{ }…’ Interpolated Generalized quoting only interpolating closures
Q:to(HERE) Literal Here doc
q:to(HERE) Escaped Here doc
qq:to(HERE) Interpolated Here doc
Short name Long name Description
:x :exec Execute shell command and return results
:q :single Interpolate \\, \qq[…], and an escaped delimiter
:qq :double Interpolate with :s, :a, :h, :f, :c, :b
:s :scalar Interpolate $ variables
:a :array Interpolate @ variables
:h :hash Interpolate % variables
:f :function Interpolate & calls
:c :closure Interpolate code in {…}
:b :backslash Interpolate \n, \t, and others
:to :heredoc Parse result as here doc terminator
:v :val Convert to allomorph if possible

Summary

The quoting slang offers several ways to represent and combine text, so you can get exactly what you need in an easy fashion. Once you have the text, you have many options for looking inside the Str to find or extract parts of it. This is still early in the book, though. You’ll see more features along the way and then really have fun in Chapter 15.

引用方言提供了几种表示和组合文本的方法,因此你可以轻松地获得所需的内容。获得文本后,你可以在字符串内部查找或提取部分内容。不过,这仍然是本书的早期版本。在第15章中,你将看到更多功能,变得真正有趣。

comments powered by Disqus