Inform can get at the contents of text in a variety of ways. The lowest-level is by character - a character is a letter, digit, punctuation symbol, space or other letter-form. (We use the term "character" rather than "letter" because otherwise we would have to call "5" a letter, and so on.) Characters number upwards from 1: character number 1, to repeat that, starts the text. We can get the Nth character with:
character number (number) in (text) ⇒ text
This phrase produces the Nth character from the text, counting from 1. Characters include letters, digits, punctuation symbols, spaces or other letter-forms. Example:
character number 8 in "numberless projects of social reform"
produces "e". If the index is less than 1 or more than the length of the text, the result is an empty text, "".
The maximum character number varies with the current length of the text, and can be evaluated as:
number of characters in (text) ⇒ number
This phrase produces the number of characters from the text. Characters include letters, digits, punctuation symbols, spaces or other letter-forms. Examples:
number of characters in "War and Peace"
number of characters in ""
produce 13 and 0 respectively.
We can also use the adjective "empty":
if the description of the location is empty, ...
The empty text, "", is the only one with 0 characters.
We can also extract the contents by word, again numbered from 1. Thus:
word number (number) in (text) ⇒ text
This phrase produces the Nth word from the text, counting from 1. Words for this purpose are what's left after breaking the text up at punctuation or spacing (spaces, line breaks, paragraph breaks) and then removing that punctuation or spacing. Example:
word number 3 in "ice-hot, don't you think?"
produces "don't". If the index is less than 1 or more than the number of words in the text, the result is an empty text, "".
number of words in (text) ⇒ number
This phrase produces the number of words from the text. Words for this purpose are what's left after breaking the text up at punctuation or spacing (spaces, line breaks, paragraph breaks) and then removing that punctuation or spacing. Example:
number of words in "ice-hot, don't you think?"
produces 5.
Note that the contraction apostrophe in "don't" doesn't count as punctuation. Because this is not always quite what we want, Inform offers two variations:
punctuated word number (number) in (text) ⇒ text
This phrase produces the Nth word from the text, counting from 1. Words for this purpose are what's left after breaking the text up at punctuation or spacing (spaces, line breaks, paragraph breaks) and then removing the spacing, but leaving the punctuation as independent words. Example:
punctuated word number 2 in "ice-hot, don't you think?"
produces "-". The punctuated words here are "ice", "-", "hot", ",", "don't", "you", "think", "?". If two or more punctuation marks are adjacent, they are counted as different words, except for runs of dashes or periods: thus ",," has two punctuated words, but "--" and "…" have only one each. If the index is less than 1 or more than the number of punctuated words in the text, the result is an empty text, "".
number of punctuated words in (text) ⇒ number
This phrase produces the number of words from the text. Words for this purpose are what's left after breaking the text up at punctuation or spacing (spaces, line breaks, paragraph breaks) and then removing the spacing, but leaving the punctuation as independent words. Example:
number of punctuated words in "ice-hot, don't you think?"
produces 8; see if you can find them all.
unpunctuated word number (number) in (text) ⇒ text
This phrase produces the Nth word from the text, counting from 1. Words for this purpose are what's left after breaking the text up at spacing (spaces, line breaks, paragraph breaks) but including all punctuation as if it were part of the spelling of the words it joins to. Example:
unpunctuated word number 1 in "ice-hot, don't you think?"
produces "ice-hot,". The unpunctuated words in "ice-hot, don't you think?" are "ice-hot,", "don't", "you", "think?". If the index is less than 1 or more than the number of punctuated words in the text, the result is an empty text, "".
number of unpunctuated words in (text) ⇒ number
This phrase produces the number of words from the text. Words for this purpose are what's left after breaking the text up at spacing (spaces, line breaks, paragraph breaks) but including all punctuation as if it were part of the spelling of the words it joins to. Example:
number of unpunctuated words in "ice-hot, don't you think?"
produces just 4.
Finally, on the larger scale still, we also have:
line number (number) in (text) ⇒ text
This phrase produces the Nth line from the text, counting from 1. Unless explicit use is made of line-breaking, lines and paragraphs will be the same - it doesn't refer to lines as visible on screen, because we have no way of knowing what size screen the player might have.
number of lines in (text) ⇒ number
This phrase produces the number of lines in the text. Unless explicit use is made of line-breaking, lines and paragraphs will be the same - it doesn't refer to lines as visible on screen, because we have no way of knowing what size screen the player might have. Example: the number of lines in
"Sensational news just in![paragraph break]The Martians have invaded Miranda.[line break](One of the moons of Uranus, that is.)"
is 3.
paragraph number (number) in (text) ⇒ text
This phrase produces the Nth paragraph from the text, counting from 1.
number of paragraphs in (text) ⇒ number
This phrase produces the number of paragraphs in the text. Example: the number of paragraphs in
"Sensational news just in![paragraph break]The Martians have invaded Miranda.[line break](One of the moons of Uranus, that is.)"
is 2.
(Attempting to make large enough texts to have a serious paragraph count is slightly risky if there is not much memory to play with, as on the Z-machine. But the facilities do exist.)
In most European languages the same letters can appear in two forms: as capitals, like "X", mainly used to mark a name or the start of a sentence; or in their ordinary less prominent form, like "x". These forms are called upper and lower case because, historically, typesetters kept lead castings of letters in two wooden cases, one above the other on the workbench. Lower case letters were in the lower box closer to hand, being more often needed.
Human languages are complicated. Not every lower case letter has an upper case partner: ordinal markers in Hispanic languages don't, for instance, and the German "ß" is never used in upper case. Sometimes two different lower case letters have the same upper case form: "ς" and "σ", two versions of the Greek sigma, both capitalise to "Σ". Inform follows the international Unicode standard in coping with all this.
We can test whether text is in either case like so:
if (text) is in lower case:
This condition is true if every character in the text is a lower case letter. Examples: this is true for "wax", but false for "wax seal" or "eZ mOnEy".
if (text) is in upper case:
This condition is true if every character in the text is in upper case. Examples: this is true for "BEESWAX", but false for "ROOM 101".
We can change the casing of text using:
(text) in lower case ⇒ text
This phrase produces a new version of the given text, but with all upper case letters reduced to lower case. Example: "a ticket to Tromsø via Østfold" becomes
"a ticket to tromsø via østfold"
(text) in upper case ⇒ text
This phrase produces a new version of the given text, but with all upper case letters reduced to lower case. Example: "a ticket to Tromsø via Østfold" becomes
"A TICKET TO TROMSØ VIA ØSTFOLD"
(text) in title case ⇒ text
This phrase produces a new version of the given text, but with casing of words changed to title casing: this capitalises the first letter of each word, and lowers the rest. Example: "a ticket to Tromsø via Østfold" becomes
"A Ticket To Tromsø Via Østfold"
(text) in sentence case ⇒ text
This phrase produces a new version of the given text, but with casing of words changed to sentence casing: this capitalises the first letter of each sentence and reduces the rest to lower case. Example: "a ticket to Tromsø via Østfold" becomes
"A ticket to tromsø via østfold"
Accents are preserved in case changes. So (if we are using Glulx and have Unicode available) title case can turn Aristophanes' discomfortingly lower-case lines
ἐξ οὗ γὰρ ἡμᾶς προὔδοσαν μιλήσιοι,
οὐκ εἶδον οὐδ᾽ ὄλισβον ὀκτωδάκτυλον,
ὃς ἦν ἂν ἡμῖν σκυτίνη "πικουρία
by raising them proudly up like so:
Ἐξ Οὗ Γὰρ Ἡμᾶς Προὔδοσαν Μιλήσιοι,
Οὐκ Εἶδον Οὐδ᾽ Ὄλισβον Ὀκτωδάκτυλον,
Ὃς Ἦν Ἂν Ἡμῖν Σκυτίνη "Πικουρία.
Title and sentence casing can only be approximate if done by computer. Inform looks at the letters, but is blind to the words and sentences they make up. (Note the way sentence casing did not realise "Tromsø" and "Østfold" were proper nouns.) If asked to put the name "MCKAY" into title casing, Inform will opt for "Mckay", not recognising this as the Scottish patronymic surname "McKay". Given "baym dnieper", the title of David Bergelson's great Yiddish novel of 1932, it will opt for "BAYM DNIEPER": but properly speaking Yiddish does not have upper case lettering at all, though nowadays it is sometimes printed as if it did. And conventions are very variable about which words should be capitalised in titles: English publishers mostly agree that connectives, articles and prepositions should be in lower case, but in France almost anything goes, with Académie Française rules giving way to avant-garde book design. In short, we cannot rely on Inform's title casing to produce a result which a human reader will always think perfect.
This discussion has all been about how Inform prints, not about how it reads commands from the keyboard, because the latter is done case-insensitively. The virtual machines for which Inform creates programs normally flatten all command input to lower case, and in any case Understand comparison ignores casing. Thus
Understand "mckay" as the Highland Piper.
means that "examine McKay", "examine MCKAY", "examine mckay", and so forth are all equivalent. The text of the player's command probably doesn't preserve the original casing typed in any event.
One more caution, though it will affect hardly anyone. For projects using the Z-machine, only a restricted character set is available in texts: for more, we must use Glulx. A mad anomaly of ZSCII, the Z-machine character set, is that it contains the lower case letter "ÿ" but not its upper case form "Ÿ", so that
produces "Ÿ" in Glulx but "ÿ" in the Z-machine. This will come as a blow to Queensrÿche fans, but in all other respects any result on the Z-machine should agree with its counterpart on Glulx.
Examples
000.
To arrange that the location information normally given on the left-hand side of the status line appears in block capitals.
000.
Using case changes on any text produced by a "to say…" phrase.
When playing around with text, we tend to get into longer and trickier wrangles of matching - we find that we want to look not for simple text like "gold", but for "gold" used only as a separate word, or for a date in YYYY-MM-DD format, or for a seemingly endless range of other possibilities. What we need is not just for Inform to provide a highly flexible matching program, but also a good notation in which to describe what we want.
Fortunately, such a notation already exists. This is the "regular expression" notation, named for a 1950s mathematical model by the logician Stephen Kleene, applied to computing in the late 60s by Ken Thompson, borrowed almost at once by the early Unix tools of the 70s, and developed further by Henry Spencer in the 80s and Philip Hazel in the 90s. The glue holding the Internet together - the Apache web-server, the scripting languages Perl and Python, and so forth - makes indispensable use of regular expressions.
As might be expected from the previous section, we simply have to describe the FIND text as "regular expression" rather than "text" and then the same facilities are available:
if (text) matches the regular expression (text):
This condition is true if any contiguous part of the text can be matched against the given regular expression. Examples:
if "taramasalata" matches the regular expression "a.*l", ...
is true, since this looks for a part of "taramasalata" which begins with "a", continues with any number of characters, and finishes with "l"; so it matches "aramasal". (Not "asal", because it gets the makes the leftmost match it can.) The option "case insensitively" causes lower and upper case letters to be treated as equivalent.
if (text) exactly matches the regular expression (text):
This condition is true if the whole text (starting from the beginning and finishing at the end) can be matched against the given regular expression. The option "case insensitively" causes lower and upper case letters to be treated as equivalent.
And once again:
number of times (text) matches the regular expression (text) ⇒ number
This produces the number of times that contiguous pieces of the text can be matched against the regular expression, without allowing them to overlap.
Since a regular expression can match quite a variety of possibilities (for instance "b\w+t" could match "boast", "boat", "bonnet" and so on), it's sometimes useful to find what the match actually was:
text matching regular expression ⇒ text
This phrase is only meaningful immediately after a successful match of a regular expression against text, and it produces the text which matched. Example:
if "taramasalata" matches the regular expression "m.*l":
say "[text matching regular expression].";
says "masal."
Perhaps fairly, perhaps not, regular expressions have a reputation for being inscrutable. The basic idea is that although alphanumeric characters (letters, numbers and spaces) mean just what they look like, punctuation characters are commands with sometimes dramatic effects. Thus:
if WHATEVER matches the regular expression "fish", ...
if WHATEVER matches the regular expression "f.*h", ...
behave very differently. The first is just like matching the text "fish", but the second matches on any sequence of characters starting with an "f" and ending with an "h". This is not at all obvious at first sight: reading regular expressions is a skill which must be learned, like reading a musical score. A really complex regular expression can look like a soup of punctuation and even an expert will blink for a few minutes before telling you what it does - but a beginner can pick up the basics very quickly. Newcomers might like to try out and become comfortable with the features a few at a time, reading down the following list.
1. Golden rule. Don't try to remember all the characters with weird effects. Instead, if you actually mean any symbol other than a letter, digit or space to be taken literally, place a backslash "\" in front of it. For instance, matching the regular expression
"\*A\* of the Galactic Patrol"
is the same as matching the text "*A* of the Galactic Patrol", because the asterisks are robbed of their normal powers. This includes backslash itself: "\\" means a literal backslash. (Don't backslash letters or digits - that turns out to have a meaning all its own, but anyway, there is never any need.)
2. Alternatives. The vertical stroke "|" - not a letter I or L, nor the digit 1 - divides alternatives. Thus
"the fish|fowl|crawling thing"
is the same as saying match "the fish", or "fowl", or "crawling thing".
3. Dividing with brackets. Round brackets "(" and ")" group parts of the expression together.
"the (fish|fowl|crawling thing) in question"
is the same as saying match "the fish in question", or "the fowl in question", or "the crawling thing in question". Note that the "|" ranges outwards only as far as the group it is in.
4. Any character. The period "." means any single character. So
matches on any sequence of five characters so long as the first is "a" and the last is "z".
5. Character alternatives. The angle brackets "<" and ">" are a more concise way of specifying alternatives for a single character. Thus
matches on "bab", "beb", "bib", "bob" or "bub", but not "baob" or "beeb" - any single character within the angle brackets is accepted. Beginning the range with "^" means "any single character so long as it is not one of these": thus
matches on "blb" but not "bab", "beb", etc., nor on "blob" or "bb". Because long runs like this can be a little tiresome, we are also allowed to use "-" to indicate whole ranges. Thus
matches a "b", then any lower case English letter, then another "b".
In traditional regular expression language, square brackets rather than angle brackets are used for character ranges. In fact Inform does understand this notation if there are actual square brackets "[" and "]" in the pattern text, but in practice this would be tiresome to achieve, since Inform uses those to achieve text substitutions. So Inform allows "b<a-z>b" rather than making us type something like
"b[bracket]a-z[close bracket]b"
to create the text "b[a-z]b".
6. Popular character ranges. The range "<0-9>", matching any decimal digit, is needed so often that it has an abbreviation: "\d". Thus
matches, say, "2006-12-03". Similarly, "\s" means "any spacing character" - a space, tab or line break. "\p" is a punctuation character, in the same sense used for word division in the previous section: it actually matches any of
. , ! ? - / " : ; ( ) [ ] { }
"\w" means "any character appearing in a word", and Inform defines it as anything not matching "\s" or "\p".
"\l" and "\u" match lower and upper case letters, respectively. These are much stronger than "<a-z>" and "<A-Z>", since they use the complete definition in the Unicode 4.0.0 standard, so that letter-forms from all languages are catered for: for example "δ" matches "\l" and "Δ" matches "\u".
The reverse of these is achieved by capitalising the letter. So "\D" means "anything not a digit", "\P" means "anything not punctuation", "\W" means "anything not a word character", "\L" means "anything not a lower case letter" and so on.
7. Positional restrictions. The notation "^" does not match anything, as such, but instead requires that we be positioned at the start of the text. Thus
matches only "fish" at the start of the text, not occurring anywhere later on. Similarly, "$" requires that the position be the end of the text. So
matches only if the last four characters are "fish". Matching "^fish$" is the same thing as what Inform calls exactly matching "fish".
Another useful notation is "\b", which matches a word boundary: that is, it matches no actual text, but requires the position to be a junction between a word character and a non-word character (a "\w" and a "\W") or vice versa. Thus
matches "fish" in "some fish" and also "some fish, please!", but not in "shellfish". (The regular expression "\w*fish\b" catches all words ending in "fish", as we will see below.) As usual, the capitalised version "\B" negates this, and means "not at a word boundary".
8. Line break and tab. The notations "\n" and "\t" are used for a line break ("n" for "new line") and tab, respectively. Tabs normally do not occur in Inform strings, but can do when reading from files. It makes no sense to reverse these, so "\N" and "\T" produce errors.
9. Repetition. Placing a number in braces "{" and "}" after something says that it should be repeated that many times. Thus
matches only on "axxxxxxxxxxxxxxxxxxxxxxxxx". More usefully, perhaps, we can specify a range of the number of repetitions:
matches only on "axx", "axxx", "axxxx", "axxxxx", "axxxxxx". And we can leave the top end open: "ax{2,}" means "a" followed by at least two "x"s.
Note that the braces attach only to most recent thing - so "ax{2}" means "a" followed by two of "x" - but, as always, we can use grouping brackets to change that. So "(ax){2,}" matches "axax", "axaxax", "axaxaxax",…
(It's probably best not to use Inform to try to match the human genome against "<acgt>{3000000000}", but one of the most important practical uses of regular expression matching in science is in treating DNA as a string of nucleotides represented by the letters "a", "c", "g", "t", and looking for patterns.)
10. Popular repetitions. Three cases are so often needed that they have standard short forms:
"{0,1}", which means 0 or 1 repetition of something - in other words, doesn't so much repeat it as make it optional - is written "?". Thus "ax?y" matches only on "ay" or "axy".
"{0,}", which means 0 or more repetitions - in other words, any number at all - is written "*". Thus "ax*y" matches on "ay", "axy", "axxy", "axxxy", … and the omnivorous ".*" - which means "anything, any number of times" - matches absolutely every text. (Perhaps unexpectedly, replacing ".*" in a text with "X" will produce "XX", not "X", because the ".*" first matches the text, then matches the empty gap at the end. To match the entire text just once, try "^.*$".)
"{1,}", which means 1 or more repetitions, is written "+". So "\d+" matches any run of digits, for instance.
11. Greedy vs lazy. Once we allow things to repeat an unknown number of times, we run into an ambiguity. Sure, "\d+" matches the text "16339b". But does it look only as far as the "1", then reason that it now has one or more digits in a row, and stop? Or does it run onward devouring digits until it can do so no longer, so matching the "16339" part? These two strategies are called "lazy" and "greedy" respectively.
Do we care? Well, the strategy used makes no difference to whether there is a match, but it does affect what part of the text is matched, and the number of matches there are. Unless we mark for it, all repetitions are greedy. Usually this is good, but it means that, for instance,
applied to "-alpha- -beta- -gamma-" will match the whole text, because ".+" picks up all of "alpha- -beta- -gamma". To get around this, we can mark any of the repetition operators as lazy by adding a question mark "?". Thus:
applied to "-alpha- -beta- -gamma-" matches three times, producing "-alpha-" then "-beta-" then "-gamma-".
A logical but sometimes confusing consequence is that a doubled question mark "??" means "repeat 0 or 1 times, but prefer 0 matches to 1 if both are possibilities": whereas a single question mark "?", being greedy, means "repeat 0 or 1 times, but prefer 1 match to 0 if both are possibilities".
12. Numbered groups. We have already seen that round brackets are useful to clump together parts of the regular expression - to choose within them, or repeat them. In fact, Inform numbers these from 1 upwards as they are used from left to right, and we can subsequently refer back to their contents with the notation "\1", "\2", … After a successful match, we can find the results of these subexpressions with:
text matching subexpression (number) ⇒ text
This phrase is only meaningful immediately after a successful match of a regular expression against text, and it produces the text which matched. The number must be from 1 to 9, and must correspond to one of the bracketed groups in the expression just matched. Example: after
if "taramasalata" matches the regular expression "a(r.*l)a(.)":
the "text matching regular expression" is "aramasalat", the "text matching subexpression 1" is "ramasal", and "text matching subexpression 2" is "t".
For instance:
matches any run of two or more word-characters, subject to the restriction that the last one has to be the same as the first - so it matches "xerox" but not "alphabet". When Inform matches this against "xerox", first it matches the initial "x" against the group "(\w)". It then matches "\w*" ("any number of word-characters") against "ero", so that the "*" runs up to 3 repetitions. It then matches "\1" against the final "x", because "\1" requires it to match against whatever last matched in sub-expression 1 - which was an "x".
Numbered groups allow wicked tricks in matching, it's true, but really come into their own when it comes to replacing - as we shall see.
13. Switching case sensitivity on and off. The special notations "(?i)" and "(?-i)" switch sensitivity to upper vs. lower case off and on, mid-expression. Thus "a(?i)bcd(?-i)e" matches "abcde", "aBcDe", etc., but not "Abcde" or "abcdE".
14. Groups with special meanings. This is the last of the special syntaxes: but it's a doozy. A round-bracketed group can be marked to behave in a special way by following the open bracket by a symbol with a special meaning. Groups like this have no number and are not counted as part of \1, \2, and so forth - they are intended not to gather up material but to have some effect of their own.
Is a comment, that is, causes the group to do nothing and match against anything.
Is a lookahead: it is a form of positional requirement, like "\b" or "^", but one which requires that the text ahead of us matches whatever is in the brackets. (It doesn't consume that text - only checks to see that it's there.) For instance "\w+(?=;)" matches a word followed by a semicolon, but does not match the semicolon itself.
Is the same but negated: it requires that the text ahead of us does not match the material given. For instance, "a+(?!z)" matches any run of "a"s not followed by a "z".
"(?<= ...)" and "(?<! ...)"
Are the same but looking behind (hence the "<"), not forward. These are restricted to cases where Inform can determine that the material to be matched has a definite known width. For instance, "(?<!shell)fish" matches any "fish" not occurring in "shellfish".
Is a possessive, that is, causes the material to be matched and, once matched, never lets go. No matter what subsequently turns out to be convenient, it will never change its match. For instance, "\d+8" matches against "768" because Inform realises that "\d+" cannot be allowed to eat the "8" if there is to be a match, and stops it. But "(>\d+)8" does not match against "768" because now the "\d+", which initially eats "768", is possessive and refuses to give up the "8" once taken.
"(?(1)...)" and "(?(1)...|...)"
Are conditionals. These require us to match the material given if \1 has successfully matched already; in the second version, the material after the "|" must be matched if \1 has not successfully matched yet. And the same for 2, 3, …, 9, of course.
Finally, conditionals can also use lookaheads or lookbehinds as their conditions. So for instance:
"(?(?=\d)\d\d\d\d|AY-\d\d\d\d)"
means if you start with a digit, match four digits; otherwise match "AY-" followed by four digits. There are easier ways to do this, of course, but the really juicy uses of conditionals are only borderline legible and make poor examples - perhaps this is telling us something.
Examples
000.
Creating a beta-testing command that matches any line starting with punctuation.
000.
Some footnotes on Inform's regular expressions, and how they compare to those of other programming languages.
Substitutions are most often used just for printing, like so:
say "The clock reads [time of day].";
But they can also produce text which can be stored up or used in other ways. For example, defining
To decide what text is (T - text) doubled:
decide on "[T][T]".
makes
let the Gerard Kenny reference be "NewYork" doubled;
set this temporary variable to "NewYorkNewYork".
There is, however, a subtlety here. A text with a substitution in it, like:
"The clock reads [time of day]."
is always waiting to be substituted, that is, to become something like:
"The clock reads 11:12 AM."
If all we do with text is to print it, there's nothing to worry about. But if we're storing it up, especially for multiple turns, there are ambiguities. For example, suppose we're changing the look of the black status line bar at the top of the text window:
now the left hand status line is "[time of day]";
Just copying "[time of day]" to the "left hand status line" variable doesn't make it substitute - which is just as well, or the top of the screen would perpetually show "9:00 AM".
On the other hand, looking back at the phrase example:
To decide what text is (T - text) doubled:
decide on "[T][T]".
"[T][T]" is substituted immediately it's formed. That's also a good thing, because "T" loses its meaning the moment the phrase finishes, which would make "[T][T]" meaningless anywhere else.
What's going on here is this: Inform substitutes text immediately if it contains references to a temporary value such as "T", and otherwise only if it needs to access the contents. This is why "[time of day]" isn't substituted until we need to print it out (or, say, access the third character): "time of day" is a value which always exists, not a temporary one.
Another case where that might be important is if we want to set a text to an elaborated version of itself. For example, suppose there is a variable (not a temporary one) called "the accumulated tally", and consider this:
now the accumulated tally is "[the accumulated tally]X";
The intention of the writer here was to add an "X" each time this happens. But the result is a hang, because what it actually means is that accumulated tally can only be printed if the accumulated tally is printed first… an infinite regress. The safe way to do this would be:
now the accumulated tally is the substituted form of "[the accumulated tally]X";
Using the adjectives "substituted" and "unsubstituted", it's always possible to test whether a given text is in either state, should this ever be useful. For example,
now the left hand status line is "[time of day]";
if the left hand status line is unsubstituted, say "Yes!";
will say "Yes!": the LHSL is like a bomb waiting to go off. Speaking of which:
The player is holding a temporal bomb.
When play begins:
now the left hand status line is "Clock reads: [time of day]".
After dropping the temporal bomb:
now the left hand status line is the substituted form of the left hand status line;
say "Time itself is now broken. Well done."
This is making use of:
Note that there's no analogous phrase for "unsubstituted form of…", because once text has substituted, there's no way to go back.
Examples
000.
Allowing the player to enter a name to be used for the player character during the game.
000.
The sorcerer's mirror can, when held up high, form an impression of its surroundings which it then preserves.
000.
Creating a class of matches that burn for a time and then go out, with elegant reporting when several matches go out at once.
Suppose V is a text which varies - perhaps a property of something, or a variable defined everywhere, or a temporary "let"-named value. How do we change its contents? The easiest way is simply to assign text to it. Thus:
let V be "It is now [the time of the day in words]."
And, for instance,
adds an exclamation mark at the end of V.
Otherwise, it is more useful (also a little faster) to modify V by changing its characters, words and so on. Thus:
replace character number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth character, counting from 1. Example:
let V be "mope";
replace character number 3 in V with "lecul";
say V;
says "molecule".
replace word number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth word, counting from 1, and dividing words at spacing or punctuation. Example:
let V be "Does the well run dry?";
replace word number 3 in V with "jogger";
say V;
says "Does the jogger run dry?".
replace punctuated word number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth word, counting from 1, and dividing words at spacing, counting punctuation runs as words in their own right. Example:
let V be "Frankly, yes, I agree.";
replace punctuated word number 2 in V with ":";
say V;
says "Frankly: yes, I agree.".
replace unpunctuated word number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth word, counting from 1, and dividing words at spacing, counting punctuation as part of a word just as if it were lettering. Example:
let V be "Frankly, yes, I agree.";
replace unpunctuated word number 2 in V with "of course";
say V;
says "Frankly, of course I agree.".
replace line number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth line, counting from 1. Lines are divided by paragraph or line breaks.
replace paragraph number (number) in (text) with (text)
This phrase acts on the named text by placing the given text in place of the Nth paragraph, counting from 1.
Last, but not least, we can replace text wherever it occurs:
replace the text (text) in (text) with (text)
This phrase acts on the named text by searching and replacing, as many non-overlapping times as possible. Example:
replace the text "a" in V with "z"
changes every lower-case "a" to "z": the same thing done with the "case insensitively" option would change each "a" or "A" to "z".
All very well for letters, but it can be unfortunate to try
replace the text "Bob" in V with "Robert"
if V happens to contain, say "The Olympic Bobsleigh Team": it would become "The Olympic Robertsleigh Team". What we want, of course, is for Bob to become Robert only when it's a whole word. We can get that with:
replace the word (text) in (text) with (text)
This phrase acts on the named text by searching and replacing, as many non-overlapping times as possible, where the search text must occur as a whole word. Example:
replace the word "Bob" in V with "Robert"
changes "Bob got on the Bobsleigh" to "Robert got on the Bobsleigh".
replace the punctuated word (text) in (text) with (text)
This phrase acts on the named text by searching and replacing, as many non-overlapping times as possible, where the search text must occur as a whole word or run of punctuation.
But these are all just special cases of the grand-daddy of all replacement phrases:
replace the regular expression (text) in (text) with (text)
This phrase acts on the named text by matching the regular expression and replacing anything which fits it, as many non-overlapping times as possible. Example:
replace the regular expression "\d+" in V with "..."
changes "The Battle of Waterloo, 1815, rivalled Trafalgar, 1805" to "The Battle of Waterloo, …, rivalled Trafalgar, …". The "case insensitively" causes lower and upper case letters to be treated as if the same letter.
When replacing a regular expression, the replacement text also has a few special meanings (though, thankfully, many fewer than for the expression itself). Once again "\n" and "\t" can be used for line break and tab characters, and "\\" must be used for an actual backslash. But, very usefully, "\1" to "\9" expand as the contents of groups numbered 1 to 9, and "\0" to the exact text matched. So:
replace the regular expression "\d+" in V with "roughly \0"
adds the word "roughly" in front of any run of digits in V, because \0 becomes in turn whichever run of digits matched. And
replace the regular expression "(\w+) (.*)" in V with "\2, \1"
performs the transformation "Frank Booth" to "Booth, Frank".
Finally, prefixing the number by "l" or "u" forces the text it represents into lower or upper case, respectively. For instance:
replace the regular expression "\b(\w)(\w*)" in X with "\u1\l2";
changes the casing of X to "title casing", where each individual word is capitalised. (This is a little slow on large texts, since so many matches and replacements are made: it's more efficient to use the official phrases for changing case.)
Examples
000.
Filtering the names of rooms printed while in darkness.
000.
A dog the player can name and un-name at will.
000.
A pig Latin filter for the player's commands.
000.
Letting the player guess types for an unidentifiable fish.
000.
Making Inform understand ASK JOSH TO TAKE INVENTORY as JOSH, TAKE INVENTORY. This requires us to use a regular expression on the player's command, replacing some of the content.
000.
Determining that the command the player typed is invalid, editing it, and re-examining it to see whether it now reads correctly.