Text contents
Block Format.
The short block for a text is two words long: the first word selects which form of storage will be used to represent the content, and the second word is a reference to that content. This reference is an I6 String or Routine in all cases except one, when it's a pointer to a long block containing a null-terminated array of characters, like a C string.
Clearly we need PACKED_TEXT_STORAGE and UNPACKED_TEXT_STORAGE to distinguish between the two basic methods of text storage, roughly equivalent to the pre-2013 kinds "text" and "indexed text". But why do we need four?
CONSTANT_PACKED_TEXT_STORAGE is easy to explain: the BlkValue routines normally detect constants using metadata in their long blocks, but of course that won't work for values which haven't got any long blocks. We use this instead. We don't need a CONSTANT_UNPACKED_TEXT_STORAGE because I7 never compiles constant text in unpacked form.
The surprising one is CONSTANT_PERISHABLE_TEXT_STORAGE. This is a constant created by the I7 compiler which is marked as being tricky because its value is a text substitution containing references to local variables. Unlike other text substitutions, this can't meaningfully be stored away to be expanded later: it must be expanded into unpacked text before it perishes.
33Constant CONSTANT_PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 1;
34Constant CONSTANT_PERISHABLE_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 2;
35Constant PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + 3;
36Constant UNPACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_LONGBLOCK + 4;
Extent Of Long Block.
When there's a long block, we need enough of the entries to store the number of characters, plus one for the null terminator.
43[ TEXT_TY_Extent arg1 x;
44 x = BlkValueSeekZeroEntry(arg1);
45 if (x < 0) return -1;
46 return x+1;
47];
Character Set.
On the Z-machine, we use the 8-bit ZSCII character set, stored in bytes; on Glulx, we use the opening 16-bit subset of Unicode (which though only a subset covers almost all letter forms used on Earth), stored in half-words.
The Z-machine does have very partial Unicode support, but not in a way that can help us here. It is capable of printing a wide range of Unicode characters, and on a good interpreter with a good font (such as Zoom for Mac OS X, using the Lucida Grande font) can produce many thousands of glyphs. But it is not capable of printing those characters into memory rather than the screen, an essential technique for texts: it can only write each character to a single byte, and it does so in ZSCII. That forces our hand when it comes to choosing the indexed-text character set.
64#IFDEF TARGET_ZCODE;
65Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE;
66Constant ZSCII_Tables;
67#IFNOT;
68Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE + BLK_FLAG_16_BIT;
69Constant Large_Unicode_Tables;
70#ENDIF;
71
72{-segment:UnicodeData.i6t}
73{-segment:Char.i6t}
KOV Support.
See the BlockValues.i6t segment for the specification of the following routines. Because no block values are ever stored in a text, they can freely be bitwise copied or forgotten, which is why we need do nothing special to copy or destroy a text.
82[ TEXT_TY_Support task arg1 arg2 arg3;
83 switch(task) {
84 CREATE_KOVS: return TEXT_TY_Create(arg2);
85 CAST_KOVS: TEXT_TY_Cast(arg1, arg2, arg3);
86 MAKEMUTABLE_KOVS: return TEXT_TY_Mutable(arg1);
87 COPYQUICK_KOVS: rtrue;
88 COPYSB_KOVS: TEXT_TY_CopySB(arg1, arg2);
89 KINDDATA_KOVS: return 0;
90 EXTENT_KOVS: return TEXT_TY_Extent(arg1);
91 COMPARE_KOVS: return TEXT_TY_Compare(arg1, arg2);
92 READ_FILE_KOVS: if (arg3 == -1) rtrue;
93 return TEXT_TY_ReadFile(arg1, arg2, arg3);
94 WRITE_FILE_KOVS: return TEXT_TY_WriteFile(arg1);
95 HASH_KOVS: return TEXT_TY_Hash(arg1);
96 DEBUG_KOVS: TEXT_TY_Debug(arg1);
97 }
98
99 rfalse;
100];
Debugging.
This shows the various forms a text's short block can take:
106[ TEXT_TY_Debug txt;
107 switch (txt-->0) {
108 CONSTANT_PACKED_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~";
109 CONSTANT_PERISHABLE_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~";
110 PACKED_TEXT_STORAGE: print " = p~", (PrintI6Text) txt-->1, "~";
111 UNPACKED_TEXT_STORAGE: print " = ~", (TEXT_TY_Say) txt, "~";
112 default: print " broken?";
113 }
114];
Creation.
A newly created text is a two-word short block with no long block, like this:
Array ThisIsAText --> PACKED_TEXT_STORAGE EMPTY_TEXT_PACKED;
122[ TEXT_TY_Create short_block x;
123 return BlkValueCreateSB2(short_block, PACKED_TEXT_STORAGE, EMPTY_TEXT_PACKED);
124];
Copy Short Block.
When a short block for a constant is copied, the new copy isn't a constant any more.
131[ TEXT_TY_CopySB to_bv from_bv;
132 BlkValueCopySB2(to_bv, from_bv);
133 if (to_bv-->0 & BLK_BVBITMAP_CONSTANTMASK) to_bv-->0 = PACKED_TEXT_STORAGE;
134];
Transmutation.
What happens if a text is stored in packed form, but we need to access or change its individual characters? The answer is that we have to "transmute" it into long block form. Sometimes this is a permanent change, but often it's only temporary, and will soon be followed by an un-transmutation.
143[ TEXT_TY_Transmute txt;
144 TEXT_TY_Temporarily_Transmute(txt);
145];
146
147[ TEXT_TY_Temporarily_Transmute txt x;
148 if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0)) {
149 x = txt-->1;
150
151 txt-->0 = UNPACKED_TEXT_STORAGE;
152 txt-->1 = FlexAllocate(32, TEXT_TY, TEXT_TY_Storage_Flags);
153 if (x ~= EMPTY_TEXT_PACKED) TEXT_TY_CastPrimitive(txt, false, x);
154
155 return x;
156 }
157 return 0;
158];
159
160[ TEXT_TY_Untransmute txt pk cp x;
161 if ((pk) && (txt-->0 == UNPACKED_TEXT_STORAGE)) {
162 x = txt-->1;
163 FlexFree(x);
164 txt-->0 = cp;
165 txt-->1 = pk;
166 }
167 return txt;
168];
Mutability.
That neatly handles the question of how to make a text mutable. (Note that constants are never created in unpacked form.)
175[ TEXT_TY_Mutable txt;
176 if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) {
177 TEXT_TY_Transmute(txt);
178 return 0;
179 }
180 return 2;
181];
Casting.
In general computing, "casting" is the process of translating data in one type into semantically equivalent data in another: the only interesting cast here is that a snippet can be turned into a text.
189[ TEXT_TY_Cast to_txt from_kind from_value;
190 if (from_kind == TEXT_TY) {
191 BlkValueCopy(to_txt, from_value);
192 } else if (from_kind == SNIPPET_TY) {
193 TEXT_TY_Transmute(to_txt);
194 TEXT_TY_CastPrimitive(to_txt, true, from_value);
195 } else BlkValueError("impossible cast to text");
196];
197
198[ SNIPPET_TY_to_TEXT_TY to_txt snippet;
199 return BlkValueCast(to_txt, SNIPPET_TY, snippet);
200];
Data Conversion.
We use a single routine to handle two kinds of format translation: a packed I6 string into an unpacked text, or a snippet into an unpacked text.
In each case, what we do is simply to print out the value we have, but with the output stream set to memory rather than the screen. That gives us the character by character version, neatly laid out in an array, and all we have to do is to copy it into the text and add a null termination byte.
What complicates things is that the two virtual machines handle printing to memory quite differently, and that the original text has unpredictable length. We are going to try printing it into the array TEXT_TY_Buffers, but what if the text is too big? Disastrously, the Z-machine simply writes on in memory, corrupting all subsequent arrays and almost certainly causing the story file to crash soon after. There is nothing we can do to predict or avoid this, or to repair the damage: this is why the Inform documentation warns users to be wary of using text with large strings in the Z-machine, and advises the use of Glulx instead. Glulx does handle overruns safely, and indeed allows us to dynamically allocate memory as necessary so that we can always avoid overruns entirely.
In either case, though, it's useful to have TEXT_TY_BufferSize, the size of the temporary buffer, large enough that it will never be overrun in ordinary use. This is controllable with the use option "maximum indexed text length".
229#ifndef TEXT_TY_BufferSize;
230Constant TEXT_TY_BufferSize = 512;
231#endif;
232Constant TEXT_TY_NoBuffers = 2;
233
234#ifdef TARGET_ZCODE;
235Array TEXT_TY_Buffers -> TEXT_TY_BufferSize*TEXT_TY_NoBuffers;
236#ifnot;
237Array TEXT_TY_Buffers --> (TEXT_TY_BufferSize+2)*TEXT_TY_NoBuffers;
238#endif;
239
240Global RawBufferAddress = TEXT_TY_Buffers;
241Global RawBufferSize = TEXT_TY_BufferSize;
242
243Global TEXT_TY_CastPrimitiveNesting = 0;
Z Version.
The two versions of this routine, one for each virtual machine, are in all important respects the same, but there are enough fiddly differences that it's clearer to give two definitions, so:
251#ifdef TARGET_ZCODE;
252[ TEXT_TY_CastPrimitive to_txt from_snippet from_value len news buffer;
253 if (to_txt == 0) BlkValueError("no destination for cast");
254 SuspendRTP();
255 buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*TEXT_TY_BufferSize;
256 TEXT_TY_CastPrimitiveNesting++;
257 if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers)
258 FlexError("ran out with too many simultaneous text conversions");
259
260 @push say__p; @push say__pc;
261 ClearParagraphing(6);
262 @output_stream 3 buffer;
263 if (from_value) {
264 if (from_snippet) print (PrintSnippet) from_value;
265 else print (PrintI6Text) from_value;
266 }
267 @output_stream -3;
268 @pull say__pc; @pull say__p;
269 ResumeRTP();
270
271 len = buffer-->0;
272 if (len > RawBufferSize-1) len = RawBufferSize-1;
273 buffer->(len+2) = 0;
274
275 TEXT_TY_CastPrimitiveNesting--;
276 BlkValueMassCopyFromArray(to_txt, buffer+2, 1, len+1);
277];
282#ifnot;
283[ TEXT_TY_CastPrimitive to_txt from_snippet from_value
284 len i stream saved_stream news buffer buffer_size memory_to_free results;
285
286 if (to_txt == 0) BlkValueError("no destination for cast");
287
288 buffer_size = (TEXT_TY_BufferSize + 2)*WORDSIZE;
289
290 RawBufferSize = TEXT_TY_BufferSize;
291 buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*buffer_size;
292 TEXT_TY_CastPrimitiveNesting++;
293 if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) {
294 buffer = VM_AllocateMemory(buffer_size); memory_to_free = buffer;
295 if (buffer == 0)
296 FlexError("ran out with too many simultaneous text conversions");
297 }
298
299 if (unicode_gestalt_ok) {
300 SuspendRTP();
301 .RetryWithLargerBuffer;
302 saved_stream = glk_stream_get_current();
303 stream = glk_stream_open_memory_uni(buffer, RawBufferSize, filemode_Write, 0);
304 glk_stream_set_current(stream);
305
306 @push say__p; @push say__pc;
307 ClearParagraphing(7);
308 if (from_snippet) print (PrintSnippet) from_value;
309 else print (PrintI6Text) from_value;
310 @pull say__pc; @pull say__p;
311
312 results = buffer + buffer_size - 2*WORDSIZE;
313 glk_stream_close(stream, results);
314 if (saved_stream) glk_stream_set_current(saved_stream);
315 ResumeRTP();
316
317 len = results-->1;
318 if (len > RawBufferSize-1) {
319
320
321 news = RawBufferSize;
322 while (news < len) news=news*2;
323 i = VM_AllocateMemory(news*WORDSIZE);
324 if (i ~= 0) {
325 if (memory_to_free) VM_FreeMemory(memory_to_free);
326 memory_to_free = i;
327 buffer = i;
328 RawBufferSize = news;
329 buffer_size = (RawBufferSize + 2)*WORDSIZE;
330 jump RetryWithLargerBuffer;
331 }
332
333 len = RawBufferSize-1;
334 }
335 buffer-->(len) = 0;
336
337 TEXT_TY_CastPrimitiveNesting--;
338 BlkValueMassCopyFromArray(to_txt, buffer, 4, len+1);
339 } else {
340 RunTimeProblem(RTP_NOGLULXUNICODE);
341 }
342 if (memory_to_free) VM_FreeMemory(memory_to_free);
343];
344#endif;
Comparison.
This is more or less strcmp, the traditional C library routine for comparing strings, but it does pose a few interesting questions. The answers are:
(a) Two different unexpanded texts with substitutions are never equal, so "[X]" and "[Y]" aren't equal as texts even if X and Y are equal. (b) Otherwise we test the current value of the text as expanded, so "[X]" and "17" can be equal as texts if X is 17.
356[ TEXT_TY_Compare left_txt right_txt rv;
357 @push say__comp;
358 say__comp = true;
359 rv = TEXT_TY_Compare_Inner(left_txt, right_txt);
360 @pull say__comp;
361 return rv;
362];
363
364[ TEXT_TY_Compare_Inner left_txt right_txt
365 pos ch1 ch2 capacity_left capacity_right fl fr cl cr cpl cpr;
366 if (left_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fl = true;
367 if (right_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fr = true;
368
369 if (fl && fr) {
370 if ((left_txt-->1 ofclass String) && (right_txt-->1 ofclass String))
371 return left_txt-->1 - right_txt-->1;
372 if ((left_txt-->1 ofclass Routine) && (right_txt-->1 ofclass Routine))
373 return left_txt-->1 - right_txt-->1;
374 cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt);
375 cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt);
376 } else if (fl) {
377 cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt);
378 } else if (fr) {
379 cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt);
380 }
381 if ((cl) || (cr)) {
382 pos = TEXT_TY_Compare(left_txt, right_txt);
383 TEXT_TY_Untransmute(left_txt, cl, cpl);
384 TEXT_TY_Untransmute(right_txt, cr, cpr);
385 return pos;
386 }
387 capacity_left = BlkValueLBCapacity(left_txt);
388 capacity_right = BlkValueLBCapacity(right_txt);
389 for (pos=0:(pos<capacity_left) && (pos<capacity_right):pos++) {
390 ch1 = BlkValueRead(left_txt, pos);
391 ch2 = BlkValueRead(right_txt, pos);
392 if (ch1 ~= ch2) return ch1-ch2;
393 if (ch1 == 0) return 0;
394 }
395 if (pos == capacity_left) return -1;
396 return 1;
397];
398
399[ TEXT_TY_Distinguish left_txt right_txt;
400 if (TEXT_TY_Compare(left_txt, right_txt) == 0) rfalse;
401 rtrue;
402];
Hashing.
This calculates a hash value for the string, using Bernstein's algorithm.
408[ TEXT_TY_Hash txt rv len i p cp;
409 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
410 rv = 0;
411 len = BlkValueLBCapacity(txt);
412 for (i=0: i<len: i++)
413 rv = rv * 33 + BlkValueRead(txt, i);
414 TEXT_TY_Untransmute(txt, p, cp);
415 return rv;
416];
Printing.
Unicode is not the native character set on Glulx: it came along as a late addition to Glulx's specification. The deal is that we have to explicitly tell the Glk interface layer to perform certain operations in a Unicode way; if we simply perform print (char) ch; then the character ch will be printed in ZSCII rather than Unicode.
426[ TEXT_TY_Say txt ch i dsize;
427 if (txt==0) rfalse;
428 if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) return PrintI6Text(txt-->1);
429 dsize = BlkValueLBCapacity(txt);
430 for (i=0: i<dsize: i++) {
431 ch = BlkValueRead(txt, i);
432 if (ch == 0) break;
433 #ifdef TARGET_ZCODE;
434 print (char) ch;
435 #ifnot;
436 @streamunichar ch;
437 #endif;
438 }
439 if (i == 0) rfalse;
440 rtrue;
441];
Capitalised printing.
It turns out to be useful to have a variation on this:
447[ TEXT_TY_Say_Capitalised txt mod rc;
448 mod = BlkValueCreate(TEXT_TY);
449 TEXT_TY_SubstitutedForm(mod, txt);
450 if (TEXT_TY_CharacterLength(mod) > 0) {
451 BlkValueWrite(mod, 0, CharToCase(BlkValueRead(mod, 0), 1));
452 TEXT_TY_Say(mod);
453 rc = true;
454 say__p = 1;
455 }
456 BlkValueFree(mod);
457 return rc;
458];
Serialisation.
Here we print a serialised form of a text which can later be used to reconstruct the original text. The printing is apparently to the screen, but in fact always takes place when the output stream is a file.
The format chosen is a letter "S" for string, then a comma-separated list of decimal character codes, ending with the null terminator, and followed by a semicolon: thus S65,66,67,0; is the serialised form of the text "ABC".
470[ TEXT_TY_WriteFile txt len pos ch p cp;
471 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
472 len = BlkValueLBCapacity(txt);
473 print "S";
474 for (pos=0: pos<=len: pos++) {
475 if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos);
476 if (ch == 0) {
477 print "0;"; break;
478 } else {
479 print ch, ",";
480 }
481 }
482 TEXT_TY_Untransmute(txt, p, cp);
483];
Unserialisation.
If that's the word: the reverse process, in which we read a stream of characters from a file and reconstruct the text which gave rise to them.
491[ TEXT_TY_ReadFile txt auxf ch i v dg pos tsize p;
492 TEXT_TY_Transmute(txt);
493 tsize = BlkValueLBCapacity(txt);
494 while (ch ~= 32 or 9 or 10 or 13 or 0 or -1) {
495 ch = FileIO_GetC(auxf);
496 if (ch == ',' or ';') {
497 if (pos+1 >= tsize) {
498 if (BlkValueSetLBCapacity(txt, 2*pos) == false) break;
499 tsize = BlkValueLBCapacity(txt);
500 }
501 BlkValueWrite(txt, pos++, v);
502 v = 0;
503 if (ch == ';') break;
504 } else {
505 dg = ch - '0';
506 v = v*10 + dg;
507 }
508 }
509 BlkValueWrite(txt, pos, 0);
510 return txt;
511];
516[ TEXT_TY_SubstitutedForm to txt;
517 if (txt) {
518 BlkValueCopy(to, txt);
519 TEXT_TY_Transmute(to);
520 }
521 return to;
522];
523
524[ TEXT_TY_IsSubstituted txt;
525 if ((txt) &&
526 (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) &&
527 (txt-->1 ofclass Routine)) rfalse;
528 rtrue;
529];
Perishability.
As noted above, a perishable constant is one which must be expanded before the values it refers to vanish from existence.
536[ TEXT_TY_ExpandIfPerishable to from;
537 if ((from) && (from-->0 == CONSTANT_PERISHABLE_TEXT_STORAGE))
538 return TEXT_TY_SubstitutedForm(to, from);
539 return from;
540];
Recognition-only-GPR.
An I6 general parsing routine to look at words from the position marker wn in the player's command to see if they match the contents of the text txt, returning either GPR_PREPOSITION or GPR_FAIL according to whether a match could be made. This is used when the an object's name is set to include one of its properties, and the property in question is a text: "A flowerpot is a kind of thing. A flowerpot has a text called pattern. Understand the pattern property as describing a flowerpot." When the player types EXAMINE STRIPED FLOWERPOT, and there is a flowerpot in scope, the following routine is called to test whether its pattern property – a text – matches any words at the position STRIPED FLOWERPOT. Assuming a pot does indeed have the pattern "striped", the routine advances wn by 1 and returns GPR_PREPOSITION to indicate a match.
This kind of GPR is called a "recognition-only-GPR", because it only recognises an existing value: it doesn't parse a new one.
561[ TEXT_TY_ROGPR txt p cp r;
562 if (txt == 0) return GPR_FAIL;
563 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
564 r = TEXT_TY_ROGPRI(txt);
565 TEXT_TY_Untransmute(txt, p, cp);
566 return r;
567];
568[ TEXT_TY_ROGPRI txt
569 pos len wa wl wpos bdm ch own;
570 bdm = true; own = wn;
571 len = BlkValueLBCapacity(txt);
572 for (pos=0: pos<=len: pos++) {
573 if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos);
574 if (ch == 32 or 9 or 10 or 0) {
575 if (bdm) continue;
576 bdm = true;
577 if (wpos ~= wl) return GPR_FAIL;
578 if (ch == 0) break;
579 } else {
580 if (bdm) {
581 bdm = false;
582 if (NextWordStopped() == -1) return GPR_FAIL;
583 wa = WordAddress(wn-1);
584 wl = WordLength(wn-1);
585 wpos = 0;
586 }
587 if (wa->wpos ~= ch or TEXT_TY_RevCase(ch)) return GPR_FAIL;
588 wpos++;
589 }
590 }
591 if (wn == own) return GPR_FAIL;
592 return GPR_PREPOSITION;
593];
Blobs.
That completes the compulsory services required for this KOV to function: from here on, the remaining routines provide definitions of text-related phrases in the Standard Rules.
What are the basic operations of text-handling? Clearly we want to be able to search, and replace, but that is left for the segment RegExp.i6t to handle. More basically we would like to be able to read and write characters from the text. But texts in I7 tend to be of natural language, rather than containing arbitrary material – that's indeed why we call them texts rather than strings. This means they are likely to be punctuated sequences of words, divided up perhaps into sentences and even paragraphs.
So we provide facilities which regard a text as being an array of "blobs", where a "blob" is a unit of text. The user can choose whether to see it as an array of characters, or words (of three different sorts: see the Inform documentation for details), or paragraphs, or lines.
614Constant CHR_BLOB = 1;
615Constant WORD_BLOB = 2;
616Constant PWORD_BLOB = 3;
617Constant UWORD_BLOB = 4;
618Constant PARA_BLOB = 5;
619Constant LINE_BLOB = 6;
620
621Constant REGEXP_BLOB = 7;
Blob Access.
The following routine runs a small finite-state-machine to count the number of blobs in a text, using any of the above blob types (except REGEXP_BLOB, which is used for other purposes). If the optional arguments ctxt and wanted are supplied, it also copies the text of blob number wanted (counting upwards from 1 at the start of the text) into the text ctxt. If the further optional argument rtxt is supplied, then ctxt is instead written with the original text txt as it would read if the blob in question were replaced with the text in rtxt.
634Constant WS_BRM = 1;
635Constant SKIPPED_BRM = 2;
636Constant ACCEPTED_BRM = 3;
637Constant ACCEPTEDP_BRM = 4;
638Constant ACCEPTEDN_BRM = 5;
639Constant ACCEPTEDPN_BRM = 6;
640
641[ TEXT_TY_BlobAccess txt blobtype ctxt wanted rtxt
642 p1 p2 cp1 cp2 r;
643 if (txt==0) return 0;
644 if (blobtype == CHR_BLOB) return TEXT_TY_CharacterLength(txt);
645 cp1 = txt-->0; p1 = TEXT_TY_Temporarily_Transmute(txt);
646 cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt);
647 TEXT_TY_Transmute(ctxt);
648 r = TEXT_TY_BlobAccessI(txt, blobtype, ctxt, wanted, rtxt);
649 TEXT_TY_Untransmute(txt, p1, cp1);
650 TEXT_TY_Untransmute(rtxt, p2, cp2);
651 return r;
652];
653[ TEXT_TY_BlobAccessI txt blobtype ctxt wanted rtxt
654 brm oldbrm ch i dsize csize blobcount gp cl j;
655 dsize = BlkValueLBCapacity(txt);
656 if (ctxt) csize = BlkValueLBCapacity(ctxt);
657 else if (rtxt) "*** rtxt without ctxt ***";
658 brm = WS_BRM;
659 for (i=0:i<dsize:i++) {
660 ch = BlkValueRead(txt, i);
661 if (ch == 0) break;
662 oldbrm = brm;
663 if (ch == 10 or 13 or 32 or 9) {
664 if (oldbrm ~= WS_BRM) {
665 gp = 0;
666 for (j=i:j<dsize:j++) {
667 ch = BlkValueRead(txt, j);
668 if (ch == 0) { brm = WS_BRM; break; }
669 if (ch == 10 or 13) { gp++; continue; }
670 if (ch ~= 32 or 9) break;
671 }
672 ch = BlkValueRead(txt, i);
673 if (j == dsize) brm = WS_BRM;
674 switch (blobtype) {
675 PARA_BLOB: if (gp >= 2) brm = WS_BRM;
676 LINE_BLOB: if (gp >= 1) brm = WS_BRM;
677 default: brm = WS_BRM;
678 }
679 }
680 } else {
681 gp = false;
682 if ((blobtype == WORD_BLOB or PWORD_BLOB or UWORD_BLOB) &&
683 (ch == '.' or ',' or '' or '?'
684 or '-' or '/' or '' or ':' or ';'
685 or '(' or ')' or '[' or ']' or '{' or '}'))
686 gp = true;
687 switch (oldbrm) {
688 WS_BRM:
689 brm = ACCEPTED_BRM;
690 if (blobtype == WORD_BLOB) {
691 if (gp) brm = SKIPPED_BRM;
692 }
693 if (blobtype == PWORD_BLOB) {
694 if (gp) brm = ACCEPTEDP_BRM;
695 }
696 SKIPPED_BRM:
697 if (blobtype == WORD_BLOB) {
698 if (gp == false) brm = ACCEPTED_BRM;
699 }
700 ACCEPTED_BRM:
701 if (blobtype == WORD_BLOB) {
702 if (gp) brm = SKIPPED_BRM;
703 }
704 if (blobtype == PWORD_BLOB) {
705 if (gp) brm = ACCEPTEDP_BRM;
706 }
707 ACCEPTEDP_BRM:
708 if (blobtype == PWORD_BLOB) {
709 if (gp == false) brm = ACCEPTED_BRM;
710 else {
711 if ((ch == BlkValueRead(txt, i-1)) &&
712 (ch == '-' or '.')) blobcount--;
713 blobcount++;
714 }
715 }
716 ACCEPTEDN_BRM:
717 if (blobtype == WORD_BLOB) {
718 if (gp) brm = SKIPPED_BRM;
719 }
720 if (blobtype == PWORD_BLOB) {
721 if (gp) brm = ACCEPTEDP_BRM;
722 }
723 ACCEPTEDPN_BRM:
724 if (blobtype == PWORD_BLOB) {
725 if (gp == false) brm = ACCEPTED_BRM;
726 else {
727 if ((ch == BlkValueRead(txt, i-1)) &&
728 (ch == '-' or '.')) blobcount--;
729 blobcount++;
730 }
731 }
732 }
733 }
734 if (brm == ACCEPTED_BRM or ACCEPTEDP_BRM) {
735 if (oldbrm ~= brm) blobcount++;
736 if ((ctxt) && (blobcount == wanted)) {
737 if (rtxt) {
738 BlkValueWrite(ctxt, cl, 0);
739 TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB);
740 csize = BlkValueLBCapacity(ctxt);
741 cl = TEXT_TY_CharacterLength(ctxt);
742 if (brm == ACCEPTED_BRM) brm = ACCEPTEDN_BRM;
743 if (brm == ACCEPTEDP_BRM) brm = ACCEPTEDPN_BRM;
744 } else {
745 if (cl+1 >= csize) {
746 if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break;
747 csize = BlkValueLBCapacity(ctxt);
748 }
749 BlkValueWrite(ctxt, cl++, ch);
750 }
751 } else {
752 if (rtxt) {
753 if (cl+1 >= csize) {
754 if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break;
755 csize = BlkValueLBCapacity(ctxt);
756 }
757 BlkValueWrite(ctxt, cl++, ch);
758 }
759 }
760 } else {
761 if ((rtxt) && (brm ~= ACCEPTEDN_BRM or ACCEPTEDPN_BRM)) {
762 if (cl+1 >= csize) {
763 if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break;
764 csize = BlkValueLBCapacity(ctxt);
765 }
766 BlkValueWrite(ctxt, cl++, ch);
767 }
768 }
769 }
770 if (ctxt) BlkValueWrite(ctxt, cl++, 0);
771 return blobcount;
772];
Get Blob.
The front end which uses the above routine to read a blob. (Note that, for efficiency's sake, we read characters more directly.)
779[ TEXT_TY_GetBlob ctxt txt wanted blobtype;
780 if (txt==0) return;
781 if (blobtype == CHR_BLOB) return TEXT_TY_GetCharacter(ctxt, txt, wanted);
782 TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted);
783 return ctxt;
784];
Replace Blob.
The front end which uses the above routine to replace a blob. (Once again, characters are handled directly to avoid incurring all that overhead.)
791[ TEXT_TY_ReplaceBlob blobtype txt wanted rtxt ctxt ilen rlen i p cp;
792 TEXT_TY_Transmute(txt);
793 cp = rtxt-->0; p = TEXT_TY_Temporarily_Transmute(rtxt);
794 if (blobtype == CHR_BLOB) {
795 ilen = TEXT_TY_CharacterLength(txt);
796 rlen = TEXT_TY_CharacterLength(rtxt);
797 wanted--;
798 if ((wanted >= 0) && (wanted<ilen)) {
799 if (rlen == 1) {
800 BlkValueWrite(txt, wanted, BlkValueRead(rtxt, 0));
801 } else {
802 ctxt = BlkValueCreate(TEXT_TY);
803 TEXT_TY_Transmute(ctxt);
804 if (BlkValueSetLBCapacity(ctxt, ilen+rlen+1)) {
805 for (i=0:i<wanted:i++)
806 BlkValueWrite(ctxt, i, BlkValueRead(txt, i));
807 for (i=0:i<rlen:i++)
808 BlkValueWrite(ctxt, wanted+i, BlkValueRead(rtxt, i));
809 for (i=wanted+1:i<ilen:i++)
810 BlkValueWrite(ctxt, rlen+i-1, BlkValueRead(txt, i));
811 BlkValueWrite(ctxt, rlen+ilen, 0);
812 BlkValueCopy(txt, ctxt);
813 }
814 BlkValueFree(ctxt);
815 }
816 }
817 } else {
818 ctxt = BlkValueCreate(TEXT_TY);
819 TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted, rtxt);
820 BlkValueCopy(txt, ctxt);
821 BlkValueFree(ctxt);
822 }
823 TEXT_TY_Untransmute(rtxt, p, cp);
824];
Replace Text.
This is the general routine which searches for any instance of ftxt, as a blob, in txt, and replaces it with the text rtxt. It works on any of the above blob-types, but two cases are special: first, if the blob-type is CHR_BLOB, then it can do more than search and replace for any instance of a single character: it can search and replace any instance of a substring, so that ftxt is not required to be only a single character. Second, if the blob-type is the special value REGEXP_BLOB then ftxt is interpreted as a regular expression rather than something literal to find: see RegExp.i6t for what happens next.
838[ TEXT_TY_ReplaceText blobtype txt ftxt rtxt
839 r p1 p2 cp1 cp2;
840 TEXT_TY_Transmute(txt);
841 cp1 = ftxt-->0; p1 = TEXT_TY_Temporarily_Transmute(ftxt);
842 cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt);
843 r = TEXT_TY_ReplaceTextI(blobtype, txt, ftxt, rtxt);
844 TEXT_TY_Untransmute(ftxt, p1, cp1);
845 TEXT_TY_Untransmute(rtxt, p2, cp2);
846 return r;
847];
848
849[ TEXT_TY_ReplaceTextI blobtype txt ftxt rtxt
850 ctxt csize ilen flen i cl mpos ch chm whitespace punctuation;
851 if (blobtype == REGEXP_BLOB or CHR_BLOB)
852 return TEXT_TY_Replace_RE(blobtype, txt, ftxt, rtxt);
853
854 ilen = TEXT_TY_CharacterLength(txt);
855 flen = TEXT_TY_CharacterLength(ftxt);
856 ctxt = BlkValueCreate(TEXT_TY);
857 TEXT_TY_Transmute(ctxt);
858 csize = BlkValueLBCapacity(ctxt);
859 mpos = 0;
860
861 whitespace = true; punctuation = false;
862 for (i=0:i<=ilen:i++) {
863 ch = BlkValueRead(txt, i);
864 .MoreMatching;
865 chm = BlkValueRead(ftxt, mpos++);
866 if (mpos == 1) {
867 switch (blobtype) {
868 WORD_BLOB:
869 if ((whitespace == false) && (punctuation == false)) chm = -1;
870 }
871 }
872 whitespace = false;
873 if (ch == 10 or 13 or 32 or 9) whitespace = true;
874 punctuation = false;
875 if (ch == '.' or ',' or '' or '?'
876 or '-' or '/' or '' or ':' or ';'
877 or '(' or ')' or '[' or ']' or '{' or '}') {
878 if (blobtype == WORD_BLOB) chm = -1;
879 punctuation = true;
880 }
881 if (ch == chm) {
882 if (mpos == flen) {
883 if (i == ilen) chm = 0;
884 else chm = BlkValueRead(txt, i+1);
885 if ((blobtype == CHR_BLOB) ||
886 (chm == 0 or 10 or 13 or 32 or 9) ||
887 (chm == '.' or ',' or '' or '?'
888 or '-' or '/' or '' or ':' or ';'
889 or '(' or ')' or '[' or ']' or '{' or '}')) {
890 mpos = 0;
891 cl = cl - (flen-1);
892 BlkValueWrite(ctxt, cl, 0);
893 TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB);
894 csize = BlkValueLBCapacity(ctxt);
895 cl = TEXT_TY_CharacterLength(ctxt);
896 continue;
897 }
898 }
899 } else {
900 mpos = 0;
901 }
902 if (cl+1 >= csize) {
903 if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break;
904 csize = BlkValueLBCapacity(ctxt);
905 }
906 BlkValueWrite(ctxt, cl++, ch);
907 }
908 BlkValueCopy(txt, ctxt);
909 BlkValueFree(ctxt);
910];
Character Length.
When accessing at the character-by-character level, things are much easier and we needn't go through any finite state machine palaver.
917[ TEXT_TY_CharacterLength txt ch i dsize p cp r;
918 if (txt==0) return 0;
919 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
920 dsize = BlkValueLBCapacity(txt); r = dsize;
921 for (i=0:i<dsize:i++) {
922 ch = BlkValueRead(txt, i);
923 if (ch == 0) { r = i; break; }
924 }
925 TEXT_TY_Untransmute(txt, p, cp);
926 return r;
927];
928
929[ TEXT_TY_Empty txt;
930 if (txt==0) rtrue;
931 if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) {
932 if (txt-->1 == EMPTY_TEXT_PACKED) rtrue;
933 rfalse;
934 }
935 if (TEXT_TY_CharacterLength(txt) == 0) rtrue;
936 rfalse;
937];
Get Character.
Characters in a text are numbered upwards from 1 by the users of this routine: which is why we subtract 1 when reading the array in the block-value, which counts from 0.
945[ TEXT_TY_GetCharacter ctxt txt i ch p cp;
946 if (txt==0) return 0;
947 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
948 TEXT_TY_Transmute(ctxt);
949 if ((i<=0) || (i>TEXT_TY_CharacterLength(txt))) ch = 0;
950 else ch = BlkValueRead(txt, i-1);
951 BlkValueWrite(ctxt, 0, ch);
952 BlkValueWrite(ctxt, 1, 0);
953 TEXT_TY_Untransmute(txt, p, cp);
954 return ctxt;
955];
Casing.
In many programming languages, characters are a distinct data type from strings, but not in I7. To I7, a character is simply a text which happens to have length 1 – this has its inefficiencies, but is conceptually easy for the user.
TEXT_TY_CharactersOfCase(txt, case) determines whether all the characters in txt are letters of the given casing: 0 for lower case, 1 for upper case. In the case of ZSCII, this is done correctly handling all of the European accented letters; in the case of Unicode, it follows the Unicode standard.
Note that there is no requirement for txt to be only a single character long.
972[ TEXT_TY_CharactersOfCase txt case i ch len p cp r;
973 if (txt==0) return 0;
974 cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt);
975 len = TEXT_TY_CharacterLength(txt);
976 r = true;
977 for (i=0:i<len:i++) {
978 ch = BlkValueRead(txt, i);
979 if ((ch) && (CharIsOfCase(ch, case) == false)) { r = false; break; }
980 }
981 TEXT_TY_Untransmute(txt, p, cp);
982 return r;
983];
Change Case.
We set ctxt to the text in txt, except that all the letters are converted to the case given (0 for lower, 1 for upper). The definition of what is a "letter", what case it has and what the other-case form is are as specified in the ZSCII and Unicode standards.
992[ TEXT_TY_CharactersToCase ctxt txt case i ch len bnd pk cp;
993 if (txt==0) return 0;
994 cp = txt-->0; pk = TEXT_TY_Temporarily_Transmute(txt);
995 TEXT_TY_Transmute(ctxt);
996 len = TEXT_TY_CharacterLength(txt);
997 if (BlkValueSetLBCapacity(ctxt, len+1)) {
998 bnd = 1;
999 for (i=0:i<len:i++) {
1000 ch = BlkValueRead(txt, i);
1001 if (case < 2) {
1002 BlkValueWrite(ctxt, i, CharToCase(ch, case));
1003 } else {
1004 BlkValueWrite(ctxt, i, CharToCase(ch, bnd));
1005 if (case == 2) {
1006 bnd = 0;
1007 if (ch == 0 or 10 or 13 or 32 or 9
1008 or '.' or ',' or '' or '?'
1009 or '-' or '/' or '' or ':' or ';'
1010 or '(' or ')' or '[' or ']' or '{' or '}') bnd = 1;
1011 }
1012 if (case == 3) {
1013 if (ch ~= 0 or 10 or 13 or 32 or 9) {
1014 if (bnd == 1) bnd = 0;
1015 else {
1016 if (ch == '.' or '' or '?') bnd = 1;
1017 }
1018 }
1019 }
1020 }
1021 }
1022 BlkValueWrite(ctxt, len, 0);
1023 }
1024 TEXT_TY_Untransmute(txt, pk, cp);
1025 return ctxt;
1026];
Concatenation.
To concatenate two texts is to place one after the other: thus "green" concatenated with "horn" makes "greenhorn". In this routine, from_txt would be "horn", and is added at the end of to_txt, which is returned in its expanded state.
When the blob type is REGEXP_BLOB, the routine is used not for simple concatenation but to handle the concatenations occurring when a regular expression search-and-replace is going on: see RegExp.i6t.
1039[ TEXT_TY_Concatenate to_txt from_txt blobtype ref_txt
1040 p cp r;
1041 if (to_txt==0) rfalse;
1042 if (from_txt==0) return to_txt;
1043 TEXT_TY_Transmute(to_txt);
1044 cp = from_txt-->0; p = TEXT_TY_Temporarily_Transmute(from_txt);
1045 r = TEXT_TY_ConcatenateI(to_txt, from_txt, blobtype, ref_txt);
1046 TEXT_TY_Untransmute(from_txt, p, cp);
1047 return r;
1048];
1049
1050[ TEXT_TY_ConcatenateI to_txt from_txt blobtype ref_txt
1051 pos len ch i tosize x y case;
1052 switch(blobtype) {
1053 CHR_BLOB, 0:
1054 pos = TEXT_TY_CharacterLength(to_txt);
1055 len = TEXT_TY_CharacterLength(from_txt);
1056 if (BlkValueSetLBCapacity(to_txt, pos+len+1) == false) return to_txt;
1057 for (i=0:i<len:i++) {
1058 ch = BlkValueRead(from_txt, i);
1059 BlkValueWrite(to_txt, i+pos, ch);
1060 }
1061 BlkValueWrite(to_txt, len+pos, 0);
1062 return to_txt;
1063 REGEXP_BLOB:
1064 return TEXT_TY_RE_Concatenate(to_txt, from_txt, blobtype, ref_txt);
1065 }
1066 print "*** TEXT_TY_Concatenate used on impossible blob type ***^";
1067 rfalse;
1068];
Setting the Player's Command.
In effect, the text typed most recently by the player is a sort of text already, though it isn't in text format, and doesn't live on the heap.
1076[ SetPlayersCommand from_txt i len at p cp;
1077 cp = from_txt-->0; p = TEXT_TY_Temporarily_Transmute(from_txt);
1078 len = TEXT_TY_CharacterLength(from_txt);
1079 if (len > 118) len = 118;
1080 #ifdef TARGET_ZCODE;
1081 buffer->1 = len; at = 2;
1082 #ifnot;
1083 buffer-->0 = len; at = 4;
1084 #endif;
1085 for (i=0:i<len:i++) buffer->(i+at) = CharToCase(BlkValueRead(from_txt, i), 0);
1086 for (:at+i<120:i++) buffer->(at+i) = ' ';
1087 VM_Tokenise(buffer, parse);
1088 players_command = 100 + WordCount();
1089 TEXT_TY_Untransmute(from_txt, p, cp);
1090];