I6 Template Layer

Inform 7 6M62ContentsIntroductionFunction IndexRules Index

Text.i6t

Text contents

Block Format.

The short block for a text is two words long: the first word selects which form of storage will be used to represent the content, and the second word is a reference to that content. This reference is an I6 String or Routine in all cases except one, when it's a pointer to a long block containing a null-terminated array of characters, like a C string.

Clearly we need PACKED_TEXT_STORAGE and UNPACKED_TEXT_STORAGE to distinguish between the two basic methods of text storage, roughly equivalent to the pre-2013 kinds "text" and "indexed text". But why do we need four?

CONSTANT_PACKED_TEXT_STORAGE is easy to explain: the BlkValue routines normally detect constants using metadata in their long blocks, but of course that won't work for values which haven't got any long blocks. We use this instead. We don't need a CONSTANT_UNPACKED_TEXT_STORAGE because I7 never compiles constant text in unpacked form.

The surprising one is CONSTANT_PERISHABLE_TEXT_STORAGE. This is a constant created by the I7 compiler which is marked as being tricky because its value is a text substitution containing references to local variables. Unlike other text substitutions, this can't meaningfully be stored away to be expanded later: it must be expanded into unpacked text before it perishes.

33Constant CONSTANT_PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 1; 34Constant CONSTANT_PERISHABLE_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 2; 35Constant PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + 3; 36Constant UNPACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_LONGBLOCK + 4;

Extent Of Long Block.

When there's a long block, we need enough of the entries to store the number of characters, plus one for the null terminator.

43[ TEXT_TY_Extent arg1 x; 44    x = BlkValueSeekZeroEntry(arg1); 45    if (x < 0) return -1; ! should not happen, of course 46    return x+1; 47];

Character Set.

On the Z-machine, we use the 8-bit ZSCII character set, stored in bytes; on Glulx, we use the opening 16-bit subset of Unicode (which though only a subset covers almost all letter forms used on Earth), stored in half-words.

The Z-machine does have very partial Unicode support, but not in a way that can help us here. It is capable of printing a wide range of Unicode characters, and on a good interpreter with a good font (such as Zoom for Mac OS X, using the Lucida Grande font) can produce many thousands of glyphs. But it is not capable of printing those characters into memory rather than the screen, an essential technique for texts: it can only write each character to a single byte, and it does so in ZSCII. That forces our hand when it comes to choosing the indexed-text character set.

64#IFDEF TARGET_ZCODE; 65Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE; 66Constant ZSCII_Tables; 67#IFNOT; 68Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE + BLK_FLAG_16_BIT; 69Constant Large_Unicode_Tables; 70#ENDIF; 71 72{-segment:UnicodeData.i6t} 73{-segment:Char.i6t}

KOV Support.

See the BlockValues.i6t segment for the specification of the following routines. Because no block values are ever stored in a text, they can freely be bitwise copied or forgotten, which is why we need do nothing special to copy or destroy a text.

82[ TEXT_TY_Support task arg1 arg2 arg3; 83    switch(task) { 84        CREATE_KOVS: return TEXT_TY_Create(arg2); 85        CAST_KOVS: TEXT_TY_Cast(arg1, arg2, arg3); 86        MAKEMUTABLE_KOVS: return TEXT_TY_Mutable(arg1); 87        COPYQUICK_KOVS: rtrue; 88        COPYSB_KOVS: TEXT_TY_CopySB(arg1, arg2); 89        KINDDATA_KOVS: return 0; 90        EXTENT_KOVS: return TEXT_TY_Extent(arg1); 91        COMPARE_KOVS: return TEXT_TY_Compare(arg1, arg2); 92        READ_FILE_KOVS: if (arg3 == -1) rtrue; 93                          return TEXT_TY_ReadFile(arg1, arg2, arg3); 94        WRITE_FILE_KOVS: return TEXT_TY_WriteFile(arg1); 95        HASH_KOVS: return TEXT_TY_Hash(arg1); 96        DEBUG_KOVS: TEXT_TY_Debug(arg1); 97    } 98    ! We choose not to respond to: DESTROY_KOVS, COPYKIND_KOVS, COPY_KOVS 99    rfalse; 100];

Debugging.

This shows the various forms a text's short block can take:

106[ TEXT_TY_Debug txt; 107    switch (txt-->0) { 108        CONSTANT_PACKED_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; 109        CONSTANT_PERISHABLE_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; 110        PACKED_TEXT_STORAGE: print " = p~", (PrintI6Text) txt-->1, "~"; 111        UNPACKED_TEXT_STORAGE: print " = ~", (TEXT_TY_Say) txt, "~"; 112        default: print " broken?"; 113    } 114];

Creation.

A newly created text is a two-word short block with no long block, like this:

Array ThisIsAText --> PACKED_TEXT_STORAGE EMPTY_TEXT_PACKED;

122[ TEXT_TY_Create short_block x; 123    return BlkValueCreateSB2(short_block, PACKED_TEXT_STORAGE, EMPTY_TEXT_PACKED); 124];

Copy Short Block.

When a short block for a constant is copied, the new copy isn't a constant any more.

131[ TEXT_TY_CopySB to_bv from_bv; 132    BlkValueCopySB2(to_bv, from_bv); 133    if (to_bv-->0 & BLK_BVBITMAP_CONSTANTMASK) to_bv-->0 = PACKED_TEXT_STORAGE; 134];

Transmutation.

What happens if a text is stored in packed form, but we need to access or change its individual characters? The answer is that we have to "transmute" it into long block form. Sometimes this is a permanent change, but often it's only temporary, and will soon be followed by an un-transmutation.

143[ TEXT_TY_Transmute txt; 144    TEXT_TY_Temporarily_Transmute(txt); 145]; 146 147[ TEXT_TY_Temporarily_Transmute txt x; 148    if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0)) { 149        x = txt-->1; ! The old value was a packed string 150         151        txt-->0 = UNPACKED_TEXT_STORAGE; 152        txt-->1 = FlexAllocate(32, TEXT_TY, TEXT_TY_Storage_Flags); 153        if (x ~= EMPTY_TEXT_PACKED) TEXT_TY_CastPrimitive(txt, false, x); 154         155        return x; 156    } 157    return 0; 158]; 159 160[ TEXT_TY_Untransmute txt pk cp x; 161    if ((pk) && (txt-->0 == UNPACKED_TEXT_STORAGE)) { 162        x = txt-->1; ! The old value was an unpacked string 163        FlexFree(x); 164        txt-->0 = cp; 165        txt-->1 = pk; ! The value earlier returned by TEXT_TY_Temporarily_Transmute 166    } 167    return txt; 168];

Mutability.

That neatly handles the question of how to make a text mutable. (Note that constants are never created in unpacked form.)

175[ TEXT_TY_Mutable txt; 176    if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { 177        TEXT_TY_Transmute(txt); 178        return 0; 179    } 180    return 2; ! Tell BlockValue there's a long block pointer 181];

Casting.

In general computing, "casting" is the process of translating data in one type into semantically equivalent data in another: the only interesting cast here is that a snippet can be turned into a text.

189[ TEXT_TY_Cast to_txt from_kind from_value; 190    if (from_kind == TEXT_TY) { 191        BlkValueCopy(to_txt, from_value); 192    } else if (from_kind == SNIPPET_TY) { 193        TEXT_TY_Transmute(to_txt); 194        TEXT_TY_CastPrimitive(to_txt, true, from_value); 195    } else BlkValueError("impossible cast to text"); 196]; 197 198[ SNIPPET_TY_to_TEXT_TY to_txt snippet; 199    return BlkValueCast(to_txt, SNIPPET_TY, snippet); 200];

Data Conversion.

We use a single routine to handle two kinds of format translation: a packed I6 string into an unpacked text, or a snippet into an unpacked text.

In each case, what we do is simply to print out the value we have, but with the output stream set to memory rather than the screen. That gives us the character by character version, neatly laid out in an array, and all we have to do is to copy it into the text and add a null termination byte.

What complicates things is that the two virtual machines handle printing to memory quite differently, and that the original text has unpredictable length. We are going to try printing it into the array TEXT_TY_Buffers, but what if the text is too big? Disastrously, the Z-machine simply writes on in memory, corrupting all subsequent arrays and almost certainly causing the story file to crash soon after. There is nothing we can do to predict or avoid this, or to repair the damage: this is why the Inform documentation warns users to be wary of using text with large strings in the Z-machine, and advises the use of Glulx instead. Glulx does handle overruns safely, and indeed allows us to dynamically allocate memory as necessary so that we can always avoid overruns entirely.

In either case, though, it's useful to have TEXT_TY_BufferSize, the size of the temporary buffer, large enough that it will never be overrun in ordinary use. This is controllable with the use option "maximum indexed text length".

229#ifndef TEXT_TY_BufferSize; 230Constant TEXT_TY_BufferSize = 512; 231#endif; 232Constant TEXT_TY_NoBuffers = 2; 233 234#ifdef TARGET_ZCODE; 235Array TEXT_TY_Buffers -> TEXT_TY_BufferSize*TEXT_TY_NoBuffers; ! Where characters are bytes 236#ifnot; 237Array TEXT_TY_Buffers --> (TEXT_TY_BufferSize+2)*TEXT_TY_NoBuffers; ! Where characters are words 238#endif; 239 240Global RawBufferAddress = TEXT_TY_Buffers; 241Global RawBufferSize = TEXT_TY_BufferSize; 242 243Global TEXT_TY_CastPrimitiveNesting = 0;

Z Version.

The two versions of this routine, one for each virtual machine, are in all important respects the same, but there are enough fiddly differences that it's clearer to give two definitions, so:

251#ifdef TARGET_ZCODE; 252[ TEXT_TY_CastPrimitive to_txt from_snippet from_value len news buffer; 253    if (to_txt == 0) BlkValueError("no destination for cast"); 254    SuspendRTP(); 255    buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*TEXT_TY_BufferSize; 256    TEXT_TY_CastPrimitiveNesting++; 257    if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) 258        FlexError("ran out with too many simultaneous text conversions"); 259 260    @push say__p; @push say__pc; 261    ClearParagraphing(6); 262    @output_stream 3 buffer; 263    if (from_value) { 264        if (from_snippet) print (PrintSnippet) from_value; 265        else print (PrintI6Text) from_value; 266    } 267    @output_stream -3; 268    @pull say__pc; @pull say__p; 269    ResumeRTP(); 270 271    len = buffer-->0; 272    if (len > RawBufferSize-1) len = RawBufferSize-1; 273    buffer->(len+2) = 0; 274 275    TEXT_TY_CastPrimitiveNesting--; 276    BlkValueMassCopyFromArray(to_txt, buffer+2, 1, len+1); 277];

Glulx Version.

282#ifnot; ! TARGET_ZCODE 283[ TEXT_TY_CastPrimitive to_txt from_snippet from_value 284    len i stream saved_stream news buffer buffer_size memory_to_free results; 285 286    if (to_txt == 0) BlkValueError("no destination for cast"); 287 288    buffer_size = (TEXT_TY_BufferSize + 2)*WORDSIZE; 289     290    RawBufferSize = TEXT_TY_BufferSize; 291    buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*buffer_size; 292    TEXT_TY_CastPrimitiveNesting++; 293    if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) { 294        buffer = VM_AllocateMemory(buffer_size); memory_to_free = buffer; 295        if (buffer == 0) 296            FlexError("ran out with too many simultaneous text conversions"); 297    } 298 299    if (unicode_gestalt_ok) { 300        SuspendRTP(); 301        .RetryWithLargerBuffer; 302        saved_stream = glk_stream_get_current(); 303        stream = glk_stream_open_memory_uni(buffer, RawBufferSize, filemode_Write, 0); 304        glk_stream_set_current(stream); 305 306        @push say__p; @push say__pc; 307        ClearParagraphing(7); 308        if (from_snippet) print (PrintSnippet) from_value; 309        else print (PrintI6Text) from_value; 310        @pull say__pc; @pull say__p; 311 312        results = buffer + buffer_size - 2*WORDSIZE; 313        glk_stream_close(stream, results); 314        if (saved_stream) glk_stream_set_current(saved_stream); 315        ResumeRTP(); 316 317        len = results-->1; 318        if (len > RawBufferSize-1) { 319            ! Glulx had to truncate text output because the buffer ran out: 320            ! len is the number of characters which it tried to print 321            news = RawBufferSize; 322            while (news < len) news=news*2; 323            i = VM_AllocateMemory(news*WORDSIZE); 324            if (i ~= 0) { 325                if (memory_to_free) VM_FreeMemory(memory_to_free); 326                memory_to_free = i; 327                buffer = i; 328                RawBufferSize = news; 329                buffer_size = (RawBufferSize + 2)*WORDSIZE; 330                jump RetryWithLargerBuffer; 331            } 332            ! Memory allocation refused: all we can do is to truncate the text 333            len = RawBufferSize-1; 334        } 335        buffer-->(len) = 0; 336 337        TEXT_TY_CastPrimitiveNesting--; 338        BlkValueMassCopyFromArray(to_txt, buffer, 4, len+1); 339    } else { 340        RunTimeProblem(RTP_NOGLULXUNICODE); 341    } 342    if (memory_to_free) VM_FreeMemory(memory_to_free); 343]; 344#endif;

Comparison.

This is more or less strcmp, the traditional C library routine for comparing strings, but it does pose a few interesting questions. The answers are:

(a) Two different unexpanded texts with substitutions are never equal, so "[X]" and "[Y]" aren't equal as texts even if X and Y are equal. (b) Otherwise we test the current value of the text as expanded, so "[X]" and "17" can be equal as texts if X is 17.

356[ TEXT_TY_Compare left_txt right_txt rv; 357    @push say__comp; 358    say__comp = true; 359    rv = TEXT_TY_Compare_Inner(left_txt, right_txt); 360    @pull say__comp; 361    return rv; 362]; 363 364[ TEXT_TY_Compare_Inner left_txt right_txt 365    pos ch1 ch2 capacity_left capacity_right fl fr cl cr cpl cpr; 366    if (left_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fl = true; 367    if (right_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fr = true; 368 369    if (fl && fr) { 370        if ((left_txt-->1 ofclass String) && (right_txt-->1 ofclass String)) 371            return left_txt-->1 - right_txt-->1; 372        if ((left_txt-->1 ofclass Routine) && (right_txt-->1 ofclass Routine)) 373            return left_txt-->1 - right_txt-->1; 374        cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); 375        cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); 376    } else if (fl) { 377        cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); 378    } else if (fr) { 379        cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); 380    } 381    if ((cl) || (cr)) { 382        pos = TEXT_TY_Compare(left_txt, right_txt); 383        TEXT_TY_Untransmute(left_txt, cl, cpl); 384        TEXT_TY_Untransmute(right_txt, cr, cpr); 385        return pos; 386    } 387    capacity_left = BlkValueLBCapacity(left_txt); 388    capacity_right = BlkValueLBCapacity(right_txt); 389    for (pos=0:(pos<capacity_left) && (pos<capacity_right):pos++) { 390        ch1 = BlkValueRead(left_txt, pos); 391        ch2 = BlkValueRead(right_txt, pos); 392        if (ch1 ~= ch2) return ch1-ch2; 393        if (ch1 == 0) return 0; 394    } 395    if (pos == capacity_left) return -1; 396    return 1; 397]; 398 399[ TEXT_TY_Distinguish left_txt right_txt; 400    if (TEXT_TY_Compare(left_txt, right_txt) == 0) rfalse; 401    rtrue; 402];

Hashing.

This calculates a hash value for the string, using Bernstein's algorithm.

408[ TEXT_TY_Hash txt rv len i p cp; 409    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 410    rv = 0; 411    len = BlkValueLBCapacity(txt); 412    for (i=0: i<len: i++) 413        rv = rv * 33 + BlkValueRead(txt, i); 414    TEXT_TY_Untransmute(txt, p, cp); 415    return rv; 416];

Printing.

Unicode is not the native character set on Glulx: it came along as a late addition to Glulx's specification. The deal is that we have to explicitly tell the Glk interface layer to perform certain operations in a Unicode way; if we simply perform print (char) ch; then the character ch will be printed in ZSCII rather than Unicode.

426[ TEXT_TY_Say txt ch i dsize; 427    if (txt==0) rfalse; 428    if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) return PrintI6Text(txt-->1); 429    dsize = BlkValueLBCapacity(txt); 430    for (i=0: i<dsize: i++) { 431        ch = BlkValueRead(txt, i); 432        if (ch == 0) break; 433        #ifdef TARGET_ZCODE; 434        print (char) ch; 435        #ifnot; ! TARGET_ZCODE 436        @streamunichar ch; 437        #endif; 438    } 439    if (i == 0) rfalse; 440    rtrue; 441];

Capitalised printing.

It turns out to be useful to have a variation on this:

447[ TEXT_TY_Say_Capitalised txt mod rc; 448    mod = BlkValueCreate(TEXT_TY); 449    TEXT_TY_SubstitutedForm(mod, txt); 450    if (TEXT_TY_CharacterLength(mod) > 0) { 451        BlkValueWrite(mod, 0, CharToCase(BlkValueRead(mod, 0), 1)); 452        TEXT_TY_Say(mod); 453        rc = true; 454        say__p = 1; 455    } 456    BlkValueFree(mod); 457    return rc; 458];

Serialisation.

Here we print a serialised form of a text which can later be used to reconstruct the original text. The printing is apparently to the screen, but in fact always takes place when the output stream is a file.

The format chosen is a letter "S" for string, then a comma-separated list of decimal character codes, ending with the null terminator, and followed by a semicolon: thus S65,66,67,0; is the serialised form of the text "ABC".

470[ TEXT_TY_WriteFile txt len pos ch p cp; 471    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 472    len = BlkValueLBCapacity(txt); 473    print "S"; 474    for (pos=0: pos<=len: pos++) { 475        if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos); 476        if (ch == 0) { 477            print "0;"; break; 478        } else { 479            print ch, ","; 480        } 481    } 482    TEXT_TY_Untransmute(txt, p, cp); 483];

Unserialisation.

If that's the word: the reverse process, in which we read a stream of characters from a file and reconstruct the text which gave rise to them.

491[ TEXT_TY_ReadFile txt auxf ch i v dg pos tsize p; 492    TEXT_TY_Transmute(txt); 493    tsize = BlkValueLBCapacity(txt); 494    while (ch ~= 32 or 9 or 10 or 13 or 0 or -1) { 495        ch = FileIO_GetC(auxf); 496        if (ch == ',' or ';') { 497            if (pos+1 >= tsize) { 498                if (BlkValueSetLBCapacity(txt, 2*pos) == false) break; 499                tsize = BlkValueLBCapacity(txt); 500            } 501            BlkValueWrite(txt, pos++, v); 502            v = 0; 503            if (ch == ';') break; 504        } else { 505            dg = ch - '0'; 506            v = v*10 + dg; 507        } 508    } 509    BlkValueWrite(txt, pos, 0); 510    return txt; 511];

Substitution.

516[ TEXT_TY_SubstitutedForm to txt; 517    if (txt) { 518        BlkValueCopy(to, txt); 519        TEXT_TY_Transmute(to); 520    } 521    return to; 522]; 523 524[ TEXT_TY_IsSubstituted txt; 525    if ((txt) && 526        (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) && 527        (txt-->1 ofclass Routine)) rfalse; 528    rtrue; 529];

Perishability.

As noted above, a perishable constant is one which must be expanded before the values it refers to vanish from existence.

536[ TEXT_TY_ExpandIfPerishable to from; 537    if ((from) && (from-->0 == CONSTANT_PERISHABLE_TEXT_STORAGE)) 538        return TEXT_TY_SubstitutedForm(to, from); 539    return from; 540];

Recognition-only-GPR.

An I6 general parsing routine to look at words from the position marker wn in the player's command to see if they match the contents of the text txt, returning either GPR_PREPOSITION or GPR_FAIL according to whether a match could be made. This is used when the an object's name is set to include one of its properties, and the property in question is a text: "A flowerpot is a kind of thing. A flowerpot has a text called pattern. Understand the pattern property as describing a flowerpot." When the player types EXAMINE STRIPED FLOWERPOT, and there is a flowerpot in scope, the following routine is called to test whether its pattern property – a text – matches any words at the position STRIPED FLOWERPOT. Assuming a pot does indeed have the pattern "striped", the routine advances wn by 1 and returns GPR_PREPOSITION to indicate a match.

This kind of GPR is called a "recognition-only-GPR", because it only recognises an existing value: it doesn't parse a new one.

561[ TEXT_TY_ROGPR txt p cp r; 562    if (txt == 0) return GPR_FAIL; 563    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 564    r = TEXT_TY_ROGPRI(txt); 565    TEXT_TY_Untransmute(txt, p, cp); 566    return r; 567]; 568[ TEXT_TY_ROGPRI txt 569    pos len wa wl wpos bdm ch own; 570    bdm = true; own = wn; 571    len = BlkValueLBCapacity(txt); 572    for (pos=0: pos<=len: pos++) { 573        if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos); 574        if (ch == 32 or 9 or 10 or 0) { 575            if (bdm) continue; 576            bdm = true; 577            if (wpos ~= wl) return GPR_FAIL; 578            if (ch == 0) break; 579        } else { 580            if (bdm) { 581                bdm = false; 582                if (NextWordStopped() == -1) return GPR_FAIL; 583                wa = WordAddress(wn-1); 584                wl = WordLength(wn-1); 585                wpos = 0; 586            } 587            if (wa->wpos ~= ch or TEXT_TY_RevCase(ch)) return GPR_FAIL; 588            wpos++; 589        } 590    } 591    if (wn == own) return GPR_FAIL; ! Progress must be made to avoid looping 592    return GPR_PREPOSITION; 593];

Blobs.

That completes the compulsory services required for this KOV to function: from here on, the remaining routines provide definitions of text-related phrases in the Standard Rules.

What are the basic operations of text-handling? Clearly we want to be able to search, and replace, but that is left for the segment RegExp.i6t to handle. More basically we would like to be able to read and write characters from the text. But texts in I7 tend to be of natural language, rather than containing arbitrary material – that's indeed why we call them texts rather than strings. This means they are likely to be punctuated sequences of words, divided up perhaps into sentences and even paragraphs.

So we provide facilities which regard a text as being an array of "blobs", where a "blob" is a unit of text. The user can choose whether to see it as an array of characters, or words (of three different sorts: see the Inform documentation for details), or paragraphs, or lines.

614Constant CHR_BLOB = 1; ! Construe as an array of characters 615Constant WORD_BLOB = 2; ! Of words 616Constant PWORD_BLOB = 3; ! Of punctuated words 617Constant UWORD_BLOB = 4; ! Of unpunctuated words 618Constant PARA_BLOB = 5; ! Of paragraphs 619Constant LINE_BLOB = 6; ! Of lines 620 621Constant REGEXP_BLOB = 7; ! Not a blob type as such, but needed as a distinct value

Blob Access.

The following routine runs a small finite-state-machine to count the number of blobs in a text, using any of the above blob types (except REGEXP_BLOB, which is used for other purposes). If the optional arguments ctxt and wanted are supplied, it also copies the text of blob number wanted (counting upwards from 1 at the start of the text) into the text ctxt. If the further optional argument rtxt is supplied, then ctxt is instead written with the original text txt as it would read if the blob in question were replaced with the text in rtxt.

634Constant WS_BRM = 1; 635Constant SKIPPED_BRM = 2; 636Constant ACCEPTED_BRM = 3; 637Constant ACCEPTEDP_BRM = 4; 638Constant ACCEPTEDN_BRM = 5; 639Constant ACCEPTEDPN_BRM = 6; 640 641[ TEXT_TY_BlobAccess txt blobtype ctxt wanted rtxt 642    p1 p2 cp1 cp2 r; 643    if (txt==0) return 0; 644    if (blobtype == CHR_BLOB) return TEXT_TY_CharacterLength(txt); 645    cp1 = txt-->0; p1 = TEXT_TY_Temporarily_Transmute(txt); 646    cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); 647    TEXT_TY_Transmute(ctxt); 648    r = TEXT_TY_BlobAccessI(txt, blobtype, ctxt, wanted, rtxt); 649    TEXT_TY_Untransmute(txt, p1, cp1); 650    TEXT_TY_Untransmute(rtxt, p2, cp2); 651    return r; 652]; 653[ TEXT_TY_BlobAccessI txt blobtype ctxt wanted rtxt 654    brm oldbrm ch i dsize csize blobcount gp cl j; 655    dsize = BlkValueLBCapacity(txt); 656    if (ctxt) csize = BlkValueLBCapacity(ctxt); 657    else if (rtxt) "*** rtxt without ctxt ***"; 658    brm = WS_BRM; 659    for (i=0:i<dsize:i++) { 660        ch = BlkValueRead(txt, i); 661        if (ch == 0) break; 662        oldbrm = brm; 663        if (ch == 10 or 13 or 32 or 9) { 664            if (oldbrm ~= WS_BRM) { 665                gp = 0; 666                for (j=i:j<dsize:j++) { 667                    ch = BlkValueRead(txt, j); 668                    if (ch == 0) { brm = WS_BRM; break; } 669                    if (ch == 10 or 13) { gp++; continue; } 670                    if (ch ~= 32 or 9) break; 671                } 672                ch = BlkValueRead(txt, i); 673                if (j == dsize) brm = WS_BRM; 674                switch (blobtype) { 675                    PARA_BLOB: if (gp >= 2) brm = WS_BRM; 676                    LINE_BLOB: if (gp >= 1) brm = WS_BRM; 677                    default: brm = WS_BRM; 678                } 679            } 680        } else { 681            gp = false; 682            if ((blobtype == WORD_BLOB or PWORD_BLOB or UWORD_BLOB) && 683                (ch == '.' or ',' or '' or '?' 684                        or '-' or '/' or '' or ':' or ';' 685                        or '(' or ')' or '[' or ']' or '{' or '}')) 686                gp = true; 687            switch (oldbrm) { 688                WS_BRM: 689                    brm = ACCEPTED_BRM; 690                    if (blobtype == WORD_BLOB) { 691                        if (gp) brm = SKIPPED_BRM; 692                    } 693                    if (blobtype == PWORD_BLOB) { 694                        if (gp) brm = ACCEPTEDP_BRM; 695                    } 696                SKIPPED_BRM: 697                    if (blobtype == WORD_BLOB) { 698                        if (gp == false) brm = ACCEPTED_BRM; 699                    } 700                ACCEPTED_BRM: 701                    if (blobtype == WORD_BLOB) { 702                        if (gp) brm = SKIPPED_BRM; 703                    } 704                    if (blobtype == PWORD_BLOB) { 705                        if (gp) brm = ACCEPTEDP_BRM; 706                    } 707                ACCEPTEDP_BRM: 708                    if (blobtype == PWORD_BLOB) { 709                        if (gp == false) brm = ACCEPTED_BRM; 710                        else { 711                            if ((ch == BlkValueRead(txt, i-1)) && 712                                (ch == '-' or '.')) blobcount--; 713                            blobcount++; 714                        } 715                    } 716                ACCEPTEDN_BRM: 717                    if (blobtype == WORD_BLOB) { 718                        if (gp) brm = SKIPPED_BRM; 719                    } 720                    if (blobtype == PWORD_BLOB) { 721                        if (gp) brm = ACCEPTEDP_BRM; 722                    } 723                ACCEPTEDPN_BRM: 724                    if (blobtype == PWORD_BLOB) { 725                        if (gp == false) brm = ACCEPTED_BRM; 726                        else { 727                            if ((ch == BlkValueRead(txt, i-1)) && 728                                (ch == '-' or '.')) blobcount--; 729                            blobcount++; 730                        } 731                    } 732            } 733        } 734        if (brm == ACCEPTED_BRM or ACCEPTEDP_BRM) { 735            if (oldbrm ~= brm) blobcount++; 736            if ((ctxt) && (blobcount == wanted)) { 737                if (rtxt) { 738                    BlkValueWrite(ctxt, cl, 0); 739                    TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); 740                    csize = BlkValueLBCapacity(ctxt); 741                    cl = TEXT_TY_CharacterLength(ctxt); 742                    if (brm == ACCEPTED_BRM) brm = ACCEPTEDN_BRM; 743                    if (brm == ACCEPTEDP_BRM) brm = ACCEPTEDPN_BRM; 744                } else { 745                    if (cl+1 >= csize) { 746                        if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; 747                        csize = BlkValueLBCapacity(ctxt); 748                    } 749                    BlkValueWrite(ctxt, cl++, ch); 750                } 751            } else { 752                if (rtxt) { 753                    if (cl+1 >= csize) { 754                        if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; 755                        csize = BlkValueLBCapacity(ctxt); 756                    } 757                    BlkValueWrite(ctxt, cl++, ch); 758                } 759            } 760        } else { 761            if ((rtxt) && (brm ~= ACCEPTEDN_BRM or ACCEPTEDPN_BRM)) { 762                if (cl+1 >= csize) { 763                    if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; 764                    csize = BlkValueLBCapacity(ctxt); 765                } 766                BlkValueWrite(ctxt, cl++, ch); 767            } 768        } 769    } 770    if (ctxt) BlkValueWrite(ctxt, cl++, 0); 771    return blobcount; 772];

Get Blob.

The front end which uses the above routine to read a blob. (Note that, for efficiency's sake, we read characters more directly.)

779[ TEXT_TY_GetBlob ctxt txt wanted blobtype; 780    if (txt==0) return; 781    if (blobtype == CHR_BLOB) return TEXT_TY_GetCharacter(ctxt, txt, wanted); 782    TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted); 783    return ctxt; 784];

Replace Blob.

The front end which uses the above routine to replace a blob. (Once again, characters are handled directly to avoid incurring all that overhead.)

791[ TEXT_TY_ReplaceBlob blobtype txt wanted rtxt ctxt ilen rlen i p cp; 792    TEXT_TY_Transmute(txt); 793    cp = rtxt-->0; p = TEXT_TY_Temporarily_Transmute(rtxt); 794    if (blobtype == CHR_BLOB) { 795        ilen = TEXT_TY_CharacterLength(txt); 796        rlen = TEXT_TY_CharacterLength(rtxt); 797        wanted--; 798        if ((wanted >= 0) && (wanted<ilen)) { 799            if (rlen == 1) { 800                BlkValueWrite(txt, wanted, BlkValueRead(rtxt, 0)); 801            } else { 802                ctxt = BlkValueCreate(TEXT_TY); 803                TEXT_TY_Transmute(ctxt); 804                if (BlkValueSetLBCapacity(ctxt, ilen+rlen+1)) { 805                    for (i=0:i<wanted:i++) 806                        BlkValueWrite(ctxt, i, BlkValueRead(txt, i)); 807                    for (i=0:i<rlen:i++) 808                        BlkValueWrite(ctxt, wanted+i, BlkValueRead(rtxt, i)); 809                    for (i=wanted+1:i<ilen:i++) 810                        BlkValueWrite(ctxt, rlen+i-1, BlkValueRead(txt, i)); 811                    BlkValueWrite(ctxt, rlen+ilen, 0); 812                    BlkValueCopy(txt, ctxt); 813                } 814                BlkValueFree(ctxt); 815            } 816        } 817    } else { 818        ctxt = BlkValueCreate(TEXT_TY); 819        TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted, rtxt); 820        BlkValueCopy(txt, ctxt); 821        BlkValueFree(ctxt); 822    } 823    TEXT_TY_Untransmute(rtxt, p, cp); 824];

Replace Text.

This is the general routine which searches for any instance of ftxt, as a blob, in txt, and replaces it with the text rtxt. It works on any of the above blob-types, but two cases are special: first, if the blob-type is CHR_BLOB, then it can do more than search and replace for any instance of a single character: it can search and replace any instance of a substring, so that ftxt is not required to be only a single character. Second, if the blob-type is the special value REGEXP_BLOB then ftxt is interpreted as a regular expression rather than something literal to find: see RegExp.i6t for what happens next.

838[ TEXT_TY_ReplaceText blobtype txt ftxt rtxt 839    r p1 p2 cp1 cp2; 840    TEXT_TY_Transmute(txt); 841    cp1 = ftxt-->0; p1 = TEXT_TY_Temporarily_Transmute(ftxt); 842    cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); 843    r = TEXT_TY_ReplaceTextI(blobtype, txt, ftxt, rtxt); 844    TEXT_TY_Untransmute(ftxt, p1, cp1); 845    TEXT_TY_Untransmute(rtxt, p2, cp2); 846    return r; 847]; 848 849[ TEXT_TY_ReplaceTextI blobtype txt ftxt rtxt 850    ctxt csize ilen flen i cl mpos ch chm whitespace punctuation; 851    if (blobtype == REGEXP_BLOB or CHR_BLOB) 852        return TEXT_TY_Replace_RE(blobtype, txt, ftxt, rtxt); 853     854    ilen = TEXT_TY_CharacterLength(txt); 855    flen = TEXT_TY_CharacterLength(ftxt); 856    ctxt = BlkValueCreate(TEXT_TY); 857    TEXT_TY_Transmute(ctxt); 858    csize = BlkValueLBCapacity(ctxt); 859    mpos = 0; 860 861    whitespace = true; punctuation = false; 862    for (i=0:i<=ilen:i++) { 863        ch = BlkValueRead(txt, i); 864        .MoreMatching; 865        chm = BlkValueRead(ftxt, mpos++); 866        if (mpos == 1) { 867            switch (blobtype) { 868                WORD_BLOB: 869                    if ((whitespace == false) && (punctuation == false)) chm = -1; 870            } 871        } 872        whitespace = false; 873        if (ch == 10 or 13 or 32 or 9) whitespace = true; 874        punctuation = false; 875        if (ch == '.' or ',' or '' or '?' 876            or '-' or '/' or '' or ':' or ';' 877            or '(' or ')' or '[' or ']' or '{' or '}') { 878            if (blobtype == WORD_BLOB) chm = -1; 879            punctuation = true; 880        } 881        if (ch == chm) { 882            if (mpos == flen) { 883                if (i == ilen) chm = 0; 884                else chm = BlkValueRead(txt, i+1); 885                if ((blobtype == CHR_BLOB) || 886                    (chm == 0 or 10 or 13 or 32 or 9) || 887                    (chm == '.' or ',' or '' or '?' 888                        or '-' or '/' or '' or ':' or ';' 889                        or '(' or ')' or '[' or ']' or '{' or '}')) { 890                    mpos = 0; 891                    cl = cl - (flen-1); 892                    BlkValueWrite(ctxt, cl, 0); 893                    TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); 894                    csize = BlkValueLBCapacity(ctxt); 895                    cl = TEXT_TY_CharacterLength(ctxt); 896                    continue; 897                } 898            } 899        } else { 900            mpos = 0; 901        } 902        if (cl+1 >= csize) { 903            if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; 904            csize = BlkValueLBCapacity(ctxt); 905        } 906        BlkValueWrite(ctxt, cl++, ch); 907    } 908    BlkValueCopy(txt, ctxt); 909    BlkValueFree(ctxt); 910];

Character Length.

When accessing at the character-by-character level, things are much easier and we needn't go through any finite state machine palaver.

917[ TEXT_TY_CharacterLength txt ch i dsize p cp r; 918    if (txt==0) return 0; 919    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 920    dsize = BlkValueLBCapacity(txt); r = dsize; 921    for (i=0:i<dsize:i++) { 922        ch = BlkValueRead(txt, i); 923        if (ch == 0) { r = i; break; } 924    } 925    TEXT_TY_Untransmute(txt, p, cp); 926    return r; 927]; 928 929[ TEXT_TY_Empty txt; 930    if (txt==0) rtrue; 931    if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { 932        if (txt-->1 == EMPTY_TEXT_PACKED) rtrue; 933        rfalse; 934    } 935    if (TEXT_TY_CharacterLength(txt) == 0) rtrue; 936    rfalse; 937];

Get Character.

Characters in a text are numbered upwards from 1 by the users of this routine: which is why we subtract 1 when reading the array in the block-value, which counts from 0.

945[ TEXT_TY_GetCharacter ctxt txt i ch p cp; 946    if (txt==0) return 0; 947    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 948    TEXT_TY_Transmute(ctxt); 949    if ((i<=0) || (i>TEXT_TY_CharacterLength(txt))) ch = 0; 950    else ch = BlkValueRead(txt, i-1); 951    BlkValueWrite(ctxt, 0, ch); 952    BlkValueWrite(ctxt, 1, 0); 953    TEXT_TY_Untransmute(txt, p, cp); 954    return ctxt; 955];

Casing.

In many programming languages, characters are a distinct data type from strings, but not in I7. To I7, a character is simply a text which happens to have length 1 – this has its inefficiencies, but is conceptually easy for the user.

TEXT_TY_CharactersOfCase(txt, case) determines whether all the characters in txt are letters of the given casing: 0 for lower case, 1 for upper case. In the case of ZSCII, this is done correctly handling all of the European accented letters; in the case of Unicode, it follows the Unicode standard.

Note that there is no requirement for txt to be only a single character long.

972[ TEXT_TY_CharactersOfCase txt case i ch len p cp r; 973    if (txt==0) return 0; 974    cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); 975    len = TEXT_TY_CharacterLength(txt); 976    r = true; 977    for (i=0:i<len:i++) { 978        ch = BlkValueRead(txt, i); 979        if ((ch) && (CharIsOfCase(ch, case) == false)) { r = false; break; } 980    } 981    TEXT_TY_Untransmute(txt, p, cp); 982    return r; 983];

Change Case.

We set ctxt to the text in txt, except that all the letters are converted to the case given (0 for lower, 1 for upper). The definition of what is a "letter", what case it has and what the other-case form is are as specified in the ZSCII and Unicode standards.

992[ TEXT_TY_CharactersToCase ctxt txt case i ch len bnd pk cp; 993    if (txt==0) return 0; 994    cp = txt-->0; pk = TEXT_TY_Temporarily_Transmute(txt); 995    TEXT_TY_Transmute(ctxt); 996    len = TEXT_TY_CharacterLength(txt); 997    if (BlkValueSetLBCapacity(ctxt, len+1)) { 998        bnd = 1; 999        for (i=0:i<len:i++) { 1000            ch = BlkValueRead(txt, i); 1001            if (case < 2) { 1002                BlkValueWrite(ctxt, i, CharToCase(ch, case)); 1003            } else { 1004                BlkValueWrite(ctxt, i, CharToCase(ch, bnd)); 1005                if (case == 2) { 1006                    bnd = 0; 1007                    if (ch == 0 or 10 or 13 or 32 or 9 1008                        or '.' or ',' or '' or '?' 1009                        or '-' or '/' or '' or ':' or ';' 1010                        or '(' or ')' or '[' or ']' or '{' or '}') bnd = 1; 1011                } 1012                if (case == 3) { 1013                    if (ch ~= 0 or 10 or 13 or 32 or 9) { 1014                        if (bnd == 1) bnd = 0; 1015                        else { 1016                            if (ch == '.' or '' or '?') bnd = 1; 1017                        } 1018                    } 1019                } 1020            } 1021        } 1022        BlkValueWrite(ctxt, len, 0); 1023    } 1024    TEXT_TY_Untransmute(txt, pk, cp); 1025    return ctxt; 1026];

Concatenation.

To concatenate two texts is to place one after the other: thus "green" concatenated with "horn" makes "greenhorn". In this routine, from_txt would be "horn", and is added at the end of to_txt, which is returned in its expanded state.

When the blob type is REGEXP_BLOB, the routine is used not for simple concatenation but to handle the concatenations occurring when a regular expression search-and-replace is going on: see RegExp.i6t.

1039[ TEXT_TY_Concatenate to_txt from_txt blobtype ref_txt 1040    p cp r; 1041    if (to_txt==0) rfalse; 1042    if (from_txt==0) return to_txt; 1043    TEXT_TY_Transmute(to_txt); 1044    cp = from_txt-->0; p = TEXT_TY_Temporarily_Transmute(from_txt); 1045    r = TEXT_TY_ConcatenateI(to_txt, from_txt, blobtype, ref_txt); 1046    TEXT_TY_Untransmute(from_txt, p, cp); 1047    return r; 1048]; 1049 1050[ TEXT_TY_ConcatenateI to_txt from_txt blobtype ref_txt 1051    pos len ch i tosize x y case; 1052    switch(blobtype) { 1053        CHR_BLOB, 0: 1054            pos = TEXT_TY_CharacterLength(to_txt); 1055            len = TEXT_TY_CharacterLength(from_txt); 1056            if (BlkValueSetLBCapacity(to_txt, pos+len+1) == false) return to_txt; 1057            for (i=0:i<len:i++) { 1058                ch = BlkValueRead(from_txt, i); 1059                BlkValueWrite(to_txt, i+pos, ch); 1060            } 1061            BlkValueWrite(to_txt, len+pos, 0); 1062            return to_txt; 1063        REGEXP_BLOB: 1064            return TEXT_TY_RE_Concatenate(to_txt, from_txt, blobtype, ref_txt); 1065    } 1066    print "*** TEXT_TY_Concatenate used on impossible blob type ***^"; 1067    rfalse; 1068];

Setting the Player's Command.

In effect, the text typed most recently by the player is a sort of text already, though it isn't in text format, and doesn't live on the heap.

1076[ SetPlayersCommand from_txt i len at p cp; 1077    cp = from_txt-->0; p = TEXT_TY_Temporarily_Transmute(from_txt); 1078    len = TEXT_TY_CharacterLength(from_txt); 1079    if (len > 118) len = 118; 1080    #ifdef TARGET_ZCODE; 1081    buffer->1 = len; at = 2; 1082    #ifnot; 1083    buffer-->0 = len; at = 4; 1084    #endif; 1085    for (i=0:i<len:i++) buffer->(i+at) = CharToCase(BlkValueRead(from_txt, i), 0); 1086    for (:at+i<120:i++) buffer->(at+i) = ' '; 1087    VM_Tokenise(buffer, parse); 1088    players_command = 100 + WordCount(); ! The snippet variable "player's command" 1089    TEXT_TY_Untransmute(from_txt, p, cp); 1090];