Text-morphing Mode

 

Random text generation as specified by textDNA files is not the only function of your JanusNode robot. It also has an entirely orthogonal set of tools for text-morphing: that is, for altering pre-existent texts (including but not limited to texts produced by your JanusNode) in random but controlled ways. In order to access these function, set the Mode Switch to 'Morph'. The 'TextDNA:' pop-up menu will disappear- because text-morphing does not use the TextDNA files- and the contents of the 'Subject:' pop-up menu will change to reflect the many currently-available text-morphing tools.


There are three main types of text-morphing tools: a variety of methods of Markov chaining, which allow you to build statistical models of texts; a set of mappings, which allow you to make systematic substitutions in a target text, and a variety of random methods, which simply mix up the target text in random ways. There is also a way of adding new functions of any kind. We will first consider the Markov chaining methods

1) Markov Chaining

The idea behind Markov chaining is very simple: for every element of a text, compute the likelihood that any element is followed by any other element, then reconstruct the text in a way which reflects the real probabilities, by stepping through the probability table. For example, in the string 'ababca' the likelihood that 'b' will follow 'a' is 100% while the likelihood that 'a' will follow 'b' is just 50%, since 'b' is followed exactly once by 'a' and is also followed once by the letter 'c'. One probabilistic reconstruction of this string might be 'bcabcaba', since this string has (roughly) the same statistical structure as the original one.


Any ordered sequence of elements can be Markov chained. Your JanusNode will Markov chain any text, including its own output, although, for reasons which may be clear upon reflection, simply chaining JanusNode's output is usually not very interesting- or, at least, no more interesting (but much slower) than generating it from scratch. It is more interesting to chain 'real' texts, which you can paste in to JanusNode, or read in from text files. You can also use probability tables that are stored in the 'Markov Tables' folder inside the 'JanusNode Resources' folder. JanusNode ships with a multitude of tables you can play with. You can easily make (and mix) your own, as described below.


There are, alas, a few technical limitations in Markov chaining with your JanusNode. One is that it is quite slow, as JanusNode needs to take two passes through the text to build the probability table, and JanusNode is not fast. (However, once the probability table is built and stored in the MarkovTables folder, your JanusNode can use it without further computational effort.) The other limitation is on the size of the input texts: JanusNode is limited internally to tables which are less than 32K in size, although you can spread single texts across several chunks of that size. Sometimes the input text will be smaller than the Markov table of that text. A JanusNode simply cuts off the text when it runs out of internal storage space. This is transparent to the user, and you don't have to worry about it, unless it is extremely important for you to be sure that every word in your input text was Markov-chained.


Your JanusNode is able to provide a variety of Markov-chaining services. All are available in the same way, by choosing them from the subject/method menu and then clicking on the Janus icon.


'Make Markov file' will create a new Markov chain table (as a text file) from the text in the JanusNode window, and store it in disk for you. If you want to be able to use it, you must store in the 'MarkovTables' folder inside the 'JanusNode Resources' folder. JanusNode uses the same tables for both single and pairwise Markov chaining, and it only looks in this folder for them. Although the tables are editable text files, it is a bad idea to alter them in any way unless you are quite sure you understand what you are doing, as it is very easy to introduce errors into these tables which will render them useless. For example, you can't just add or delete words as you like, as the table is full of dependencies.


'Write loosely' allows you to use one or more of the pre-calculated tables stored in the 'MarkovTables' folder, using single-word chaining. This means that any word pair is guaranteed to contain words that actually appeared together in that order in the original text. You can select as many tables as you like from the folder (use the command key to make a discontiguous selection). Your JanusNode will chain and randomly switch between tables as it produces a text. The resultant output will resemble all of the input files (ie. the files from which the tables were made) to some degree- it is something very much like a statistical average of those files. The 'Write loosely' function is your best bet for generating original output, since the looseness of the algorithm makes it relatively unlikely that any of the lines in the output appeared exactly in that form in the input file.


'Write tightly' allows you to use one or more of the pre-calculated tables stored in the 'MarkovTables' folder using pair-wise chaining. This means that any word _triplet_ (not word pair, as above) is guaranteed to contain words that actually appeared together in that order in the original text. You can again select as many tables as you like from the folder (again, use the command key to make a discontiguous selection), in order to mingle different tables. The JanusNode will chain and randomly switch between tables as it produces a text. However, because of the nature of 'tight' Markov chaining (which requires a match on word pairs rather than on single words) JanusNode may have trouble finding a file to switch to. As a result, it may stay with the same file for a relatively long time. You will see it trying to switch and switching back, if you look in the information window that appears when it is working.


When chaining tightly, JanusNode may often end up reproducing fairly long sections of the input text verbatim. This is not a bug- it is the nature of pair-wise chaining. So
BEWARE: if your JanusNode produces something brilliant when it is Markov chaining by pairs, it may not be Janus speaking to you. It might just be straight plagiarism. What you find so brilliant may be a statistically-reconstructed but verbatim quote from the input files which were used to make the Markov tables. This is of course so even if you never saw the input file from which the table was constructed: the table holds all the information necessary to reconstruct the file with some degree of accuracy. You alone are responsible for distinguishing randomly-brilliant statements by Janus- to which you have been gifted the rights- from straight plagiarized brilliance, which belongs to the person who owns the copyright on the source document.


The current version of JanusNodes can also Markov-chain texts in the input window on the fly. This is not recommended: it is better to save the table as an intermediate step. There is no advantage to doing it on the fly, except that you can skip the step of saving the Markov table to disk and then choosing it. The disadvantage is that if you do not save the table you can't re-use it unless you re-generate it from the original text.


'Chain loosely' will simply Markov chain any text in the JanusNode window, without saving the probability table to disk. The probability table will be built based on the odds of any single word following any other single word. This leads to a loose model of the original text: i.e. one which resembles that text quite distantly.


'Chain tightly' does the same thing, but it builds the probability table using pairs instead of single words. This gives more structure to the probability table: the output text will bear a closer resemblance to the input text, and may include sections of several words in length which are exact duplicates of sections of the input text.


'Chain letters' will chain a text in the JanusNode window by letter pairs instead of the word pairs used by 'Chain Pairs'. The output will resemble the language of the original, but may not necessarily consist of real words. There are many interesting experiments to be conducted with Markov chaining by letter. Try exploring the difference it makes to use highly redundant texts (such as, for example, those generated by asking a JanusNode to produce lines using just one or a few rules), as compared to more randomized prose, or the difference it makes to use a long text versus a short text. Do not fail to chain by letter and then have JanusNode read it out- the results can be quite amusing.

2) Text Mapping

There are many computer programs whose purpose is to turn a text into a specific dialect. These programs are usually extremely simple: they simply make a set of pre-defined substitutions to the text. Your JanusNode has this ability, which can be completely configured by the user. The 'Text Mapping' command uses information stored in files inside the 'Mappings' folder (in the 'JanusNode Resources' folder) to make substitutions to the text in the JanusNode window.


The mappings have their own simple grammar. In general, each line must have at least two and at most three elements, separated by commas. The second element will be substituted for the first element with a probability equal to the third element, if there is a third element. (If there is not, it is assumed to be 100%). Elements may be subword character strings, words, or multiword strings. The probabilities operate globally, not by item- so once a mapping 'passes' the probability test and is chosen to be applied, it will be applied to every item. For example, consider the following mapping:

you , thee, 20

This will be applied 20% of the time it is chosen (and every mapping will be chosen exactly once, in random order, when the tool is applied). When it is applied, it will replace the word 'you' with 'thee'. Note the space inserted after the word 'you'- this is to ensure that the mapping is only applied when the whole word is 'you'- so, for example, the word 'your' will not be changed to 'theer'. It would be so changed if there were no space after the first word. If you leave a space after the first element, there is no need to also leave one after the second element: JanusNode can figure this much out for itself. Your JanusNode also deals by itself with the complications of capitalization and punctuation of various kinds, so that it will recognize that the word 'You', 'you.', or 'you)' (and so on) should be replaced in the above example with 'Thee', 'thee.' and 'thee)'.


Along with such two- or three-element substitutions, there are two other allowable forms which may appear in a mapping: comments, and random exclamations. A comment is any line beginning with an '*': it will simply be ignored when encountered, allowing you to insert notes into your mappings. A random exclamation has the form 'random(X)' (optionally followed by a percent probability), where 'X' is some text. When JanusNode sees a line like this in a mapping file it will (if it passes the optional probability test) randomly insert the text contained between the parentheses after the end of a random number (one or more) of randomly-selected sentences in the text. You can use this to add a little spice to your dialects.


JanusNode comes with a variety of mapping files which should serve as further examples, and will hopefully make the idea clear.

3) Other text-morphing tools

The remaining text-degeneration features work only if there is text in the JanusNode window. All of them are methods for randomly mixing up elements of the text.


'Blur' will randomly replace a given percentage of the letters in the text with randomly chosen letters.


'Blur Vowels' will randomly replace a given percentage of the vowels in the text with randomly chosen vowels.


'Flip Pairs' will randomly swap a given number of letter pairs.


'Flip Vowels' will randomly swap a given number of vowel pairs.


'Reverse By Word' will reverse every word of the text, leaving that word in its current position in the text.


'Delete Every Other' will delete every second word of the text. If you are trying to construct a poem from a prose text, this can sometimes be a helpful step for loosening your associations.


'eecummingsfy' will attempt to mimic the style of the great poet ee cummings, using the available text. Text which has been eecummingsfied tends to function 'more poetically' than text which has not been so treated. Like all the randomization tools, eecummingsfication works probabilistically, so treating the same text twice will not give precisely the same result.

EECummingsfication is user-configurable. It uses three files in the 'eecummings' folder which is inside the 'JanusNode Resources' folder. You can add items to and delete items from these files to customize the way eecummingsfication functions. Eecummingsfication works by looking for subword elements which can be interestingly 'isolated' from their context. The file 'EndCuts' contains strings that may possibly be isolated from the front if they appear in the text. (Since the tools apply by chance, there is no guarantee that any isolation will actually be made.) For example, if the word 'be' appears in the 'FrontCuts' file, then the word 'babe' might be split into 'ba' and return & 'be' when the tool is applied. Here the word 'be' is isolated from the front. The file 'EndCuts' contains strings that may (probabilstically) be isolated from the end if they appear in the text. For example, if the word 'be' also appears in the 'EndCuts' file, then the word 'bear' might be split into 'be' and return & 'ar' when the tool is applied. Here the word 'be' is isolated from the end. Note that such isolation would not occur from the appearance of the word 'be' in the 'FrontCuts' file. The 'FrameMe' file contains strings that will be isolated from both sides at once' If 'be' appears in that file, then the word 'unbearable' might be split up as 'un', return, 'be', return, and 'arable'- with 'be' isolated (= 'framed') from both sides at once.


'Dadaize' will randomly choose words from the original text and print them in a randomly-arrayed manner. There are two forms, representing sampling with or without replacement. If you choose 'No replacement' each word from the original text will be used at most once in each round (though you may ask your JanusNode to go through the original text multiple times, by setting the output size to an integer higher than the number of words in the input). Some of you may recognize this as the orginal formula for producing Dadaist poetry, as conceived of by the patron saint of JanusNodes, Tristan Tzara: "And here you are a writer, infinitely original and endowed with a sensibility that is charming though beyond the understanding of the vulgar". If you choose 'With replacement', a single word from the original text may be chosen more than once. Note that punctuation that is separated by a space from any word will be treated as word by the Dadaize function, so by adding such spaces one can have random punctuation in the Dadaized text.


'Random sentences' will randomly print entire sentences from the original text, sampling without replacement: in other words, it's like the Dadaize function, but works with sentences instead of words.


[Note to long-time users: 'Replace terms', a function which appeared in JanusNode's predecessors, has been removed. The 'Make A Rule' function (described below) became so much better than 'Replace Terms' was that it became misleading to allow users the choice between the two. If you want to see what a text would look like with its terms randomly replaced, use the 'Make A Rule' function to make an executable rule from the text, and then execute the rule. You can also use the 'Replace Words' function to replace specific user-chosen words.]


'Randomize' will randomly swap words in the text.


'Make TextDNA' will attempt to turn any text in the JanusNode window into an executable line of TextDNA. If you are too lazy to write TextDNA, you can now simply write (or import) a sentence (or more) of the form which you would like to produce, and let this function translate that sentence into TextDNA. The TextDNA produced can then be used like any other line of TextDNA, if you paste it into the TextDNA field. The function can only work if you use words in your template that appear in your JanusNode's BrainFood files. Each recognized word will be replaced with a call to a global variable that is set to a word from the same file as the recognized word. Do not use texts that are too long, or, if you od, break them up afterwards into smaller rules connected by 'ChooseTextDNA' calls. Although the function itself can generate very long rules, such rules will not run on your JanusNode, which has a limit on how many times it can recurse.

There are two options offered you: local and global. The global option replaces every word it recognizes with a call to a global variable: so, for example, every occurence of 'cat' will be replaced by the same word (maybe 'dog') when the rule is run. If you have the sentence 'Cats hate cats' in the input and choose the global option, it might come out as 'Dogs love dogs' when the rule is run. The local option replaces each word it recognizes with a random call to the file in which that word was found. This means that the same word in the input text will not (necessarily) be replaced by the same word in the output. So, if you have that sentence 'Cats hate cats' in the input and choose the local option, it might come out as 'Mice love pigs' when the rule is run. You also get the option to replace pronouns with the local option, by request from John Waterman. If anyone wants that option in the global case, let me know and I'll put it in.

After you have generated a rule, you wil be offered the option to run it right away. If you accept this, the contents of the TextDNA field will be over-written with the newly-generated rule, and your JanusNode will enter text-generation mode. You can run the new rule in the usual manner- by clicking on the icon of Janus. If there is an error in your new rule, you can edit it in the TextDNA window.

Your JanusNode's rule-generating tool is not perfect. Because it blindly replaces words that it finds in the order that it searches, it sometimes makes errors in deciding which Brain Food file a word should come from. It has turned out to be (to me) surprisingly difficult to generate rules that work perfectly every time, and the 'Make textDNA' function does occasionally produce TextDNA that contains (usually very minor) errors. However, it most often generates TextDNA which is either useable as it is, or in need of only minimal repairs. JanusNode's standard example files now includes many lines of TextDNA that were generated automatically. The 'Robot Johnson' project is now making very heavy use of the 'Make TextDNA' function, resulting in a huge speed-up and greatly increased ease in generating the TextDNA files used by that project.


The 'Replace Words' command allows you to choose specific words to be replaced with words from JanusNode's database. When you choose a word, JanusNode will look for a word-list that contains that word, and randomly replace the word with another one from that same word-list.


The 'Steal words' command allows you to use the main tool from rule-fix mode (described above) without having to be in the 'Make A Rule' command: the tool for snatching words from a text and adding them to JanusNode's database. The sole purpose of the command is to make it easier to add words to JanusNode. After selecting 'Steal words' from the menu, click on the Janus icon, and you will enter the 'Steal words' mode. While in 'Steal words' mode you can simply click on words which do not (or might not) appear in your JanusNode's database. The JanusNode will ask you what kind of word it is and, if it does not already appear in its database, it will add it to the proper file. The purpose, of course, is to make it easy to use texts from other sources as input to JanusNode's word database: simply read in the text (by selecting 'Open File' from the File menu) and run the 'Steal words' too. You exit the 'Steal words' mode by clicking command-period. No words which are chosen from the original text are replaced while using the 'Steal words' command.

4.) TextMorphers

JanusNodes have a mechanism by which new text-morphing functions can be easily added by anyone who knows Hypertalk. These functions are must be contained in the 'TextMorphers' folder, in files whose title is the same as the name of a Hypertalk text-morphing function that is defined within that file. These can be accessed using the 'Use External Morpher' command. Simply select the external morpher you wish to use, and the function will be automatically applied to the text in the JanusNode field.

A word of warning is probably in order: Anyone who knows Hypertalk can write an external text-morpher. It is simple- trivial- for anyone to write a text-morpher that does damage to your computer by erasing files or engaging in other mischievous behaviour. When you use an external text-morphing function, your JanusNode is simply a means of running that function- it has no control at all over what that function does. As a result, it is impossible for me to accept responsibility for the actions of external text-morphing functions. User beware. However, with that warning given, it should be pointed out that external text-morphers are in this respect no more dangerous than any other shareware, since any program you run on your computer might be a rogue program. External functions are actually a little safer than most shareware, because their code is wide-open: anyone can look and see what the code for an external text-morpher does, because it is not compiled.

I will distribute new text-morphers (if any) here on Janus's web site, and will attempt to make sure that any I distribute through that channel are safe. The official version of a JanusNode ships with three examples of text-morphers (two of which were built-in to JanusNode's predecessor).

'NeoDadaize' is an experimental randomization, loosely inspired by Tristan Tzara's method. It will randomly choose short ordered sets of words from the original text (with replacement) and print them (two per line) in a randomly-arrayed manner. It formats its output into equal-length stanzas with a 'chorus', which is occasionally changed. This function works best when the input text is quite long.

The 'ReverseText' text-morpher will reverse the entire text.

'LengthHopper' works in a deterministic manner (i.e. it treats the same text the same way every time), deleting words in the input field according to a simple formulat based on the length of those words. It is highly-selective, in the sense that it will always delete the vast majority of the input text.

Writing An External TextMorpher

The remainder of this section is devoted to very brief instructions for those who wish to write their own TextMorphers. It assumes familiarity with Hypertalk. If you want your TextMorpher to be distributed on the JanusNode web site, send it to me.

The basic idea is simple: Define a procedure which acts on text. Name your TextMorphing file with the same name as that procedure. After it loads in a TextMorpher, your JanusNode tries to run a function with the same name as that file.

You have one requisite global variable which you must declare in order to access text at all: TextField. The text you can morph (and the place to which you can write out) is 'card field TextField'.

You are very strongly encouraged to call the procedure 'FollowTheWay' as often as possible. This rotates the Taoist cursor, and, more importantly, checks to see if the user has hit 'command-.', the halt command.

If you are writing text (rather than just randomizing) then it is nice to call the 'scrollcontrol' and 'checksize' procedures. 'Scrollcontrol' will scroll the window so that the end of the text is always visible. 'Checksize' will check to see if the output field is nearly full, and write it to a text file if it is.

The only other function your JanusNode gives your is 'TypeIt', which takes an single argument. It will write its argument to end of the output window.

Other than that, just use your imagination and Hypertalk.

Here's a simple and heavily-annotated example, the 'ReverseText' morpher:

on ReverseText


-- This function will only run if it is in a file also called 'ReverseText'
global TextField
-- The text is in 'card field TextField'; this variable must be
-- declared in every text-morpher.

put card field TextField into x
-- Take control of the input text by putting it into a variable, 'x'

put "" into card field TextField
-- Delete the input text

put the number of chars in x into y
-- start with the last character

repeat while y > 0

FollowTheWay
-- This animates the cursor, and, more importantly, allows -- the user to force the function to quit. Please call the
-- 'FollowTheWay' procedure as often as possible.

TypeIt(char y of x)
-- Write the current last character
-- 'TypeIt' just writes its argument x to the output field
-- You can also write 'put x after card field TextField', of

-- course. But 'typeit' will use the 'Type-o-matic' if it is on.

ScrollControl
-- 'ScrollControl' keeps the latest text visible.

-- There is no need to call 'CheckSize' (which checks to see
-- if the field is full) since the output text can never be
-- bigger than the input text here. If we were writing long
-- output texts, we would need to call 'CheckSize' here.

put y - 1 into y
-- Move to the next character

end repeat

end ReverseText