Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD2.mdf / c / library / xplatfrm / tierra / readme.t2 < prev next >

Wrap

Text File | 1992-04-26 | 50.7 KB | 1,025 lines

6) LISTING OF DISTRIBUTION FILES The distribution includes the following files: README.T1 & README.T2 - this file, on line documentation arg.c - the main module for the assembler/disassembler, written by Tom Uffner. This program converts ascii assembler files into binary files which can be executed by the Tierran virtual computer arg.prj - the Turbo C V 2.0 project file for compiling the assember/disassembler arginst.h - a file containing a structure used by arg.c to map assembler mnemonics to executable opcodes. bookeep.c - source code for bookeeping routines, which keep track of how many of what kind of creatures are in the soup, and other stuff like that. ccarg - a file for compiling the assembler/disassembler on unix systems. This file should be made executable (chmod +x ccarg). cctierra - a file for compiling Tierra on unix systems. This file should be made executable (chmod +x cctierra). configur.h - a file for configuring Tierra. You probably won't need to touch this unless you get into advanced stuff. debug.h - this file claims to provide some useful debugging stuff, I don't know, I didn't create it. declare.h - all global variables are declared in this file, except those whose values are set by soup_in. Those globals are declared in soup_in.h. declare.h is included by tierra.c which contains the main function. depend - a listing of interdependencies of the source code files extern.h - all global variables are delcared as extern in this file, and this file is included by all *.c files except tierra.c which includes delcare.h instead. extract.c - functions for extracting creatures from the soup and saving their genomes to disk. frontend.c - functions for handling input/output for Tierra. Hopefully this module will grow in the near future as we put a better interface on Tierra. genebank.c - functions for managing the genebank. This module has benefited from a lot of work by Tom Uffner. genio.c - functions for input/output of creatures. This stuff is also used by arg.c, the assembler/disassembler. This module has benefited from a lot of work by Tom Uffner. instruct.c - this module contains generalized executable functions. These generalized functions are mapped to specific functions by the parsing functions in the parse.c module. memalloc.c - functions for handling memory allocation in the soup, the stuff that ``cell membranes'' are made of. parse.c - the parsing functions interpret the executable code of the creatures, and map it onto the executable functions contained in the instruct.c module. portable.c - functions for portability between operating systems. portable.h - definitions for portability between operating systems and architectures. prototyp.h - all functions in Tierra are prototyped here. queues.c - queue management functions for the slicer and reaper queues. slicers.c - interchangeable slicer functions. This file contains some experiments in the allocation of cpu time to creatures. This is an interesting thing to play with. soup_in - the ascii file read by Tierra on startup, which contains all the global parameters that determine the environment, and a list of creatures to use in innoculating the soup at the start of a run. soup_in.h - this file defines the default values of all the soup_in variables, and defines the instruction set by mapping the assember mnemonics to the opcodes, parser functions, and executables. tierra.c - this file contains the main function, and the central code driving the virtual computer. tierra.h - this file contains all the structure definitions. It is a good source of documentation for anyone trying to understand the code. tierra.prj - the Turbo C V. 2.0 project file for compiling Tierra. trand.c - random number generation routines from Numerical Recipes in C. tsetup.c - routines called when Tierra starts up and comes down. Tom Uffner has been putting some work into this module as well. geneban1: - a subdirectory containing the genomes of the creatures saved during a run. 0080aaa.tie - the ancestor, written by a human, mother of all other creatures. 0022abn.tie - the smallest non-parasitic self-replicating creature to evolve. 0045aaa.tie - the archtypical parasite 0072etq.tie - a phenomenal example of optimization through evolution, involving the unrolling of the copy loop. list - a list of genotypes in the genebank, which will be read by Tierra at startup. All genotypes listed in soup_in must also be listed in this file. This file will be written to when the system is saved. Therefore to start a fresh run, you must start with a fresh copy of the list file. Therefore we provide the two files below, list4580 and list80, which allow you to make a fresh start either with the genome 0080aaa, or 0080aaa and 0045aaa together. list4580 - a fresh list file for starting runs with the genotypes 0080aaa and 0045aaa together. To use this file just copy it to a file named list. list80 - a fresh list file for starting runs with the genotype 0080aaa To use this file just copy it to a file named list. tiedat: - a subdirectory where a complete record of births and deaths will be written. break.1 - a file containing a record of births and deaths. 7) SOUP_IN PARAMETERS A typical soup_in file looks like the following: /* begin soup_in file */ tierra core: 6-10-91 alive = 50 how many millions of instruction will we run BrkupSiz = 5120 size of output file in K, named break.1, break.2 ... CellsSize = 600 initial size of cells array of structures debug = 0 0 = off, 1 = on, printf statements for debugging DiskOut = 1 output data to disk (1 = on, 0 = off) DistFreq = .1 frequency of disturbance, factor of recovery time DistProp = .4 proportion of population affected by distrubance DivSameGen = 1 cells must produce offspring of same genotype, to stop evolution DivSameSiz = 0 cells must produce offspring of same size, to stop size change DropDead = 5 stop system if no reproduction in the last x million instructions GeneBnker = 1 turn genebanker on and off GenebankPath = geneban1/ path for genebanker output GenPerBkgMut = 12 mutation rate control by generations ("cosmic ray") GenPerFlaw = 16 flaw control by generations GenPerMovMut = 8 mutation rate control by generations (copy mutation) hangup = 1 0 = exit on error, 1 = hangup on error for debugging MaxFreeBlocks = 500 initial number of structures for memory allocation MaxMalMult = 3 multiple of cell size allowed for mal() MinCellSize = 8 minimum size for cells MinTemplSize = 3 minimum size for templates MovPropThrDiv = .7 minimum proportion of daughter cell filled by mov new_soup = 1 1 = this a new soup, 0 = restarting an old run NumCells = 3 number of creatures and gaps used to inoculate new soup OutPath = tiedat/ path for data output PhotonPow = 1.5 power for photon match slice size PhotonWidth = 8 amount by which photons slide to find best fit PhotonWord = chlorophill word used to capture photon RamBankSiz = 20000 array size for genotypes in ram, use with genebanker SaveFreq = 10 frequency of saving core_out, soup_out and list SavThrMem = .015 threshold memory occupancy to save genotype SavThrPop = .015 threshold population proportion to save genotype SearchLimit = 5 seed = 0 seed for random number generator, 0 uses time to set seed SizDepSlice = 0 set slice size by size of creature SlicePow = 1 set power for slice size, use when SizDepSlice = 1 SliceSize = 25 slice size when SizDepSlice = 0 SliceStyle = 2 choose style of determining slice size SlicFixFrac = 0 fixed fraction of slice size SlicRanFrac = 2 random fraction of slice size SoupSize = 60000 size of soup in instructions WatchExe = 0 mark executed instructions in genome in genebank WatchMov = 0 set mov bits in genome in genebank WatchTem = 0 set template bits in genome in genebank 0080aaa 0045aaa 0080aaa /* end soup_in file */ The meaning of each of these parameters is explained below: alive = 50 how many millions of instruction will we run This tells the simulator how long to run, in millions of instructions. BrkupSiz = 5120 size of output file in K, named break.1, break.2 ... If this value is set to zero (0) the record of births and deaths will be written to a single file named tierra.run. However, if BrkupSiz has a non-zero value, birth and death records will be written to a series of files with the names break.1, break.2, etc. Each of these files will have the size specified, in K (1024 bytes). The value 5120 indicates that the break files will each be five megabytes in size. The output file(s) will be in the path specified by OutPath (see below). See also DiskOut. CellsSize = 600 initial size of cells array of structures The initial size of the ``cells array'' which contains all the demographic data, as well as the CPU of each creature. Due to a bug in the Borland Turbo C farrealloc function, care must be taken to be sure that this array is initially large enough that it does not need to be reallocated. A good rule of thumb is to let CellsSize = SoupSize / 100. If a compiler other than Borland is used, don't worry, any initial value will do. debug = 0 0 = off, 1 = on, printf statements for debugging This is used during code development, to turn on and off print statements for debugging purposes. DiskOut = 1 output data to disk (1 = on, 0 = off) If this parameter is set to zero (0), no birth and death records will be saved. Any other value will cause birth and death records to be saved to a file whose name is discussed under BrkupSiz above, in the path discussed under OutPath below. DistFreq = .1 frequency of disturbance, factor of recovery time The frequency of disturbance, as a factor of recovery time. This and the next option control the pattern of disturbance. If you do not want the system to be disturbed, set DistFreq to a negative value. If DistFreq has a non-negative value, when the soup fills up the reaper will be invoked to kill cells until it has freed a proportion DistProp of the soup. The system will then keep track of the time it takes for the creatures to recover from the disturbance by filling the soup again. Let's call this recovery time: rtime. The next disturbance will occur: (rtime X DistFreq) after recovery is complete. Therefore, if DistFreq = 0, each disturbance will occur immediately after recovery is complete. If DistFreq = 1, the time between disturbances will be twice the recovery time, that is, the soup will remain full for a period equal to the recovery time, before another disturbance hits. DistProp = .4 proportion of population affected by distrubance The proportion of the soup that is freed of cells by each disturbance. The occurs by invoking the reaper to kill cells until the total amount of free memory is greater than or equal to: (DistProp X SoupSize). Note that cells are not killed at random, they are killed off the top of the reaper queue. DivSameGen = 0 cells must produce offspring of same genotype, to stop evolution This causes attempts at cell division to abort if the offspring is of a genotype different from the parent. This can be used when the mutation rates are set to zero, to prevent sex from causing evolution. DivSameSiz = 0 cells must produce offspring of same size, to stop evolution Like DivSameGen, but cell division aborts only if the offspring is of a different size than the parent. Changes in genotype are not prevented, only changes in size are prevented. DropDead = 5 stop system if no reproduction in the last x million instructions Sometimes the soup dies, such as when mutation rates are too high. This parameter watches the time elapsed since the last cell division, and brings the system down if it is greater than DropDead million instructions. GeneBnker = 1 turn genebanker on and off The parameter turns the genebanker on and off. The value zero turns the genebanker off, any other value turns it on. With the genebanker off, the record of births and deaths will contain the sizes of the creatures, but not their genotypes. Also no genomes will be saved in the genebank. When the genebanker is turned on, the record of births and deaths will contain a three letter unique name for each genotype, as well as the size of the creatures. Also, any genome whose frequency exceeds the thresholds SavThrMem and SavThrPop (see below) will be saved to the genebank, in the path indicated by GenebankPath (see below). GenebankPath = geneban1/ path for genebanker output This is a string variable which describes the path to the genebank where the genomes will be saved. The path name should be terminated by a forward slash. GenPerBkgMut = 12 mutation rate control by generations ("cosmic ray") Control of the background mutation rate ("cosmic ray"). The value 12 indicates that in each generation, roughly one in twelve cells will be hit by a mutation. These mutations occur completely at random, and also affect free space where there are no cells. If the value of GenPerBkgMut were 0.5, it would mean that in each generation, each cell would be hit by roughly two mutations. GenPerFlaw = 16 flaw control by generations Control of the flaw rate. The value 16 means that in each generation, roughly one in sixteen individuals will experience a flaw. Flaws cause instructions to produce results that are in error by plus or minus one, in some sense. If the value of GenPerFlaw were 0.5, it would mean that in each generation, each cell would be hit by roughly two flaws. GenPerMovMut = 8 mutation rate control by generations (copy mutation) Control of the move mutation rate (copy mutation). The value 8 indicates that in each generation, roughly one in eight cells will be hit by a mutation. These mutations only affect copies of instructions made during replication (by the double indirect mov instruction). When an instruction is affected by a mutation, one of its five bits is selected at random and flipped. If the value of GenPerMovMut were 0.5, it would mean that in each generation, each cell would be hit by roughly two mutations. hangup = 1 0 = exit on error, 1 = hangup on error for debugging If an error occurs which is serious enough to bring down the system, having hangup set to 1 will prevent the program from exiting. In this case, the program will hang in a simple loop so that it remains active for debugging purposes. MaxFreeBlocks = 500 initial number of structures for memory allocation There is an array of structures used for the virtual memory allocator. This parameter sets the initial size of the allocated array, at startup. MaxMalMult = 3 multiple of cell size allowed for mal() When a cell attempts to allocate a second block of memory (presumably to copy its genome into), this parameter is checked. If the amount of memory requested is greater than MaxMalMult times the size of the mother cell, the request will fail. This prevents mutants from requesting the entire soup, which would invoke the reaper to cause a massive kill off. MinCellSize = 8 minimum size for cells When a cell attempts to divide, this parameter is checked. If the daughter cell would be smaller than MinCellSize instructions, divide will fail. The reason this is needed is that with no lower limit, there is a tendency for some mutants to spawn large numbers of very small cells. MinTemplSize = 3 minimum size for templates When an instruction (like jump) attempts to use a template, this parameter is checked. If the actual template is smaller than MinTemplSize instructions, the instruction will fail. This is a matter of taste. MovPropThrDiv = .7 minimum proportion of daughter cell filled by mov When a cell attempts to divide, this parameter is checked. If the mother cell has moved less than MovPropThrDiv times the mother cell size, of instructions into the daughter cell, cell division will abort. A value of .7 means that the mother must at least fill the daughter 70% with instructions (though all these instructions could have been moved to the same spot in the daughter cell). The reason this parameter exists is that without it, mutants will attempt to spew out large numbers of empty cells. new_soup = 1 1 = this a new soup, 0 = restarting an old run This value is checked on startup, to determine if this is a new soup, or if this is restarting an old run where it left off. When the system comes down, all soup_in parameter (and many other global variables) are saved in a file called soup_out. The value of new_soup is set to 0 in soup_out. In order to restart an old run, just use soup_out as the input file rather than soup_in. This is done by using soup_out as a command line parameter at startup: tierra soup_out NumCells = 5 number of creatures and gaps used to inoculate new soup This parameter is checked at startup, and the system will look for a list of NumCells creatures at the end of the soup_in file. The value 5 indicates that the soup will initially be innoculated by five cells. However, NumCells also counts gaps that are placed between cells (without gaps, all cells are packed together at the bottom of the soup at startup). The gap control feature does not work at present, so don't use it. Notice that after the list of parameters in the soup_in file, there is a blank line, followed by a list of genotypes. The system will read the first NumCells genotypes from the list, and place them in the soup in the same order that they occur in the list. OutPath = tiedat/ path for data output The record of births and deaths will be written to files in a directory specified by OutPath. See BrkupSiz above for a discussion of the name of the file(s) containing the birth and death records. PhotonPow = 1.5 power for photon match slice size If SliceStyle (see below) is set to the value 1, then the allocation of CPU cycles to creatures is based on a photon - chlorophyll metaphor. Imagine that photons are raining down on the soup at random. The cell hit by the photon gets a time slice that is proportional to the goodness of fit between the pattern of instructions that are hit, and an arbitrary pattern (defined by PhotonWord, see below). The template of instructions defined by PhotonWord is laid over the sequence of instructions at the site hit by the photon. The number of instructions that match between the two is used to determine the slice size. However, the number of matching instructions is raised to the power PhotonPow, to calculate the slice size. PhotonWidth = 8 amount by which photons slide to find best fit When a photon hits the soup, it slides a distance PhotonWidth, counting the number of matching characters at each position, and the slice size will be equal to the number of characters in the best match (raised to the power PhotonPow, see above). If PhotonWidth equals 8, the center of the template will start 4 instructions to the left of the site hit by the photon, and slide to 4 instructions to the right of the site hit. PhotonWord = chlorophill word used to capture photon This string determines the arbitrary pattern that absorbs the photon. It uses a base 32 numbering system: the digits 0-9 followed by the characters a-v. The characters w, x, y and z are not allowed (that is why chlorophyll is misspelled). The string may be any length up to 79 characters. RamBankSiz = 20000 array size for genotypes in ram, use with genebanker Places an upper limit on the number of genomes that may be stored in the genebank maintained in RAM at any one time. This is a memory management feature provided for DOS systems. When the RAM genebank fills, genomes start swapping out to disk. The genomes that have not been checked for the longest time are swapped out first. At this time the RAM bank management scheme does not work. For this reason, you should be sure that this parameter is set high enough that the bank does not fill up during the run. SaveFreq = 10 frequency of saving core_out, soup_out and list Every SaveFreq million instructions, the complete state of the virtual machine is saved. This is a useful feature for long runs, so that the system can be restarted if it is interrupted for some reason. SavThrMem = .015 threshold memory occupancy to save genotype If a particular genotype fills SavThrMem of the total space available in the soup, it will be assigned a permanent unique name, and saved to disk. Note that an adjustment is made because only adult cells are counted, and embryos generally fill half the soup. Therefore adult cells of a particular genotype need only occupy SavThrMem * 0.5 of the space to be saved. SavThrPop = .015 threshold population proportion to save genotype If a particular genotype amounts to SavThrPop of the total population of (adult) cells in the soup, it will be assigned a permanent unique name, and saved to disk. SearchLimit = 5 This parameter controls how far instructions may search to match templates. The value five means that search is limited to five times the average adult cell size. The actual distance is updated every million instructions. seed = 0 seed for random number generator, 0 uses time to set seed The seed for the random number generator. If you use the value zero, the system clock is used to set the seed. If you use any other value, it will be the seed. The starting seed (even when provided by the clock) will be written to standard output, and also saved in the soup_out file when the simulator comes down. By using the original seed and all the same initial parameter settings in soup_in, a run may be repeated exactly. SizDepSlice = 0 set slice size by size of creature This determines a major slicer option. If this parameter is set to zero, the slice size will either be a constant set by SliceSize (see below) or a uniform random variate, or a mix of the two. The mix is determined by the relative values of SlicFixFrac and SlicRanFrac (see below). The actual slice size will be: (SlicFixFrac * SliceSize) + (tlrand() % (I32s) ((SlicRanFrac * SliceSize) + 1)) If SizDepSlice is set to a non-zero value, the slice size will be proportional to the size of the genome. In this case, the base slice size will be the genome size raised to the power SlicePow (see below). To clarify let slic_siz = genome_size ^ SlicePow, the actual slice size will be: (SlicFixFrac * slic_siz) + (tlrand() % (I32s) ((SlicRanFrac * slic_siz) + 1)) SlicePow = 1 set power for slice size, use when SizDepSlice = 1 This parameter is only used when SizDepSlice = 1. In this case, the genome size is raised to the power SlicePow to determine the slice size (see algorithm under SizDepSlice above). If SlicePow = 1, the run will be size neutral, selection will not be biased toward either large or small creatures (the probability of an instruction being executed is not dependent on the size of the genome it is located in). If SlicePow > 1, selection will favor larger genomes. If SlicePow < 1, selection will favor small genomes. SliceSize = 25 slice size when SizDepSlice = 0 This parameter determines the base slice size when SizDepSlice = 0. The actual slice size in this case depends on the values of SlicFixFrac and SlicRanFrac (see below). The way the slice size is actually calculated is explained under SizDepSlice above. SliceStyle = 2 choose style of determining slice size The slicer is a pointer to function, and the function actually used is determined by this parameter. At present there are three choices (0-2). The pointer to function is assigned in the setup.c module, and the slicer functions themselves are contained in the slicers.c module. 0 = SlicerQueue() - slice sizes without a random component 1 = SlicerPhoton() - slice size based on photon interception metaphor 2 = RanSlicerQueue() - slice size with a fixed and a random component SlicFixFrac = 0 fixed fraction of slice size When SliceStyle = 2, the slice size has a fixed component and a random component. This parameter determines the fixed component as a multiple of SliceSize, or genome_size ^ SlicePow. SlicRanFrac = 2 random fraction of slice size When SliceStyle = 2, the slice size has a fixed component and a random component. This parameter determines the random component as a multiple of SliceSize, or genome_size ^ SlicePow. SoupSize = 60000 size of soup in instructions This variable sets the size of the soup, measured in instructions. WatchExe = 0 mark executed instructions in genome in genebank If the genebank is on, setting this parameter to a non-zero value will turn on a watch of which instructions are being executed in each permanent genotype (this helps to distinguish junk code from code that is executed), and also, who is executing whose instructions. There is a bit field in struct g_list (bit definitions are defined in the tierra.h module) that keeps track of whether a creature executes its own instructions, those of another creature, if another creature executes this creatures instructions, etc: bit 2 EXs = executes own instructions (self) bit 3 EXd = executes daughter's instructions bit 4 EXo = executes other cell's instructions bit 5 EXf = executes instructions in free memory bit 6 EXh = own instructions are executed by other creature (host) WatchMov = 0 set mov bits in genome in genebank If the genebank is on, setting this parameter to a non-zero value will turn on a watch of who moves whose instructions and where. This information is recorded in the bit field in struct g_list: bit 17 MFs = moves instruction from self bit 18 MFd = moves instruction from daughter bit 19 MFo = moves instruction from other cell bit 20 MFf = moves instruction from free memory bit 21 MFh = own instructions are moved by other creature (host) bit 22 MTs = moves instruction to self bit 23 MTd = moves instruction to daughter bit 24 MTo = moves instruction to other cell bit 25 MTf = moves instruction to free memory bit 26 MTh = is written on by another creature (host) bit 27 MBs = executing other creatures code, moves inst from self bit 28 MBd = executing other creatures code, moves inst from daughter bit 29 MBo = executing other creatures code, moves inst from other cell bit 30 MBf = executing other creatures code, moves inst from free memory bit 31 MBh = other creature uses another cpu to move your instructions WatchTem = 0 set template bits in genome in genebank If the genebank is on, setting this parameter to a non-zero value will turn on a watch of whose templates are matched by whom. This information is recorded in the bit field in struct g_list: bit 7 TCs = matches template complement of self bit 8 TCd = matches template complement of daughter bit 9 TCo = matches template complement of other bit 10 TCf = matches template complement of free memory bit 11 TCh = own template complement is matched by other creature (host) bit 12 TPs = uses template pattern of self bit 13 TPd = uses template pattern of daughter bit 14 TPo = uses template pattern of other bit 15 TPf = uses template pattern of free memory bit 16 TPh = own template pattern is used by other creature (host) 0080aaa 0045aaa 0080aaa 0045aaa 0080aaa This is the list of cells that will be loaded into the soup when the simulator starts up. This example indicates that five cells will be loaded at startup, the ancestor 0080aaa alternating with the parasite 0045aaa. These cells will be loaded in the bottom of the soup, with no space between them. Only NumCells genotypes from the list will actually be loaded, so the NumCells parameter should be modified when you change the number of genotypes that you wish to have loaded. Also, all genotypes to be loaded must also be listed in the file geneban1/list, and all of the genotypes must occur in the genebank. 8) THE ANCESTOR & WRITING A CREATURE 8.1) The Ancestor The ASCII assembler code file with comments, for the ancestor, is listed below. Below the listing I have some explanatory material. **** begin genome file (note blank line at head of file) format: 1 bits: 45750471 EXsh TCsh TPs MFsofh MTdf MB genotype: 0080aaa parent genotype: 0666god 1st_daughter: flags: 0 inst: 827 mov_daught: 80 breed_true: 1 2nd_daughter: flags: 0 inst: 809 mov_daught: 80 breed_true: 1 InstExe.m: 0 InstExe.i: 0 origin: 662270168 Wed Dec 26 22:56:08 1990 MaxPropPop: 0.8306 MaxPropInst: 0.4239 ploidy: 1 track: 0 track 0: prot xwr nop_1 ; 010 110 01 0 beginning marker nop_1 ; 010 110 01 1 beginning marker nop_1 ; 010 110 01 2 beginning marker nop_1 ; 010 110 01 3 beginning marker zero ; 010 110 04 4 put zero in cx or1 ; 010 110 02 5 put 1 in first bit of cx shl ; 010 110 03 6 shift left cx (cx = 2) shl ; 010 110 03 7 shift left cx (cx = 4) mov_cd ; 010 110 18 8 move cx to dx (dx = 4) adrb ; 010 110 1c 9 get (backward) address of beginning marker -> ax nop_0 ; 010 100 00 10 complement to beginning marker nop_0 ; 010 100 00 11 complement to beginning marker nop_0 ; 010 100 00 12 complement to beginning marker nop_0 ; 010 100 00 13 complement to beginning marker sub_ac ; 010 110 07 14 subtract cx from ax, result in ax mov_ab ; 010 110 19 15 move ax to bx, bx now contains start address of mother adrf ; 010 110 1d 16 get (forward) address of end marker -> ax nop_0 ; 010 100 00 17 complement to end marker nop_0 ; 010 100 00 18 complement to end marker nop_0 ; 010 100 00 19 complement to end marker nop_1 ; 010 100 01 20 complement to end marker inc_a ; 010 110 08 21 increment ax, to include dummy instruction at end sub_ab ; 010 110 06 22 subtract bx from ax to get size, result in cx nop_1 ; 010 110 01 23 reproduction loop marker nop_1 ; 010 110 01 24 reproduction loop marker nop_0 ; 010 110 00 25 reproduction loop marker nop_1 ; 010 110 01 26 reproduction loop marker mal ; 010 110 1e 27 allocate space (cx) for daughter, address to ax call ; 010 110 16 28 call template below (copy procedure) nop_0 ; 010 100 00 29 copy procedure complement nop_0 ; 010 100 00 30 copy procedure complement nop_1 ; 010 100 01 31 copy procedure complement nop_1 ; 010 100 01 32 copy procedure complement divide ; 010 110 1f 33 create independent daughter cell jmp ; 010 110 14 34 jump to template below (reproduction loop) nop_0 ; 010 100 00 35 reproduction loop complement nop_0 ; 010 100 00 36 reproduction loop complement nop_1 ; 010 100 01 37 reproduction loop complement nop_0 ; 010 100 00 38 reproduction loop complement if_cz ; 010 000 05 39 dummy instruction to separate templates nop_1 ; 010 110 01 40 copy procedure template nop_1 ; 010 110 01 41 copy procedure template nop_0 ; 010 110 00 42 copy procedure template nop_0 ; 010 110 00 43 copy procedure template push_ax ; 010 110 0c 44 push ax onto stack push_bx ; 010 110 0d 45 push bx onto stack push_cx ; 010 110 0e 46 push cx onto stack nop_1 ; 010 110 01 47 copy loop template nop_0 ; 010 110 00 48 copy loop template nop_1 ; 010 110 01 49 copy loop template nop_0 ; 010 110 00 50 copy loop template mov_iab ; 010 110 1a 51 move contents of [bx] to [ax] (copy one instruction) dec_c ; 010 110 0a 52 decrement cx (size) if_cz ; 010 110 05 53 if cx == 0 perform next instruction, otherwise skip it jmp ; 010 110 14 54 jump to template below (copy procedure exit) nop_0 ; 010 110 00 55 copy procedure exit complement nop_1 ; 010 110 01 56 copy procedure exit complement nop_0 ; 010 110 00 57 copy procedure exit complement nop_0 ; 010 110 00 58 copy procedure exit complement inc_a ; 010 110 08 59 increment ax (address in daughter to copy to) inc_b ; 010 110 09 60 increment bx (address in mother to copy from) jmp ; 010 110 14 61 bidirectional jump to template below (copy loop) nop_0 ; 010 100 00 62 copy loop complement nop_1 ; 010 100 01 63 copy loop complement nop_0 ; 010 100 00 64 copy loop complement nop_1 ; 010 100 01 65 copy loop complement if_cz ; 010 000 05 66 this is a dummy instruction to separate templates nop_1 ; 010 110 01 67 copy procedure exit template nop_0 ; 010 110 00 68 copy procedure exit template nop_1 ; 010 110 01 69 copy procedure exit template nop_1 ; 010 110 01 70 copy procedure exit template pop_cx ; 010 110 12 71 pop cx off stack (size) pop_bx ; 010 110 11 72 pop bx off stack (start address of mother) pop_ax ; 010 110 10 73 pop ax off stack (start address of daughter) ret ; 010 110 17 74 return from copy procedure nop_1 ; 010 100 01 75 end template nop_1 ; 010 100 01 76 end template nop_1 ; 010 100 01 77 end template nop_0 ; 010 100 00 78 end template if_cz ; 010 000 05 79 dummy instruction to separate creature **** end genome file Each genome file begins with some header information. Let me explain each item: format: 1 because we occasionally change the format of the genome files, this parameter is included for backwards compatibility. It is used by the assembler/disassembler to know how to read and write the files. bits: 45750471 this is the bit field associated with each genome in the genebank. If the genebanker is on and if any of the parameters: WatchExe, WatchMov, or WatchTem are set to a non-zero value, then bits in this field will be set to characterize the ecological characteristics of the genotype. The definitions of the bits in the field are given in the tierra.h module, and above in the description of the soup_in parameters. For more specific details, follow the Watch variables in the source modules to see exactly what they are doing. EXsh TCsh TPs MFsofh MTdf MB this is an ASCII summary of the meaning of the bits that are set in the bit field. The meanings of these abbreviations are given in the tierra.h file and above in the description of the soup_in parameters. genotype: 0080aaa This is the name of this genotype. The name has two parts. The first part is numeric and must be equal to the size of the cell of this creature (how large is its allocated block of memory). The cell size usually, but not always, corresponds to the size of the genome. The second part is a unique (and arbitrary) three letter code to distinguish this particular genotype from others of the same size. parent genotype: 0666god This is the name of the genotype of the immediate ancestor of this genotype. The immediate ancestor is the creature, whose cpu gave rise to the first individual of this genotype. The original creature, 0080aaa was created by god and the devil. 1st_daughter: This is a set of metabolic data about what transpired during the production of the first daughter by this genotype. flags: 0 This tells us how many errors (flags) were generated during the first reproduction. The generation of errors indicates invalid execution of instructions and causes the creature to move up the reaper queue, closer to death. inst: 827 This tells us how many instructions were executed during the first reproduction, this is an indication of metabolic costs and efficiency. mov_daught: 80 This tells us how many instructions were copied from the mother to the daughter during the first reproduction. breed_true: 1 This tells us if the first daughter ever has the same genotype as the mother. 2nd_daughter: flags: 0 inst: 809 mov_daught: 80 breed_true: 1 This is a set of metabolic data about what transpired during the production of the second daughter by this genotype. The data are the same as those from the first daughter. The second daughter and those that follow generally have the same metabolic data, but they also generally differ from the first daughter, because the second time through, the parent often does not examine itself again, and it does not start the algorithm from the same place. InstExe.m: 0 At the time this genotype first appeared, the system had executed this many millions of instructions, plus the remainder indicated by the InstExe.i parameter. InstExe.i: 0 At the time this genotype first appeared, the system had executed this many instructions, plus however many millions indicated by the InstExe.m parameter. origin: 662270168 This is the system clock time at the first origin of this genotype. Wed Dec 26 22:56:08 1990 This is the system clock time at the first origin of this genotype. MaxPropPop: 0.8306 The maximum proportion of the population of cells of adult cells in the soup, attained by this genotype. MaxPropInst: 0.4239 The maximum proportion of space in the soup attained by adults of this genotype. ploidy: 1 The ploidy level of this genotype (i.e., this genotype is haploid). track: 0 Which copy of the genome will start executing at birth. This is only used when the ploidy level is greater than one (i.e., diploid). track 0: prot xwr nop_1 ; 010 110 01 0 beginning marker track 0: prot This tells us that the assembler code that follows is track one. If the genotype has a ploidy of 2, a second assembler listing will follow, and it will be labeled track 1. The word prot refers to the protection bits: xwr, or x = execute, w = write, r = read. nop_1 ; 010 110 01 0 beginning marker This is the first line of the actual genome. The first word, nop_1 is the assembler mnemonic for one of the two no-operation instructions. The semicolon indicates the beginning of comments. The digits 010 tell us what protection this instruction will have at birth. Only the write bit is set, so this instruction will be write protected, but open to reading or execution at birth. The digits 110 are a record of which instructions were executed by this creature's own CPU (first digit), and the CPUs of other creatures' (second digit), the third digit is not used at present. These bits are set when the WatchExe parameter is set. That the first two digits are set to one indicates that this instruction was executed both by its own CPU and by the CPU of another creature (perhaps a parasite, or a lost instruction pointer). The digits 01 are the actual hexadecimal op code of the instruction. It is this value that will actually be stored in the soup. The digit 0 just before the words ``beginning marker'' is a count of the Nth instruction in the genome. This is the first instruction, so it is numbered zero. The words ``beginning marker'' are a comment describing the intended purpose of this instruction. If you study the code of the ancestor, you may be perplexed by the reason for including the following instructions: zero ; 010 110 04 4 put zero in cx or1 ; 010 110 02 5 put 1 in first bit of cx shl ; 010 110 03 6 shift left cx (cx = 2) shl ; 010 110 03 7 shift left cx (cx = 4) mov_cd ; 010 110 18 8 move cx to dx (dx = 4) In the original version of the simulator, the size of the templates was determine by the value in the dx register. These five instructions loaded the dx register with the value 4, which is the size of the templates in this creature. Later, it was decided that this was a stupid way to determine template sizes. Now the parser just looks to see how many nops follow any instruction using them, and the number of consecutive nops determine the template size. Therefore, these five instructions don't do any useful work in the present model, but they have been left in place because the code still works. 8.2) Writing a Creature If you write your own creature, you must obey the following conventions: **** begin genome file (note blank line at top of file) format: 1 bits: 3 genotype: 0080aaa parent genotype: 0666god track 0: prot xwr nop_1 ; 010 nop_1 ; 010 **** end genome file Yank the above lines into the file you are going to write, to use as a template. You must have the following: 1) a blank line at the top of the file. 2) a line declaring the format and bits, just use the line given. 3) a line stating the genome size and three letter name, and that of the parent genotype. The genome size must match the actual number of instructions in the genome. The three letter name is arbitrary, you can make up any name, but I advise using a low letter name like aaa because these names are used in a base 26 numbering system by the genebanker, and the genebanker must allocate an array as big as the largerst of these numbers. You may make up the parent genotype size and age, it won't be used for anything, so its details don't matter, but it should have the format of four numeric digits followed by three letters. 4) a blank line 5) the line: track 0: prot, just use the line provided 6) the line: xwr, just use the line provided 7) the listing of assembler mnemonics, followed by a semicolon and a three digit code indicating the protection at birth. I recomment that you use the protection indicated. The listing of the 32 assembler mnemonics can be found at the end of the soup_in.h file. For a description of what they actually do, study the comments on the code of the ancestor listed above, and study the corresponding parser and execute functions in the two modules in parse.c and instruct.c. 9) IF YOU WANT TO MODIFY THE SOURCE CODE If you make some significant improvements to Tierra, we would welcome receiving the source code, so that we may integrate it into our version, and then make it available to others. All lines of source code should be 78 characters or less, or it will mess up the formatting of the code for distribution. The simulator has been designed so that it can be brought down, and then brought back up where it left off. This means that there can be no static local variables. Any variables that hang around must be global. They are declared and defined in soup_in.h if they are also soup_in parameters. Otherwise they are declared in declare.h, and all global variables are declared as externals in extern.h. The code for bringing the simulator up and down is in the tsetup.c module. The system is brought up by GetSoup(), which calls GetAVar() to read soup_in. All soup_in variables are read by the GetAVar() function. If a new simulation is being started, GetSoup() calls GetNewSoup(). If an old simulation is being restarted, GetSoup() calls GetOldSoup(). GetOldSoup() will read all global variables not contained in soup_in, and will also read in all arrays, such as the soup, the cells array, and the free_mem array. When the simulator goes down, and periodically during a run, all global variables are written to a file soup_out, and all global arrays such as soup, the cells array, the free_mem array, and the random number generator array, and some structures, are written to a binary file called core_out. Thus if you create any new global variables or arrays, be sure they are read by GetOldSoup(), and written by WriteSoup(). There are several obvious projects that I would like to comment on: 9.1) Creating a Frontend All I/O to the console is routed through the frontend.c module, so that it can be handled by a variety of front ends now under development. The simplest of these just uses printf to write to standard out. The frontend.c module is just a sketch at the moment. If your are going to work on the frontend, please get back to us for an updated version of the frontend.c module. The module is guaranteed to have been completely rewritten by the end of October 1991. 9.2) Creating New Instruction Sets If you want to create a new instruction set, more power to you. The relevant modules to study are: instruct.c, parse.c, soup_in.h, arginst.h, and configur.h. You will also need to study the definitions of struct cpu, struct InstDef, struct ArgInstDef, and struct inst, all in the tierra.h module. Note that the cpu structure includes an array of registers. The idea is that you may change the size of this array to make just about any changes you might want to the CPU architecture. You should avoid actually having to alter the structure definition in the tierra.h file. 9.3) Creating New Slicer Mechanisms If you want to experiment with artificial rather than natural selection, consider that selection is both a carrot and a stick. The carrot in this model is CPU time which is allocated by the slicers. The stick is the reaper. If you want to try to evolve algorithms that do useful work, your evaluation functions should be embedded into the slicer, and should allocate more CPU time to creatures who rank high. 9.4) Creating a Sexual Model Sex emerges spontaneously in runs whenever parasites appear. However, this sex is primitive and disorganized. I believe that the easiest way to engineer organized sex is to work with diploid creatures. The infrastructure to allow multiple ploidy levels is already in place. Notice that the definition of Instruction, the type of which the soup is composed is: typedef struct Inst Instruction[PLOIDY]; This means that if PLOIDY is defined as two, there are two parallel tracks for genomes. The instruction pointer will run down the track specified by the ce->c.tr variable in the cpu structure. We have not implemented any other controls over the tracking of the instruction pointer in diploid or higher models. This is future work. 9.5) Creating a Multi-cellular Model Multi-cellularity was the hallmark of the Cambrian explosion of diversity, and thus is likely a biological feature worth including in Tierra. Also, it is likely that a multi-cellular model is the appropriate one for evolving large application programs on massively parallel machines. How can we implement multi-cellularity? What does it mean in the context of Tierran creatures? Consider that at the conceptual core, multi-cellularity means that the mother cell determines what portion of the genome its daughter cell will express. For many daughter cells, the mother cells narrows their options by preventing them from expressing (executing) large portions of their genome (code). In the organic world this is done by loading the daughter cell with regulatory proteins which determine which genes will be expressed. In the Tierran world, the same result can be achieved by allowing the mother cell to set the position of the instruction pointer in the daughter cell, and also the initial values of the CPU registers. These acts can place the daughter cell into a portion of its code from which it may never be able to reach certain other parts of its code. In this way the mother cell determines what parts of the code are executed by the daughter. To facilitate this process, the divide instruction has been broken into three steps: 1) Create and initialize a CPU for the daughter. 2) Start the daughter CPU running. 3) Become independent from the daughter by loosing write privelages on the daughter space. Now, between steps 1 and 2, the mother can place values into the CPU registers and instruction pointer of the daughter. This will require and inter-CPU move instruction. The divide instruction takes an argument that determines which of the three steps is being performed. 10) KNOWN BUGS When Tierra runs, if the genebanker is on, a growing number of genomes will accumulate in RAM, causing memory useage to increase throughout a run. This will eventually lead to a memory allocation failure on DOS systems, or to thrashing on Unix systems due to the need to use virtual memory. The parameter RamBankSiz is designed to prevent the accumulation of too many genomes in the RAM bank, by swapping out the least used genomes when there are more than RamBankSize genomes in the genebank. At present this memory management does not work. Even when this is fixed, memory demands will still grow during a run because the genebanker must keep track of genomes swapped out to disk. When compiled with a Borland C compiler, Tierra will use the farrealloc() function to realloc several arrays during a run. The farrealloc() function is supposed to be able to reallocate arrays larger than 64K. Unfortunately the function does not work for arrays larger than 64K in most versions of Borland's compilers. The most recent versions of Borland C++ have fixed this bug. If you have an older version of the compiler, you can usually avoid the problem by setting CellsSize = SoupSize / 100 This should prevent the need to reallocate the Cells array, which is what usually generates the problem. Just be sure that the initial value of CellsSize is large enough that it does not need to be increased. When the system is brought down, and then brought back up where it left off, it continues writing birth and death records to the tierra.run or break.X files. However, if the system comes down due to being killed or due to a hardware crash, when it is brought back up, it will resume execution from the state when the simulator was last saved (see SaveFreq variable in section 5 above). The problem is that the birth and death records will now be appended to the end of the a file that contained all records up to the last buffered write before the crash. This means that the last part of the birth and death records will be incorrect. This bug will be fixed soon. Tom Ray University of Delaware School of Life & Health Sciences Newark, Delaware 19716 ray@tierra.slhs.udel.edu ray@life.slhs.udel.edu ray@brahms.udel.edu 302-451-2281 (FAX) 302-451-2753