OMVSP - A MultiVariate Statistical Package. O O MVSP is a menu-driven, easy to use program for analyzing data using Omultivariate numerical techniques. The available procedures are principal Ocomponents analysis, principal coordinates analysis, correspondence analysis, Oand cluster analysis. A number of similarity & distance measures and Odiversity indices are also available O O MENU CHOICES may be made by moving the reverse highlighted cursor to the Oappropriate choice, using the up and down cursor keys (8 & 2 on the numeric Okeypad), or by typing the letter preceding the menu choice. Help may be Oobtained by placing the cursor on the desired option and pressing the F1 key. O O DATA FILES may be produced with the MVSP data editor or any word processor Owhich creates ASCII files. The rows of the data matrix are the variables and Othe columns are the individual cases (objects or samples). O O ANALYSES are performed by choosing the appropriate menu selection and Oentering the name of the data file. A menu of options, with the default Ochoices, is presented. You may change any option to tailor the analysis to Oyour needs. Choosing 'RUN' then performs the analysis with those options. O O OUTPUT may be sent to the printer, the screen only, or to a file for further Omanipulation or entry into a graphics program. The file may be edited with a Oword processor to put it in the proper format for this purpose. O O MVSP - (C) Copyright Warren L. Kovach 1986 - 1990, All Rights Reserved O O COMMENTS, COMPLAINTS, COMPLIMENTS, AND BUGS - should be addressed to: O Dr. Warren L. Kovach O Institute of Earth Studies O University College of Wales O Aberystwyth, Wales SY23 3DB, GREAT BRITAIN O OThis is the registered version, MVSP Plus. It is not shareware and may not O be freely distributed. You are given a licence to make copies of this O program for archival purposes only. See manual for details. O O OSave Defaults Help Screen O O This option allows you to save all current settings for user specified Odefaults to the configuration file MVSP.CNF. These defaults include those Oin the configuration menus and those specified in the opening menus for each Oanalytical procedure. O O This configuration file is read each time the program is run, and the defaultOsettings are reinstated. The file MVSP.CNF should be in the same directory Oas the program files MVSP.EXE and MVSP.HLP. Alternatively, you may specify Owhere these files are to be found using the DOS SET command: O SET MVSP=x:\directory Oin which "x:" is the disk drive and "\directory" is the directory name. OQuit Menu Help Screen O O This allows you to quit this menu and return to the previous menu. O O O O O O O O O O OQuit Analysis Help Screen O O This option quits this menu and returns to the main menu, rather than Ocontinuing on with the analysis. You may then choose to run another analysis Oor quit MVSP. O O O O O O O O OPrinted Output Help Screen O O This option allows you to specify what results and other additional Oinformation (such as raw data) should be output and where they should be sent O(i.e. screen, printer, or a file). O O Your choices of output options may be saved to the configuration file for Ofor later recall. O O O O O OTransform Data Help Screen O O You may choose to have your data transformed before analysis. This option Oallows you to choose square root, log (base 2, 10, or e) or logratio Otransformation. O O The log transformations are performed on the values of (X + 1) to avoid Ocomputer errors when the datum is 0, as the log of 0 is undefined, and to Oavoid negative results when the datum is < 1. The square root Otransformation is performed on the original datum. O O The centered logratio transformation replaces data with the log of the ratio Obetween the data value and the geometric mean of the sample. ORun Analysis Help Screen O O This initiates the running of the analysis, using the options chosen from Othis menu. If you are having the results output to files, you will be Oprompted for filenames. O O As the analysis is running, you will see status messages telling you what Owhat is currently begin done. O O O O O OPrint Data Help Screen O O You may choose to have the data printed out so that you can check that they Ohave been input correctly. O O The page format for this printout may be modified through the data output Oformat section of the program defaults menu (type P O from the analysis menu). OYou may change the column width, page width, and decimal places. A narrower Ocolumn width will allow more data to be printed per page. O O O O OPrint Transformed Data Help Screen O O You may choose to have the transformed data printed out so that you can see Othe results of the log or square root transformation. O O The page format for this printout may be modified through the results output Oformat section of the program defaults menu (type P O from the analysis menu). OYou may change the column width, page width, and decimal places. A narrower Ocolumn width will allow more data to be printed per page. O O O O OOutput Destination Help Screen O O You may use this option to direct program results to a printer, file, or to Othe screen. All results will be sent to the chosen device, along with any of Othe optional output chosen from this menu. O O If you choose file or printer only, then only status messages will appear on Othe screen; otherwise the output will also scroll by on the screen. O O If output is to be directed to a file, you will be prompted for a file name Obefore the analysis begins. O O OTranspose Data Help Screen O O This option transposes the data matrix before analysis. Normally, you would Ohave the data file set up with the objects of interest as the columns and the Ocharacters or variables defining those objects as rows. Thus, in an ecologicalOstudy, the samples would be columns and the taxa rows. In a taxonomic study, Othe taxa would be columns and the characters rows. O O This option allows you to "flip" the data matrix so that the rows of the Omatrix are considered the objects of interest. O O O OMinimum Eigenvalue Help Screen O O This option allows you to specify the minimum eigenvalue to report. Any Oeigenvalues (and their associated eigenvectors and scores) less than this valueOwill not be reported. O O You may enter a specific value for the minimum eigenvalue, or you may choose Oone of two rules for determining the most appropriate minimum eigenvalue. HelpOscreens for the Minimum Eigenvalue menu explain these rules. O O You may also have all eigenvalues and vectors printed out (the same number Oas the rows of the data matrix). O OAccuracy of Solution Help Screen O O This value controls the accuracy of the eigenanalysis. The form of Oeigenanalysis used in MVSP makes repeated approximations of the solution, Ostopping when the desired accuracy level is reached. A lower value for desiredOaccuracy will give a more accurate solution, but will also cause the program toOrun longer. O O The accuracy value is APPROXIMATELY equal to the number of significant digitsOof accuracy in the major eigenvalues and eigenvectors (those greater than %10 Oof the variance). A value of 1.0E-6 (0.000001) should give a solution with at Oleast six significant digits of accuracy. O OChange Program Defaults Help Screen O O This option allows you to change many of MVSP's default settings and save Othese new defaults to a configuration file for later recall. The following Ooptions may be changed: O Disk drive for temporary storage of work files O Default data file pathname and data filename extensions O Printed output format: page width, column width, and number of decimal placesO Codes to send to printer for setting printer mode and resetting printer O Screen colors and screen output method (fast or slow) O Form of graphic output ODefaults saved to the configuration file will be automatically reinstated Oevery time MVSP is run. OGraph Results Help Screen O O You may choose to have scatter diagrams of the ordination scores drawn on theOscreen and printer. The graph can be drawn in graphics mode or character mode.OYou can choose which form to use through the Output Format menu in the Program ODefaults menu (option P of main and analysis menus) O O Graphics mode plots may be printed to an Epson-compatible dot-matrix printer Oby pressing 'P' when the graph is displayed on the screen. You may also use Ouse the DOS GRAPHICS command, which allows you to dump a graphics screen to a Oprinter using the Shift-PrtSc key combination. This may provide support for Oother types of printers. See your DOS manual for details. O OMatrix Input Help Screen O O This option allows you to choose the format for the input matrix. You may Ohave only the lower or upper half of the symmetrical matrix output, with or Owithout the diagonal, or the full matrix. O O 1 0.2 0.3 0.4 1 0.2 0.3 0.4 O 0.2 1 0.1 0.2 0.2 1 0.1 0.2 O 0.3 0.1 1 0.8 0.3 0.1 1 0.8 O 0.4 0.2 0.8 1 0.4 0.2 0.8 1 O O Lower half Upper half Full matrix O with diagonal no diagonal OPrincipal Components Analysis Help Screen O O Performs an R-mode principal components analysis of the input data, with a Ovariety of options, including centered or non-centered data, covariance or Ocorrelation matrix, raw or transformed (log or square root) data. O O In this analysis, the rows of the data matrix are the variables. Component Oloadings are provided for each row, indicating the 'influence' of that variableOon that PCA axis. Component scores are then provided for each column, the Oobjects of interest. These scores are the elements which are usually plotted, Oalthough the loadings may also be plotted. O O OPrincipal Coordinates Analysis Help Screen O O Performs an Q-mode principal coordinates analysis. The input data are in theOform of a symmetrical distance matrix. The distance measure must be metric, Oand any of the measures on the "Distances" menu of the Distances & SimilaritiesOprocedure of MVSP are suitable, with the exception of the Squared Euclidean ODistance. None of the similarity measures are suitable. O O O O O O OCorrespondence Analysis Help Screen O O Performs a correspondence analysis (or reciprocal averaging) of the input Odata, with a variety of options, including weighting of scores and raw or Otransformed (log or square root) data. O O In this analysis, the rows of the data matrix are the variables. CA scores Oare provided for both rows and columns; these are plotted separately here but Othey may also both be plotted on the same graph. O O O O OSimilarities and Distances Help Screen O O Calculates a variety of similarity and distance measures. The resulting Osymmetric matrix may be output to a separate matrix file which can then be readOby the cluster or principal coordinates analysis procedures or transferred to Oother programs. O O The measures are calculated in a pairwise fashion between the columns of the Oraw data matrix. The data may be transposed before calculation so that the Omeasures are between the rows of the original data matrix instead. O O O OCluster Analysis Help Screen O O Performs cluster analysis of a matrix of similarity or distance measures. OThe input file must be a symmetric matrix, and may be a lower half matrix, Oupper half matrix, or full matrix. The diagonal may be present or absent in Othe half matrices. O O Seven clustering procedures are available, including minimum variance Oclustering, which is limited to using squared Euclidean distances, which are Ocalculated by the similarity/distance procedure. A description of the dendro- Ogram may be output to a file for plotting using PLOTGRAM. O O ODiversity Indices Help Screen O O Calculates diversity indices for each column (sample) of the data matrix. O O Three diversity measure are available: Shannon, Simpson, and Brillouin. OEvenness is also calculated by dividing the diversity index by the log of the Ospecies count. You are given a choice of the log base to use in the analysis. OThe number of species per sample is also given. O O O O O OData Manipulation Help Screen O O This procedure allows for the creation and manipulation of data files. You Omay use the MVSP data editor to enter and edit the data matrix. There are alsoOoptions available which allow you to log or square-root transform the data, Otranspose the data matrix, convert the data to percentages or the octave Oabundance scale, and to drop rows or columns. O O O O O O OQuit MVSP Help Screen O O Ends current MVSP session and returns to the DOS prompt. O O If you have made any changes to the program options which you wish to save Ofor use in future session, this must be done before you exit. Changes may be Osaved to the configuration file MVSP.CNF by choosing the "Save defaults" Ooption in the Program Defaults menu (option P in main menu) or in the opening Omenu for each statistical procedure. O O Thank you for using MVSP! O O OChange Drive or Subdirectory Help Screen O O This option allows you to specify the disk drive and directory in which your Odata files will reside. When prompted, you may enter either a disk drive Odesignation (i.e. "B:") or a valid DOS pathname (i.e. "C:\MVSP\DATA"). The Oprogram will then look in that directory or disk for the data files and will Owrite all output files there. This default may be overridden by entering the Ofull pathname of a file when the program requests a filename. O O O O O OCenter Data Help Screen O O You may choose to have the data centered around the origin before the Osimilarity matrix is calculated, or to leave the data uncentered. Generally, Othe data are centered unless there is a specific reason for not centering. O O An uncentered PCA can be useful when the objects being analyzed fall into Otwo or more distinct groups, with little overlap. When such data are analyzed Owith the uncentered option chosen, these distinct groups will be associated Owith different axes. O O O OStandardize Data Help Screen O O You may choose to have the data either standardized by rows, thus forming a Ocorrelation matrix, or not standardized, forming a covariance matrix. O O Standardization is desirable to reduce the effects of dominant species or Oto standardize different units of measurement. Otherwise, the covariance Omatrix should be used. O O O O O OPrint Similarity Matrix Help Screen O O You may choose to have the covariance or correlation matrix printed out Obefore the eigenanalysis is performed. This is printed as a full matrix with Othe diagonal present. O O The page format for this printout may be modified through the output format Osection of the program defaults menu (type P O from the analysis menu). You Omay change the column width, page width, and decimal places. A narrower columnOwidth will allow more data to be printed per page. O O O OPrint Sorted Data Help Screen O O This option will cause the raw data to be printed out sorted by the first Oaxes of the ordination. If these first axes account for much of the variance Oin the data, the sorted data matrix will clearly show the patterns. O O This output is most effective when the data matrix is all in one, or a few, Oblocks. If you have a wide carriage printer or a dot matrix printer capable Oof printing in compressed mode, set MVSP to print out pages 130 columns wide Oand to use columns as narrow as possible. If you have 25 or fewer columns Oin your data matrix, a column width of 5 and a 130 column page will allow the Owhole data matrix to be printed in one block. These width options can be Ochanged through the Program Defaults menu (option P on main or analysis menu) OWeighting Strategy Help Screen O O You may choose to have the scores weighted in a variety of ways, based on theOformulae of Orloci (1978). Rare or common species may be upweighted by Omultiplying the original score by either the ratio of the total abundance for Oeach species and the total abundance of all species or vice versa. OAlternatively, the scores may be adjusted to a percent scale, so that the Ovalues for each axis range from 0 to 100. O O O O O OMatrix Output Help Screen O O This option allows you to choose the format for the output matrix. You may Ohave only the lower or upper half of the symmetrical matrix output, with or Owithout the diagonal, or the full matrix. O O 1 0.2 0.3 0.4 1 0.2 0.3 0.4 O 0.2 1 0.1 0.2 0.2 1 0.1 0.2 O 0.3 0.1 1 0.8 0.3 0.1 1 0.8 O 0.4 0.2 0.8 1 0.4 0.2 0.8 1 O O Lower half Upper half Full matrix O with diagonal no diagonal OCoefficient Help Screen O O You may choose one of a variety of distances or similarity coefficients to Ocalculate for these data. Twenty coefficients are available on two menus. OThe first menu contains distance measures and the second contains similarity Omeasures (both quantitative and binary). O O These coefficients are calculated in a pair-wise fashion between the columns Oof the data matrix. O O The help screens for the coefficient menus present the mathematical formulae Ofor each coefficient. In these formulae, i and j represent the columns of the Odata matrix, while k represents the rows. OPrint Results Matrix Help Screen O O You may have the resulting distance or similarity matrix printed out to the Oprinter or a file by setting this option to YES. If this is set to NO, then Othe results matrix will only be sent to the output matrix file, for input to Othe clustering or PCO procedures. O O O O O O O OEuclidean Distance: O O ┌ ┌ ┐2┐½ O Ed = │Σ │x - x │ │ O ij └ k└ ik jk┘ ┘ O O where: x = value of the kth element of the ith or jth variable O .k O O O O O OSquared Euclidean Distance: O O ┌ ┐2 O SEd = Σ │x - x │ O ij k└ ik jk┘ O O where: x = value of the kth element of the ith or jth variable O .k O O O O O OStandardized Euclidean Distance: O O ┌ ┌ ┐2┐½ O │ │ x - x │ │ O StEd = │Σ │ ik jk │ │ O ij │ k│ ──────── │ │ O │ │ sd │ │ O └ └ k ┘ ┘ O O where: x = value of the kth element of the ith or jth variable O .k O sd = standard deviation of all the elements of k O k OCosine Theta (Normalized Euclidean) Distance: O O ┌ ┌ ┐2┐½ O │ │ x x │ │ O CTd = │Σ │ ik jk │ │ O ij │ k│ ___ - ___ │ │ O │ │ ss ss │ │ O └ └ i j ┘ ┘ O where: x = value of the kth element of the ith or jth variable O .k O ┌ ┌ ┐2┐½ O ss = │Σ │x │ │ O . └ k└ .k┘ ┘ OManhattan Metric Distance: O O │ │ O MMd = Σ │x - x │ O ij k│ ik jk│ O O where: x = value of the kth element of the ith or jth variable O .k O O O O O OCanberra Metric Distance: O O ┌ ┐ O │ │x - x │ │ O │ │ ik jk│ │ O CMd = Σ │ ────────── │ O ij k│ x + x │ O └ ik jk ┘ O O where: x = value of the kth element of the ith or jth variable O .k O O OChord Distance: O O ┌ ┌ ½ ½ ┐2┐½ O Cd = │Σ │x - x │ │ O ij └ k└ ik jk┘ ┘ O O where: x = value of the kth element of the ith or jth variable O .k O O O O O OChi-Square Distance: O O ┌ ┌┌ ┐2┐┐½ O │ ││ x - x │ ││ O CSd = │Σ │└ ik jk ┘ ││ O ij │ k│ ────────── ││ O │ │ Σ x ││ O └ └ l lk ┘┘ O O where: x = value of the kth element of the ith, jth or lth variable O .k O O OAverage Distance: O O ┌ ┌ ┐2┐½ O Ad = │Σ │x - x │ │ O ij │ k└ ik jk┘ │ O │─────────────│ O │ n │ O └ ┘ O O where: x = value of the kth element of the ith or jth variable O .k O O n = number of elements in each variable OMean Character Difference Distance: O O │ │ O MCd = Σ │x - x │ O ij k│ ik jk│ O ──────────── O n O O O where: x = value of the kth element of the ith or jth variable O .k O O n = number of elements in each variable OMore Coefficients Help Screen O O This option will take you to a second menu of coefficients containing Oa number of similarity coefficients. O O O O O O O O O OPearson Product Moment Correlation Coefficient: O ┌┌ _ ┐ ┌ _ ┐┐ O Σ ││x - x │ - │x - x ││ O k └└ ik i┘ └ jk j┘┘ O Pcc = ──────────────────────────────── O ij ┌ ┌ _ ┐2┐½ ┌ ┌ _ ┐2┐½ O │Σ │x - x │ │ │Σ │x - x │ │ O └ k└ ik i┘ ┘ └ k└ jk j┘ ┘ O where: x = value of the kth element of the ith or jth variable O .k O _ O x = mean value of the ith or jth variable O . OSpearman Rank Order Correlation Coefficient: O ┌ ┐2 O 6Σ │ r - r │ O k└ ik jk ┘ O Scc = 1 - ─────────────────── O ij 3 O (n) - n O O where: r = rank order of kth element in the ith or jth variable O .k O O n = number of elements in each variable O OPercent Similarity Coefficient: O ┌ ┐ O Σ min│x , x │ O k └ ik jk ┘ O Cc = 200 ─────────────────── O ij ┌ ┐ O Σ │x + x │ O k └ ik jk ┘ O O where: x = value of the kth element of the ith or jth variable O .k O min = minimum of two values O OGower General Similarity Coefficient: O ┌ ┐ O GGSc = │Σ w s │ ÷ Σ w O ij └ k ijk ijk┘ k ijk O ┌ ┐ O where: s = 1-│ │x - x │ ÷ range(k)│ for quantitative data; O ijk │ │ ik jk│ │ O └ ┘ O = 1 for matches of binary or multistate data, otherwise = 0 O w = 0 for negative matches of binary data, otherwise = 1 O ijk OData types are declared by the first two characters of the data labels: O "B_" for binary, "M_" for multistate, anything else for quantitative OSorensen's Coefficient: O O 2a O Sc = ────────── O ij 2a + b + c O O where: a,b,c = frequency of matches or mis-matches in variables i and j: O O Var.j O Pres. Abse. O ┌─────────── O Var.i Pres.│ a b O Abse.│ c d OJaccard's Coefficient: O O a O Jc = ───────── O ij a + b + c O O where: a,b,c = frequency of matches or mis-matches in variables i and j: O O Var.j O Pres. Abse. O ┌─────────── O Var.i Pres.│ a b O Abse.│ c d OSimple Matching Coefficient: O O a + d O SMc = ───────────── O ij a + b + c + d O O where: a,b,c,d = frequency of matches or mis-matches in variables i and j: O O Var.j O Pres. Abse. O ┌─────────── O Var.i Pres.│ a b O Abse.│ c d OYule Coefficient: O O ad - bc O Yc = ───────── O ij ad + bc O O where: a,b,c,d = frequency of matches or mis-matches in variables i and j: O O Var.j O Pres. Abse. O ┌─────────── O Var.i Pres.│ a b O Abse.│ c d OMore Coefficients Help Screen O O This option will take you to a second menu of coefficients containing Oa number of distance coefficients. O O O O O O O O O OClustering Method Help Screen O O Seven different clustering methods are available. These different methods Oare based on the way in which the distances between newly fused groups and all Oother groups or objects is calculated. O O The minimum variance method is restricted to using squared Euclidean distanceOmatrices as input, but all other methods may use any distance or similarity Omatrix as input. However, the user should see works such as Greig-Smith (1983)Owhich discuss the properties of various combinations of distance measures and Oclustering methods. O O OTree Description File Help Screen O O You may have a description of the completed dendrogram output to a file, Owhere it can be used to plot a dendrogram using the PLOTGRAM program. This Odescription is composed of nested parentheses and commas that delineate the Ogroups that are fused together in the cluster diagram, and ':' followed by Onumbers that indicate the length of a branch. O O O O O O OTree Order File Help Screen O O You may have the order of the objects in the dendrogram output to a file. OThis file can be used as the basis for a translation table for the PLOTGRAM Oprogram, allowing names longer than the 8 character limit for MVSP to be Oplotted. The file can also be used as input for the SORTDATA program, which Oprints out the data matrix in graphic form, sorted in dendrogram order. O O O O O O ORandomize Input Order Help Screen O O Normally, the clustering procedure scans through the data matrix sequentiallyOto find the most similar objects, and fuses the first two most similar objects Ofound. If, however, a number of objects are of equal similarity, a different Odendrogram can be produced by fusing other objects of equal similarity first. OThis option allows you to randomize the input order, so that other possible Odendrograms can be generated. O O Note that the order of objects in the dendrogram can be changed without Ochanging the actual hierarchy of clusters. When comparing dendrograms, make Osure to focus on the branching order, NOT on the order of the object labels. O OConstrained Clustering Help Screen O O In constrained clustering, only objects which are next to each other in the Odata matrix are considered for fusion, so that the order of objects in the Ofinal dendrogram is the same as the order in the data matrix. O O This is particularly useful when the original order of the objects is Osignificant, such as with stratigraphic geological data. O O O O O OUnweighted Pair Group (UPGMA) O O In this average linkage clustering method, the distance between two groups isOdefined as the average of all distances between an object in one group and one Oin the other. O O This measure is considered unweighted because each object in the cluster is Ogiven equal weight when the average is calculated, whereas weighted clustering Oplaces equal emphasis on each group, regardless of the number of objects in theOgroup. The unweighted method is generally considered better except under Ocertain circumstances (see Weighted Pair Group help screen). O O OUnweighted Centroid O O In this average linkage clustering method, the distance between two groups isOdefined as the distance between the centroids of the two groups. This has the Odisadvantage that reversals in the dendrogram can form, in which two objects Othat are clustered can be more distant from each other than their centroid is Ofrom another object. O O This measure is considered unweighted because each object in the cluster is Ogiven equal weight when the average is calculated, whereas weighted clustering Oplaces equal emphasis on each group, regardless of the number of objects in theOgroup. O OWeighted Pair Group (WPGMA) O O In this average linkage clustering method, the distance between two groups isOdefined as the average of all distances between an object in one group and one Oin the other. O O This measure is considered weighted because each cluster is weighted by the Onumber of objects in the group, so that each group has equal weight, rather Oeach object. This is useful in cases where one group with many objects may Odominate groups with fewer objects when unweighted methods are used. O O O OWeighted Centroid (Median) O O In this average linkage clustering method, the distance between two groups isOdefined as the distance between the centroids of the two groups. This has the Odisadvantage that reversals in the dendrogram can form, in which two objects Othat are clustered can be more distant from each other than their centroid is Ofrom another object. O O This measure is considered weighted because each cluster is weighted by the Onumber of objects in the group, so that each group has equal weight, rather Oeach object. This is useful in cases where one group with many objects may Odominate groups with fewer objects when unweighted methods are used. O OMinimum Variance O O This clustering method uses a measure of within-group dispersion for each Ocluster. This is the sum-of-squares of the distance from each point to the Ocentroid of the cluster. O O During clustering, the next two objects or groups to be fused are chosen so Othat the within-group dispersion increases by the least amount. O O Unlike other clustering methods, this one is restricted to using squared OEuclidean distance as the input matrix. Any attempt to use another type of Odistance or similarity measure will generate a warning. O ONearest Neighbor O O In this clustering method, the distance between two groups is taken to be theOdistance between the two points, one from each group, which are closest. Thus Othe 'nearest neighbors' from each group represent the whole group. O O This method has a tendency towards 'chaining', in which single objects are Ocontinually added to on large cluster, forming a straggling staircase-like Odendrogram. O O O O OFarthest Neighbor O O O O O O O O O O O O OLog Base Help Screen O O Logarithms are used in the calculation of evenness and most of the diversity Oindices. You may choose which base of logarithms to use. The logarithm will Oaffect the size of the diversity unit, so the logarithm used should be Oexplicitly stated in any publication. O O O O O O O ODiversity Index Help Screen O O MVSP can calculate three different types of diversity indices. These are OSimpson's, Shannon's, and Brillouin's. See the help screens for each index Ofor the equations for these indices. O O Besides the diversity index, the number of species in each sample and the Oevenness will also be calculated. Evenness is defined as: O O E = H / log(n) O O where: H = diversity index O n = number of species in sample OSimpson O O 1 O H = 1 - ──────── Σ n (n - 1) O N(N - 1) j j j O O where: N = total number of individuals in sample O n = number of individuals of the jth species O j O O O O OShannon O O ┌ n ┌ n ┐┐ O │ j │ j ││ O H = - Σ │ ─── log │─── ││ O j└ N └ N ┘┘ O O where: N = total number of individuals in sample O n = number of individuals of the jth species O j O O O OBrillouin O O O H = log N! - Σ log n ! O j j O O where: N = total number of individuals in sample O n = number of individuals of the jth species O j O O O This index is appropriate if the data represent the whole of a finite Opopulation, rather than a random sample from an indefinitely large population. OScreen Colors Configuration Help Screen O O This option allows you to change the colors of six parts of the screen Odisplay, including: regular text, regular background, menu text, menu Obackground, menu frame, and error messages O O When one of the six parts of the screen display is chosen, you are then Opresented with a menu of the 16 colors which can be displayed on the Color OGraphics Adapter (8 colors for backgrounds). You may try different color Ocombinations until you find one you like. An option is provided to reset Othe colors to black and white. O O ODefault Data File and Work File Path Help Screen O O This option allows you to specify the disk drive and directory in which your Odata files will reside. When prompted, you may enter either a disk drive Odesignation (i.e. "B:") or a valid DOS pathname (i.e. "C:\MVSP\DATA"). The Oprogram will then look in that directory or disk for the data files and will Owrite all output files there. This default may be overridden by entering the Ofull pathname of a file when the program requests a filename. O O You will also be asked for a disk drive for storage of temporary work files Owhich are created if the data do not all fit in memory. A hard disk will be Omuch faster than a floppy. If your machine has memory greater than 640K, set Othe extra up as a RAMdisk and use that for the work files. ODefault Data File Extension Help Screen O O This option allows you to specify different filename extensions for differentOtypes of input and output files. These extensions will be assumed if you don'tOinclude an extension when entering a filename. These defaults will be Ooverridden if you specify another extension with the filename. O O Four different extensions may be entered for the following file types: O Raw data files (rectangular matrices) O Symmetric matrix data files (input to PCO and cluster analysis) O Regular output files (results of analyses) O PCO and cluster analysis output files O OOutput Format Help Screen O O This option allows you to change default settings for a number of aspects Oof the program output. These aspects include: O O Width of printed page O Width of output columns O Number of decimal places for output numbers O Printer codes for setting appropriate print mode and resetting printer O Screen output method O O O OReread Configuration File Help Screen O O This option will reread the configuration file MVSP.CNF and reinstate all Odefault options as specified in the file. O O This is useful for when you are experimenting with different default options Oand wish to return to the original settings. If you have saved the options Oduring experimentation, however, it will reinstate the options as specified Oat the time of the save. O O O O OText Color O O This is the color which will be used for most of the display text, including Oanalysis results and status messages. O O O O O O O O O OSpecial Text Color O O This is the color which will be used for special display text, including Oerror messages and help screens. O O O O O O O O O OBackground Color O O This is the color which will be used as the background for most of the normalOdisplay text. O O Background colors for the Color Graphics Adapter are restricted to the eight Olow intensity colors, while text can be displayed in sixteen colors, low or Ohigh intensity. O O O O O OMenu Text Color O O This color is used for the main text in the menu windows. O O The cursor bar of the menus is created by reversing the text and background Ocolors of the menu, so that the text is the background color and the bar is Oa low intensity version of the text color. This reversing process can create Osome unusual and unreadable results, so care must be taken in choosing the Omenu colors. Option F may be used to reset black and white colors if your Onew color combination is unreadable. O O O OMenu Background Color O O This color is used for the background of the menu windows. O O The cursor bar of the menus is created by reversing the text and background Ocolors of the menu, so that the text is the background color and the bar is Oa low intensity version of the text color. This reversing process can create Osome unusual and unreadable results, so care must be taken in choosing the Omenu colors. Option F may be used to reset black and white colors if your Onew color combination is unreadable. O O O OMenu Frame Color O O This color is used to draw the frame around the menus. The background Oof the frame will be in the background color of the main text. O O The menu title which is embedded in the frame will be in the menu text color Orather than the menu frame color. O O O O O O OReset Black And White O O Choosing this option will reset all colors to black and white. O O Remember, if any color combinations become so unreadable that you can't Oread the menu, typing "F" will let you start again. O O O O O O O ORaw Input File Default Extension Help Screen O O This option allows you to specify the default extension for the raw data Omatrix files used as input for the PCA, RA, similarity/distance, and Odiversity index procedures. O O These should be rectangular matrices, not square symmetrical, and the file Oheader should specify the number of both rows and columns. O O O O O OPCO & Cluster Analysis Input File Default Extension Help Screen O O The input files for PCO and cluster analysis, which must be square symmetric Omatrices, may be given different extensions to distinguish them from Orectangular data matrix files. The PCO and cluster analysis procedures will Oassume this extension unless another one is specified by the user. O O The similarities/distances procedure can output a symmetric matrix Oto a file in a format ready to be read by the PCO and cluster analysis Oprocedures. The default extension you choose with this configuration option Owill be added to the original filename to produce a new filename Oautomatically, unless you specify another name for the symmetric matrix file. O ORegular Output File Default Extension Help Screen O O This option allows you to specify a default extension to be used for output Ofiles. If you choose to have output placed into a disk file, the output file Owill by default be given the same name as the input file, but with a different Oextension; the one you specify here. You may also specify a different filenameOfor the output. O O O O O O OPCO & Cluster Analysis Output File Default Extension Help Screen O O The output for the PCO and cluster analysis procedures may be given a Oseparate default extension to avoid overwriting the output file of the Osimilarities/distances procedure, if it was run before the PCO or Ocluster analysis. O O O O O O O OTree Description Output File Default Extension Help Screen O O This is the default extension for the file which will contain the descriptionOof the dendrogram produced by the cluster analysis procedure. This extension Owill automatically be added to the original input file name to produce a new Ofilename, unless you type in another name. O O Note that the PLOTGRAM program, which reads these description files, assumes Oan extension of .PLG unless another one is specified. O O O O OTree Order Output File Default Extension Help Screen O O This is the default extension for the file which will contain the order of Oobjects in the dendrogram produced by the cluster analysis procedure. This Oextension will automatically be added to the original input file name to Oproduce a new filename, unless you type in another name. O O O O O O O OPage Width Help Screen O O This option allows you to set the default page width of printed output. ONormally, this would be 80 columns, but if you are using a printer with a wide Ocarriage and wide paper, or if you have a dot matrix printer which can be set Oto print in compressed mode, you can set the width to 130 columns. O O If you opt for using compressed mode on a dot matrix printer, you may also Ouse the "Printer Codes" option to automatically send the appropriate codes to Othe printer for switching to compressed mode and the codes for resetting the Oprinter at the end of an MVSP analysis. O O OResults Column Width Help Screen O O This option allows you to set the default column width for printed output. OAll numbers and labels on the printout will be limited to this width. If a Onumber to be output has more digits than the default column width, the whole Onumber is output, and adjacent numbers will be shifted over. Therefore, Obe sure to use a column width wide enough to accommodate your data and expectedOresults. O O With numbers, this column width is for the entire number, including the Odecimal point, decimal fraction, and the space between numbers. Thus O" 2345.67" requires a column width of at least 8 spaces. O OResults Decimal Place Help Screen O O This option allows you to set the number of decimal places to display for Othe results. All numbers will be rounded to this number of decimal places. O O This setting is also used by the distances/similarities procedure when Owriting values to the symmetric matrix output file, and thus the accuracy Oof numbers input to the cluster analysis program will be limited by this Ooption. I would suggest using at least three decimal places in this Osituation. O O O OData Column Width Help Screen O O This option allows you to set the default column width for printed output of Othe original data. All numbers and labels on the printout will be limited to Othis width. If a number to be output has more digits than the default column Owidth, the whole number is output, and adjacent numbers will be shifted over. OTherefore, be sure to use a column width wide enough to accommodate your data. O O O With numbers, this column width is for the entire number, including the Odecimal point, decimal fraction, and the space between numbers. Thus O" 2345.67" requires a column width of at least 8 spaces. O OData Decimal Place Help Screen O O This option allows you to set the number of decimal places to display for Othe original data. All numbers will be rounded to this number of decimal Oplaces in the printed output, the full-screen editing of data, and the output Ofiles for the data conversion routines. O O If you are only working with whole numbers, then you may set this value to 0.ORemember, however, that if you create a new file of log transformed or Opercentage data, that these will also be rounded to whole numbers, which would Onot be appropriate. O O This value does not affect the accuracy of data once they have been read. OPrinter Codes Help Screen O O This option allows you to specify printer codes which will be sent to your Oprinter at the beginning of an analysis. For instance, if you wish to have Othe results printed in compressed mode on a dot matrix printer, you may specifyOthe codes which invoke the compressed mode on your printer. A separate code Omay be entered to reset the printer at the end of an analysis. O O Example codes for Epson-compatible printers are given when you choose this Ooption. See your printer manual for codes for other modes and/or other Oprinters. O O OScreen Output Method Help Screen O O This option lets you toggle between two methods of screen output, direct Oscreen memory output and BIOS output. The direct memory method writes data Odirectly to the area of memory which controls the screen, while the BIOS Omethod uses calls to your computer's BIOS (basic input/output system). O O The direct output method is much faster, but only works on computers which Oare hardware-compatible with the IBM-PC (almost all IBM compatibles sold these Odays are hardware-compatible). Direct output also will cause problems when Oused under windowing, multitasking environments such as Microsoft's "Windows". OIf you are using one of these environments, then choosing BIOS output should Oallow MVSP to run correctly. OCheck Video "Snow" Help Screen O O On some brands of color graphics adapter boards (most notably IBM's Ooriginal), the fast method of writing directly to the screen memory can cause Ointerference, or "snow", on the screen. This occurs when both the program and Othe computer's operating system try to work on the screen memory at the same Otime. O O This option forces the program to check the screen memory before writing to Oit to make sure there will be no interference. This eliminates snow, but also Oslows down the output somewhat. If your graphics adapter is not susceptible Oto snow, then this option should be set to "No" for optimal speed. If snow Oappears, then set the option to "Yes". OGraphics Default Help Screen O O This option will allow you to change a number of defaults relating to the Oformat of the scatterplots produced by the PCA and CA procedures. These Odefaults include the type of scatterplot (produced with text characters or Odrawn in graphics mode), the width of the text plots, the number of plots per Opage of paper, and the manner in which to label the data points. O O O O O O OScatterplot Type Help Screen O O MVSP can produce scatterplots in two ways, in graphics mode where the plot Ois drawn on the screen using lines and dots, or in text mode where the graph Ois made up of characters such as '|' and '-'. Text mode plots are less Oprecise, since the placement of the points is limited to a grid of 70x22 or O110x55 characters. Graphics mode plots can only be printed on Epson-compatibleOdot matrix printers, or other printers supported by the DOS GRAPHICS screen Odump facility. O O Graphics mode plots can only be drawn on computers with graphics adapters O(CGA, EGA, VGA, or Hercules). If you have a monochrome adapter (MDA), the Oplots will automatically be drawn in text mode. OWide Text Plots Help Screen O O You may choose to have the text mode graphs plotted so that they fit an 80 Ocolumn screen and piece of paper, or you can have wide graphs printed out. OThe wide plots are 120 columns wide and 60 rows long, so that they must either Obe printed on a wide carriage printer, or your printer must be set to print in Ocompressed mode. These wide plots are more accurate than the 80 column plots, Osince more columns and rows are being plotted per graph. O O If you use the tiny print mode (compressed superscript with 12 lines per Oinch), then two graphs can be plotted per page; otherwise set the Plots Per OPage option to 1. O OPlots Per Page Help Screen O O This allows you to specify how many plots to draw before the program issues Oa form feed command to your printer to advance to the next page. In text mode Oand high resolution (EGA, VGA, Hercules) graphics mode you will normally be Oable to fit two plots per page. The printouts of the lower resolution plots Ocreated on a CGA adapter will be smaller, and three can fit on one page. If Oyou are using the DOS GRAPHICS command to do a screen dump of the plot, then Oset this option to one plot per page, as GRAPHICS usually rotates the screen Oimage to fill a whole sheet of paper. O O Various combinations of wide text plots and compressed and 'tiny' print can Orequire other settings for this option, and experimentation is necessary. OData Labels Type Help Screen O O There are two ways in which MVSP can plot data points. The first is to plot Oeach point using a sequential letter or symbol which corresponds to the symbolsOindicated for each row of the results output. This makes it easy to determine Owhich point goes with which object. You may also choose to have the first Oletter or symbol of the data labels in your data file plotted. If the objects Oyou are analyzing can be grouped, you can give different letters or symbols to Oeach group, so that you can see trends in the distribution of these groups. O O You should avoid using numbers as the first symbols of your labels, as the Otext graphs use numbers to indicate the number of data points which are Oplotted in the same space. O400 Line Graphics Mode Help Screen O O Some computers, like the AT&T 6300 and the Compaq Portables III and 386, Oprovide a special high resolution 640 x 400 graphics mode in addition to the Onormal 640 x 200 high resolution mode of the Color Graphics Adaptor (CGA). O O MVSP uses the special graphics features of the Borland Turbo Pascal compiler Oto automatically detect the type of graphics hardware in a computer and to Oadjust the graphics output accordingly. However, it is not able to detect Ocomputers which utilizes the special 400 line graphics mode; it will think theyOare normal CGA computers. If your computer is capable of generating a 400 lineOgraphics mode, then set this option to 'Yes' to take advantage of that higher Oresolution OPlots Per Analysis Help Screen O O This allows you to specify how many axes to plot for each analysis in the OPCA, PCO, and CA procedures. If, for instance, you choose 3, the first three Oaxes will be plotted against each other in all possible combinations (1 x 2, O1 x 3, 2 x 3). Entering -1 will cause the program to allow you to specify Othe number of axes to plot at runtime, after the eigenanalysis, so that you canOview the results before deciding. O O O O O OPrint Graphics, Epson Help Screen O O If you have an dot-matrix printer which is compatible with Epson printers, Oyou may automatically print out plots drawn in graphics mode. Setting this Ooption to "Yes" will cause the plots to be sent to the printer after they are Odrawn on the screen. O O If this option is set to "No", the plots will not be automatically printed. OYou may still print the graphics plot on an Epson printer by pressing "P" when Othe graph is displayed. The DOS GRAPHICS command may work in printing graphs Oon non-Epson compatible printers; see your DOS manual. O O A beep while attempting to print indicates an error; check your printer. OPrint Centered Similarity Matrix Help Screen O O Before eigenanalysis, the input matrix is centered by subtracting the row Oand column means and adding the grand mean to all data points. Distance Omatrices are first converted to similarities by multiplying the square of Oeach value by -0.5. O O You may choose to have this new centered matrix printed out before Oeigenanalysis so that you can see the results of these transformations. O O O O OAll Eigenvalues Help Screen O O When this option is chosen, all resulting eigenvalues and associated Oeigenvectors and scores will be printed out. There will be as many eigenvaluesOand eigenvectors as there are rows in the data matrix. O O It is often useful in PCA to examine the last eigenvectors. Outlying or Ounusual variables will often have high loadings on either the first few Oeigenvectors (those with the highest eigenvalues) or the last few. Those whichOappear on the last few eigenvectors will usually be outliers in respect to Otheir correlation with other variables, and therefore will not be obviously Odifferent in the original data matrix. O OKaiser's Rule Help Screen O O When this option is chosen, only those eigenvalues (and associated Oeigenvectors) which are greater than the mean of all eigenvalues will be Oprinted out. When a correlation matrix is used, all eigenvalues greater than O1.0 are printed. O O Note that choosing this rule is the equivalent of entering '-1' for the Ominimum eigenvalue in MVSP ver. 1.31. O O O O OJolliffe's Rule Help Screen O O This is a modification of Kaiser's rule. The minimum eigenvalue is taken to Obe the mean eigenvalue multiplied by 0.7 (or simply 0.7 when the correlation Omatrix is used). This gives a slightly lower minimum eigenvalue and usually Oresults in one or more extra eigenvalues and vectors being printed. O O O O O O O OUser Supplied Minimum Eigenvalue Help Screen O O This option gives you the flexibility of specifying the exact value for the Ominimum eigenvalue. O O O O O O O O O OEnter/Edit Data Help Screen O O MVSP contains a spreadsheet-like data editor which allows you to enter your Odata directly into an MVSP file in the proper format. You may edit an existingOfile or create a new one, and you may insert existing data into the new file Oso that modifications are saved under the new name. O O All changes and additions are saved by pressing the F9 or F10 keys. A backupOfile with the previous version of the file will be created, with the extension O".BAK". O O Pressing F1 while in the data editor will give a help screen explaining otherOkeystrokes. OTransform Data File Help Screen O O You may choose to have your data transformed and saved to a new file. This Ooption allows you to choose square root, log (base 2, 10, or e) or logratio Otransformation. O O The log transformations are performed on the values of (X + 1) to avoid Ocomputer errors when the datum is 0, as the log of 0 is undefined, and to Oavoid negative results when the datum is < 1. The square root Otransformation is performed on the original datum. O O The centered logratio transformation replaces data with the log of the ratio Obetween the data value and the geometric mean of the sample. OTranspose Data File Help Screen O O This option transposes the data matrix and saves it to a new file. Normally,Oyou would have the data file set up with the objects of interest as the columnsOand the characters or variables defining those objects as rows. Thus, in an Oecological study, the samples would be columns and the taxa rows. In a Otaxonomic study, the taxa would be columns and the characters rows. O O This option allows you to "flip" the data matrix so that the rows of the Omatrix are considered the objects of interest. O O O OConvert Data Help Screen O O This option allows you to convert your data to a number of different scales. OThe possible scales currently include percentage (0-100), proportions (0-1.0), Obinary (0,1), octave, and range-through. The octave scale is a 10 point Oabundance class scale which is logarithmically-based. With ecological data, Othis scale smooths over minor variation that may only reflect "noise", thus Oallowing the major trends to be studied more easily. The range through scale Ois used by biostratigraphers and assumes that a taxon is present in every Osample (column) between the first and last stratigraphic ocurrence. O O All these conversions are calculated column-wise, so that each datum is Oreplaced its percentage or proportion of the column total. ODrop Rows or Columns Help Screen O O This option allows you to create a new file containing a subset of the Ooriginal data. You will be presented with lists of the row and column labels. OYou may then move among them using the cursor keys and select rows or columns Oto be dropped by pressing the space bar. This will cause the label to blink, Oindicating that it has been selected for deletion. Pressing the space bar Oagain will turn off the "selection". O O O O O ODrop 'Zero' Help Screen O O This option allows you to remove all rows and columns that have totals of Ozero. Zero totals will have adverse effects on a number of the analyses, and Osome will not run if either the rows or columns have all zeros. O O It is particularly useful to use this option when you are creating a new Omatrix by dropping rows or columns. For instance, if a column only has two Onon-zero entries and you drop both those rows from the data matrix, you will Obe left with a zero column total. If you also choose the Drop Zero option, OMVSP will drop this column. O O