home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-02-04 | 49.0 KB | 1,043 lines |
- 1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- EPISTAT
- Statistical Package
- for the IBM Personal Computer
-
- Version 3.2
-
-
-
-
- Written by:
-
- Tracy L. Gustafson, M.D.
-
- Copyright 1985
-
- 2
-
-
-
- INTRODUCTION
-
-
- EPISTAT is a collection of programs written in BASICA for
- statistical analysis of small to medium-sized data samples ( < 28
- samples or variables and < 2000 total data entries per file).
- The 25 programs in EPISTAT perform more than 40 common statistical
- tests or functions and provide utilities for data entry, editing,
- printing, graphing, sorting, selecting, transforming and crosstabs.
-
- The programs are intended to be as self-explanatory and user-
- friendly as possible. You do not need to memorize this guide
- before using the programs. On the other hand, neither the programs
- nor this manual purport to TEACH the proper use or interpretation
- of statistics. The user must have some familiarity with the kinds
- of data required and the underlying assumptions appropriate to each
- statistical test.
-
-
- For further explanations of tests, refer to:
-
- 1. Colton, Theodore. Statistics in Medicine. Little, Brown and Co.
- Boston, 1974.
- 2. Fleiss, Joseph. Statistical Methods for Rates and Proportions.
- John Wiley and Sons. New York, 1981.
- 3. Snedecor, George W. and Cochran, William G. Statistical Methods.
- Iowa State Univ. Press. Ames, Iowa, 1978.
- 4. Schlesselman, James. Case-Control Studies. Oxford Univ. Press.
- New York, 1982.
-
-
-
-
-
-
-
-
-
- CAVEAT:
- These programs have been tested extensively, but I cannot
- guarantee that they will work correctly with every possible data set.
- Incorrect results are usually due to errors in format or type of
- data entered. If you believe you have discovered an error in the
- programs, please write me. I intend to correct any bugs that are
- brought to my attention.
- It is good practice to regularly compare the results obtained
- by programs in EPISTAT with results obtained by your previous method
- of calculation. ANY unexpected result should be questioned and
- double-checked by reference to tables or another method of
- calculation.
-
- 3
-
-
-
-
-
-
-
- INDEX TO EPISTAT
-
- The following statistical tests and functions are available:
-
- TEST or FUNCTION PROGRAM NAME
- ---------------- ------------
- Analysis of variance (1 and 2-way)...................ANOVA
- Bayes' theorem.......................................BAYES
- Binomial distribution................................BINOMIAL
- Chi-square test and distribvtion.....................CHISQR
- Correlation coefficients.............................CORRELAT
- F distribution.......................................ANOVA
- Fisher's exact test..................................FISHERS
- Linear regression analysis...........................LNREGRES
- Mantel-Haenszel Chi-square test......................MHCHISQR
- Mantel-Haenszel for multiple controls................MHCHIMLT
- McNemar's test.......................................MCNEMAR
- Mean, median and standard deviation..................DATA-ONE
- Normal distribution..................................NORMAL
- Poisson distribution.................................POISSON
- Random sample generator..............................RANDOMIZ
- Rank sum test........................................RANKTEST
- Rates adjusted (direct and indirect).................RATEADJ
- Sample size calculations..........,..................SAMPLSIZ
- Signed rank test.....................................RANKTEST
- Student's T-test and T distribution..................T-TEST
-
-
-
-
-
-
- The following data-handling capabilities are provided:
-
- DATA MANIPULATION PROGRAM NAME
- ----------------- ------------
- Determine best test and program names................EPISTAT
- Graph histograms.....................................HISTOGRM
- Graph scattergrams...................................SCATRGRM
- Perform data transformations.........................LNREGRES
- Print data (sorted or input order)...................DATA-ONE
- Print crosstab reports...............................XTAB
- Select specific records..............................SELECT
- Transfer data between EPISTAT files..................FILETRAN
- Transfer data from FORTRAN to EPISTAT files..........FORTRANS
-
- 4
-
-
-
- SYSTEM REQUIREMENTS FOR EPISTAT
-
- MINIMUM OPTIMAL
- IBM PC with 64K RAM IBM PC with 96K RAM
- One 160K disk drive Two 320K disk drives
- Monochrome monitor Color graphics adapter
- BASICA Hi-res color monitor
- BASICA
- IBM, Epson, Okidata, or
- C. Itoh Prowriter printer
- with graphics capability
-
-
-
-
- OVERALL PROGRAM DESCRIPTION
-
-
- All calculations in EPISTAT are performed using single precision.
- Although it may first appear that double precision would be more
- appropriate for statistical tests, "double" precision makes little or
- no real improvement in the accuracy of these programs. For best
- results, data entries should be numbers between 1E+7 and 1E-7. Larger
- or smaller numbers should be multiplied by an appropriate power of 10
- before entry and analysis in EPISTAT.
-
-
- All EPISTAT programs are written so that as much pertinent
- information about the test as possible can fit on the final screen.
- This feature allows a summary printed copy to be produced simply by
- pressing <Shift-PrtSc>. This will work any time there is a pause in
- the program display. Six programs, "DATA-ONE", "HISTOGRM", "RANDOMIZ",
- "SCATRGRM", "SELECT", and "XTAB" produce printed reports without using
- <Shift-PrtSc>. In these, follow program instructions to route output
- to your printer.
-
-
- EPISTAT is the introductory program in the EPISTAT package.
- DATA-ONE is the major data entry, editing, and printing program. Most
- of the programs in EPISTAT can evaluate data entered and saved using
- DATA-ONE. Many of the programs can, in addition, evaluate summary
- data. The programs marked with a star (*) below can evaluate data
- entered in DATA-ONE. Non-starred programs provide their own data entry
- routines.
-
-
- The EPISTAT disk should be placed in drive A (or other default
- drive) when loading any program because "EPIMRG" and "EPISETUP.DAT" are
- used by every program. Once a program is running, EPISTAT can be
- removed from drive A if necessary.
-
- 5
-
-
-
-
- INDIVIDUAL PROGRAM DESCRIPTIONS
-
-
-
- (1) "EPISTAT"
-
- This introductory program lists the available programs. It also aids
- the user in selecting the best statistical test. To do so, choose menu
- option 2 and decide whether you are interested in tests for a single
- sample, tests for 2 or more samples, other statistical functions, or data
- handling utilities.
-
- You are also allowed to specify hardware configuration and colors for a
- color monitor. Choose colors 7,0,0 if you have a monochrome monitor
- connected to the color/graphics adapter. If yours is not one of the
- listed printers, check your printer's codes for the typeface you want.
- For example, the code for elite type on the Prowriter is ESC "E". If you
- press Escape then E, the display will show the decimal ASCII codes: 27
- 69. An alternate method is to press <Alt> and enter the decimal code on
- the numeric keypad. Press <Enter> when the complete code is entered.
-
-
- "DATA-ONE" *
-
- A. DATA ENTRY:
- This is the central keyboard data entry program for the EPISTAT package
- (for non-keyboard data entry, see FILETRAN and FORTRANS). Initial data
- entry (Option 1) first asks you to name your samples or variables. Then
- type in the data, pressing <Enter> after each entry. Press the TAB key
- to back up one or two items on the SAME ROW. The maximum number of
- samples or variables (S) allowed is 28 with a color adapter and 7 with a
- monochrome adapter. The maximum number of records in each sample is
- 2000/S. A missing value can be entered by pressing <Enter> only. Note
- that this is different than entering a zero (0). To exit, press key F10.
- The mean, median and (n-1) standard deviation are then displayed. When
- you return to the main menu, SAVE your datafile to disk (Option 5) for
- future modification or use by other programs in the EPISTAT package.
- Although all entries in a datafile are treated as numbers by
- DATA-ONE, it is possible to enter characters (names) in a record.
- Characters will be treated as zeros in calculations. Nevertheless, it
- improves data readability to use the "Sample 1" column for record or case
- names. Thus, DATA-ONE allows one to specify a name for each column
- (variable) and each row (case) in the datafile.
-
- B. DATA MODIFICATION:
- APPEND (Option 2) allows one to add more observations to a sample at
- a later session. EDIT (Option 3) allows one to delete or replace
- incorrect data entries and to change sample or variable names. When you
- return to the main menu, SAVE modified data to disk again.
-
- 6
-
-
-
-
-
- C. PRINTING DATA:
- To view or review a datafile, a printout to screen or printer can
- be selected (Option 4). To print a datafile exactly as it was keyed in,
- request the printout in INPUT order. DATA-ONE can also print the data
- SORTED by any selected sample. Only numeric data is sorted by DATA-ONE,
- so it will not alphabetize a character field. Blank records are not
- sorted, either.
-
-
- D. SAVING DATAFILES and LOADING DATAFILES:
- SAVING data (Option 5), writes your data to disk in a sequential
- file for later editing, review, or use by another program. DATA MUST BE
- SAVED TO DISK before it can be used by other programs in EPISTAT. Since
- EPISTAT must be in drive A: (or other default drive) to begin, you will
- probably want to SAVE datafiles on drive B. To do so, precede each
- datafile name with B: (e.g. B:TESTDATA). Do not enclose filenames in
- quotation marks.
-
-
- (3) "ANOVA" *
-
- A. ONE-way ANOVA:
- PURPOSE: To compare the means of 3 or more samples.
- DATA REQUIRED: A DATA-ONE datafile with 3 or more columns/variables.
- EXAMPLE: Are the mean ages of three groups of individuals
- significantly different?
- COMMENT: Sample means, (n-1) variances, the mean variance and the
- variance of the means are displayed. Total sum of squares,
- Treatment sum of squares and Error sum of squares are also
- shown. Finally the F value, degrees of freedom (df) in the
- numerator and df in the denominator and p value are given.
-
- B. TWO-way ANOVA:
- PURPOSE: To evaluate the combined effects of 2 variables on a third
- variable (ROW and COLUMN effects).
- DATA REQUIRED: A DATA-ONE datafile with at least 2 columns and 2 rows.
- EXAMPLE: How much of the variance in transparency of glass types is
- attributable to the kind of sand and how much to the process
- used to make it?
- COMMENT: All samples in two-way ANOVA must have the same number of
- elements. Sample means, (n-1) variances, Total sum of
- squares, Row sum of squares, Column sum of squares and
- Residual are all displayed. The F value, df in numerator,
- df in denominator and corresponding p values are shown for
- both the Row and Column effects.
- C. F-value:
- PURPOSE: To evaluate the p value associated with a known F value.
- DATA REQUIRED: F value, df in numerator, and df in denominator.
-
- REFERENCE: Snedecor, pp. 258-338.
-
- 7
-
-
-
-
- (4) "BAYES"
-
- A. Probabilities of false positive and false negative tests:
- PURPOSE: To evaluate a test or procedure in terms of its sensitivity
- and specificity.
- DATA REQUIRED: Sensitivity and specificity of a test in relation to
- a specific condition it tests for. The estimated incidence of
- this condition in the population being tested.
- EXAMPLE: If a test has a specificity of .99 and a sensitivity of .99,
- how many false positives will occur in a population where the
- incidence of this disease is only 100/10,100 ?
- Answer: 99% of positives will be false positives.
-
- B. Probability of disease given a positive test:
- PURPOSE: To determine the most likely disease given a certain positive
- test.
- DATA REQUIRED: The estimated incidence of several diseases in the test
- population. (Use `OTHER' as the last disease so that the sum
- of all percentages is 100). The probability of a positive
- test in people known to have each disease (test sensitivities).
- EXAMPLE: If antithyroid antibodies are found in patients with diabetes,
- thyroiditis and other diseases, what is the a priori
- probability of each diagnosis given a positive test? This
- will vary as the relative incidence of these diseases varies
- in the test population.
- COMMENT: Although the examples deal with the use of medical tests, the
- same statistical test applies to the relation of any test for
- any condition.
-
- REFERENCE: Fleiss, p. 5.
-
-
- (5) "BINOMIAL"
-
- PURPOSE: The binomial distribution allows calculation of the probability
- of an observed number compared to a known expected.
- DATA REQUIRED: A dichotomous variable that has an equal probability of
- occurring in each of N trials.
- EXAMPLE: What is the chance of obtaining 2 or fewer heads in 10 tosses
- of a fair coin?
- Answer: p = .055
- COMMENT: BINOMIAL calculates the ONE-tailed probability of the observed
- number and all more extreme situations. For example the
- ONE-tailed probability of 2 heads in 10 tosses of a coin is the
- sum of the probabilities for 0,1 and 2 heads.
-
- REFERENCE: Colton, p. 151.
-
- 8
-
-
-
- (6) "CHISQR"
-
- A. Table of data:
- PURPOSE: The Chi-square program evaluates a possible relationship
- between the row variable and the column variable.
- DATA REQUIRED: The counts for each cell of the table.
- EXAMPLE: Is there a relationship between race and socioeconomic group?
- COMMENT: 2 by 2 tables are evaluated using Yates' correction and the
- odds ratio and its confidence limits are calculated using
- Cornfield's method.
-
- B. Chi-square value:
- PURPOSE: To evaluate the p value associated with a known X-square value.
- DATA REQUIRED: The chi-square value and the degrees of freedom.
-
- C. Chi-square test for trend:
- PURPOSE: To evaluate a possible directional relationship between the
- row variable and the column variable. If the row is exposure
- level and the column is outcome, the relationship is called a
- `dose-response.'
- DATA REQUIRED: A number that describes each `exposure level'. (If they
- are not quantifiable, just use consecutive numbers.) The
- number of cases and controls at each exposure level.
- EXAMPLE: Is the risk of lung cancer directionally related to the
- number of pack-years of smoking?
-
- REFERENCE: Schlesselman, p. 175,177.
-
-
- (7) "CORRELAT" *
-
- A. Pearson's correlation coefficient:
- PURPOSE: To assess the linear relationship between two variables.
- DATA REQUIRED: A DATA-ONE datafile containing the two samples/variables
- of interest.
- EXAMPLE: How closely do age and blood pressure correlate?
- COMMENT: The correlation coefficient is calculated and then tested
- using the Student's T distribution for the probability that
- such a correlation would occur by chance.
- B. R value:
- PURPOSE: To evaluate the p value associated with a known R value.
- DATA REQUIRED: The R value and the number of observations in the sample
- from which it came.
-
- C. Spearman's rank correlation:
- PURPOSE: To assess the relationship between two variables that are not
- normally distributed (and only a small sample is available).
- DATA REQUIRED: A DATA-ONE datafile containing the 2 variables of interest.
- EXAMPLE: How closely do infant's ages at death correlate
- with birthweight?
- COMMENT: The correlation coefficient is calculated but associated
- p values are not calculated.
-
- REFERENCE: Colton, p. 212.
-
- 9
-
-
-
-
-
-
- (8) "FILETRAN" *
-
- PURPOSE: To transfer a sample or column of data from one EPISTAT
- datafile to another. This makes it unneccesary to re-enter
- data, even if you need to compare 2 samples that are in separate
- datafiles, or you have a data set with more than 28 variables
- that you split between two or more datafiles. You may
- create a new datafile by selecting one sample from DATAFILE #1
- and another from DATAFILE #2. FILETRAN can also combine two
- samples by APPENDING one to the other.
- DATA REQUIRED: Two DATA-ONE datafiles. First enter the datafile you
- with to replace, add or append a sample TO. Then enter the
- datafile you wish to transfer data FROM. After the data
- sample has been added, you may save the data under the original
- filename, or create a new datafile with the additional data
- in it. You may also cancel the file modification if you find
- you have made an error.
- EXAMPLE: You performed the same experiment on two different days and
- analyzed the results separately. Now you want to combine the
- results of both experiments and analyze the combined data
- set. FILETRAN will allow you to append the two files together
- and save that data under a new filename.
- COMMENT: If you want to append several columns of data from one ยท
- datafile to another, do not return to the main menu until all
- columns have been appended. Exiting between appending will
- leave large blank spaces in the file.
-
-
- (9) "FISHERS"
-
- PURPOSE: Fisher's exact test evaluates 2 by 2 tables of discrete
- variables.
- DATA REQUIRED: The counts for each of 4 cells of the table.
- EXAMPLE: Is there a relationship between being bald and dying of
- coronary heart disease?
- COMMENT: Fisher's exact test is particularly valuable when the
- Chi-square test is inappropriate because the expected value
- for a cell is less than 5. However, this program can
- evaluate some tables where A+B+C+D > 200.
-
- 10
-
-
-
-
- (10) "FORTRANS"
-
- PURPOSE: To transfer data from an SDF, FORTRAN, or sequential card
- image file into EPISTAT DATA-ONE format.
- DATA REQUIRED: A sequential card image file of equal-length records
- each delimited by a carriage return and line feed. The
- end of file must be marked by a CHR(26). You must know the
- record length (including spaces, but NOT including the carriage
- return and line feed at the end of each line), the beginning
- column number and width of each data item you want to transfer.
- If your datafile contains understood (but not marked) decimal
- places, then enter the number of decimal places. If your
- datafile contains marked decimal places, then enter 0 for
- (understood) decimal places. Finally, specify a missing value
- code like 9999. If you have no missing values, then enter a
- code that does not occur in your data set.
- EXAMPLE: You have a FORTRAN file on the mainframe with 10 years worth
- of data. You can select a subset of that data from a 6-month
- period and read that into EPISTAT for some pilot analyses
- before using mainframe time to analyze the entire data set.
- COMMENT: FORTRANS can be used to extract selected data items from
- DBASE(R) "SDF" type files and from LOTUS(R) "PRN" print files.
- Be sure to first look at the datafile you create from DBASE or
- LOTUS with your word processor in non-document mode to be sure
- that all records are of equal length and that you know which
- columns contain which data items. Some programs add extra
- spaces here and there when creating an SDF file. FORTRANS
- will not successfully read a datafile with more that 255
- columns of data in each record.
-
-
- (11) "HISTOGRM" *
-
- PURPOSE: To graph a data sample according to user specifications in the
- form of a histogram on the high resolution graphics screen.
- DATA REQUIRED: A DATA-ONE datafile. The full name of the variable to
- be graphed, its units, and the width of each cell in the
- histogram.
- EXAMPLE: What is the distribution of scores on the last exam?
- COMMENT: You determine the appearance of the report by entering a label
- for the horizontal axis and the interval width. To obtain a
- printed copy on the IBM, Epson, Okidata or Prowriter printer
- (specified in "EPISTAT" when you setup) press key F1. Press
- F10 to return to the program.
-
- 11
-
-
-
-
- (12) "LNREGRES" *
-
- A. Linear regression:
- PURPOSE: To calculate the least-squares regression line for paired
- samples.
- DATA REQUIRED: A DATA-ONE datafile and the sample numbers of the
- predictor and dependent variables.
- EXAMPLE: What is the regression line relating IQ to income?
- COMMENT: The regression line is displayed in the form Y = b + aX.
- The T distribution is applied to determine if the calculated
- slope is significantly different than zero. The T value,
- degrees of freedom and p value are shown.
-
- REFERENCE: Colton p. 199.
-
- B. Data transformations:
- PURPOSE: To change a data set in a regular way, either to normalize
- it or to identify a non-linear relationship between two
- variables.
- DATA REQUIRED: A DATA-ONE datafile with fewer than 28 variables in it.
- EXAMPLE: In my sample, IQ and income were not linearly related, so I
- will try a transformation to see if they are related
- logarithmically.
- COMMENT: Nine transformations are available:
- 1. Ax + B 6. A * ln(x) + B
- 2. A(x)squared + B 7. ln(x/(100-x))
- 3. A*square root(x) + B 8. Sample A + Sample B
- 4. A/x + B 9. Sample A * Sample B
- 5. x - mean
-
- Specify the value for A and B and the program will apply that
- formula to each value in the sample you want transformed. It
- then adds this transformed sample to the datafile as an
- additional column/variable. You may save the new datafile
- containing this transformed variable under the old name or
- under a new datafile name as you choose.
-
-
- (13) "MHCHISQR"
-
- PURPOSE: To evaluate the relationship between two discrete variables
- while controlling for the effect of a third variable.
- DATA REQUIRED: The names of the factors you wish to test for and control
- for as well as the counts of cases and controls that have and
- do not have the test and control variables. This is the
- equivalent of a series of 2 by 2 tables, one for each category
- of the control variable.
- EXAMPLE: Is there a relationship between smoking and lung cancer,
- controlled for occupation?
- COMMENT: The factor you are testing must be dichotomous, but the control
- variable may have more that 2 categories. The Chi-square value,
- degrees of freedom, and p value are displayed. Also shown
- are an odds ratio and 95% confidence limits on the odds ratio.
-
- REFERENCE: Schlesselman, pp. 183,206.
-
- 12
-
-
- (14) "MHCHIMLT" *
-
- PURPOSE: To evaluate the relationship between cases and controls and a
- test factor when each a case is matched with 2 or more controls.
- DATA REQUIRED: A DATA-ONE datafile or manually entered summary data. If
- using DATA-ONE, a case sample and a 2 or more control samples
- should be present. Data is coded as "1" for factor present,
- and "0" for factor absent in each case and control sample.
- EXAMPLE: Is there a relationship between illness and eating raw potatoes?
- COMMENT: The Chi-square value, degrees of freedom and p value are
- displayed. Also shown are an odds ratio and 95% confidence
- limits on the odds ratio. This test does not apply if each
- case is matched with a different number of controls.
-
- REFERENCE: Fleiss, p. 125.
-
-
- (15) "MCNEMAR"
-
- PURPOSE: Also called a paired Chi-square test, McNemar's test evaluates
- a relationship between two variables by analyzing the number
- of discordant PAIRS.
- DATA REQUIRED: The name of the factor being tested in CASES and CONTROLS
- and the number of pairs that belong in each of 4 cells.
- EXAMPLE: In twins in which one developed a stroke and the other did not,
- is there a relationship between high-fat diet and stroke?
- COMMENT: The Chi-square value is calculated using Yates correction, and
- degrees of freedom and p value are displayed. Also shown are an
- odds ratio and 95% confidence limits on the odds ratio.
-
- REFERENCE: Schlesselman, p. 210.
-
-
- (16) "NORMAL" *
-
- A. Comparing a sample mean to the population mean:
- PURPOSE: To see if your sample mean is different from a known population.
- DATA REQUIRED: A DATA-ONE datafile and a known population mean.
- EXAMPLE: Is the mean blood pressure in my sample statistically different
- from the U.S. population mean?
- COMMENT: The mean for the sample and the p value are displayed.
-
- B. Percent of test values in a given range:
- PURPOSE: To determine the percent of sample values that will fall between
- two values in a normally distributed population.
- DATA REQUIRED: The mean and standard deviation of the population being
- sampled. The upper and lower limits of the range in question.
- EXAMPLE: If the population mean height is 70 inches and the standard
- deviation is 3 inches, what proportion of the population are
- at least 65 inches but no more than 73 inches tall?
- Answer: 79.4 % of the population.
-
- C. Z value:
- PURPOSE: To evaluate the p value associated with a known Z value.
- DATA REQUIRED: The known Z value.
- COMMENT: A two-tailed p value is returned.
-
- 13
-
-
-
-
- (17) "POISSON"
-
- PURPOSE: To determine the probability of a certain number of cases or
- events, when the expected rate is known but the number of
- times when the case or event did not occur cannot be counted.
- DATA REQUIRED: The number of cases observed and the expected number of
- cases (calculated as expected rate * time interval).
- EXAMPLE: Is it unusual for lightning to strike 5 people in one county
- this year, given that in the last 5 years lightning has struck
- only 8 people in this county?
- Answer: p = .024
-
- COMMENT: The ONE-tailed probability of observing the given number AND
- all more extreme cases is displayed.
-
-
- (18) "RANDOMIZ"
-
- A. Survey sample:
- PURPOSE: To provide a series of random numbers to aid in selecting a
- survey sample from a large number of possible respondents.
- DATA REQUIRED: The smallest number and the largest number you want,
- and the number of random numbers between those values you
- want selected.
- EXAMPLE: I want to survey 100 individuals from the pages of the
- telephone book. The telephone book has 700 pages so I will
- ask for 100 numbers between 1 and 700 and then phone the
- tenth person on each of the randomly selected pages.
-
- B. Unpaired case-control sample:
- PURPOSE: To assign subjects to two equal groups randomly.
- DATA REQUIRED: The total number of subjects in the study.
- EXAMPLE: Assign 50 patients to receive drug A and 50 to receive drug B.
- COMMENT: You are also asked if subjects will enter the study over a
- period longer than one month. If so, you are warned that in
- many studies it is preferable to randomize each month's cases
- independently, so that seasonal biases do no creep in.
-
- C. Paired case-control sample:
- PURPOSE: To assign members of pairs to case and control groups randomly.
- DATA REQUIRED: The total number of pairs. You must also decide on an
- objective way of deciding which one of each pair is #1 and
- which is #2.
- EXAMPLE: Assign 20 pairs of patients to case and control groups randomly.
- COMMENT: Consecutive order of patients admitted to the hospital is not
- always a satisfactory method of deciding which of each is #1
- and which is #2. Alphabetic criteria, day of week, or other
- criteria entirely beyond the investigator's control are usually
- better.
-
- REFERENCE: Colton, p.259.
-
- 14
-
-
-
- (19) "RANKTEST" *
-
- A. Rank sum test:
- PURPOSE: To evaluate the difference between two unpaired non-parametric
- samples. Comparable to the unpaired T-test for normally
- distributed samples. It also specifically applies when
- quantitative variables are not available but qualitative
- ranks are.
- DATA REQUIRED: A DATA-ONE datafile or the number of observations in each
- of two samples and the sum of ranks for the first sample.
- EXAMPLE: Is the duration of remission different for leukemia patients
- treated with regimen #1 compared regimen #2? Duration of
- remission is measured in months and 8 cases and 10 controls
- have been followed for 5 years.
- COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are
- displayed for both groups. The two-tailed exact p value is
- then calculated. However, for samples larger than 12 to 15,
- the p value calculation can overflow the computer
- capabilities. In that case, stop the program by pressing
- Ctrl-Break, and refer to tables to evaluate the rank sums
- displayed. Note that even non-parametric samples larger than
- 30 can often be evaluated with parametric tests like the
- T-test (the central limit theorem).
-
- B. Signed rank test:
- PURPOSE: To evaluate the difference between two paired non-parametric
- samples. Comparable to the paired T-test for normally
- distributed samples. It also specifically applies when
- quantitative variables are not available but qualitative
- ranks are.
- DATA REQUIRED: A DATA-ONE datafile or the number of non-zero differences
- ranked and the sum of negative and then sum of positive-signed
- ranks.
- EXAMPLE: For paired rats from the same litter, does extra dietary
- vitamin E shorten the time it takes to complete a maze?
- COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are
- displayed for both groups. The two-tailed exact p value is
- then calculated. However, for samples larger than 12 to 15,
- the p value calculation can overflow the computer
- capabilities. In that case, stop the program by pressing
- Ctrl-Break, and refer to tables to evaluate the rank sums
- displayed. Note that even non-parametric samples larger than
- 30 can often be evaluated with parametric tests like the
- T-test (the central limit theorem).
-
- REFERENCE: Colton, pp. 219-222.
-
- 15
-
-
-
-
-
- (20) "RATEADJ" *
-
- A. Direct rate adjustment:
- PURPOSE: To adjust a rate to a standard population for comparison
- to other published rates.
- DATA REQUIRED: A DATA-ONE datafile that includes one sample containing
- the study rates to by adjusted (e.g. the rate in each age
- group if age-adjusting). A second sample must contain the
- standard population counts for the same groups. Rates in the
- first sample may use any denominator (per 1000, per million,
- etc), as you supply that denominator at the time of the
- calculation.
- EXAMPLE: Studying bladder cancer in Eskimos, you want to age-adjust
- to the standard U.S. population to compare to other studies.
- COMMENT: Direct adjustment may not be appropriate if the number of
- cases in any one cell is fewer than 5.
-
- B. Indirect rate adjustment:
- PURPOSE: To adjust sample observations to to a standard population rate
- for comparison to other published rates.
- DATA REQUIRED: A DATA-ONE datafile that includes one sample containing
- the number of cases observed in the study. A second sample
- must contain the standard population rates for the same
- groups. The standard population rates may use any denominator
- (per 1000, per million, etc), as you supply that denominator
- at the time of the calculation.
- EXAMPLE: Studying bladder cancer in Eskimos, you find only 2 or 3 cases
- in several of the younger age groups. You want to age-adjust
- to standard U.S. population rates to compare to other studies.
- COMMENT: In addition to age-adjusting, RATEADJ will calculate the
- probability of observing the number of cases (total) that you
- observed in your study. Enter the number observed and the
- Expected number will be displayed as well as the one-tailed
- POISSON probability of this outcome. The adjusted rate is
- displayed in the form: ` X times the standard population rate.'
-
- REFERENCE: Colton, pp. 47-51.
-
- 16
-
-
- (21) "SAMPLSIZ"
-
- A. Survey sample size:
- PURPOSE: To determine the sample size required to for a survey sample.
- DATA REQUIRED: The approximate size of the population from which
- you plan to draw the sample, your estimate of the rate of the
- study characteristic (the result of your study), the accuracy
- you require, and the z(alpha) level you wish to test.
- EXAMPLE: What sample size is required to determine the immunization
- levels in 2 year olds within 1% of the true value, given that
- there are 100,000 2 year-olds in the state, and we believe that
- 95% are immunized? Let z(alpha) correspond to 95% certainty.
- Answer: N = 1792
- COMMENT: TP = total population pi = population proportion
- d = maximum acceptable error in sample proportion
-
- n = [ z(a)*SQR(pi*(1-pi)) / d ] squared and N = n / (1+n/TP)
-
- B. Sample size for a paired case-control study:
- PURPOSE: To determine the number of cases and controls required for a
- paired case-control study.
- DATA REQUIRED: An estimate of the population rate of the study
- characteristic, the smallest difference you wish to be able to
- detect, and the z(beta) and z(alpha) levels of certainty you
- require.
- EXAMPLE: Paired rats are fed a normal diet plus or minus a suspected
- carcinogen. How many rat pairs must be studied to detect a
- 1% increase in the population cancer rate of 3% , given that
- z(beta) = 90% and z(alpha) = 95% ?
- Answer: N = 3429
- COMMENT:
- N = [(z(a)*SQR(pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT))) / (PT-pi)] squared
-
- REFERENCE: Colton, p. 161.
-
- C. Sample size for an unpaired case-control study:
- PURPOSE: To determine the number of cases and controls required for an
- unpaired case-control study.
- DATA REQUIRED: An estimate of the Control group rate (used as the
- population rate), whether the test group will be higher or lower
- than the controls, the smallest difference you wish to be able to
- detect, and the z(beta) and z(alpha) levels of certainty you
- require.
- EXAMPLE: How many case and control animals should be studied to determine
- if a new antibiotic cures cattle disease 10% better than current
- standard therapy? Current therapy cures 70% of animals. Let
- z(beta) = 90% and z(alpha) = 95%.
- Answer: 392 cases and 392 controls.
- COMMENT:
- [(z(a)*SQR(2*pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT)+PC*(1-PC))]
- N = [-----------------------------------------------------------] squared
- (PT - PC)
-
- REFERENCE: Fleiss, p 41 and Schlesselman, p. 168.
-
- 17
-
-
-
-
-
- (22) "SCATRGRM" *
-
- PURPOSE: To graph the relationship between paired variables according to
- user specifications on the high resolution graphics screen. To
- display the linear regression line.
- DATA REQUIRED: A DATA-ONE datafile containing two paired variables. The
- minimum and maximum values in each variable are displayed. You
- specify the labels and units to be printed on horizontal and
- vertical axes. Then enter an interval width for each variable.
- EXAMPLE: Graph the relationship between advertising expenditures and
- gross sales based on the last 10 years of experience at
- Company A.
- COMMENT: Be sure to pick an interval width that will result in 20 or
- fewer intervals on the vertical, and 60 or fewer intervals on
- the horizontal axis. To display the linear regression line
- press key F5. The formula for this regression line is
- displayed in LNREGRES (number 12 above). To obtain a printed
- copy on the IBM, Epson, Okidata or Prowriter (specified in
- "EPISTAT"), press key F1. Press key F10 to return to the
- program.
-
-
-
- (23) "SELECT" *
-
- PURPOSE: To select a subset of a datafile based on user specifications.
- Data can be selected for printing, or to create a new datafile
- on disk.
- DATA REQUIRED: A DATA-ONE datafile and knowledge of the selection
- criteria you want to apply. One can select on any variable
- with "AND" and "OR" specifications. As many as 10 selection
- criteria can be set at one time. SELECT assumes that "AND"s
- are in parentheses. For example:
- "SELECT IF Sample #1>10 AND Sample #2=1 OR Sample #1<Sample #3"
-
- is interpreted as meaning:
-
- "SELECT IF (Sample #1>10 AND Sample #2=1) OR Sample #1<Sample #3"
-
- EXAMPLE: You have a datafile containing all of the quality control
- results for a particular machine part this month. You want a
- new file created which contains only those parts that failed
- specifications. You may select all the samples that exceed
- quality criteria.
-
- 18
-
-
-
-
-
- (24) "T-TEST" *
-
- A. Paired and unpaired T-test:
- PURPOSE: To determine if the means of two samples are statistically
- different.
- DATA REQUIRED: A DATA-ONE datafile with the two samples to be compared.
- If a paired test is being performed, both samples must contain
- the same number of items.
- EXAMPLE: Is the mean weight gain of a herd fed on new Brand X
- significantly greater than the weight gain of a second herd
- fed the standard brand feed?
- COMMENT: The means and variances of the two samples will be displayed,
- followed by the T value, degrees of freedom, and the p value.
- For the unpaired T-test, the equality of variances is tested
- to be sure that the assumptions of the T-test are met. If
- the variances are statistically different, the F value
- supporting that conclusion will be displayed. The confidence
- limits on the difference between the two values are also
- displayed.
-
- REFERENCE: Snedecor, p. 116.
-
- B. T value:
- PURPOSE: To evaluate the p value associated with a given T value.
- DATA REQUIRED: The T value and the degrees of freedom.
-
-
-
- (25) "XTAB" *
-
- PURPOSE: To crosstabulate data in 1,2 or 3-way reports. This provides
- the tabular couterpart of a scattergram.
- DATA REQUIRED: A DATA-ONE datafile containing at least as many variables
- as the number of ways you want to crosstabulate. The minimum
- and maximum values for each sample will be displayed and then
- you choose the interval width for each cell of the table. If
- you have coded data with sequential integers, choose a width
- of 1. If you have quantitative data, it is usually best to
- choose and interval that will result in fewer than 10 cells or
- the table will be difficult to read. In addition to choosing
- the interval, you are offerred the opportunity to label each
- row and column interval with the label of your choice to make
- a more readable report.
- EXAMPLE: What is the age by sex breakdown of hospitalized cases of
- meningitis?
- COMMENT: The crosstab report is printed on screen or printer. The
- number of missing values displayed is the number of cases
- where one or more of the samples involved contained a blank.
-
-
- 19
-
-
-
-
-
- THE EXAMPLE DATAFILE
-
-
- An example datafile, named "EXAMPLE", showing a sample of people,
- their ages and their systolic blood pressures, is included on the EPISTAT
- disk. To gain some familiarity with the appearance of an EPISTAT
- datafile, follow these steps:
-
- 1.) Press <Ctrl> and <Alt> and <Del> at the same time (or load BASICA,
- then type RUN "EPISTAT") to run the introductory program. Do not change
- the default configuration for now, but move on to the main menu.
-
- 2.) Choose Menu option 3 to run specific programs in the EPISTAT package.
-
- 3.) Choose program number 2 to run "DATA-ONE", the main data entry and
- printing program in EPISTAT.
-
- 4.) Choose Menu option 6 to load data from disk. Then enter the filename
- EXAMPLE without any quotation marks.
-
- 5.) Return to the main DATA-ONE menu and choose option 4 to print this
- datafile on your screen or printer. Print it once in input order,
- then try printing it sorted by Sample 2 or 3.
-
- 6.) Choose menu option 7 to exit DATA-ONE ,then enter Y because EXAMPLE
- was already saved to disk. Choose other EPISTAT program numbers to
- run ANOVA, HISTOGRM, LNREGRES, SCATRGRM, or XTAB with this datafile.
-
- 7.) Return to DATA-ONE to enter your own data for analysis.
-
- 20
-
-
-
-
- NOTICE
-
- ---------------------------------------------------------------------
- Users may copy EPISTAT and distribute it to others on the following
- conditions:
- 1. The programs are not modified in any way.
- 2. Individual programs are not distributed separately.
- 3. No fee is charged for copying or distribution.
- ---------------------------------------------------------------------
-
-
- ====USER-SUPPORTED SOFTWARE====
-
- The concept of user-supported software is based on three
- principles:
-
- 1. The value and utility of a software package is best assessed
- by each user on his or her own system with his or her own data.
- Only after using a program can one determine whether it serves
- one's personal applications, needs, and tastes.
-
- 2. The creation of independent personal computer software requires
- a substantial commitment of time and effort. Rather than
- replicate this effort time after time, the computing community
- can and should support individual creative efforts.
-
- 3. By encouraging users to copy programs, rather than spending
- large sums on copy-protection, authors can supply quality
- software at reduced cost. Users will support useful programs.
-
-
- If after using EPISTAT, you find it of value, your contribution
- in any amount will be appreciated ( $25 suggested ).
-
- Send contributions to:
-
- Tracy L. Gustafson, M.D.
- 1705 Gattis School Road
- Round Rock, Texas 78664
-
-
-
- Thank you.
-
-