home *** CD-ROM | disk | FTP | other *** search
- INDUCE
-
-
- Copyright 1986 - MicroExpert Systems
- Box 430 R.D. 2
- Nassau, NY 12123
-
-
- INDUCE implements the ID3 algorithm for the generation of rules from
- a data set to be described (we hope) in the article "Finding Knowledge
- in Data" in the November 1986 issue of BYTE. For details on the ID3
- algorithm see Quinlan's article in Machine Intelligence, An Artificial
- Intelligence Approach edited by Micalski, Carbonell and Mitchell. It is
- published by Tioga Publishing Co., Palo Alto CA, 1983.
-
- The program has been tested using Turbo Version 3.01A on an IBM PC.
- It has been run under both DOS 2.1 and Concurrent 4.1 . The source for
- this program is contained in two files, INDUCE.PAS and INDUCE.INC. The
- program produces one overlay file INDUCE.000 .
-
- INDUCE produces a knowledge base which can be used with MicroExpert.
- MicroExpert is an expert system shell written in Turbo Pascal for the IBM
- PC and Apple II. It is available for $49.95 and comes with complete
- source code. It can be order by writing to :
-
- McGraw-Hill Book Company
- P.O. Box 400
- Hightstown, NJ 08520
-
- Or calling 1-800-628-004 or in New York state 212/512-2999.
-
- We would be pleased to hear your comments, good or bad, or any
- applications and modifications of the program. Contact us at the above
- address or on BIX. Our id is bbt and we may be contacted via BIXmail or
- by leaving comments in the MicroExpert conference.
-
- - Bill and Bev Thompson
-
-
- Operation
-
- To start the program simply switch to the directory containing
- INDUCE.COM and INDUCE.000 and at the DOS prompt type INDUCE and press the
- ENTER key. The screen will clear and the message
-
- Example File (Press <ENTER> to quit.) :
-
- will appear. Type in the name of your example file and press the enter
- key. The file name should include the drive and path name if necessary.
- The default extension for example files is ".EX". The program will now
- read the example file. Error messages will be displayed on the screen.
- The program does not do very extensive error checking, so be sure to
- examine the example files and knowledge base to be sure that they make
- sense.
-
- Once the file has been read, the program will attempt to classify
- the example set. Each time an attempt is made to classify a partition of
- the example set, a "." is printed on the screen. The program is not
- particularly fast, so you will see the "."s crawl across the screen.
-
- You may see a "*" appear on the screen from time to time and then
- disappear. This indicates that garbage collection is in process. The
- program is attempting to reclaim memory which has been used, but is no
- longer accessible.
-
- When the classification process has been completed, the message
-
- Output the tree to what file (Press <ENTER> for screen) ?
-
- will appear. You may save the tree to a file or press <ENTER> to print it
- on the screen. The format of the tree is described in the BYTE article.
- If the size of the tree is such that its width exceeds 80 columns, it may
- not print properly. After displaying the tree on the screen, a message
- telling you to press any key to continue will be displayed. To print the
- tree on the printer enter "lst:" as the file name.
-
- For instance, the example file, profit.ex included with the program
- will produce the following tree as output:
-
- (age (old ( profit (down)))
- (new ( profit (up )))
- (midlife (competition (no (profit (up)))
- (yes (profit (down))))
-
-
- Next, the program will display
-
- Output the rules to what file (Press <ENTER> for screen) ?
-
- Enter the name of the file which is to contain the rules. If this file is
- to be a MicroExpert knowledge base, be sure to include the extension
- ".KB" to the file name. The program will also write a series of prompts
- for the attributes.
-
- The file profit.ex will produce the following rules ane prompts:
-
-
- 1
- If age is old
- then profit is down
- .
-
- 2
- If age is midlife
- and competition is no
- then profit is up
- .
-
- 3
- If age is midlife
- and competition is yes
- then profit is down
- .
-
- 4
- If age is new
- then profit is up
- .
-
-
- Prompt competition
- What is the value of competition ?
- .
-
- Prompt age
- What is the value of age ?
- .
-
- Finally the program will clear the screen and request a new example
- file. At this point you can enter a new example file or press <ENTER> to
- exit the program.
-
- Example Files
-
- Example files are simply Ascii text files which are created with a
- text editor. The program ignores blank lines and comments in the files.
- Comments begin with "(*" and end with "*)". A comment may extend over
- several lines. The first line in the file which is not a comment or a
- blank line must contain the attribute names. The format of this line is
-
- class name,attribute1,attribute2,.....
-
- The class name must come first, followed by the names of the attributes
- separated by commas. Leading and trailing blanks in attribute are not
- significant. Internal spaces are. Therefore, "dog and cat" is not the
- same as "dogandcat". The program is also case sensitive, so "Dog" is
- considered different from "dog". The program does not check for duplicate
- attributes, but of course, any knowledge base produced using duplicate
- attribute names is likely to be incorrect.
-
- Following the line containing the attribute names are one or more
- lines containing examples. Each example line contains a class value
- followed by a series of attribute values separated by commas. Each
- example must fit on one line. The general format is
-
- class value,value of attribute1,value of attribute2,.....
-
- The attributes must be in the same order as they are listed in the first
- line, although there is no way for the program to check on this. As with
- attribute and class names, internal spaces are significant, leading and
- trailing spaces are not. "don't care" values are indicated by a "*". A
- "don't care" value indicates that the value does not contribute to the
- example, i.e. you will get the same result no matter what value this
- attribute takes on.
-
- The following is the contents of the file for the example set in the
- BYTE article: (it is included with the program in the file profit.ex)
-
-
- (* Example file for Byte Article *)
-
- (* Copyright [c] 1986 MicroExpert Systems
- Box 430 RD 2
- Nassau, NY 12123 *)
-
- (* Attributes *)
-
- profit ,age ,competition ,type
-
- (* Examples *)
- down ,old ,no ,software
- down ,midlife ,yes ,software
- up ,midlife ,no ,hardware
- down ,old ,no ,hardware
- up ,new ,no ,hardware
- up ,new ,no ,software
- up ,midlife ,no ,software
- up ,new ,yes ,software
- down ,midlife ,yes ,hardware
- down ,old ,yes ,software
-
- The first example line can be read, "profit is down for a company whose
- age is old, has no significant competetition and whose product type is
- software."
-
- Numerical Attributes
-
- Numerical attributes are handled in the name manner as symbolic
- (non-numerical) attributes, except that ":number" is appended to the
- attribute name. ":number" is removed from the attribute name before
- printing and will not appear in either the tree or the knowledge base.
- Values for numeric attributes must be with in the range +/- 1.0E+37 to
- +/- 1.0E-37. The numbers may be entered in integer, real or floating
- point format. The following example set demonstrates the use of
- numerical attributes. There is a "don't care" value in the second
- example.
-
-
- (* Numerical Attribute Example file *)
-
- (* Copyright [c] 1986 MicroExpert Systems
- Box 430 RD 2
- Nassau, NY 12123 *)
-
- (* Attributes *)
-
- profit ,age:number ,competition ,type
-
- (* Examples *)
- down ,5.0 ,no ,software
- down ,2.5 ,* ,software
- up ,2.5 ,no ,hardware
- down ,5 ,no ,hardware
- up ,1 ,no ,hardware
- up ,1 ,no ,software
- up ,2.5 ,no ,software
- up ,1 ,yes ,software
- down ,2 ,yes ,hardware
- down ,5 ,yes ,software
-
-