home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-12-04 | 108.0 KB | 3,083 lines |
-
-
-
-
-
-
-
-
-
- A Guide to Parallel Programming
-
-
-
- Using
-
-
-
- PCS-Linda and the Parallel Lan System
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Parallel Computer Solutions
-
- November 30, 1991
-
-
- Contents
-
-
-
- Disclaimer:. . . . . . . . . . . . . . . . . . . . . . . . . . 4
-
- Linda. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
- The Language Linda. . . . . . . . . . . . . . . . . . . . 5
- Linda Primitives. . . . . . . . . . . . . . . . . . . . . 6
- Tuples. . . . . . . . . . . . . . . . . . . . . . . . . . 6
- Elements. . . . . . . . . . . . . . . . . . . . . . . . . 6
- Actuals . . . . . . . . . . . . . . . . . . . . . . . . . 6
- Formals . . . . . . . . . . . . . . . . . . . . . . . . . 7
- Tuple Space . . . . . . . . . . . . . . . . . . . . . . . 7
- Tuple Matching. . . . . . . . . . . . . . . . . . . . . . 8
- Further Examples. . . . . . . . . . . . . . . . . . . . . 9
- Note on real elements . . . . . . . . . . . . . . . . . . 9
-
- Linda Instructions . . . . . . . . . . . . . . . . . . . . . 10
- IN. . . . . . . . . . . . . . . . . . . . . . . . . . . 10
- RD. . . . . . . . . . . . . . . . . . . . . . . . . . . 10
- OUT . . . . . . . . . . . . . . . . . . . . . . . . . . 11
- EVAL. . . . . . . . . . . . . . . . . . . . . . . . . . 11
- INP . . . . . . . . . . . . . . . . . . . . . . . . . . 12
- RDP . . . . . . . . . . . . . . . . . . . . . . . . . . 12
- Hello World Example . . . . . . . . . . . . . . . . . . 12
-
- Parallel Lan System. . . . . . . . . . . . . . . . . . . . . 14
- Requirements. . . . . . . . . . . . . . . . . . . . . . 14
- Packet Drivers. . . . . . . . . . . . . . . . . . . . . 15
-
- Master . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
- Multiple Processor Communication. . . . . . . . . . . . 17
- Different Masters . . . . . . . . . . . . . . . . . . . 18
- Machine . . . . . . . . . . . . . . . . . . . . . . . . 19
- DOS . . . . . . . . . . . . . . . . . . . . . . . . . . 19
- Extended Memory . . . . . . . . . . . . . . . . . . . . 19
- Master Installation . . . . . . . . . . . . . . . . . . 20
-
- Worker . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
- Worker Installation . . . . . . . . . . . . . . . . . . 22
-
- Developer. . . . . . . . . . . . . . . . . . . . . . . . . . 24
- Compiler Directives . . . . . . . . . . . . . . . . . . 26
- Units and System Code . . . . . . . . . . . . . . . . . 27
- Startup Procedure . . . . . . . . . . . . . . . . . . . 27
- Procedure Declarations. . . . . . . . . . . . . . . . . 30
- Global Variables. . . . . . . . . . . . . . . . . . . . 30
- Readln/Writeln. . . . . . . . . . . . . . . . . . . . . 30
- Formal Parameters . . . . . . . . . . . . . . . . . . . 31
- Nesting Procedures. . . . . . . . . . . . . . . . . . . 31
- Multiple Procedures . . . . . . . . . . . . . . . . . . 31
- CAUTION!. . . . . . . . . . . . . . . . . . . . . . . . 31
- Developer Program Design. . . . . . . . . . . . . . . . 32
- Tuning. . . . . . . . . . . . . . . . . . . . . . . . . 32
- Debugging . . . . . . . . . . . . . . . . . . . . . . . 33
- Main . . . . . . . . . . . . . . . . . . . . . . . . . 33
- Skeleton Code . . . . . . . . . . . . . . . . . . . . . 34
- What the Developer Doesn't Show . . . . . . . . . . . . 35
- RELAX!. . . . . . . . . . . . . . . . . . . . . . . . . 35
-
- PCS-Linda. . . . . . . . . . . . . . . . . . . . . . . . . . 36
- Linda Conversion Program. . . . . . . . . . . . . . . . 36
- Tuple Elements. . . . . . . . . . . . . . . . . . . . . 36
- Actuals - Integer/Real. . . . . . . . . . . . . . . . . 37
- Integers. . . . . . . . . . . . . . . . . . . . . . . . 37
- REALS . . . . . . . . . . . . . . . . . . . . . . . . . 38
- Formals . . . . . . . . . . . . . . . . . . . . . . . . 38
- INP & RDP . . . . . . . . . . . . . . . . . . . . . . . 38
- EVAL. . . . . . . . . . . . . . . . . . . . . . . . . . 39
- Tuple Size. . . . . . . . . . . . . . . . . . . . . . . 40
- What Does OUT (); Produce . . . . . . . . . . . . . . . 40
-
- An Example . . . . . . . . . . . . . . . . . . . . . . . . . 41
- Analysis. . . . . . . . . . . . . . . . . . . . . . . . 43
- Final Code. . . . . . . . . . . . . . . . . . . . . . . 44
-
- Mandelbrot . . . . . . . . . . . . . . . . . . . . . . . . . 46
- Graphics Screen . . . . . . . . . . . . . . . . . . . . 46
- Sequential Program. . . . . . . . . . . . . . . . . . . 47
- Parallel Programming Methods. . . . . . . . . . . . . . 51
- Developer . . . . . . . . . . . . . . . . . . . . . . . 51
- Columns . . . . . . . . . . . . . . . . . . . . . . . . 52
- Results . . . . . . . . . . . . . . . . . . . . . . . . 52
- Analysis. . . . . . . . . . . . . . . . . . . . . . . . 54
- Worker. . . . . . . . . . . . . . . . . . . . . . . . . 55
- Complete Code . . . . . . . . . . . . . . . . . . . . . 59
- Disclaimer:
-
- Copyright 1991 Parallel Computer Solutions. All Rights Reserved
-
-
-
- The terms and conditions governing the sale of Parallel Computer
- Solutions, hardware products and the licensing of Parallel Computer
- Solutions, software consist solely of those set forth in the
- written contracts between Parallel Computer Solutions, and its
- customers. No representation or other affirmation of facts
- contained in this publication, including but not limited to
- statements regarding capacity, response time, suitability for use
- or performance of products described herein shall be deemed to be
- a warranty by Parallel Computer Solutions, for any purpose or give
- rise to any liability by Parallel Computer Solutions, whatever.
-
-
-
-
- No part of this document may be copied or reproduced in any form or
- by any means without the prior written consent of Parallel Computer
- Solutions. Information is subject to change without notice.
-
-
-
-
-
-
-
-
-
- Printed in the United States of America.
-
-
-
-
-
-
-
-
-
-
-
-
- Linda is a registered trademark of SCIENTIFIC Computing Associates,
- Inc. PCS-LINDA is a trademark of Parallel Computer Solutions
- Parallel Lan System is a trademark of Parallel Computer Solutions
- Turbo Pascal is a copyright of Borland International, Inc.
-
- Linda
-
-
-
- Linda is a parallel processing coordination language. To
- most computer scientists and engineers, this is a new concept.
- Commonly, sequential languages such as C and Pascal are enhanced to
- support parallel processing; Concurrent C and Parallel Pascal are
- two such languages. However, if we look at the actual code that
- makes up a parallel application we see that not all of the code is
- parallel. It can't be. If a parallel application was completely
- parallel, one instruction would be executed on each machine. This
- is not reasonable of course. Parallel programs are written such
- that certain pieces of code are farmed off to other processors
- while other code is executed on the host machine to pull all of
- the pieces together. Now there are machines where this is not
- true but we will not discuss those. These pieces of code farmed to
- other processor are typically small and perform a specific
- operation on some data. They are commonly called processes.
- Let's look at a common real world situation to put these ideas
- together.
-
- Large, busy offices in a corporation usually are made up of
- many different people doing some task or process. If we were to
- let this office run itself, it would become a mess in a very short
- time. Some people would be doing tasks that had already been done.
- There would be miscommunication. You can probably think of many
- more things that could go wrong. What this office needs is an
- office manager. An office manager isn is responsible for pulling
- the individual people into a working team. Each team member
- working for the greater good of the corporation.
-
-
- The Language Linda
-
- The same can be said for Linda. Linda is a language developed
- at YALE University and is copyrighted by Scientific Computing
- Associates, Inc. Linda however lacks the common features you and
- I commonly think of when we hear about a new language. There are
- no loops, decisions, records, etc in Linda. All of these common
- functions are provided in some other language used in conjunction
- with Linda. This language could be C, Pascal, Forth or any other
- language. The primitive functions of Linda are coded in a
- particular language and can be used just as any other statement
- can be used.
-
- Linda coordinates the activities of many different executing
- processes on different machines. As we will see later, these
- machines do not have to be a single parallel machine such as the
- Connection Machine. The Linda system we will be using coordinates
- the activities of different processes running on IBM PCs connected
- by an ethernet local area network.
- As we describe some of the characteristics and feature of
- Linda, we will introduce our version of Linda called PCS-Linda.
- There are features of Linda that can be difficult to implement on
- top of an existing language and compiler therefore, we have not
- implemented everything in our version.
-
-
- Linda Primitives
-
- It may be surprising to find out that Linda has just six
- instructions. These six instructions are in, out, rd, eval, inp,
- rdp. Each of which will be discussed in detail later in this
- section. First we must talk about how Linda coordinates
- processes. Linda uses a distributed data structure called a
- tuple. A tuple is a data structure that can be accessed by any
- process thus the distributed part.
-
-
- Tuples
-
- Tuples have two parts: a name and some number of elements.
- The name of a tuple is any combination of up to sixteen
- characters. The name serves as the primary matching characteristic
- for the tuples.
-
- ( name, e1, e2, e3, e4, e5, e6 )
-
- The name can be either a variable ( string ) or enclosed in
- single quotes. The elements are filled from left to right,
-
-
- Elements
-
- A tuple can have from zero to six elements. Each of the
- elements will be a value or variable of a particular type
- supported in the host language Linda is coded in. In PCS-Linda the
- types can be integer, real, array, and record. Strings are not
- allowed in the current PCS-Linda system. Elements, in addition to
- having a specific type, can have a further description of actual
- or formal.
-
-
- Actuals
-
- When we speak of actual, we are concerned with an actual
- value. This value can be either the numerical representation or
- can be contained in a variable. When using a number as an
- element, we simply state the number in one of the element fields.
-
-
- ( 'example1', 1, 2 )
-
- is an example of a tuple with name 'example1' and actuals 1 and 2,
- each of type integer. If we had two variables a and b of type
- integer,
-
- a, b : integer;
-
-
- and we wanted to use the values contained in these variables, we
- would precede each of the variable by the & symbol. The tuple
-
- ( 'example2', &a, &b )
-
- would be an example of a tuple with name 'example2' and actuals a
- and b, of type integer. If this tuple was used in a program, the
- integer values assigned to the variables a and b would be compiled
- into the tuple definition.
-
- In addition to integers, reals or floating point values can be
- used as actuals. Thus we can have
-
- ( 'example3', 3.141592, 1.2e+08 )
-
- can also be used as a tuple. In most cases, the only thing that
- differentiates an integer from a real number is the decimal point
- and therefore must be included when specifying a real number.
-
-
- Formals
-
- Formal elements are more like variables. They are used to
- collect a value from a tuple. They are distinguishable by the
- lack of the & symbol. A tuple with formal elements would appear
- as
-
- ( 'example4', a, b, c )
-
- The variables a, b, and c would be assigned to values returned
- after a match has taken place on the tuple. Formal variables can
- be any of the previous data types mention except strings.
-
-
- Tuple Space
-
- Now that we have tuples, we have to have a place to keep them.
- This storage place is called the tuple space. The tuple space
- could be envisioned as a large bag full of tuples.
- The tuple space can be likened to shared memory in that all
- processors have access to the tuples in the tuple space. However,
- unlike shared memory, there are no critical sections in the tuple
- space. Any of the tuples can be used by any processor at any time.
- Now there are ways to control access to the tuple space by using
- tuples that emulate semaphores and the like. In order for a
- process to access a tuple, a matching has to take place. Thus a
- processor would sent a template of the tuple it would like to
- match in the tuple space. The machine holding the tuple space
- would perform a search on all of the tuple in the tuple space. As
- soon as a match is found, the matching tuple is sent to the
- processor that requested the match.
-
- The tuple space in our system is located on a single machine
- called a master. The master is responsible for holding tuples and
- matching tuples. When tuples are put into the tuple space, the
- master does not check for duplicate tuples. Any number of
- duplicate tuples can exist in the tuple space. There are specific
- rules that the master will follow when matching tuples.
-
-
- Tuple Matching
-
- RULE 1 : Tuple names must match both length and character for
- character.
-
- RULE 2 : Actuals match actuals if of the same type and contain the
- same value.
-
- RULE 3 : Actuals match formals if they are of the same type and
- length.
-
- RULE 4 : Formals never match formals.
-
- Let's look at each rule using some examples. Assume we have
- the following tuples in our tuple space.
-
- ( 'stuff', row, col, 1 );
-
- ( 'coord', &x, &y );
-
- ( 'work', &segments, &adder );
-
- ( 'Junk', 1, 3, 5 );
-
-
- Row, col, x, y are integers;
- Adder is a procedure;
- Segments is a record of length 18;
-
-
- We have to match the tuple ( 'stuff', 100, 200, start ) with
- a tuple in our tuple space. First off, RULE 1 is satisfied with
- the first tuple in tuple space because they both have names of
- length 5 and match character for character ( stuff = stuff ). The
- elements of the tuples are matched next. The tuple to be match
- has an actual element of type integer in the first position and
- value 100. The first tuple in tuple space has a formal of type
- integer in the first position. By RULE 3, the first elements
- match in each of these tuples. Further study shows that the second
- and third elements match as well.
-
-
- Further Examples
-
- The tuple ( 'stuff', 3.4, 200, start ) does not match the
- tuple ( 'stuff', row, col, 1 ) because the types of the first
- element are different.
-
- The tuple ( 'stuff', &a, &b, &start ) may or may not match the
- tuple ( 'stuff', row, col, 1 ). A determination cannot be made
- because we do not have a value for the variable start. If start
- equals one, then we have a match but if start equals any other
- integer, the tuples do not match.
-
-
- Note on real elements
-
- As we all know, computers have a tough time representing real
- or floating point numbers well. Some numbers can be represented
- exactly such as 0.5 but how does a computer represent the value of
- 2/3. At some point, the computer will have to round to 0.666666667
- which is not correct. This inaccuracy must be kept in mind when
- using real number as element types in tuples. If during a
- calculation, a real number is used in a tuple, there may not be a
- match because of rounding. Thus it is probably wise to avoid
- matching on reals if at all possible. In most cases, you will
- probably get a match but that one time when you don't, keep the
- above in mind when looking for the problem.
-
- Linda Instructions
-
- Now that we have tuples and a place to put them, we must look
- at the individual Linda instructions and determine what each one
- of them does and how to use them. This section will be ended with
- the common 'Hello World' problem coded in Linda and Pascal. The
- first four Linda instructions discussed are the most common ones
- used. The last two are variants of two of the common ones.
-
-
- IN
-
- The IN instruction is used to request a match on a tuple from
- tuple space. The format of the instruction is
-
- IN ( name, e1, e2, e3, e4, e5, e6 )
-
- Only the elements necessary are included in the tuple. The
- tuple specified with the IN instruction is sent to the master for
- a match with a tuple in tuple space. If there is a positive
- match, the master will return the matching tuple. If there are any
- formal elements, they are assigned the appropriate values from the
- matching tuple. If there is not a tuple in tuple space that
- matches the tuple sent by the IN instruction, the master keeps the
- tuple and continually check the tuple space for a match. In the
- meantime, the processor that sent the IN instruction will block
- execution until a matching tuple is sent. Once that master has a
- match, it is sent. When the master sends the matching tuple to
- the requesting processor, it is permanently deleted from tuple
- space. If there are multiple copies of the same tuple in tuple
- space, the first one is taken. The master will take the first
- tuple that it determines matches the tuple sent with the
- instruction.
-
-
- RD
-
- The RD instruction is somewhat identical to the IN
- instruction. The format of the RD is
-
- RD ( name, e1, e2, e3, e4, e5, e6 )
-
- The RD instruction sends a tuple to the master for a match
- with a tuple in tuple space. If the master has a match, it sends
- a copy of the tuple found in tuple space. If the master does not
- have a match, it will continually search tuple space for a match.
- Meanwhile, the processor that sent the RD instruction will block
- execution until a tuple is returned. When the master finds a
- tuple, it sends a copy of the tuple to the requesting processor.
- The tuple is not deleted from tuple space. This allows a single
- tuple to be read from any processor without the overhead of INing
- the tuple and OUTing the tuple just to get a copy of the tuple.
- The RD instruction should be used when communicating common data
- to a number of different processors.
-
-
- OUT
-
- We have seen how to request tuples from tuple space but how do
- we get tuples into tuple space to begin with. The OUT instruction
- enables us to put tuples into tuple space. The format of the OUT
- instruction is
-
- out ( name, e1, e2, e3, e4, e5, e6 )
-
- The tuple with the OUT instruction is sent to the master and
- is put into tuple space. Any number of the same tuples can be put
- into tuple space. In all cases of actuals, the actual values are
- sent to the master in the tuple. There are no pointers referencing
- data in a different processors memory.
-
- The master does not respond to the OUT instruction. It is
- simply taken for granted the tuple is put into tuple space. Once
- an OUT has been performed, any processor can access the tuple
- including the processor that issued the OUT instruction in the
- first place.
-
-
- EVAL
-
- The EVAL instruction is the most important of the Linda
- instruction. Without it, the other instructions are of no use.
- The EVAL instruction is used to put code into tuple space. This
- code is picked up by workers and executed. The format of the EVAL
- instruction is
-
- eval ( name, e1, e2, e3, e4, e5, e6 )
-
-
- In PCS-Linda, the eval instruction has not been fully
- developed to the specification of the original Linda EVAL. In
- PCS-Linda the format of the EVAL instruction is
-
- eval ( 'work', &procedure );
-
- All EVAL instruction must name the tuple 'work'. Because
- tuple with the same name can reside in tuple space at the same
- time this is not a problem. The first element of the tuple will be
- an actual designated by the & operator and the name of the
- procedure that is to be put into tuple space. So if we wanted a
- procedure called MANDEL to be put into tuple space to be executed,
- the following instruction would be used
-
- eval ( 'work', &mandel );
- INP
-
- The INP instruction is identical to the IN instruction except
- for one condition. If the master finds a tuple match when the
- instruction is received, it will return the tuple. If a tuple is
- not found in tuple space, the master returns a null tuple to the
- requestor. Thereby allowing the INP instruction to be evaluated
- as either true or false depending on whether or not a tuple was
- matched from tuple space. PCS-Linda implementation information of
- this instruction will be given later.
-
-
- RDP
-
- The RDP instruction is identical to the RD instruction except
- the master does not continually search for a match if its first
- attempt in unsuccessful. If a tuple is found, the matching tuple
- is returned to the requestor. If the master does not find a
- matching tuple in tuple space, the master will return null. Thus
- allowing the RDP instruction to be evaluated as either true of
- false depending on whether or not a tuple is returned.
-
- Those are the six instructions that make up the Linda
- coordination language. Let's look at an example of a 'Hello World'
- program using Linda and Pascal.
-
-
- Hello World Example
-
- We will assume we are working on a system with 8 worker
- processors. The beginning of our program would be standard
- Pascal.
-
- program hello;
-
- const
- num_proc = 8;
-
- Next we have to write the procedure for the workers.
-
- procedure world;
- var
- count : integer;
-
- begin
-
- in ( 'count', count )
- inc ( count );
- out ( 'count', &count
- out ( 'hello', &count );
-
- end;
- Now the main procedure.
- begin
-
- out ( 'count', 0 );
- for i := 1 to num_proc do
- eval ( 'work', &world );
-
- for i := 1 to num_proc do
- begin
- in ( 'hello', proc );
- writeln ( 'Hello from processor', proc );
- end;
- in ( 'count', 8 );
-
- end;
-
-
- The program begins with the tuple called COUNT being put into
- tuple space with an integer actual of 0. This is followed by
- eight copies of the WORLD procedure; one for each worker. The
- main program goes into a loop requesting a HELLO tuple one at a
- time. Not that the first element is a formal, thus we are only
- matching on the name of the tuple HELLO. The formal element will
- have some value on each loop iteration. The numbers my not be in
- order. As each tuple is found in tuple space, a message is
- printed that says Hello from processor -. The code ends with an
- IN which cleans up the tuple space.
-
- The workers are instructed to IN the COUNT tuple, increment
- the number it finds in the first element position and put the
- tuple back in tuple space for the next processor. After it does
- this, it puts a tuple in tuple space called HELLO with the number
- it put into the COUNT tuple. Each worker will get a different
- number to report back to the main code.
-
- Now as we stated above, this code will receive eight tuple
- with the name HELLO is any order. If we changed the main programs
- loop slightly, we could guarantee to get the HELLO tuple in order
- from 1 to 8.
-
- for i := 1 to 8 do
- begin
- in ( 'hello', &i );
- writeln ( 'Hello from process - ', i );
- end;
-
- We have changed the first element from a formal to an actual.
- Thus instead of the master having to match the name, it must match
- the first element as well. Therefore, the it will return a HELLO
- tuple only when the first element is a 1.
-
- Parallel Lan System
-
-
-
- Now that we know how to program Linda and we have seen a
- parallel program, we need to begin doing some real work. So now
- we can sit down to our PC and begin programming the examples.
-
- Well not exactly. Our common PC is a single processor
- machine. We have no way of dividing up work and allowing
- different processors to work on the pieces. We have two options.
- We can purchase an expensive parallel machine for several
- thousands of dollars. Have it installed and teach ourselves the
- things necessary to program the machine. Or we can use the
- Parallel Lan System.
-
- The Parallel Lan System is a software package that allows IBM
- PC or compatible machines to operate as a parallel processor. The
- machines must be connected together by an ethernet local area
- network. A minimum system consists of three machines: a master,
- worker, and a developer.
-
- Using Turbo Pascal and the Parallel Lan System ( PLS ), a
- parallel program can be constructed for any parallel algorithm we
- so desire. In addition, we have a version of Linda called PCS-
- Linda that we will use to coordinate the activities of the
- different processes created our the parallel program.
-
- The PLS was created to co-exist with other network products on
- the market such as Novell Netware and DECNet. The system will not
- interfere with NCSA or PCSA or any of the TCP/IP programs. The
- Parallel Lan System can even be configured to run several parallel
- programs on the same network sharing the same processor of the
- system.
-
-
- Configuration
-
- The configuration of the Parallel Lan System is very important
- to the efficiency of the parallel system. The remaining sections
- will document the different components of the PLS. Several
- examples will be presented in the end of the guide.
-
-
- Requirements
-
- The Parallel Lan System operates by using an ethernet local
- area network ( LAN ). LANs are very popular among educational
- instructions and businesses. Various packages are available to
- run on networks including Novell Netware and MS Lan Manager.
-
- The Parallel Lan System communicates on an ethernet by way of
- a packet driver. Most ethernet card manufacturers have these
- drivers available at no cost. In addition, there are public
- domain packet drivers available for most cards.
-
-
- Packet Drivers
-
- Packet drivers are terminate and stay-resident ( TSR )
- programs which act as an interface between a developer and an
- ethernet card. Ethernet cards use a hardware interrupt of a PC.
- When this interrupt is activated, the packet driver code is
- activated to perform some function. Likewise, the Parallel Lan
- System software is able to activate a software interrupt which also
- activates the packet driver code for its own use. Software can be
- written that locates the packet driver installed in a machine thus
- allowing an PC and ethernet card to be used without changing the
- system software.
-
- Manufactures who provide packet drivers will also include
- information on how to install them. Packet drivers can be used
- with most network packages. An advantage of using packet driver
- is multiple applications can access the same ethernet card. The
- packet driver has the ability to give a packet from the LAN to one
- application or another based on a type field located in the
- packet.
-
- What we want to do now is install the packet drivers for the
- machines the system will run on. Follow the instructions given by
- the manufacturer. With that, we want to verify that everything to
- this point is operating correctly. On the disk labeled system
- software is a file called PKTINFO.EXE. When executed, this
- program will give us information about the ethernet card and
- packet driver installed on a particular machine. After the packet
- driver has been installed in a machine, place the system disk in
- driver A: and type
-
- A:PKTINFO
-
- If a page of information appears on the screen, the packet driver
- is working correctly. If a message appears saying no packet
- driver was found then the packet driver was not installed
- correctly.
-
- After the packet drivers have been verified we want to check
- the LAN itself. Located on the same disk is a program called
- TRANS.EXE. This is a simple LAN communication program that will
- send and receive packets from on PC to another. Pick two machines
- to execute this software on. One one of them, write down the
- ethernet address as shown by PKTINFO. Inset the system disk in
- drive A: and type
-
- A:TRANS
- Take the disk out and do the same for the other machine. On
- the PC that will receive the packets press SHIFT and R. We now
- want to select what receive mode to use. Press 2. The screen
- will blank and a status bar will appear at the top. On the sender
- machine press SHIFT and S. We need to select a receive mode as we
- did for the receive machine. The reason is because the receiving
- machine will send an acknowledgement packet back to the sender. So
- press 2. We are now asked if we want to use the broadcast
- feature. Press N. The program is asked for the address of the
- receive machine. Enter the six bytes written down separated by
- spaces and press return. Enter a message to be sent to the
- receiving machine and press enter.
-
- The screen will blank and ask you to press a key to send a
- packet or shift Q to quit. Press any key. If you look at the
- receiving machine, you message should be on the screen. On each of
- the screens, there should be a 1 in the upper right hand corner.
- This indicates that one packet has been sent and one packet has
- been received. You can continue to press any key to send
- additional packets. Press SHIFT and Q when ready to quit.
-
-
- Master
-
-
-
- The master of the Parallel Lan System is the most important
- machine. It can be likened to the server of a local area network.
- It must be powerful and able to handle a flood of activity by the
- worker nodes and the developer. As documented in the manual Using
- the Parallel Lan System, the master should consists of
-
- * At least a 25-MHZ 386 IBM PC or Compatible.
-
- * 4 MB of RAM
-
- * 16-bit 32k buffer Ethernet card
-
- Anything above these specification will allow for better
- efficiency in the system. Other than running the system software
- for the master, there is nothing that can be done to the master.
-
- Duties
-
- The duties of the master are as follows
-
- * Administer the MAIN tuple space of the PLS.
-
- * Receive and interpret Linda instructions from multiple
- processors.
-
- * Keep track of unanswered Linda instructions (IN,RD).
-
-
- Those are the only responsibilities of the master. However it
- is a larger responsibility than it may appear.
-
-
- Multiple Processor Communication
-
- When considering the concept of multiple processor
- communication, image this situation. You are sitting in the
- middle of a ring of sixteen people. All of these people are trying
- to hold a conversation with you AT THE SAME TIME. The human mind
- simply cannot handle that much information at the same time. Most
- people can't even keep track of a single conversation. That is
- one of the jobs of the master.
-
- Things begin with the ethernet card receiving a packet. This
- packet is put into some memory set aside for incoming packets.
- The master system software periodically checks to see if there are
- any packets in the first incoming buffer. If there are packets, it
- will take as many as there are and partial processes them and put
- them into a secondary buffer. This step is crucial. The first
- buffer is static. In other words, it is of a constant size. As
- much information about this buffer as possible was defined when
- the master program began executing. If it were to overflow, some
- packets would be lost ( the sender would send duplicates however.
- But at a lose of execution time ). After the packets are in the
- secondary buffer, the system software will processed them fully
- one at a time as it has time. Processed packets will end up in one
- of two places. Packets which represent the Linda instructions OUT
- and EVAL windup in the TUPLE SPACE. Packets representing IN, INP,
- RD, and RDP will end up in a request queue. The master system
- software picks from the request queue when it tries to match
- tuples.
-
- In addition to the above queues, the master also has internal
- data structures that it uses to guarantee that no duplicates
- packets are processed. We all know what its like to be told the
- same thing over and over. The master doesn't like it either
- therefore it eliminates duplicates. Also, the master must keep
- track of the order of packets from different processors. Just like
- we try to say a persons first name before their last name, the
- master must guarantee that a packet sent before another packet
- arrives first.
-
- For more information on the about technical data about the
- system software, refer to the document The Parallel Lan System -A
- Look Inside.
-
-
- Different Masters
-
- There is a difference in performance between different
- masters. We are going to take a look at how masters can influence
- the overall speed of a parallel program. The masters used are
-
- * a 4.77-MHZ 8086 IBM PC Compatible, and
-
- * a 25-MHZ 80386 IBM PC Compatible.
-
- Each of masters was used in a comprehensive set of 40 tests
- using the Linda mandelbrot program explain in a later chapter.
- The test documented here was performed using from 1 to 16 workers
- ( 4.77 MHZ 8086 PC's using 8087 math coprocessors ) each doing 4
- columns per computation. The following table shows the difference
- between the masters.
-
- Cpus 8086 Master 80386 Master Speedup
- 1 642.52 577.91 11%
- 2 358.89 274.89 24%
- 4 210.80 141.65 33%
- 8 188.72 80.36 58%
- 16 ----.-- 88.87 -----
-
- The chart shows the total seconds for each test using the 8086
- master and the 80836 master. The far right column shows the
- speedup between the two systems. There is a considerable speedup
- as we approach 8 workers. There was no test performed for 16
- workers using the 8086 master.
-
- Notice the increase in time between 8 and 16 workers. This
- example also illustrates the idea that adding more workers does
- not necessarily decrease execution time of a program. A faster
- master does indeed make a considerable impact on the speed of the
- execution of the program.
-
-
- Machine
-
- The faster the machine, the faster the tuple space can be
- searched for needed tuples. The most work the master will perform
- is interpreting tuples sent to it. The faster it can do this the
- better.
-
-
- DOS
-
- Microsoft recently released DOS 5.0. This DOS has many memory
- saving features that the system can benefit from. IF the master
- is equipped with 1 MB of memory or more, DOS can be loaded into
- high memory as well as device drivers and TSR's. All of this can
- save precious lower memory.
-
- The master software uses lower memory ( as well as extended )
- to hold the tuple space. A system using DOS 3.3 or 4.01 will have
- approximately 300k of heap space available for the tuple space.
- Dos 5.0 increases this amount to approximately 360k.
-
-
- Extended Memory
-
- If we have 1 MB available we can take advantage of the
- features of DOS 5.0 Any memory over 1 MB can be configured as
- extended memory by using an extended memory manager. The master
- software will take advantage of any extended memory available.
-
- Using extended memory for some of the tuple space is expensive
- as far as processing speed is considered. The software uses a
- system put into public domain to manage extended memory. Unlike
- conventional memory below 640k, extended memory cannot be accessed
- directly. A system of segments is created in extended memory.
- These segments are copied into conventional memory to be
- manipulated and then put back. The segments are 32k in size. This
- means there is a 64k memory copy performed for each and every
- packet that must be put into extended memory. Now simple memory
- management has been put into place which will keep the most
- recently used extended memory extent in conventional memory.
- However, to execute programs which has a large amount of data,
- extended memory is essential. At Parallel Computer Solutions, a
- 4 MB system has been used in all cases with no problem as far as
- space requirements.
-
- Network Cards
-
- Not all cards are created equal. We have 8-bit, 16-bit, and
- 32-bit cards available for PCs. To the best of my knowledge, 32-
- bit cards are available for EISA bus system only. Therefore, most
- PC's will use either the 8 or 16 bit cards. If we have a 80286 or
- better machine, we will want to use 16-bit cards. 16-bits
- ethernet cards have several advantages.
-
- The first is the addition of 8 data lines. More information
- can be transferred on 16 lines than 8 lines. A second reason is
- buffer size. Most 8-bit ethernet cards have an 8k buffer for
- incoming packets. For most applications this is fine but the
- master will be receiving packets from many different PCs at the
- same time. We want to have a large buffer available if the master
- is busy searching tuple space or some other system function. 16-
- bit ethernet cards have either 32k or 64k buffers for incoming
- packets.
-
- You usually get what you pay for when buying a computer
- system. The master is the most important component in the
- Parallel Lan System and therefore it should be the best system
- available.
-
-
-
- Master Installation
-
-
-
- Obtain a blank diskette to copy the master software from the
- system diskette. To make the system easy to install, create a
- bootable diskette with the appropriate memory managers, etc for
- you system. Copy the packet driver for the ethernet card onto the
- new diskette. Copy the file MASTER.EXE to the net diskette.
- Label this disk as bootable and containing the master system
- software.
-
- Execute the master software on the appropriate machine by
- typing
-
- A:MASTER
-
- A message will appear telling how much extended memory is
- available and used by the master system. Press return to get to
- the next screen. The following should appear
- ( master screen - initial picture )
-
- This is the initial screen for the master. The master sits in
- a loop waiting for a tuple to be sent to it. Under normal
- circumstances, the master does not need any attention. The
- operation of the master can be disabled by pressing any key. As
- activity starts on the system, the screen will change, this screen
-
- ( master screen - 2 cpu picture )
-
- shows that 2 cpus are active on the system and the packets sent
- and received from those cpus. In addition, the sizes of the heap
- and extended memory is given.
-
- Worker
-
-
- The workers are an important part of the Parallel Lan System.
- The workers do one thing: perform calculations. The least
- powerful machine available for a worker is:
-
- * 4.77 Mhz 8088 IBM PC or Compatible
-
- * DOS 3.3
-
- * 640K RAM
-
- * 8-bit ethernet card
-
- Just as in the case of the master, the performance of the
- Parallel Lan System can be determined by the power of the worker
- machines. Testing of the Parallel Lan System was performed on
- both 4.77 Mhz 8088 workers and 20 Mhz 80386 workers. Both systems
- provided speedup simply because of the parallel execution of the
- test program. But the 80386 workers gave additional speed
- advantages by as much as 60/80%. The following four charts show
- the times required to perform mandelbrot using 8088 workers and
- 80386 workers. The master in the first two charts was an 8088
- machine. In the second two charts, the master was an 80386
- machine.
-
- ( charts )
-
- In an established LAN we do not have much choice in what type
- machine is used as a worker. The charts should show you that
- processor power certainly makes a difference in both the master
- and workers.
-
-
-
- Worker Installation
-
- We must now install software on each of the worker machine.
- Follow the previous directions for setting up a master diskette
- but instead of copying master.exe we need to copy a worker
- program.
-
- Turbo Pascal is available in two different versions: 5.5 and
- 6.0. They are different in code generation. Therefore, we have
- two different worker pieces of software called worker5.exe and
- worker6.exe. If you are using Turbo Pascal 5.5 then copy
- worker5.exe to a:worker.exe and if you are using Turbo Pascal 6.0
- then copy worker6.exe to a:worker.exe.
-
- Insert the diskette into driver A: and type
-
- A:worker
-
- The screen will blank and you will be asked the address of the
- master. If you did not write the address of the master down, you
- can look on the screen of the machine it is executing on. In the
- upper left hand corner of the screen is the ethernet address of
- this machine. On each of the worker systems, enter the six bytes
- separated by spaces. Once the system has digested the master
- address, it will attempt to establish communications with the
- master.
-
- Recall that the purpose of the worker is to execute code given
- to it by the developer system. The worker will get work from the
- master by sending a tuple of the form
-
- in ( 'work' );
-
- to the master. All packets sent over the ethernet are given a
- specific number in order to keep a sequence. the system software
- gives the number 2 to the first packet sent out by any system. We
- can verify that a worker is communicating with the master by the
- messages put on the master and worker screens. A worker has
- communicated successfully if on the screen of the master is a
- message IN - 2 for each of the workers.
-
- Since a packet has been sent to the master, the master has to
- acknowledge the packet, therefore each of the workers should have
- a message in the right most box with the number 2 next to it. If
- this is the case, then this worker has been successful added to the
- system. If this is not the case, then either the master's address
- was not correctly entered or the address in not that of the
- master. The worker software halts when a key is pressed. So if no
- message appears on the screen , press a key on the worker machine
- and try again by executing the worker software again.
-
-
- Developer
-
-
-
- The developer is a machine on the system which runs and
- coordinates the executing parallel program. It will begin a
- program by submitting any number of worker processes. The
- responsibilities of the developer are
-
- * Submit jobs for worker processors
-
- * Coordinate the submitted jobs
-
- * Produce results
-
- The developer is the foreman for the Parallel Lan System. A
- programmer will code a parallel program on the developer, compile
- it, and run it without moving to different machines. One way to
- look at it is you are programming a single PC which just happens to
- have any number of subprocessors available to help speed up a
- particular program.
-
- Again it needs to be pointer out that if you are using Turbo
- Pascal 5.5, the workers should be executing worker5.exe and if you
- are using Turbo Pascal 6.0, the workers should be executing
- worker6.exe.
-
- The Parallel Lan System developer software is setup to
- initially recognize files with an extension of .PLS. Turbo Pascal
- and our system files can be set up in such a way as to simplify
- use of the system. In the root directory of you hard drive create
- a directory called PLS using the command
-
- mkdir pls
-
- Move into that directory with
-
- cd pls
-
- Insert the system software diskette into drive A: and copy the
- following files into our new directory
-
- work*.tpu
-
- both*.tpu
-
- *.pls
-
- linda.exe
-
- *.doc
-
- Now back out of this directory with
-
- cd ..
-
- Using an editor change your autoexec.bat file to include the
- turbo directory in the system path. If no PATH directive appears
- in you autoexec.bat file, enter the following line
-
- path c:\tp - or whatever the turbo pascal directory name is
-
- After you have changed the file type
-
- autoexec
-
- This is make the change effective. Now change into your PLS
- directory and type
-
- turbo linear.pls
-
- Turbo's main screen will appear. We need to make sure the compiler
- settings are set correctly before we do anything else. Press ALT
- and O, move to the COMPILER and press return. Move the bar to
- FORCE FAR CALLS and change the entry to YES. Check that the entry
- BOOLEAN EXPRESSIONS is set to SHORT-CIRCUIT. Move to the first
- menu and select the SAVE OPTIONS entry and press return. Save the
- new options.
-
- Now on the screen will be the parallel program for solving
- linear equations. Try to compile this program by pressing F9.
- You should get an error on the first Linda command IN. What we
- need to do is execute the conversion program LINDA.EXE. Linda.exe
- is explained in the document, Linda Conversion Program User Guide.
- To do this in a convenient manner, press ALT and F. Move to the
- entry EXIT TO DOS and press return. This will exit us to DOS but
- keep Turbo Pascal in memory. Run the conversion program by typing
-
- linda linear
-
- The linda program will create a file called Linear.pas that is
- made up of Turbo Pascal statements only. Type exit and press
- return. This will return us back to Turbo Pascal. We are still
- working on the file Linear.PLS. Press ALT and F and move the bar
- to PICK and press return. Pick a new file and enter linear.pas.
- This will bring up the file linear.pas. Press F9 to compile the
- program, it will compile successfully. If there had been an error,
- we would have wanted to press ALT and F, move the bar to PICK and
- select linear.pls to make any changes, run linda again, and select
- linear.pas. Although this may seem a hassle, it is the only way to
- incorporate a preprocessor in the Turbo environment. Future
- releases will not require this.
-
-
- Compiler Directives
-
- During the development of the system, several things were
- learned about how Turbo Pascal ( TP ) generates code. TP allows
- for several different types of float point variable. Reals,
- double, extended and several other are the chooses that we have.
-
- In most applications, real variables are good enough for our
- calculations. TP will generate code for reals automatically. In
- order to generate code to handle double or extended variables, the
- compiler directives {$N+, E+} are necessary.
-
- So what the problem. The problem comes about when there are
- constants in your code. Such as
-
- a := 1.0;
-
- The 1.0 is a constant as far as TP is concerned. When TP generates
- code for normal reals, it appears that the constant value is
- stored in the code itself. When double or extended reals are used,
- TP does not put the constant into the code itself. The constant
- value is stored at the top of the procedure that the constant
- appears in. The code
-
- procedure test;
- var a : extended;
- begin
- a := 1.0;
- end;
-
- TP generates the code.
-
- 0000803F - 1.0
- 55 - push bp
- 89E5 - mov bp,sp
- B80A00 - mov ax, 000A
- 9A7C02AF5C - call 5CAF:027C
- 83EC0A - sub sp,000A
- CD3C99060000 - fld cs:dword ptr[0000]
-
-
- The mnemonic FLD loads the real constant into the 80x87 or
- emulator. The constant is located at cs:dword ptr[0000] which
- when disassembled, is the first line of this code segment. The
- main code has a call statement CALL 0004 which calls this
- procedure code thus bypassing the real value.
-
- Now what is means to us is, we should always include the
- compiler directive {$N+,E+} in our program. It is not always
- needed obviously, but it is safe to include it. If you write an
- application that uses reals, there may be an error.
-
- The system code has been tested on machines with and without
- numerical coprocessors. One test included some workers with
- coprocessors and some without. The above compiler directive worked
- for both setups.
-
-
- Units and System Code
-
- There are a considerable number of routines that are needed to
- perform the operations of the Parallel Lan System. By
- incorporating these routines into a single Turbo Pascal Unit, we
- are able to successfully hide them from the developer.
-
- Turbo Pascal requires units to be included in a program by
- using the command USES. Programs intended to execute on the PLS
- should start with the code:
-
- Program ..........
-
- {$N+,E+}
-
- Uses work, both;
-
- There are two units for the PLS; work and both and should be
- in the directory where the program being written is. By USEing
- this unit, we have access to procedures and variables that we will
- use directly in the writing of our parallel programs. Now in the
- working directory, there are two different version of the work and
- both units. Work5 and both5 should be used when using Turbo Pascal
- version 5.x and work6 and both6 should be used when using Turbo
- Pascal 6.0.
-
-
- Startup Procedure
-
- We have developed a parallel environment which operates in
- coordination with Turbo Pascal. Because of this, there are
- several problems that must be dealt with directly. To solve
- several problems, a parallel program must include, what we call,
- a startup procedure. The purpose of this procedure is two-fold.
- First is to initialize the system. The second procedure of the
- startup procedure is to border procedure which are destined to be
- sent to worker processes.
-
- Turbo Pascal creates, as far as I can tell, procedures by
- first coding all real constants and placing them into the code.
- So if a statement in the language like
-
- a := 1.0;
-
- Turbo Pascal will encode the 1.0 into the code such as
-
- 0000 3F55 ( or something like this )
-
- After the constants, code is generated to set up the stack for
- any formal or local variables.
-
- push bp
- mov bp,sp
- mov ax, ---- ( the number of bytes required for
- variables ) call ----
-
- There one or two calls to Turbo Pascal procedures. It is my
- guess that the stack is checked for overflow as well as other
- housekeeping activities associated with the stack. The code to
- perform the function so the procedure are generated next. After
- this code, the stack is returned to its original state. The last
- code generated is a return instruction. The 80x86 family of
- microprocessors have several different return opcodes. These
- return opcodes are for near, far, and interrupt routines. Our
- system requires all procedure and function calls to be generated
- as far. The reason for this is so Turbo Pascal will always
- generate a RETF instruction which us $CB hex. By always generating
- this particular instruction, the system software can make a fairly
- good guess at where a procedure ends and another begins. This, if
- we wanted to generate code for the procedures.
-
- procedure startup
-
- begin
- end;
-
- procedure tosend;
-
- begin
- end;
-
- The code generated would look something like
-
-
- ( procedure startup )
-
- $55 push bp
- mov sp,bp
- mov ax,----
- call
-
- .
- .
- .
-
- $CB retf
-
- ( procedure tosend )
- $55 push bp
- mov sp,bp
- mov ax,----
- call
-
- .
- .
- .
-
- $CB retf
-
- If our code was going to send the procedure tosend to a worker
- node, we would grab the starting location of the procedure, say
- 0030 hex, by using the @ operator of Turbo Pascal. The system code
- backtracks until it finds a $CB value. What we are doing is
- looking for any real constants. The memory from the start of a
- procedure back to the next procedure's end $CB is copied to the end
- of the code we are sending to the workers.
-
- Now why the startup procedure. If we did not have a procedure
- coded before the procedure we are sending to the nodes, the system
- could search for a long time before another $CB is encountered. By
- incorporating the startup procedure, we are guarantee to find a
- $CB. The turbo debugger invaluable when trying to determine
- exactly what Turbo Pascal does during code generation.
-
- The startup procedure should be included in the developer code
- right after the compiler directives.
-
- Program ...............
-
- {$N+,E+}
-
- Procedure startup;
-
- begin
-
- end;
-
- Now that we have the basics of the startup procedure, we need
- to put some code into it. The first two lines of the procedure
- are
-
-
- exitsave := exitproc;
- exitproc := @myexit;
-
- These lines of code link our exit procedure into the system exit
- procedure which is part of any Turbo Pascal program. In the event
- of a run-time error, our exit routine will do some internal memory
- deallocation and other system functions that allow the system to
- fail gracefully.
- The next size lines of code identify which machine is the
- master. Ethernet addresses are codes as six bytes which as 00 00
- C0 05 86 24 hex. Find out the address of the ethernet card the
- master system software will execute on using the pktinfo program
- included with the system. The lines of code necessary to identify
- the master to the developer program looks like
-
- master[x] := $yy;
-
- There should be six lines identical to the above with x ranging
- from 1 to 6. The $ operator identifies the following two
- characters are hexidecimal.
-
- The last line of code in the startup procedure is
-
- init_system;
-
- The system initialization code has been put into this single call
- in order to keep the startup procedure as simple as possible.
-
-
- Procedure Declarations
-
- The power of the Parallel Lan System lies in the ability to
- send procedures to worker nodes. The system is not limited to
- just sending of a single procedure. Any number of procedures can
- be sent to workers. There are several rules that have to be
- followed when designing procedure which will be sent to workers.
-
-
- Global Variables
-
- When Turbo Pascal compiles a program, space is set aside in a
- separate data segment in the PC's memory for variables. The
- memory location of a specific variable is recorded in the compiled
- code. If a global variable is reference in a procedure sent to a
- worker, the value obtained when this variable is used will not be
- the value that we want. The contents of the memory location
- referenced will be undetermined because the worker was not aware
- that of that particular global variable.
-
- Worse yet is the assignment of a global variable. The value
- assigned to the variable will be placed in the workers memory
- somewhere. It is possible that the assignment will cause the
- worker to fail.
-
- Global variables should be passed by the developer to worker
- machine through the Linda instructions and the tuple space.
-
-
- Readln/Writeln
-
- Procedures sent to workers cannot have readln or writeln
- statements in them. The reason for this is Turbo Pascal treats
- the screen and keyboard as files. If used, a FILE NOT OPENED error
- will appear and the worker will fail.
-
-
- Formal Parameters
-
- There can not be formal parameters in the procedures sent to
- workers. Because we are using a version of the Linda coordination
- language, there is no need for formal parameters. The function of
- formal parameters are performed by Linda instructions.
-
-
-
- Nesting Procedures
-
- Turbo Pascal does not follow the source code exactly when
- generating code. One such case is nesting procedures. Logic would
- dictate that the code generated would model that of the source,
- where by the procedure code generated by the compiler would
- include procedure nesting. Turbo Pascal does not do this, nor
- should it. Nested procedures are coded as single procedures and a
- call in made to them as needed. This however causes us some
- problems. When our system generates tuples to send to workers, it
- is performed at runtime, therefore we do not have access to the
- symbol table. We cannot determine which procedure to send and
- which not to. The end result is we are unable to handle nested
- procedures with this release of the system.
-
-
- Multiple Procedures
-
- There are programs which are prone to decomposition requiring
- multiple procedures. The system is setup such that procedure sent
- to workers can coexist with normal procedures. The developer can
- use normal procedures during the execution of a program. When a
- program has more than just a single procedure to be sent to the
- worker nodes, order is not important. The system will locate the
- appropriate procedure when asked for.
-
- Say we have a program which requires two different procedure
- to be sent to workers. Can the developer program or programmer
- determine which worker will get which procedure? No, we can not
- determine in what order the workers will get the procedures. If it
- is important, a system of semaphores could be used.
-
-
- CAUTION!
-
- There is one important note that must be made about the code
- that can appear in a worker procedure. Turbo Pascal performs smart
- linking when ever a program is compiled. In other words, if you
- use the sqrt library function, it is extracted from the system unit
- and placed in your code. The entire system unit is not compiled
- with your code. Therefore, STATEMENTS FROM THE SYSTEM UNIT CANNOT
- BE USED IN YOUR WORKER PROCEDURES! The reason for this is the
- worker software cannot effectively be compiled with all of the
- functions included in the system unit. Once again, this is a
- limitation brought about from building this system on top of the
- conventions of a pre-existing compiler.
-
-
- Developer Program Design
-
- Parallel programming is no easy nor is it straightforward. In
- this section, I would like to give a few points on understanding
- the task at hand. This is by no means a tutorial on parallel
- programming.
-
- All parallel system are different. Then again so are all
- programming languages. Bit as computer scientists, we are
- required to be able to learn new programming languages easily.
- Learning to program the Parallel Lan System can be an
- uncomplicated task if the problem we want to solve is correctly
- decomposed.
-
- The Parallel Lan System is designed as a distributed structure
- machine. The information or data structures we develop for our
- problem are kept on a single machine, the master. In order to
- access this information, a request is sent to the master by way of
- a tuple. There can be potentially a large amount of information
- passed among the different systems. The designer of a parallel
- program on the PLS must keep this i mind. Take for instance on
- the of the example programs described in the main documentation.
- Mandelbrot is typically designed as a sequential program which
- executes on a single pixel at a time. When writing Mandelbrot for
- a parallel machine, several options are available. Do we want each
- processor to execute the algorithm on a single pixel before
- returning result, a column, a row , or multiple of each. The main
- documentation describe the result . But suffice it to say,
- executing on a single pixel at a time will achieve parallelism
- because some number of processors will be calculating a pixel value
- at any given moment. Bit is not optimal. There is too much
- communication between processors. We can achieve better results
- with a particular number of processors by increasing to a column or
- more.
-
- This brings up another point, there are times when adding
- processors to the system will result in a degradation of the
- system as a whole.
-
-
- Tuning
- There will almost always be some parameters in a parallel
- program that will judge the ultimate speed of the program on the
- system. It must be kept in mind that the key to parallel
- programming is to get an initial program up and running. After
- the program is running, test it and compare to other platforms.
-
- A sequence program executing on a single processor
-
- A one - worker Parallel Lan System
-
- A two - worker Parallel Lan System
-
- A four - worker Parallel Lan System
-
- An eight - worker Parallel Lan System
-
- Graph the results ( using execution time ) on a chart. This
- will help illustrate the speed up achieved from going to parallel
- execution. Do not be surprised if several of the tests are slower
- than a sequential program. At some point, adding more processor
- should achieve results faster than the sequential program. A rule
- of thumb can be used in general. If after executing the algorithm
- on a four processor system the results are not achieved faster
- than a sequential program, something is wrong with the design of
- the program. But before submitting to a redesign, check the
- establish or try to establish tunable parameters. There are
- always some variables or aspects of the algorithm used in the
- program which can be tuned to achieve faster results.
-
- The pixels was a tunable parameter in the Mandelbrot program.
- In a fiance program it could be the number of variables used in a
- regression or the number of data points examined. In the linear
- algebra program shown in the main documentation, its the number of
- columns calculated by each processor.
-
-
- Debugging
-
- A parallel program can sometimes be more of a challenge than
- writing the parallel program itself. As of this release there are
- no debugging facilities, but the source code is provided so you can
- add any support you feel necessary. My only remark on this subject
- is to run your program on a one worker system and count out pieces
- of the code until you have identified where the problem occurs.
- Then run on a two worker system and do the same thing. Just
- remember the you have many different tasks executing at the same
- time. A key to debugging a parallel program is to understand the
- relationship between these different tasks.
-
-
- Main
-
- In the main section of the developer program, the first
- statement should be a call to start_up. Your code for the actual
- program goes next. After the code, the statement close_system ends
- the main section.
-
- begin
-
- start_up;
-
- { user code }
-
- close_system;
-
- end.
- Skeleton Code
-
- The following is a skeleton of the necessary code for a
- developer program. It should reproduced and placed in a .PLS file
- for easy access. The easiest way for keeping the code separate is
- to name this code skel.pls and perform a copy when developing a
- new application.
-
- program ------;
-
- {$N+,E+}
-
- uses both, work; { remember: append 5 or 6 to the end of work and
- both depending on the version of tp }
-
- procedure start_up;
-
- begin
-
- exitsave := exitproc;
- exitproc := @myexit;
-
- master[1] := $--;
- master[2] := $--;
- master[3] := $--;
- master[4] := $--;
- master[5] := $--;
- master[6] := $--;
-
- init_system;
-
- end;
-
- { user procedures }
-
- begin
-
- start_up;
- { user code }
-
- close_system;
-
- end.
-
-
- What the Developer Doesn't Show
-
- The developer is an independent program. You will not see the
- same displays as the master and worker have. Therefore, there
- should be some activity on the developers screen. The programmer
- will have to provide this activity. The example programs are all
- graphical in nature therefore it is quite easy to see that
- everything is going normally. The programmer will want some kind
- of indication from the developer as to the state of the system as
- far as the developer is concerned.
-
-
- RELAX!
-
- The most important thing to remember when developing parallel
- programs is to relax. Parallel programming takes time and
- patience. Parallel programs have to be planned, in my opinion,
- much more than sequential programs. The results for this is that
- in a sequential program, one instruction is going to follow
- another. Even in the case of decisions and loops. There is no
- question about it. It can be traced and reproduced many times. (
- Unless there is a subtle bug in the program that occurs only every
- three thousand iterations ). In a parallel program, there is
- absolutely no guarantee as to the order that instructions will be
- executed by the different machine. Even if the workers are
- executing the same piece of code.
-
- I strongly recommend coding the programs presented later and
- running them. Get comfortable with the system and then
- experiment. You can always reset the machines and start the
- system again if something really messes up.
-
- PCS-Linda
-
-
-
- In the above section on LINDA, we touched on the different
- instructions and either use in coordinating parallel programs.
- Now we want to look at PCS-Linda. PCS-Linda is the dialetic of
- Linda that the Parallel Lan System is programmed with; along with
- Turbo Pascal. In order to get Linda is execute using a pre-
- existing compiler, several requirements had to be made as far as
- using Linda was concerned. This section will discuss those
- requirements.
-
-
- Linda Conversion Program
-
- Turbo Pascal will not recognize PCS-Linda. Try it once. You
- will get an error message on any of the six instructions. The
- Linda Conversion Program ( LCP ) is a preprocessor for the PLS and
- PCS-Linda. After you have coded a parallel program using Turbo
- Pascal and PCSL, you must give the program a name ending with the
- extension '.PLS'. Once this file is created, you can run the LCP
- to convert your program. LCP will perform some magic and produce
- a file with the same name ending with '.PAS'. This file can now
- be successfully compiled with Turbo Pascal.
-
- The LCP will produce code that Turbo Pascal can recognize for
- each of the linda commands. If there is an error in the Turbo
- Pascal code, verify that all variable have been declared and
- things like this. Always remember to make changes to the '.PLS'
- file and not the '.PAS' file. For more information about LCP,
- refer to the document Linda Conversion Program.
-
-
- Tuple Elements
-
- Tuples in PCS-Linda can consist of zero to six elements. The
- elements can be one of four different types
-
- * Actuals
-
- * Formals
-
- * Data
-
- * Null
-
- Actuals and formal elements will be discussed next. The last
- two types, the programmer and user have no control over. When an
- EVAL instruction is executed, a procedure is sent to the master or
- worker. This procedure is considered to be a data element. In
- reality it is an actual but there are special needs that must be
- met when searching and transferring procedure to and from the
- different machines. The data type allows for cleaner code.
- The null type is used in every tuple where there is less than six
- elements. Elements not used in a tuple are given the type null.
- The null type uses no memory therefore it is a convenient form of
- termination for these empty elements.
-
-
- Actuals - Integer/Real
-
- Using a pre-existing compiler caused several different things
- to happen in the development of the Parallel Lan System and its
- associated Linda language. This can be seen when using actuals
- and formals in the tuples.
-
-
- Integers
-
- Turbo Pascal has several different types of integers; integer,
- long, word, byte. Each of these can be used in a tuple but there
- is a rule to their use. If an integer is to be sent in a tuple, it
- can be represented in two ways. The first is to put the integer
- into the tuple
-
- ('info', 1, 5, 8)
-
- The 1, 5 and 8 will be interpreted as integer and stored
- accordingly. The integer used in the two-complement two-byte
- integer. Therefore the tuple
-
- ('info', 1234567)
-
- would case a compile error when the final code was compiled. This
- number is too large for an integer stored in two bytes. The same
- is true if an single byte was to be sent. It would be stored in
- two-bytes instead of one. In order to sent the above number, we
- could use the tuple
-
-
- bigint : long;
-
- bigint := 1234567;
-
- ( 'info', &bigint );
-
- This tuple would effectively send the large number to the
- master. We would do the same for a byte
-
- smallint : byte;
-
- smallint := $32;
-
- ( 'info', &smallint );
-
- Remember that the tuple sent to match either of the above
- tuples would also have to have the receiving integer declared as
- either a long or a byte.
-
-
- REALS
-
- The same situation as above occurs with reals as well. When
- a real is put into a tuple
-
- ( 'info', 1.234, 1.4E+3 )
-
- The real is evaluated as a 6-byte real number. This is an
- acceptable real number for most applications but in the case of
- more complex mathematics, a large real may be necessary. Turbo
- Pascal support reals as either real, single, double, extended, and
- comp. Each of these tpus can be used just as we did for integers.
- To send the real number 1.234e+1023 we would use the tuple
-
- bigreal : extended;
-
- bigreal := 1.234e+1023;
-
- ( 'info', &bigreal );
-
- Again, the variable to receive this real number would have to
- declared as an extended as well.
-
-
- Formals
-
- Formals must match the types they are to receive. Because we
- are programming both the code the worker will receive and the code
- the developer will execute, we can easily match the types.
- However, it may happen that in a case where we were using reals and
- the accuracy was not enough so we change to extended reals, we
- missed a declaration. The system will not match the tuple because
- a standard real is 6 bytes and an extended is 10 bytes in length.
-
-
-
- INP & RDP
-
- The pre-existing compiler strikes again. The INP and RDP
- instruction treated as function normally. They will returned
- either true or false if a tuple was matched by the master.
- However, we were unable to get the functionality of these two
- instructions exactly. In PCS-Linda the RDp and INP instructions
- must be assigned to a boolean variable such as
-
- ok := RDP ( 'info', bigreal );
-
- If a tuple was matched, bigreal will contain the real value
- sent with the tuple and ok will be set to true. If a null tuple
- was sent by the master, ok will be set to false, and bigreal will
- not be changed. If we were to use the RDP ( INP can be
- substituted wherever RDP is used ) to control a loop we would do
- the following
-
- ok := rdp ( 'info', bigreal );
- counter := 1;
- while not ok do
- begin
- inc ( counter );
- ok := rdp ( 'info', bigreal );
- end;
-
-
- This loop would count the number of times the RDP instruction
- was used before a successful tuple match was found. In Original
- Linda from Scientific Computing Associates, Inc. this loop would
- appear as
-
-
- counter := 1;
- while not rdp ( 'info', bigreal ) do;
- inc ( counter );
-
-
- This is not a big change but it is one that should be kept in
- mind when using the RDP and INP instructions.
-
-
- EVAL
-
- The EVAL instructions is used to send a procedure to be
- executed by a worker. PCS-Linda only allows one element in the
- tuple used for the EVAL instruction
-
- * The first element must be &procedure name
-
- The name of the tuple must be 'work'. The worker system
- software has been precompiled to look for a tuple of this name
- with the above elements. Because duplicate elements are allowed
- in tuple space, the name 'work' will not cause a problem when
- sending more than a single procedure to be executed by workers.
- Thus the tuples
-
- eval ( 'work', &compute );
-
- eval ( 'work', &result );
-
- will not conflict with each other.
-
-
- Tuple Size
-
- The tuple size of the present Parallel Lan System is 16000
- bytes. This will allow a large amount of data to be sent between
- machine per instruction. If this is not acceptable for your
- situation, you receive garbage when trying to execute your parallel
- program, ( the workers will probably get confused as your code
- executes ), call us and we can increase your systems tuple size.
-
-
-
- What Does OUT (); Produce
-
- Lastly, we would like to show what the instruction
-
-
- var
-
- start_col : integer;
- results : array[1..200] of integer;
-
- out ( 'col', &start_col, &results );
-
-
- produces when put through the Linda Conversion Program.
-
-
- begin
- make_tuple ( 3, 'c', 'o', 'l', ' ',' ',' ',' ',' ',' ',' ',' ','
- ',' ',' ',' ',' ', @start_col, sizeof (start_col), yes,
- @results, sizeof (results), yes,
- nil,0,null,
- nil,0,null,
- nil,0,null,
- nil,0,null
- );
- send_tuple ( out );
- end;
-
- All of the above is required for a single PCS-Linda
- instruction. It is beyond the scope of this guide to detail what
- is being done here. For more information about the technical
- nature of the Parallel Lan System consult the paper - Parallel Lan
- System - A Course In Overcoming.
-
-
-
- An Example
-
-
-
- Let's start using the things we have learned by formulating an
- example exercise. We want to have multiple processors find the
- sum of all the integers between 1 and 20. We want to write the
- program in PCS-Linda and Turbo Pascal. Because of the nature of
- the PLS, if your system only has a single worker, the code will
- still execute correctly.
-
- We want our developer to put the numbers 1 through 20 into
- tuple space and wait for a sum to be computed. the first thing we
- should consider is what our workers are going to do.
-
- Our workers are suppose to take a number and add it to a sum.
- The workers are going to get this number and the sum from tuple
- space. So let's get these two things
-
- in ( 'num', a );
- in ( 'sum', sum );
-
- Both a and sum will be declared as integers for this problem.
- Once the worker has these two things, it will add the number in a
- to the running sum.
-
- sum := sum + a;
-
- After which, the worker has to put the new sum back in tuple
- space.
-
- out ( 'sum', &sum );
-
- The worker is finished. Several questions should come to
- mind. First of all, why did we IN both a and sum, why not use RD
- instead.
-
- The first thing that we want to happen is a worker to get the
- sum, which should be zero since we are starting the addition
- sequence. The worker should then get one of the numbers to add to
- the current sum. The first worker will request the current 'sum'
- tuple and a 'num' tuple. The worker will add the number to the
- sum and get a value of 1 for sum. This new sum is placed back in
- tuple space for the next worker. If we did not remove the 'sum'
- and 'num' tuple, we would have a tuple space that was filled with
- multiple 'sum' tuples and duplicate 'num' tuples. When the next
- worker requests a 'sum' and a 'num' tuple, it may get the 'sum'
- tuple that has a value of 0 and not the one with the value of 1.
-
- The duty of the developer program was to put the work
- procedure and necessary values in tuple space. The code will look
- like
- out ( 'sum', 0 );
- for i := 1 to 20 do
- begin
- eval ( 'work', &worker );
- out ( 'num', &i );
- end;
-
- The developer will end by INing the ending sum and printing
- the result.
-
- in ( 'sum', &sum );
-
- The program looks like this:
-
- program add;
- {$N+, E+}
- uses work, both;
-
- var i, sum : integer;
-
- procedure startup;
- begin
- exitsave := exitproc;
- exitproc := @myexit;
-
- master[1] := $--;
- master[2] := $--;
- master[3] := $--;
- master[4] := $--;
- master[5] := $--;
- master[6] := $--;
-
- init_system;
- end;
-
- procedure worker;
- var a, sum : integer;
- begin
- in ( 'num' , a );
- in ( 'sum', sum );
-
- sum := sum + num;
-
- out ( 'sum', &sum );
- end;
-
- begin
- startup;
-
- out ( 'sum', 0 );
-
- for i := 1 to 20 do
- begin
- eval ( 'work', &worker );
- out ( 'num', &i );
- end;
-
- in ( 'sum', sum );
- writeln ( sum );
-
- close_system;
- end;
-
-
- Analysis
-
- Now look at this code carefully. As soon as the worker
- receive their procedures to execute, they try to IN two tuples
- named 'num' and 'sum'. But what does the developer do, it
- immediately tries to IN the 'sum' tuple. Now we cannot predict who
- will get the 'sum' tuple first but the developer is in line to get
- it. We want the developer to IN the 'sum' tuple after all tuples
- have added the values to sum not while they are currently adding
- the values.
-
- We must put something into the code which will delay the
- developer from getting the 'sum' tuple until after all workers
- have added all values to sum. We could put a delay, time wise,
- into the developer code. Delays are dangerous though. Too many
- parameters go into the circumstances surrounding delays. We need
- a tuple that can be increment when a value is added to sum; like
- a loop control variable. We can do this fairly easily in PCS-
- Linda. Let's define a tuple called 'count'
-
- ( 'count', count );
-
- Before any values are added to sum, count should be set to
- zero and placed into the tuple space.
-
- out ( 'count', 0 );
-
- The worker code must include instructions to read the 'count'
- tuple and increment it when a value is added to sum.
-
- in ('count', count );
- inc ( count );
- out ( 'count, &count );
-
- The most important part is up to the developer. We do not
- want to read the final 'sum' tuple until after all values have
- been added to sum. We have two instructions that can be used;
- either IN or RD. If we use RD it will leave the 'count' tuple in
- the tuple space. So let's use IN. But what do we want to IN. We
- want to read the final count of 20.
- in ( 'count', 20 );
-
- This IN instructions will send a tuple to be matched to the
- master. Nothing will happen in the developer until a match is
- made with the tuple 'count' having a value of 20.
-
-
- Final Code
-
- program add;
- {$N+, E+}
- uses work, both;
-
- var i, sum, count : integer;
-
- procedure startup;
- begin
- exitsave := exitproc;
- exitproc := @myexit;
-
- master[1] := $--;
- master[2] := $--;
- master[3] := $--;
- master[4] := $--;
- master[5] := $--;
- master[6] := $--;
-
- init_system;
- end;
-
- procedure worker;
- var a, sum : integer;
- begin
- in ( 'num' , a );
- in ( 'sum', sum );
-
- sum := sum + num;
-
- out ( 'sum', &sum );
-
- in ( 'count', count );
- inc ( count );
- out ( 'count' , &count );
-
- end;
-
- begin
- startup;
-
- out ( 'sum', 0 );
- out ( 'count', 0 );
-
- for i := 1 to 20 do
- begin
- eval ( 'work', &worker );
- out ( 'num', &i );
- end;
-
- in ( 'count', 20 );
-
- in ( 'sum', sum );
- writeln ( sum );
-
- close_system;
- end;
-
-
-
-
- Mandelbrot
-
-
-
- Mathematicians are able to generate wonderful things with
- numbers. One particular thing created from numbers are pictures.
- The mandelbrot complex number set is a set of numbers that when
- put through a particular set of equations generates a picture.
- This manual is not the place for a detailed description of the
- mandelbrot calculations. Books on fractals will typically have a
- detailed look at mandelbrot numbers. We will describe as much as
- needed for the calculations and the parallel program we will
- write.
-
-
- Graphics Screen
-
- In order to plot a picture on a graphics screen, we have to
- create a relationship between each of the pixels on the screen and
- a particular complex number. For the mandelbrot program we
- present, we are going to concentrate on a 320 x 200 portion of the
- VGA graphics screen available on IBM PCs or compatible.
-
- Since we are going to be using complex number is the
- calculations, we need to define what a complex number is. A
- complex number has both a real and an imaginary part for each
- number. There will always be two number for every complex number.
- Turbo Pascal does not have a complex number type. Using the TYPE
- feature of Pascal, we can create a complex number type
-
- type
- complex = record
- realp,
- imag : extended;
- end;
-
- The nature of the calculations suggests that we use extended real
- units in order to achieve the best possible accuracy.
-
- For the mandelbrot calculations, a corner value is selected which
- acts as a reference point for the calculations
-
- bcorner : complex;
-
- This corner represented the upper left hand corner. For the
- standard mandelbrot picture, bcorner is given the value
-
- bcorner.realp := -2.0;
- bcorner.imag := -1.25;
-
- In addition to the corner, we must know the lengths of the sides of
- the picture both width and height. The value typically used for
- both the width and height is 2.50. Notice that all of these
- values can be changed to get different pictures of the mandelbrot
- set.
-
- The last values needed for the calculations are called gap values.
- The calculations are such that the precision of the numbers can be
- changed for different screen resolutions. The calculations use
- two different gap values, one for the width of the graphics screen
- and one for the height.
-
- gap1 : extended; {width}
- gap2 : extended; {height}
-
- Because all of the calculations will use the same gap values, they
- can be predefined. The gap values are calculated by taking the
- length of each side ( 2.50 ) and divide it by the number of pixels
- in each dimension. Therefore we have the values
-
- gap1 := 2.50 / 320;
- gap2 := 2.50 / 200;
-
- What these gaps say is there is 2.50/320 space between pixels on
- the graphics screen being used to display the mandelbrot number
- set. When higher resolution screens are used, the gap gets
- smaller and smaller thus enhancing the resolution of the picture.
-
-
- Sequential Program
-
- We have the basic values for the mandelbrot picture we wish to
- procedure. The only thing left is to do the actual calculations.
- The calculation that must be done for each pixel is
-
- z := z2 + bcorner;
-
- When the size of z grows to be greater than 4.0, we can stop
- the calculations. Mandelbrot numbers are characterized by not
- approaching 4.0. Therefore, these number could be put through the
- above equation many times. This suggests that we need some sort of
- variable to control the total number of times we do the
- calculation. A good control value would be 50. If a value is put
- through the calculation 50 times and the size is less than 4.0, the
- number belongs to the mandelbrot set. Values in the mandelbrot
- set
- are colored black in our picture. Values that reach 4.0 before 50
- are colored different colors.
-
- The actual calculation procedure will appear this way
-
- function calculate ( col, row, bcorner, ncomplex ) : integer;
-
- begin
- size := 0.0;
- result := 0;
- original.realp := ncomplex.realp;
- original.imag := original.imag;
-
- while ( results < 50 ) and ( size < 4.0 ) do
- begin
- original.realp := sqr(original.realp) - sqr ( original.imag
- ) + ncomplex.realp
- original.imag := 2*original.realp*original.imag +
- ncomplex.imag;
- size := sqr ( original.realp ) + sqr ( original.imag );
- inc ( result );
- end;
- calculate := result;
-
- end;
-
- This function is performed for each pixel in the screen. In
- a 320 x 200 graphic screen there are 64000 calls to this function.
- In a large graphics screen such as 1024 x 768, there would be
- 786,432 calls to the function.
-
- One parameter in the equation not yet examined is ncomplex.
- Ncomplex is a complex number which represents the actual pixel on
- the graphics screen. Ncomplex is calculated by adding the distance
- the pixel is from the corner of the screen
-
- ncomplex.realp := current_column_pixel * gap1 +
- bcorner.realp;
- ncomplex.imag := current_row_pixel * gap2 +
- bcorner.imag;
-
- So each pixel starts at the bcorner and adds the number of
- gaps in each direction the pixel is from the upper left corner,
- thus multiplication by the gaps.
-
- The entire sequential program appears as
-
- program mandel;
-
- {$N+,E+} (* use 8087 if present or emulate if not *)
-
- uses dos, crt , graph;
-
- type
-
- complex = record
- realp : extended;
- imag : extended;
- end;
-
- var
-
- bcorner,
- ncomplex : complex;
- ccol,
- crow,
- row,
- column,
- result : integer;
- gap1,
- gap2,
- side1,
- side2 : extended;
-
- procedure get_coordinates;
-
- begin
-
- clrscr;
- write ( 'Enter the real part of the lower left corner ( -2.0 )
- :'); readln ( bcorner.realp );
- write ( 'Enter the imaginary part of the lower left corner (
- -1.25 ) :'); readln ( bcorner.imag );
- write ( 'Enter the length of real edge ( 2.50 ) :');
- readln ( side1 );
- write ( 'Enter the length of imaginary edge ( 2.50 ) :');
- readln ( side2 );
- write ( 'Enter the pixels length of a row ( 320 ) :');
- readln ( row );
- write ( 'Enter the pixels length of a column ( 200 ) :');
- readln ( column );
-
- end;
-
- procedure compute_stuff;
-
- begin
-
- gap1 := side1 / row;
- gap2 := side2 / column;
-
- end;
-
- procedure prepare_screen;
-
- var
-
- graphdriver,
- graphmode : integer;
-
-
- begin
- graphdriver := vga;
- graphmode := vgahi;
- initgraph ( graphdriver , graphmode , 'c:\tp' );
-
- end;
-
- procedure plot ( crow , ccol , result : integer );
-
- var
- color : word;
-
- begin
-
- color := 1;
-
- if result < 2 then color := 1;
- if result > 2 then color := 9;
- if result > 4 then color := 2;
- if result > 6 then color := 10;
- if result > 8 then color := 4;
- if result > 10 then color := 12;
- if result > 12 then color := 5;
- if result > 14 then color := 13;
- if result > 16 then color := 8;
- if result > 18 then color := 7;
- if result > 20 then color := 0;
- putpixel ( ccol , crow , color );
-
- end;
-
- procedure get_complex;
-
- begin
-
- ncomplex.realp := ccol * gap1 + bcorner.realp;
- ncomplex.imag := crow * gap2 + bcorner.imag;
-
- end;
-
- procedure calculate_mandel;
-
- var
-
- original : complex;
- size : extended;
-
- begin
-
- result := 0;
- original.realp := 0.0;
- original.imag := 0.0;
-
- while ( result <= 21 ) and ( size < 4.0 ) do
- begin
- original.realp := original.realp*original.realp -
- orginal.imag*original.imag + ncomplex.realp;
- original.imag := 2 * ( original.realp * original.imag ) +
- ncomplex.imag;
- size := original.realp * original.realp + original.imag *
- original.imag;
- inc ( result );
- end;
-
- end;
-
- begin
-
- get_coordinates;
- prepare_screen;
- compute_stuff;
- for ccol := 1 to row do
- for crow := 1 to column do
- begin
- get_complex ( crow , ccol , gap1 , ncomplex , bcorner );
- calculate_mandel ( ncomplex , result );
- plot ( crow , ccol , result );
- end;
- end.
-
- This program can be compiled using Turbo Pascal 5.5 or 6.0 and
- executed to demonstrate the picture that will be produced. The
- row and column entries can be increased to 640 by 480 if desired.
-
-
- Parallel Programming Methods
-
- The mandelbrot program represents a large number of tasks
- which perform the same operations on a large set of data. Each and
- every pixel in the graphics screen has to be calculation on using
- the calculate function presented above. There is no way around it.
- Thus if we have two processor available for work, we can be doing
- two pixels at an given moment instead of just one. Just think, if
- we had 64000 processors, all pixels would be calculated at the
- same time. Think about the speed up.
-
- Our job is to write a parallel program using PCS-Linda and
- Turbo Pascal that will perform the mandelbrot program using any
- given number of processors.
-
-
- Developer
-
- The easiest way to approach this problem is to first consider
- the job or duties of the developer. After we have determined the
- job of the developer, we can look at the code the worker will
- execute. Recall from above that the calculations for each pixel
- all rely on a set of simple calculations which were performed only
- once. These include
-
- number of pixels in each column
- number of pixels in each row
- length of column
- length of row
- gap value for column
- gap value for row
- corner value
- column to compute
- number of columns to compute
- In addition, the graphics screen of the system must also be
- initialized.
-
-
- Columns
-
- We must ask a question about how to distribute the work to the
- workers. The sequential program computed pixels one at a time.
- Each pixel is then plotted on the screen. We could allow each
- worker to only compute one pixel at a time. Because of the nature
- of the Parallel Lan System, there would be some communication
- overhead for each pixel. In reality it might take longer for a
- number of processors to compute the screen than it would for a
- single processor if we only allow each worker to compute one
- pixel.
-
- A solution to this problem may be to allow each worker to
- compute an entire column of pixels. This would significantly cut
- down on the communication because there would only be one exchange
- for every 200 pixels if our example problem. This is a much better
- solution. By allowing each worker to computer a column of pixels,
- we can introduce another value that will tell each worker how many
- columns to compute. We might run tests to determine if each
- worker should have one column, two, four, or more between
- communication calls.
-
-
- Results
-
- The developer is responsible for setting up the calculations
- as well as receiving the results. As each worker finishes its
- column or columns, it will have to put the results into the tuple
- space for the developer to access. Once the developer has a set
- of results, it can pass to them to a procedure to be plotted.
- After 320 result packets have been received, the developer can
- perform any housecleaning necessary and quit execution.
-
- The code for the developer looks like this
- procedure plot ( col : integer; results : resu );
-
- var i,j : integer;
- color : word;
-
- begin
-
-
- for j := 0 to num_col-1 do
- for i := 1 to 200 do
- begin
- if results[i+j*200] < 20 then color := 1;
- if results[i+j*200] > 20 then color := 9;
- if results[i+j*200] > 40 then color := 2;
- if results[i+j*200] > 60 then color := 10;
- if results[i+j*200] > 80 then color := 4;
- if results[i+j*200] > 100 then color := 12;
- if results[i+j*200] > 120 then color := 5;
- if results[i+j*200] > 140 then color := 13;
- if results[i+j*200] > 160 then color := 8;
- if results[i+j*200] > 180 then color := 7;
- if results[i+j*200] > 200 then color := 0;
-
- putpixel ( col+j , i, color );
- end;
- end;
-
- begin
-
- start_up;
-
- graphdriver := vga;
- graphmode := vgahi;
- initgraph ( graphdriver, graphmode, 'a:' );
-
- gap1 := 2.50 / 320;
- gap2 := 2.50 / 200;
- bcorner.realp := -2.0;
- bcorner.imag := -1.25;
-
- cur_col := 1;
- tot_col := 320;
- pix_col := 200;
- num_col := 1;
-
- for i := 1 to num_proc do
- begin
- eval ( 'work', &adder );
- out ( 'stuff', &gap1, &gap2, &bcorner );
- end;
-
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
- for i := 1 to tot_col div num_col do
- begin
- in ( 'col', start_col, results );
- plot ( start_col, results );
- end;
-
- readln;
-
-
- { system_shutdown }
- close_system
-
-
- end.
-
-
-
- Analysis
-
- The developer starts by calling the startup procedure for
- initialization of the network. After the network is setup, the
- screen must be changed to graphics mode.
-
- graphdriver := vga;
- graphmode := vgahi;
- initgraph ( graphdriver, graphmode, 'a:' );
-
- These Turbo Pascal commands instruct the graphic card in the
- PC to switch to VGA mode. The Turbo Pascal BGI file for VGA must
- be on the drive specified in the initgraph statement.
-
- The developer proceeds to establish the constants for the
- program. Gap1, gap2, and bcorner are the same as in the
- sequential program.
-
- gap1 := 2.50 / 320;
- gap2 := 2.50 / 200;
- bcorner.realp := -2.0;
- bcorner.imag := -1.25;
-
- The next four statements set up the working environment for
- the workers.
-
- cur_col := 1;
- tot_col := 320;
- pix_col := 200;
- num_col := 1;
-
- Cur_col is the number of the next column that needs to be
- computed. Tot_col is the total number of columns that are to be
- computed. Pix_col is the total number of pixels in each column.
- Num_col is the step value or the number of columns each worker is
- suppose to compute. These values will all be shared by the
- different workers in the system.
-
- Once all of the values are computed, the developer is ready to
- put the work and initialization values in tuple space for the
- workers to pick up.
-
- for i := 1 to num_proc do
- begin
- eval ( 'work', &adder );
- out ( 'stuff', &gap1, &gap2, &bcorner );
- end;
-
- Num_proc is the total number of processor on the system or the
- number of processors to be utilized in the calculations. The
- common values for the workers is the next tuple put into tuple
- space.
-
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
-
- The developer is now free to concentrate on its own activities
- mainly receiving and plotting the results.
-
- for i := 1 to tot_col DIV num_col do
- begin
- in ( 'col', start_col, results );
- plot ( start_col, results );
- end;
-
- Num_col is used to divide the total number of 'col' tuples to
- receive based on the number of columns each worker will calculate.
-
- The last thing the developer needs to do is clean up the tuple
- space and shutdown the system
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
- close_system.
-
-
- Worker
-
- Next we must determine what the duty of the worker is. The
- worker must IN the common data from tuple space. This data would
- include the gaps and corner value. The worker would then IN the
- screen information. It is this information that will determine
- whether or not a column needs to be computed. If the worker IN the
- tuple and the cur_col to compute is greater than the tot_col
- value, then the system has successfully computed all columns. The
- worker ca quit executing this procedure. If the cur_col is not
- greater than the tot_col value, it will have to determine how many
- columns to compute and compute them. The code for the worker
- looks like this
- procedure adder;
-
-
- type
-
- complex = record
- realp,
- imag : double;
- end;
-
- var
-
-
- plen : integer;
- pkt : pointer;
- a_tuple : tuple_pointer;
-
- gap1,gap2,
- a,b,c,
- size : double;
- bcorner,
- ncomplex,
- original : complex;
- cur_col,
- tot_col,
- pix_col,
- num_col,
- cc,
- row,
- r,
- indexc,
- result,
- start_col,
- end_col : integer;
- results : array[1..200*num_col] of integer;
- finished : boolean;
-
- begin
-
-
- in ( 'stuff', gap1, gap2, bcorner );
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
- if cur_col > tot_col then
- begin
- finished := true;
- end
- else
- begin
- finished := false;
- start_col := cur_col;
- end_col := cur_col + num_col - 1;
- cur_col := cur_col + num_col;
- end;
-
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
-
- while not finished do
- begin
- indexc := 0;
- for cc := start_col to end_col do
- begin
- for r := 1 to pix_col do
- begin
- ncomplex.realp := cc * gap1 + bcorner.realp;
- ncomplex.imag := r * gap2 + bcorner.imag;
- result := 0;
- size := 0.0;
- original.realp := 0.0;
- original.imag := 0.0;
- while ( result <= 21 ) and ( size < 4.0 ) do
- begin
- a := original.realp * original.realp;
- b := original.realp * original.imag;
- c := original.imag * original.imag;
- original.realp := a-c+ncomplex.realp;
- original.imag := b+b+ncomplex.imag;
- size :=
- original.realp*original.realp+original.imag*original.imag;
- inc ( result );
- end;
- results[indexc+r] := result;
- end;
- indexc := indexc+pix_col;
- end;
- out ( 'col', &start_col, &results );
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
- if cur_col > tot_col then
- begin
- finished := true;
- end
- else
- begin
- start_col := cur_col;
- end_col := cur_col + num_col-1;
- cur_col := cur_col + num_col;
- end;
-
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
- end;
-
- end;
-
- The worker begins by INing the common information
-
- in ( 'stuff', gap1, gap2, bcorner );
-
- followed by the screen information
-
- in ( 'screen', cur_col, tot_col, pix_col,
- num_col )
-
- and determines if there is anything to compute
-
- if cur_col > tot_col then
-
- if there isn't anything to do, the boolean variable finished will
- be set to true and the compute loop will not be entered. If there
- is work to do, the worker will obtain its starting columns and put
- it into the variable start_col. The ending columns will be put
- into the variable end_col. The cur_col will be increased the
- number of columns the worker is going to compute. Once all of
- this has bee determined, the screen tuple is put back into tuple
- space.
-
- begin
- finished := true;
- end
- else
- begin
- finished := false;
- start_col := cur_col;
- end_col := cur_col + num+col - 1;
- cur_col := cur_col + num+col;
- end;
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
-
- Notice that in either case of the if statement, the screen
- tuple is put back into the tuple space. If this were not done,
- the other workers in the system would block because they could not
- find the screen tuple. Therefore, it must be put back into tuple
- space.
-
- In order to simplify the way results are sent back to the
- developer, a single array was used that includes enough space for
- the total number of pixels a worker computes. This includes the
- situation where multiple columns are computed. The indexc variable
- does the job of coordinating where the results will go in the
- array. The first column of pixels will use the locations 1-200
- while the next columns uses the location 201-400, etc. Indexc is
- incremented by pix_col for every column computed.
-
- Once all columns have been computed for this particular group
- of columns, they are put into tuple space with the
-
- out ( 'col', &start_col, &results );
-
- command. The worker will then IN another screen tuple and begin
- again.
-
- Complete Code
-
- The complete code for the system is
-
- program devman;
-
- {$N+,E+}
-
- uses dos, crt, both, work, graph;
-
- const
- num_colu = 2;
- num_proc = 1;
-
- type
- complex = record
- realp,
- imag : double;
- end;
-
- resu = array[1..200*num_colu] of integer;
-
-
- var
- gap1,
- gap2 : double;
- bcorner : complex;
-
- i,
- j,
- cur_col,
- pix_col,
- start_col,
- num_col,
- tot_col : integer;
- results : resu;
-
- graphdriver,
- graphmode : integer;
-
-
-
- procedure start_up;
-
- begin
-
- exitsave := exitproc;
- exitproc := @myexit;
-
- master[1] := $00;
- master[2] := $00;
- master[3] := $C0;
- master[4] := $05;
- master[5] := $86;
- master[6] := $24;
-
- init_system;
-
- end;
-
-
- procedure adder;
-
-
- type
-
- complex = record
- realp,
- imag : double;
- end;
-
- var
- gap1, gap2,
- a,b,c,
- size : double;
- bcorner,
- ncomplex,
- original : complex;
- cur_col,
- tot_col,
- num_col,
- pix_col,
- cc,
- row,
- r,
- indexc,
- result,
- start_col,
- end_col : integer;
- results : array[1..200*num_colu] of integer;
- finished : boolean;
-
- begin
-
-
- in ( 'stuff', gap1, gap2, bcorner );
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
- if cur_col > tot_col then
- begin
- finished := true;
- end
- else
- begin
- finished := false;
- start_col := cur_col;
- end_col := cur_col + num_col - 1;
- cur_col := cur_col + num_col;
- end;
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
-
- while not finished do
- begin
- indexc := 0;
- for cc := start_col to end_col do
- begin
- for r := 1 to pix_col do
- begin
- ncomplex.realp := cc * gap1 + bcorner.realp;
- ncomplex.imag := r * gap2 + bcorner.imag;
- result := 0;
- size := 0.0;
- original.realp := 0.0;
- original.imag := 0.0;
- while ( result <= 21 ) and ( size < 4.0 ) do
- begin
- a := original.realp * original.realp;
- b := original.realp * original.imag;
- c := original.imag * original.imag;
- original.realp := a-c+ncomplex.realp;
- original.imag := b+b+ncomplex.imag;
- size :=
- original.realp*original.realp+original.imag*original.imag;
- inc ( result );
- end;
- results[indexc+r] := result;
- end;
- indexc := indexc+pix_col;
- end;
- out ( 'col', &start_col, &results );
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
- if cur_col > tot_col then
- begin
- finished := true;
- end
- else
- begin
- start_col := cur_col;
- end_col := cur_col + num_col-1;
- cur_col := cur_col + num_col;
- end;
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
- end;
-
- end;
-
-
-
- procedure plot ( col : integer; results : resu );
-
- var i,j : integer;
- color : word;
-
- begin
-
-
- for j := 0 to num_col-1 do
- for i := 1 to 200 do
- begin
- if results[i+j*200] < 20 then color := 1;
- if results[i+j*200] > 20 then color := 9;
- if results[i+j*200] > 40 then color := 2;
- if results[i+j*200] > 60 then color := 10;
- if results[i+j*200] > 80 then color := 4;
- if results[i+j*200] > 100 then color := 12;
- if results[i+j*200] > 120 then color := 5;
- if results[i+j*200] > 140 then color := 13;
- if results[i+j*200] > 160 then color := 8;
- if results[i+j*200] > 180 then color := 7;
- if results[i+j*200] > 200 then color := 0;
-
- putpixel ( col+j , i, color );
- end;
-
- end;
-
-
-
-
-
- begin
-
- start_up;
-
- graphdriver := vga;
- graphmode := vgahi;
- initgraph ( graphdriver, graphmode, 'a:' );
-
- gap1 := 2.50 / 320;
- gap2 := 2.50 / 200;
- bcorner.realp := -2.0;
- bcorner.imag := -1.25;
- cur_col := 1;
- tot_col := 320;
- pix_col := 200;
- num_col := num_colu;
-
- for i := 1 to num_proc do
- begin
- eval ( 'work', &adder );
- out ( 'stuff', &gap1, &gap2, &bcorner );
- end;
-
- out ( 'screen', &cur_col, &tot_col, &pix_col, &num_col );
-
- for i := 1 to tot_col div num_col do
- begin
- in ( 'col', start_col, results );
- plot ( start_col, results );
- end;
-
- in ( 'screen', cur_col, tot_col, pix_col, num_col );
-
- readln;
-
-
- { system_shutdown }
- close_system
-
-
- end.
-
- This code is contained on the samples disk provided with the
- system. Convert it to standard Turbo Pascal, compile it, and run
- it for an interesting result. In order to understand what the
- code is doing exactly, follow the code for a single developer and
- a single worker. Notice the changes that take place in the screen
- tuple. This screen tuple is the main controller of the entire
- program.
-
- Once you have the code operational, try changing some of the
- initial values to obtain different pictures. The color codes (
- numbers ) used in the plotting routine can also be change to
- different sequences. These number were taken directly from those
- in the Turbo Pascal Reference Manual.
-
-
-
-