home *** CD-ROM | disk | FTP | other *** search
- .pa
- VECTOR EMULATIONS
-
- Vector emulations are software procedures that mimic the operation of
- vector processing hardware. Of course, the software is not based on the
- same principle as the hardware; but the concept is the same: specific
- procedure designed to most efficiently perform similar repetitive tasks
- on contiguously stored real numbers. No, I won't tell you how I do it,
- so don't ask. My vector emulations are completely compatible with
- Hewlett-Packard's Vector Instruction Set (VIS). They have the same
- calling syntax and function (that's why I developed them in the first
- place - downloading programs from an HP-1000F). HP has a very nice
- manual with examples. If you are interested, perhaps they would sell
- you one (I wouldn't even hazard a guess as to the cost).
-
- Vector Instruction Set (VIS) User's Manual
- Part No. 12824-90001
- Hewlett-Packard Company
- Data Systems Division
- 11000 Wolfe Road
- Cupertino, CA 95014
-
- You do not need a math coprocessor (Intel 8087/80287) in order to run a
- program linked with LIBRY.LIB; but it makes a TREMENDOUS difference (a
- factor of 120 or so for floating point operations). The vector
- emulations will run even without a math coprocessor; but in that case
- the speed is already so slow that nothing will help. The improvement in
- speed with the vector emulations varies depending on the relative speed
- of your processor and coprocessor. The greatest improvement is realized
- on a PC with a 5MHz-8086/5MHz-8087 pair; and the least improvement is
- realized on an AT with an 8MHz-80286/5MHz-80287 pair.
-
- Note that the increments (INCR1,INCR2,INCR3),index (M), and the count
- (N) are of the type INTEGER*2. Reals are of the type REAL*4 and double
- precision reals are of the type REAL*8. There can be no mixing of
- REAL*4 and REAL*8 types in the same emulation. To get double precision
- use "CALL DVABS(...)" rather than "CALL VABS(...)".
-
- It is very important to BE SURE THAT NO VECTOR CROSSES A SEGMENT
- BOUNDARY (refer to Microsoft FORTRAN manual section 8). What this means
- to the machine is that a vector must reside within a single segment
- (65536 bytes) or it can not address all of the elements as a group. In
- order to assure this to be the case, NEVER use the $LARGE metacommand.
- If you have no COMMON then you never have to worry about this. If you
- do have COMMON make sure that each COMMON contain no more than 65536
- bytes. Of course, you can have several named COMMONs so this is not too
- restrictive a limit on your programs. Also, if there is more than one
- vector passed to the emulator they need not reside in the same segment.
- For instance, you can add one real vector with 16384 elements to another
- with 16384 elements and store the result in a third - as long as they
- are all in different COMMONs. Of course, you can add two vectors in the
- same COMMON provided their total number of elements does not exceed
- 16384. There is a way of getting around this; but it is too involved
- to explain here.
-
- A word of warning... vector emulations do not like being interrupted.
- This is the whole point of "speed at any cost" procedures. For this
- reason, the emulations may interfere with the operation of some "pop-up"
- programs and such things as windowing and multi-tasking. This is
- regretably unpredictable. I can say that the emulations don't interfere
- with any of the "pop-up" programs that I have developed (like my DOS
- command stack full-screen editor and improved scroller) that "lurk" in
- the background; but I don't know about such programs that others have
- developed.
- .pa
- SAMPLE FORTRAN EQUIVALENT OF A VECTOR ADD
-
-
- SUBROUTINE VADD(V1,INCR1,V2,INCR2,V3,INCR3,N)
- C
- C VECTOR V3=V1+V2
- C
- IMPLICIT INTEGER*2 (I-N)
- IMPLICIT REAL*4 (A-H,O-Z)
- DIMENSION V1(N),V2(N),V3(N)
- C
- IF(N.LT.1) GO TO 999
- I1=1
- I2=1
- I3=1
- C
- DO 100 I=1,N
- V3(I3)=V1(I1)+V2(I2)
- I1=I1+INCR1
- I2=I2+INCR2
- 100 I3=I3+INCR3
- C
- 999 RETURN
- END
- .pa
- .ft c
- .in 15
- SUMMARY OF VECTOR INSTRUCTION SET
- ----------------------------------------------------------------------------------------------------------------------
- SPEED IMPROVEMENT: PC WITH 8087 HP-1000F
- CALLING SYNTAX OPERATION VECTOR LENGTH: N=10 N=100 N=10 N=100
- ----------------------------------------------------------------------------------------------------------------------
- CALL VABS(V1,INCR1,V2,INCR2,N) (V2(I)=ABS(V1(I)),I=1,N) 4.0 7.5 4.5 5.1
- CALL VADD(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.3 3.8 4.9 4.8
- CALL VDIV(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)/V2(I),I=1,N) 2.5 3.2 5.0 5.7
- CALL VDOT(S,V1,INCR1,V2,INCR2,N) S=SUM(V1(I)*V2(I),I=1,N) 4.0 4.8 3.5 3.6
- CALL VMAB(M,V1,INCR1,N) V1(M)=AMAX1(ABS(V1(I)),I=1,N) 3.5 4.4 3.6 3.6
- CALL VMAX(M,V1,INCR1,N) V1(M)=AMAX1(V1(I),I=1,N) 3.5 3.3 4.2 4.4
- CALL VMIB(M,V1,INCR1,N) V1(M)=AMIN1(ABS(V1(I)),I=1,N) 3.8 4.8 3.7 3.2
- CALL VMIN(M,V1,INCR1,N) V1(M)=AMIN1(V1(I),I=1,N) 3.5 3.5 4.2 3.9
- CALL VMOV(V1,INCR1,V2,INCR2,N) (V2(I)=V1(I),I=1,N) 3.3 9.0 5.2 6.5
- CALL VMPY(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)*V2(I),I=1,N) 3.5 3.8 4.8 4.7
- CALL VNRM(S,V1,INCR1,N) S=SUM(ABS(V1(I)),I=1,N) 5.3 4.7 3.8 4.5
- CALL VPIV(S,V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=S*V1(I)+V2(I),I=1,N) 3.4 3.5 4.6 5.2
- CALL VSAD(S,V1,INCR1,V2,INCR2,N) (V2(I)=S+V1(I),I=1,N) 3.4 4.0 3.7 4.2
- CALL VSDV(S,V1,INCR1,V2,INCR2,N) (V2(I)=S/V1(I),I=1,N) 3.0 3.2 4.5 4.4
- CALL VSMY(S,V1,INCR1,V2,INCR2,N) (V2(I)=S*V1(I),I=1,N) 3.4 4.0 4.0 4.5
- CALL VSSB(S,V1,INCR1,V2,INCR2,N) (V2(I)=S-V1(I),I=1,N) 3.4 5.3 3.6 4.1
- CALL VSUB(V1,INCR1,V2,INCR2,V3,INCR3,N) (V3(I)=V1(I)-V2(I),I=1,N) 3.3 3.8 5.5 5.6
- CALL VSUM(S,V1,INCR1,N) (V3(I)=V1(I)+V2(I),I=1,N) 3.5 4.3 3.5 4.1
- CALL VSWP(V1,INCR1,V2,INCR2,N) (V1(I)<->V2(I),I=1,N) 3.2 5.0 5.0 5.7
- CALL VMIX(INDEX,V1,V2,N) (V2(I)=V1(INDEX(I)),I=1,N) 1.8 2.7 NA NA
- CALL VMXI(INDEX,V1,V2,N) (V2(INDEX(I))=V1(I),I=1,N) 1.8 1.7 NA NA
- CALL CLAMP(VMIN,VMAX,V,N) (V1(I)=AMAX1(VMIN,AMIN1(VMAX,V(I))),I=1,N) 8.0 9.0 NA NA
- H=HORNER(C,X,N) H=SUM(C(I)*X**(I-1),I=1,N) 3.5 4.3 NA NA
- ----------------------------------------------------------------------------------------------------------------------
- The above table shows, for instance, that an emulated add of two vector having length 100 is 7.5 times as fast
- as the same operation in FORTRAN on a "stock" PC with an 8087 math coprocessor.
-
- note 1: there is little or no improvement for n<10 and runtimes may increase for n<5.
- note 2: for double precision add a "D" prefix (e.g. DVABS, DVADD, ..., DCLAMP, DHORNE).
- note 3: vectors must not cross a segment boundary (see section 8 of Microsoft FORTRAN user's guide).
- note 4: all integers (e.g. INCR1,INCR2,INCR3,n...) are of the INTEGER*2 type.
- note 5: increments (viz. INCR1,INCR2,INCR3) can be positive, negative, or zero.
- .ft e
- .in 10