home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.parallel
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!gatech!hubcap!fpst
- From: Steven Ericsson Zenith <zenith@kai.com>
- Subject: Re: Linda / The Parform
- Message-ID: <1992Dec30.212742.24207@hubcap.clemson.edu>
- Sender: fpst@hubcap.clemson.edu (Steve Stevenson)
- Organization: Clemson University
- Date: Wed, 30 Dec 92 09:25:27 -0600
- Approved: parallel@hubcap.clemson.edu
- Lines: 205
-
- Well, regular readers of comp.parallel over the years will know that
- this isn't the first time my idiosyncratic use of the english language
- has gotten me into trouble. Colorful, and perhaps inappropriate at
- times, it is a reflection of me, my background and culture (or lack of
- it) and I do not apologize for it. I never -ever- have malicious intent
- in my remarks - I am sorry if anything I say should be interpreted so.
-
- I do, however, comment on areas in which I have some knowledge or have
- experience undertaking similar experiments. I may criticize the work of
- others in my field but such criticism should not be taken as an attack
- on the person - though I understand the temptation to initially
- interpret it so. I hope I am as gracious when receiving criticism of my
- own work. It too has strengths and weaknesses and that work and postings
- here over the years have revealed both - I have no shame of either.
-
- I have read the "old" report to which Dr.Cap refers and as he points out
- it does not answer the questions raised in my posting. I look forward to
- reading his coming publication. I will also note that the ability to
- fund a project has never in my experience been an accurate reflection of
- its worth, most especially in Europe.
-
- There are just a couple of points that need clarifying. When I referred
- to use of the same base compiler in these experiments I meant *the
- same*. Using different C compilers with the same switches is no
- guarantee of the same code generation. Therefore, if different compilers
- were used I do expect to see some variation. Depending on the problem
- the variation may or may not be important.
-
- For example, in the graph presented in the referred paper a time of
- 1370.6 is reported under the table heading "Linda" and the table heading
- "PVM" and "PARFORM". The same number appears in the recent posting for
- Yale (SCA) Linda. Now, the last time I looked Yale (SCA) Linda had an
- integrated compiler while PVM and POSYBL were, as libraries, able to use
- a native compiler and there I expect to see a variation of some kind -
- even a small one (The source of this single number becomes clear in the
- paper and will be revealed in a following paragraph). But whatever the
- case the central point is this: the actual compiler used, the maker and
- version number, was not reported - it is not good enough to say "all
- platforms use the C language and C compiler with identical compiler
- options": this was the same C compiler? The native Sun C compiler? The
- GNU C compiler? Which?
-
- I am particularly interested in your base time because it, and not any
- caching artifact, may be the source of your superlinear speed up. I'm
- not saying that it is so, I'm simply asking the question.
-
- However, you might like to consider the following scenario, a
- reinterpretation of your figures, that is the source for my concern:
-
- Here are the original numbers.
-
- > Procs POSYBL SCA linda PVM MC-2 The Parform
- > 1 1370.6 1370.6 1370.6 -- 1370.6 (1.0)*
- > 2 737.2 662.2 648.0 -- 654.8 (2.1)
- > 4 442.6 342.6 328.0 921.4 332.3 (4.1)
- > 6 339.3 235.5 219.0 618.5 221.7 (6.2)
- > 8 284.6 175.8 168.4 466.7 170.2 (8.0)
- > 10 260.2 144.3 143.6 376.6 137.4 (10.0)
- > 12 244.7 122.1 116.6 318.2 116.0 (11.8)
- > 14 242.7 104.5 100.1 276.2 103.5 (13.5)
- > 16 239.5 92.8 90.0 240.0 89.0 (15.4)
- > 18 242.6 84.5 97.5 215.9 80.9 (16.9)
- > 20 241.6 76.0 85.8 196.6 73.5 (18.7)
- > 22 71.5 68.5 182.8 67.5 (20.3)
- > 24 66.5 63.6 170.9 62.5 (21.9)
- > 26 63.1 60.5 160.9 58.6 (23.4)
- > 28 58.5 56.7 151.9 55.8 (24.7)
- > 30 55.1 53.5 144.7 53.0 (25.9)
- > 32 54.0 54.0 138.5 51.0 (26.9)
- > 34 52.4 54.0 50.8 (27.0)
- > 36 51.4 52.0 48.5 (28.3)
- > 38 51.3 54.0 48.4 (28.3)
- > 40 52.9 47.2 (29.0)
-
- > * speedup in parentheses
-
- I assume speedup is computed using the same method for the graph on page
- 11 of the "old" paper. There it is stated that the speed up is "with
- respect to the sequential run on the slowest workstation in our LAN, a
- Sparcstation1." This is justified later by the comment "In the case of
- homogeneous partioning [of the problem] the slowest machine in the net
- is responsible for the speedup. All faster machines idle while waiting
- at the synchronization points, which are the communication statements.
- Thus it is reasonable to calculate speedup values with respect to the
- runtime on the slowest machine."
-
- I understand the point here but I'm not convinced by the reasoning. I
- would disagree with the paper on this point arguing that the base time
- for a parallel speedup measure must be the fastest possible sequential
- time not the slowest, regardless of the disparate performance of the
- relative processing elements. By your reasoning I can sit a Cray on the
- network (which I'll assume here can do the task in zero time) and still
- we will see your "speed up" yet if I randomly walk up to your network
- and sit at the Cray I will not be able to prove your case - however, let
- us move on.
-
- Reconsider the figures with the base numbers modified according to the
- time give for 2 processors times 2. So, this allows that each system,
- for this problem, speeds up perfectly for 2 processors (which it
- probably doesn't) but I'll argue it may give me a fair measure of the
- sequential code generation of each compiler and is equally fair to each
- system implementation. Not surprisingly we see the variation I expected.
- To save time permit me to focus on the Yale (SCA) Linda, PVM and PARFORM
- numbers.
-
- Procs SCA linda PVM The Parform
- 1 1324.4(1) 1296.0(1) 1309.6 (1)*
- 2 662.2(2) 648.0(2) 654.8 (2) [2.1]**
- 4 342.6(3.86) 328.0(3.95) 332.3 (3.94) [4.1]
- 6 235.5(5.62) [5.82] 219.0(5.92) [6.26] 221.7 (5.9) [6.2]
- 8 175.8(7.53) [7.8] 168.4(7.70) [8.14] 170.2 (7.69) [8.0]
- 10 144.3(9.18) [9.5] 143.6(9.03) [9.54] 137.4 (9.53) [10.0]
- 12 122.1(10.8) [11.23] 116.6(11.11)[11.75] 116.0 (11.29)[11.8]
- 14 104.5(12.67) 100.1(13) [13.6] 103.5 (12.7) [13.2]
- 16 92.8(14.27) 90.0(14.4) 89.0 (14.71)[15.4]
- 18 84.5(15.67) 97.5(13.29) 80.9 (16.19)[16.9]
- 20 76.0(17.43) 85.8(15.01) 73.5 (17.82)[18.7]
- 22 71.5(18.52) 68.5(18.92) 67.5 (19.40)[20.3]
- 24 66.5(19.92) 63.6(20.38) 62.5 (20.95)[21.9]
- 26 63.1(20.99) 60.5(21.42) 58.6 (22.35)[23.4]
- 28 58.5(22.64)[23.4] 56.7(22.86)[24.2] 55.8 (23.47)[24.7]
- ...
-
- * speedup in parentheses
- ** old numbers in square brackets.
-
- These numbers draw a slightly different graph - not greatly different
- I'll admit but, significantly, there are no superlinear artifacts. Let
- me make it clear that this is a fiction because we do not have the real
- base times but I will argue that it is a legitimate interpretation of
- the numbers. I like these numbers better since there is no superlinear
- speedup (well you knew there wouldn't be) - and they illustrate the
- effect of a small change to the respective base times. Imagine now the
- result if we had taken the SPARCstation2 base time (564) given in the
- paper - negative speedup on 2 processors - which may, in fact, be the
- right information to give a potential user with this application whose
- network configuration is two SPARC1s and one SPARC2.
-
- I did read the paper. I understand the issues surrounding swapping, I
- know, as do most people by now, the effective increase in localized
- memory and cache size can produce superlinear artifacts - I'm just not
- convinced these numbers illustrate that with the 3800x100 problem size
- these numbers are for.
-
- These are not trivial points either since they have ramifications in the
- subsequent analytical conclusions presented in the paper.
-
- We certainly should not interpret from these numbers that the PARFORM
- implementation performs better than the other two. The Linda compiler
- may still be a version of an old GNU C compiler (it was the last time I
- -2+ years ago- dug into its guts) and thus its code generation may not
- be quite as good as the compiler used in the PVM and PARFORM case - we
- don't know since we don't know the compiler used - it may be enough to
- remove any disparity shown above.
-
- I am not suggesting that the numbers were in any way faked, I am
- questioning the use, attribution and validity for the stated cause, of
- the numbers you have and I welcome clarification. I expect other peer
- review would do the same.
-
- The stated cause is important here for my comments were aimed at the
- contention made in the earlier reposting of the numbers that they (the
- numbers) "heated up the discussion whether the Linda or message passing
- paradigm are superior" and I am particularly interested in the
- comparison of parallel programming *models*. The same implication is
- made in the paper. However, the explicit claim is different. The
- explicit claim is (pg18) "The presented results and experiences with our
- platform show that we can regard a workstation network as a tightly
- coupled multiprocessor system when exploiting its resources with a
- system like the PARFORM." with which I doubt anyone in this field would
- disagree. But let us be clear, these numbers do not tell us anything
- about the intrinsic merit of the models involved. They tell us something
- of the specific implementations, in this case I believe they tell us
- more about parallel decomposition of this problem: pretty much any
- interaction model will do, new scheduling techniques are important and
- applicable to all implementations.
-
- To emphasize: comparing a poor implementation of Linda to any
- implementation of Message Passing and vice versa reveals no intrinsic
- knowledge about the relative merit of each model.
-
- And so I must return to my earlier remarks and restate them: for a valid
- comparison of the models to be made the implementations must be shown to
- be comparable. The base compiler must be the same; i.e., the sequential
- code generation must be equivalent. The implementation techniques of the
- model must be comparable; i.e., process creation, synchronization,
- interaction implementations must be understood at the lowest level and
- be shown to be comparable. Further more, the problems run should
- exercise the primitives of the model. A straight comparison of a program
- running under Yale (SCA) Linda and PVM is not at all valid in this
- sense; just as the earlier comparison with POSYBL illustrated (yes,
- though I rest in silence, I do still read this news group).
-
- The above comments should not be deemed to show favor by me to either
- message passing or Linda models, anyone knowing my work will know that I
- have arguments against both for general purpose parallel programming
- beit in shared or distributed environments. I have solutions but *that*
- is for another time.
-
- Peace and balance to you all in this coming New Year.
- --
- Steven Ericsson Zenith
- Disclaimer: Opinions expressed are my own and not necessarily those of KAI.
-
-
-