home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.intel:2823 comp.arch:11898
- Path: sparky!uunet!pipex!bnr.co.uk!uknet!gdt!aber!fronta.aber.ac.uk!pcg
- From: pcg@aber.ac.uk (Piercarlo Grandi)
- Newsgroups: comp.sys.intel,comp.arch
- Subject: Re: Superscalar vs. multiple CPUs ?
- Message-ID: <PCG.92Dec23150744@decb.aber.ac.uk>
- Date: 23 Dec 92 15:07:44 GMT
- References: <WAYNE.92Dec4093422@backbone.uucp> <37595@cbmvax.commodore.com>
- <PCG.92Dec13170504@aberdb.aber.ac.uk>
- <1992Dec21.133318.2975@athena.mit.edu>
- Sender: news@aber.ac.uk (USENET news service)
- Reply-To: pcg@aber.ac.uk (Piercarlo Grandi)
- Organization: Prifysgol Cymru, Aberystwyth
- Lines: 77
- In-Reply-To: solman@athena.mit.edu's message of 21 Dec 92 13: 33:18 GMT
- Nntp-Posting-Host: decb.aber.ac.uk
-
- On 21 Dec 92 13:33:18 GMT, solman@athena.mit.edu (Jason W Solinsky) said:
-
- solman> If you define the codes which we are concerned with to be codes
- solman> which can only exploit ILP, then of course the level of
- solman> parallelism is limited, but you are not dealing with general
- solman> purpose computing anymore.
-
- Ah, this discussion was indeed mostly about ILP/superscalarity/VLIW.
-
- pcg> Indeed pipeline designs with more than a few stages of pipelining
- pcg> run into huge problems, and are worth doing only if a significant
- pcg> proportion of SIMD-like operation is expected. Pipeline bubbles
- pcg> start to become a significant problem beyond 4 pipeline stages on
- pcg> general purpose codes, even on non superscalar architectures.
-
- solman> This can be taken care of by interleaving different threads in
- solman> the software, or using hardware which will take care of the
- solman> interleaving on its own. The above statement is only true when
- solman> the compiler is too dumb to notice higher level parallelism.
-
- Well, here you are saying, if I read you right, that if an application
- is suited to MIMD style computing then multithreading is the answer.
- This in itself is nearly a tautology. If you are also implying that many
- more applications are suited to *massive* MIMD style (macro) parallelism
- then is commonly believed, I would ber skeptical; what I believe is that
- quite a few *important* applications can be successfully (macro)
- parallelized, MIMD style. For example compiling GNU Emacs; each
- compilation of each source module can proceed in parallel, and indeed
- one can spawn a separate thread for each function in each source module.
-
- solman> The key question in chosing how large register files and caches
- solman> should be, is "How large a {register file or cache} do I need
- solman> for `good' performance on the algorithms I want to run?"
- solman> Invariably, the size chosen is too small some of the time, while
- solman> much of it is left unused at other times. In the multiple CPU
- solman> version, this still happens. In the hyperscalar version,
- solman> however, some of the execution units and threads will need a
- solman> larger {cache or reg file} and some will be unable to utilize
- solman> the existing space, but because they can share the same caches
- solman> and register files, it is far less likely for performance to be
- solman> limited by cache or register file size.
-
- I agree with the sentiment here; partitioning resources can indeed lead
- to starvation in one place but to overcommitment in another.
-
- The problem is that sharing resources can have tremendous costs; sharing
- register files between multiple parallel functional units looks nice
- until one figures out the cost in terms of multiporting.
-
- It's obvious you know this, but maybe you are underestimating this cost;
- moreover avoiding partitioning will really solve problems only when
- there are huge swings in resource allocations by different threads; if
- these are fairly predictable, then static partitioning does not look so
- bad. I reckon that this is the case in many instances. Vectorization is
- successful precisely because a lot of common important applications
- (array oriented ones) have fairly predictable resource profiles, and so
- you can build the architecture around them.
-
- solman> [ ... ] Its an open question on how you want to do it. I
- solman> personally favor tagging everything, and avoiding the concept of
- solman> contexts in a dataflowish fashion.
-
- Ah, this again would be my sentiment; I am a great admirer for example
- fo the Manchester Dataflow prototype. But the practicality of the thing
- is that dataflowish machines are built for the worst possible case
- (everything unpredictable and fluid), while 80% of important codes are
- quite more statically so, and taggint everything is quite a waste then.
-
- The open research question is how to run as fast as possible on 80% of
- codes while not defining an architecture that is unsuited for the
- remaining 20%. As to this I think that a dirtier design, a compromise,
- will work better than a neat fluid tagged dataflow machine. I only hoppe
- I am wrong.
- --
- Piercarlo Grandi, Dept of CS, PC/UW@Aberystwyth <pcg@aber.ac.uk>
- E l'italiano cantava, cantava. E le sue disperate invocazioni giunsero
- alle orecchie del suo divino protettore, il dio della barzelletta
-