home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.sys.sgi.misc
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!venezia!penev
- From: penev@venezia (Penio Penev)
- Subject: Re: Where does the CPU go, when its not doing nuthin' (gr_osview)
- References: <1992Dec27.213821.29605@sol.ctr.columbia.edu>
- Sender: nobody@ctr.columbia.edu
- Organization: Rockefeller University
- Date: Thu, 31 Dec 1992 12:25:15 GMT
- X-Newsreader: TIN [version 1.1 PL6]
- Message-ID: <1992Dec31.122515.12993@sol.ctr.columbia.edu>
- Reply-To: penev@venezia.rockefeller.edu
- X-Posted-From: venezia.rockefeller.edu
- NNTP-Posting-Host: sol.ctr.columbia.edu
- Lines: 151
-
- Penio Penev (penev@venezia) wrote:
- : ForthNet articles from GEnie (ForthNet@willett.pgh.pa.us) wrote:
- : : Category 9, Topic 2
- : : Message 127 Sat Dec 26, 1992
- : : ELLIOTT.C at 13:40 EST
- : :
- : : -----via CRS Premium Bulletin Board -
- : : USR Dual Standard 16.8K (416) 629-7000
- : :
- : : Date: 12-22-92 (02:09)
- : : To: ALL
- : : From: MARCEL HENDRIX
- : : Subj: THREADING SPEED
- : :
- : : Penio Penev wrote about M. Anton Ertl's ``Threading speed''
- : :
- : : M. Anton Ertl's benchmark intrigued me too, but it was in a
- : : foreign language (C). However, your Forth version I can
- : : reproduce. Here is what
- : :
- : : I found for the TMS320C30, a 32-bits, 33 MHz DSP chip from Texas
- : : Instruments. The output shown is produced from within my
- : : interactive target compiler for this chip (The tc is written in
- : : iForth, a 32-bit Forth for the '386. It runs under GO32 in
- : : protected mode on my PC).
- : :
- : ..
- : : | code cdummy next, end-code
- : : | : dummy ;
- : : | : dd FOR dummy NEXT ;
- : : | : cc FOR cdummy NEXT ;
- : : | : tara FOR NEXT ;
- : : | cr .( tara : ) timer-reset 10000000 tara .elapsed .( Indigo: 1300 )
- : : | cr .( dd : ) timer-reset 10000000 dd .elapsed .( Indigo: 3100 )
- : : | cr .( cc : ) timer-reset 10000000 cc .elapsed .( Indigo: 2500 )
- : : tara : 3.630 seconds elapsed. Indigo: 1300
- : : dd : 10.065 seconds elapsed. Indigo: 3100
- : : cc : 10.065 seconds elapsed. Indigo: 2500 ok
- : : <TARGET> see cc
- : : $000005C7 ldi *ar0++(1),r0 08402001 .... MH> dpop,
- : : $000005C8 ldi $5CF,r2 086205CF ....
- : : $000005C9 push r2 0F220000 ....
- : : $000005CA push r0 0F200000 ....
- : : $000005CB push r0 0F200000 ....
- : : $000005CC ldi r0,r7 08070000 .... MH> loop count
- : : $000005CD addi 1,r2 02620001 ....
- : : $000005CE bu r2 68000002 ....
- : : $000005CF bu $5D3 6A000003 .... MH> LEAVE use
- : : $000005D0 callu $5B7 7200FFE6 ....
- : : $000005D1 subi 1,r7 18670001 .... MH> NEXT
- : : $000005D2 bne $5D0 6A06FFFD ....
- : : $000005D3 subi 3,sp 18740003 ....
- : : $000005D4 retsu 78800000 .... ok
- : : <TARGET> words
- : :
- : : 2
- : : tara cc dd dummy
- : : cdummy ok
- : : <TARGET> see cdummy
- : : $000005B7 retsu 78800000 .... ok
- : : <TARGET> see dummy
- : : $000005B8 retsu 78800000 .... ok
- : : <TARGET> close-log
- : : ---
- : ..
- : : The long intro to FOR makes work a bit easier for the target
- : : compiler, and allows me to code FOR ... LEAVE ... NEXT if I want
- : : to. You cannot nest FOR NEXT's.
- :
- : The difinition of FOR NEXT I posted _can_ be nested. There is another
- : definition, which cannot. Words can be invoked from it, but You do not
- : have I in it.
- :
- : 1 : FOR( ( - a) TS V1 mov Drop V1 dec begin ; IMMEDIATE
- : 2 : )NEXT ( a) V1 Z= until V1 dec ; IMMEDIATE
- :
- : Measurements:
- : : TARA() FOR( )NEXT ; ok
- : COUNTER 1000000000 TARA() TIMER 62000 ok
- :
- : In dbx:
- : [RETRY, 0x10011ef4] addiu sp,sp,-4
- : [RETRY, 0x10011ef8] move v1,s0
- : [RETRY, 0x10011efc] lw s0,0(s8)
- : [RETRY, 0x10011f00] addiu s8,s8,4
- : [RETRY, 0x10011f04] addiu v1,v1,-1
- : [RETRY, 0x10011f08] bne v1,zero,0x10011f08
- : [RETRY, 0x10011f0c] addiu v1,v1,-1
- : *[RETRY, 0x10011f10] addiu sp,sp,4
- : [RETRY, 0x10011f14] jr ra
- : [RETRY, 0x10011f18] lw ra,4(sp)
- :
- : : The constant 100,000,000 was changed to 10,000,000 because, as
- : : you can see, the code takes about three to four times as long to
- : : run as on your Indigo. It is possible to optimize CC above, by
- : : using bned instead of bne, but that won't give me 300% more
- : : speed(?) I really wonder how the R3000 does it. Branch
- : : prediction?
- :
- : The R3000 can deliver one result in one clock if the pipe is kept
- : full. It can utilize the branch delay slot (the instruction after the
- : branch). I was rather surprised (pleasantly), when I realised, that
- : the branch delay slot is enough to fill the pipe. Two clocks/loop at
- : 33 MHz means 17.5 loops/us = 17.5 Mloops/second. The 1000M loops
- : should be performed in 57.1 seconds in theaory. The agreement with
- : experiment is very good, because I have at least 10 other open windos,
- : one of which is a graphical clock, updated on a second basis. This
- : means, that I can utilise ~ 92.1% of the power of my machine without
- : sacrificing any convinience.
-
- What activities are caried on on a unburdened system? 8% of this
- powerfull CPU seems a little bit much for the clock. The other windows
- just waited for input from the keyboard. When I checked this with
- gr_osview, the CPU was almost all user's with very small ammount of
- sys.
-
- :
- : In my oppinion the R3000 has some way of processing two instructions
- : in the early stages of the pipe (perhaps the first 2 of 5 total). I
- : made the following test:
- :
- : : TT FOR( [ begin 0 Z= until nop ] )NEXT ; ok
- : COUNTER 100000000 TT TIMER 13000 ok
- :
- : This is one branch, which is taken, and one which is not. The timing
- : is ~ two times the original one, which means, that both branches
- : execute in the same time. On the other hand in the Reference Manual
- : clearly states, that the branch decision is made one clock after the
- : calculation of the target address and the condition. This is one clock
- : after the Instruction Fetch phase on the target instruction.
- :
- : The R4000 has an 8 stage pipe, and there (IMHO) maintaining two early
- : pipes is not possible (feasible). It has an instructions 'branch if
- : condition likely'.
- :
- : Another consequence of this feature is that You can make
- : call = jump and link, store, dec = 3 clocks
- : ret = jump register, load, inc = 3 clocks
- : ----------
- : 6 clocks
- :
- : call unnestable = jump and link, store = 2 clocks
- : ret from unnestable = jump register, load = 2 clocks
- : ----------
- : 4 clocks
- :
- : I like this processor.
- :
- : -- Penio.
-
- -- Penio.
-