home *** CD-ROM | disk | FTP | other *** search
- From: eliot@cs.qmw.ac.uk (Eliot Miranda)
- Newsgroups: comp.compilers,gnu.gcc.bug,alt.sources
- Subject: re: Threaded Code
- Message-ID: <3520@sequent.cs.qmw.ac.uk>
- Date: 5 Apr 91 11:43:51 GMT
- Approved: compilers@iecc.cambridge.ma.us
-
- I recently posted articles to comp.compilers & alt.sources on how
- to write threaded code machines in C. I received the following questions
- from Simon Peyton Jones at Glasgow. They are specific to GCC.
- Since they have non-obvious answers & since the answers suggest
- augmentation of the GCC compiler I'm posting my response to Simon.
-
- >From: Simon L Peyton Jones <simonpj@cs.gla.ac.uk>
- >
- >I saw your article about threaded code. Like you and others, we are
- >using C as an assembler, only for a pure functional language, Haskell.
- >I have some brief questions.
- >
- >1. By telling GCC not to use a frame pointer, one can eliminate
- >the prolog entirely, can't one? So why edit it out?
-
- No, registers local to the procedure will still be saved & stack space
- allocated for automatic variables. This IS a problem since the
- threaded-code jump at the end of the procedure will miss the register
- restores before the epilogue. Consequently the stack will grow unless
- these register saves & stack-space allocations are removed. Also
- GCC can not always eliminate the frame pointer.
-
- >I guess the answer is going to be local variables, allocated once for
- >all by the StartTCMachine routine. Still, it seems quite a pain. I guess
- >one could sacrifice some (perhaps slight) speed by using a bunch of
- >globals instead.
- For certain routines, not using register variables will be expensive
- (e.g. a simple integer arithmetic primitive).
-
- >2. You edit out the epilogue for tidiness only, I take it. It doesn't
- >cause any problems if you leave it in, does it?
- No, but given that the prologue has to be removed & removing the epilogue
- is as easy (& given finite memory :-) one might as well remove it.
- >
- >3. Why do you use no-defer-pop (with the associated bug)?
- OK. This is again to avoid stack growth. On conventional stack architectures
- gcc will try & combine the argument popping code of a sequence of procedure calls.
- e.g.
- extern long a, b, c;
- void doit() {
- foo(a); bar(b); baz(c);
- }
-
- with -O -no-defer-pop one might expect gcc to generate
-
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- addqw #4,%sp
- movel b,%sp@-
- jbsr bar
- addqw #4,%sp
- movel c,%sp@-
- jbsr baz
- addqw #4,%sp
- unlk %a6
- rts
-
- but because gcc knows that the unlk instruction will roll back the stack
- in fact gcc generates:
-
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- addqw #4,%sp
- movel b,%sp@-
- jbsr bar
- addqw #4,%sp
- movel c,%sp@-
- jbsr baz
- unlk %a6
- rts
-
- With -O -fdefer-pop gcc optimizes out the pops completely & generates:
-
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- movel b,%sp@-
- jbsr bar
- movel c,%sp@-
- jbsr baz
- unlk %a6
- rts
-
- with -O -fomit-frame-pointer -fdefer-pop gcc generates:
-
- movel a,%sp@-
- jbsr foo
- movel b,%sp@-
- jbsr bar
- movel c,%sp@-
- jbsr baz
- addw #12,%sp
- rts
-
- & with -O -fomit-frame-pointer -fno-defer-pop gcc generates:
-
- movel a,%sp@-
- jbsr foo
- addqw #4,%sp
- movel b,%sp@-
- jbsr bar
- addqw #4,%sp
- movel c,%sp@-
- jbsr baz
- addqw #4,%sp
- rts
-
- All the above cases are as one would wish. The elimination of all
- defered pops in the unlk instruction is especially clever.
-
- However, in the presence of the threaded-code jump the waters darken!
- Consider what gcc generates for:
-
- register void (**tcip)() asm("%a5");
-
- #define JUMPNEXT do{asm("movl %a5@+,%a0; jmp %a0@");return;}while(0)
-
- extern long a, b;
- void doit() {
- foo(a); bar(b); JUMPNEXT;
- }
- with -O -fdefer-pop gcc generates
-
- doit:
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- movel b,%sp@-
- jbsr bar
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- unlk %a6
- rts
-
- This is clearly no good because the arguments to both foo & bar
- will never be popped. Every time doit() is executed the stack will grow
- by 8 bytes. Soon your program will dump core with a very large stack
- segment!
-
- with -O -fno-defer-pop gcc generates:
-
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- addqw #4,%sp
- movel b,%sp@-
- jbsr bar
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- unlk %a6
- rts
-
- Again useless because bar's pop has been folded into the unlk
- which won't be executed.
-
- with -O -fdefer-pop -fomit-frame-pointer gcc generates
-
- movel a,%sp@-
- jbsr foo
- movel b,%sp@-
- jbsr bar
- addqw #8,%sp
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- rts
-
- This is great. However, not all functions are susceptible to
- the omit-frame-pointer optimization (i.e. functions
- with local variables). E.g. the code generated for:
-
- register void (**tcip)() asm("%a5");
-
- #define JUMPNEXT do{asm("movl %a5@+,%a0; jmp %a0@");return;}while(0)
-
- extern long a, b;
- void doit() {
- char large[1024];
- foo(a,large); bar(b); JUMPNEXT;
- }
-
- with -O -fomit-frame-pointer -fdefer-pop is:
-
- link %a6,#-1024
- pea %a6@(-1024)
- movel a,%sp@-
- jbsr foo
- movel b,%sp@-
- jbsr bar
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- unlk %a6
- rts
-
- so in general one can't rely on -fomit-frame-pointer.
- For the above example both
- -O -fomit-frame-pointer -fno-defer-pop
- and
- -O -fno-defer-pop
- generate:
-
- doit:
- link %a6,#-1024
- pea %a6@(-1024)
- movel a,%sp@-
- jbsr foo
- addqw #8,%sp
- movel b,%sp@-
- jbsr bar
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- unlk %a6
- rts
-
- This is also useless because bar's argument pop has been folded away. The
- problem is that gcc will always fold away the last call's argument pop if
- the function has a frame pointer, and -fomit-frame-pointer can't allways
- get rid of the frame pointer. In fact, in the presence of variable sized
- automatic variables or calls on alloca it would be very hard (impossible
- for recursive functions?) to do.
-
- The neatest solution I've come up with is to use -fno-defer-pop and a
- dummy function call between the threaded-code jump and the return:
-
- register void (**tcip)() asm("%a5");
-
- #define JUMPNEXT do{asm("movl %a5@+,%a0; jmp %a0@");SBD();return;}while(0)
-
- extern long a, b;
- void doit() {
- foo(a); bar(b); JUMPNEXT;
- }
- with -O -fno-defer-pop gcc generates:
-
- link %a6,#0
- movel a,%sp@-
- jbsr foo
- addqw #4,%sp
- movel b,%sp@-
- jbsr bar
- addqw #4,%sp
- #APP
- movl %a5@+,%a0; jmp %a0@
- #NO_APP
- jbsr SBD
- unlk %a6
- rts
-
- Now bar's argument pop is not folded because its no longer the last
- call in the routine, SBD is.
- So the call to SBD
- a) prevents gcc's 'last call argument pop fold into unlk' optimization
- which prevents uncontrolled stack growth.
- b) doesn't get executed because of the jump
- c) is trivial to remove from the assembler with a sed-script
-
-
- One an try to use -fcaller-saves, but this surrounds calls with unnecessary
- register saves & restores that for the code to be optimal have to be edited out.
-
- >4. Why does JUMPNEXT have a loop? Surely the jump leaves the loop right
- >away. Presumably you are tricking the compiler somehow.
-
- This is old C lore. The problem is
- 'How do you write a macro that is a sequence of statements
- that can be used wherever a single statement can?'
-
- take the following definition of JUMPNEXT:
- #define JUMPNEXT asm("movl %a5@+,%a0; jmp %a0@");return;
-
- Now invoke it here:
- if (its_time_to_jump)
- JUMPNEXT;
- do_something_else();
-
- This expands to:
- if (its_time_to_jump)
- asm("movl %a5@+,%a0; jmp %a0@");
- return;
- do_something_else();
-
- Not at all whats intended!
-
- There are two tricks I know of (the first I saw in Berkely Smalltalk,
- the second in Richard Stallman's gcc manual. I expect they're both
- quite old).
- The first is to surround your statements with
- if (TRUE){statements}else
- i.e.
- #define JUMPNEXT if(1){asm("movl %a5@+,%a0; jmp %a0@");return;}else
- So now we get:
- if (its_time_to_jump)
- if (1){
- asm("movl %a5@+,%a0; jmp %a0@");
- return;
- else;
- do_something_else();
-
- which works because C binds elses innermost first. However, some
- compilers will whine about dangling elses. The second scheme is
- more elegant (-:
-
- Surround your statements with
- do{statements}while(FALSE);
- which will execute statements precisely once (its NOT a loop).
- i.e.
- #define JUMPNEXT do{asm("movl %a5@+,%a0; jmp %a0@");SBD();return;}while(0)
- expands to
-
- if (its_time_to_jump)
- do {
- asm("movl %a5@+,%a0; jmp %a0@");
- return;
- while(0);
- do_something_else();
-
- which does what's wanted and doesn't incur compiler whines.
-
-
- >Thanks
- >
- >Simon L Peyton Jones, Glasgow University
-
- More and more people are taking the 'use C as an assembler' route, and
- more and more people are using GCC to do it (because its code quality is
- good, it had global register variables, and it has an excellent asm
- facility). The threaded-code in C idea is also becomming more popular.
- But as the code above demonstrates, one does have to side-step
- optimizations and develop system-specific assembler editing scripts.
-
- I'd like to ask Richard Stallman & the GCC development team for
- -fno-prolog -fno-epilog
- flags that instruct gcc to generate
- a) no register saves or restores
- b) no automatic variable allocation
- c) no procedure linkage/frame creation
-
- Then the optimal 'Threaded-Code Machine in GCC C' can be compiled without
- any assembler editing scripts at all.
-
- Also nice would be a way of telling GCC that an asm statement
- changed the flow of control so GCC could
- a) warn about not-reached code
- b) eliminate unnecessary code (do more code folding)
- --
- Eliot Miranda email: eliot@cs.qmw.ac.uk
- Dept of Computer Science Tel: 071 975 5229 (+44 71 975 5229)
- Queen Mary Westfield College ARPA: eliot%cs.qmw.ac.uk@nsf.ac.uk
- Mile End Road UUCP: eliot@qmw-cs.uucp
- LONDON E1 4NS
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-