home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
TopWare Tools
/
TOOLS.iso
/
tools
/
top1244
/
gccinfo.zoo
/
gccinfo
/
gcc.info-10
< prev
next >
Encoding:
Amiga
Atari
Commodore
DOS
FM Towns/JPY
Macintosh
Macintosh JP
Macintosh to JP
NeXTSTEP
RISC OS/Acorn
Shift JIS
UTF-8
Wrap
GNU Info File
|
1992-02-16
|
48.0 KB
|
1,120 lines
This is Info file gcc.info, produced by Makeinfo-1.43 from the input
file gcc.texi.
This file documents the use and the internals of the GNU compiler.
Copyright (C) 1988, 1989, 1992 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public
License" and this permission notice may be included in translations
approved by the Free Software Foundation instead of in the original
English.
File: gcc.info, Node: Standard Names, Next: Pattern Ordering, Prev: Constraints, Up: Machine Desc
Standard Names for Patterns Used in Generation
==============================================
Here is a table of the instruction names that are meaningful in the
RTL generation pass of the compiler. Giving one of these names to an
instruction pattern tells the RTL generation pass that it can use the
pattern in to accomplish a certain task.
`movM'
Here M stands for a two-letter machine mode name, in lower case.
This instruction pattern moves data with that machine mode from
operand 1 to operand 0. For example, `movsi' moves full-word
data.
If operand 0 is a `subreg' with mode M of a register whose own
mode is wider than M, the effect of this instruction is to store
the specified value in the part of the register that corresponds
to mode M. The effect on the rest of the register is undefined.
This class of patterns is special in several ways. First of all,
each of these names *must* be defined, because there is no other
way to copy a datum from one place to another.
Second, these patterns are not used solely in the RTL generation
pass. Even the reload pass can generate move insns to copy
values from stack slots into temporary registers. When it does
so, one of the operands is a hard register and the other is an
operand that can need to be reloaded into a register.
Therefore, when given such a pair of operands, the pattern must
generate RTL which needs no reloading and needs no temporary
registers--no registers other than the operands. For example, if
you support the pattern with a `define_expand', then in such a
case the `define_expand' mustn't call `force_reg' or any other
such function which might generate new pseudo registers.
This requirement exists even for subword modes on a RISC machine
where fetching those modes from memory normally requires several
insns and some temporary registers. Look in `spur.md' to see how
the requirement can be satisfied.
During reload a memory reference with an invalid address may be
passed as an operand. Such an address will be replaced with a
valid address later in the reload pass. In this case, nothing
may be done with the address except to use it as it stands. If
it is copied, it will not be replaced with a valid address. No
attempt should be made to make such an address into a valid
address and no routine (such as `change_address') that will do so
may be called. Note that `general_operand' will fail when
applied to such an address.
The global variable `reload_in_progress' (which must be explicitly
declared if required) can be used to determine whether such
special handling is required.
The variety of operands that have reloads depends on the rest of
the machine description, but typically on a RISC machine these
can only be pseudo registers that did not get hard registers,
while on other machines explicit memory references will get
optional reloads.
If a scratch register is required to move an object to or from
memory, it can be allocated using `gen_reg_rtx' prior to reload.
But this is impossible during and after reload. If there are
cases needing scratch registers after reload, you must define
`SECONDARY_INPUT_RELOAD_CLASS' and/or
`SECONDARY_OUTPUT_RELOAD_CLASS' to detect them, and provide
patterns `reload_inM' or `reload_outM' to handle them. *Note
Register Classes::.
The constraints on a `moveM' must permit moving any hard register
to any other hard register provided that `HARD_REGNO_MODE_OK'
permits mode M in both registers and `REGISTER_MOVE_COST' applied
to their classes returns a value of 2.
It is obligatory to support floating point `moveM' instructions
into and out of any registers that can hold fixed point values,
because unions and structures (which have modes `SImode' or
`DImode') can be in those registers and they may have floating
point members.
There may also be a need to support fixed point `moveM'
instructions in and out of floating point registers.
Unfortunately, I have forgotten why this was so, and I don't know
whether it is still true. If `HARD_REGNO_MODE_OK' rejects fixed
point values in floating point registers, then the constraints of
the fixed point `moveM' instructions must be designed to avoid
ever trying to reload into a floating point register.
`reload_inM'
`reload_outM'
Like `movM', but used when a scratch register is required to move
between operand 0 and operand 1. Operand 2 describes the scratch
register. See the discussion of the `SECONDARY_RELOAD_CLASS'
macro in *note Register Classes::..
`movstrictM'
Like `movM' except that if operand 0 is a `subreg' with mode M of
a register whose natural mode is wider, the `movstrictM'
instruction is guaranteed not to alter any of the register except
the part which belongs to mode M.
`addM3'
Add operand 2 and operand 1, storing the result in operand 0.
All operands must have mode M. This can be used even on
two-address machines, by means of constraints requiring operands
1 and 0 to be the same location.
`subM3', `mulM3'
`divM3', `udivM3', `modM3', `umodM3'
`sminM3', `smaxM3', `uminM3', `umaxM3'
`andM3', `iorM3', `xorM3'
Similar, for other arithmetic operations.
`mulhisi3'
Multiply operands 1 and 2, which have mode `HImode', and store a
`SImode' product in operand 0.
`mulqihi3', `mulsidi3'
Similar widening-multiplication instructions of other widths.
`umulqihi3', `umulhisi3', `umulsidi3'
Similar widening-multiplication instructions that do unsigned
multiplication.
`divmodM4'
Signed division that produces both a quotient and a remainder.
Operand 1 is divided by operand 2 to produce a quotient stored in
operand 0 and a remainder stored in operand 3.
For machines with an instruction that produces both a quotient
and a remainder, provide a pattern for `divmodM4' but do not
provide patterns for `divM3' and `modM3'. This allows
optimization in the relatively common case when both the quotient
and remainder are computed.
If an instruction that just produces a quotient or just a
remainder exists and is more efficient than the instruction that
produces both, write the output routine of `divmodM4' to call
`find_reg_note' and look for a `REG_UNUSED' note on the quotient
or remainder and generate the appropriate instruction.
`udivmodM4'
Similar, but does unsigned division.
`ashlM3'
Arithmetic-shift operand 1 left by a number of bits specified by
operand 2, and store the result in operand 0. Operand 2 has mode
`SImode', not mode M.
`ashrM3', `lshlM3', `lshrM3', `rotlM3', `rotrM3'
Other shift and rotate instructions.
Logical and arithmetic left shift are the same. Machines that do
not allow negative shift counts often have only one instruction
for shifting left. On such machines, you should define a pattern
named `ashlM3' and leave `lshlM3' undefined.
`negM2'
Negate operand 1 and store the result in operand 0.
`absM2'
Store the absolute value of operand 1 into operand 0.
`sqrtM2'
Store the square root of operand 1 into operand 0.
`ffsM2'
Store into operand 0 one plus the index of the least significant
1-bit of operand 1. If operand 1 is zero, store zero. M is the
mode of operand 0; operand 1's mode is specified by the
instruction pattern, and the compiler will convert the operand to
that mode before generating the instruction.
`one_cmplM2'
Store the bitwise-complement of operand 1 into operand 0.
`cmpM'
Compare operand 0 and operand 1, and set the condition codes.
The RTL pattern should look like this:
(set (cc0) (compare (match_operand:M 0 ...)
(match_operand:M 1 ...)))
`tstM'
Compare operand 0 against zero, and set the condition codes. The
RTL pattern should look like this:
(set (cc0) (match_operand:M 0 ...))
`tstM' patterns should not be defined for machines that do not
use `(cc0)'. Doing so would confuse the optimizer since it would
no longer be clear which `set' operations were comparisons. The
`cmpM' patterns should be used instead.
`movstrM'
Block move instruction. The addresses of the destination and
source strings are the first two operands, and both are in mode
`Pmode'. The number of bytes to move is the third operand, in
mode M.
The fourth operand is the known shared alignment of the source and
destination, in the form of a `const_int' rtx. Thus, if the
compiler knows that both source and destination are word-aligned,
it may provide the value 4 for this operand.
These patterns need not give special consideration to the
possibility that the source and destination strings might overlap.
`cmpstrM'
Block compare instruction, with five operands. Operand 0 is the
output; it has mode M. The remaining four operands are like the
operands of `movstrM'. The two memory blocks specified are
compared byte by byte in lexicographic order. The effect of the
instruction is to store a value in operand 0 whose sign indicates
the result of the comparison.
`floatMN2'
Convert signed integer operand 1 (valid for fixed point mode M) to
floating point mode N and store in operand 0 (which has mode N).
`floatunsMN2'
Convert unsigned integer operand 1 (valid for fixed point mode M)
to floating point mode N and store in operand 0 (which has mode
N).
`fixMN2'
Convert operand 1 (valid for floating point mode M) to fixed
point mode N as a signed number and store in operand 0 (which has
mode N). This instruction's result is defined only when the
value of operand 1 is an integer.
`fixunsMN2'
Convert operand 1 (valid for floating point mode M) to fixed
point mode N as an unsigned number and store in operand 0 (which
has mode N). This instruction's result is defined only when the
value of operand 1 is an integer.
`ftruncM2'
Convert operand 1 (valid for floating point mode M) to an integer
value, still represented in floating point mode M, and store it
in operand 0 (valid for floating point mode M).
`fix_truncMN2'
Like `fixMN2' but works for any floating point value of mode M by
converting the value to an integer.
`fixuns_truncMN2'
Like `fixunsMN2' but works for any floating point value of mode M
by converting the value to an integer.
`truncMN'
Truncate operand 1 (valid for mode M) to mode N and store in
operand 0 (which has mode N). Both modes must be fixed point or
both floating point.
`extendMN'
Sign-extend operand 1 (valid for mode M) to mode N and store in
operand 0 (which has mode N). Both modes must be fixed point or
both floating point.
`zero_extendMN'
Zero-extend operand 1 (valid for mode M) to mode N and store in
operand 0 (which has mode N). Both modes must be fixed point.
`extv'
Extract a bit field from operand 1 (a register or memory
operand), where operand 2 specifies the width in bits and operand
3 the starting bit, and store it in operand 0. Operand 0 must
have mode `word_mode'. Operand 1 may have mode `byte_mode' or
`word_mode'; often `word_mode' is allowed only for registers.
Operands 2 and 3 must be valid for `word_mode'.
The RTL generation pass generates this instruction only with
constants for operands 2 and 3.
The bit-field value is sign-extended to a full word integer
before it is stored in operand 0.
`extzv'
Like `extv' except that the bit-field value is zero-extended.
`insv'
Store operand 3 (which must be valid for `word_mode') into a bit
field in operand 0, where operand 1 specifies the width in bits
and operand 2 the starting bit. Operand 0 may have mode
`byte_mode' or `word_mode'; often `word_mode' is allowed only for
registers. Operands 1 and 2 must be valid for `word_mode'.
The RTL generation pass generates this instruction only with
constants for operands 1 and 2.
`sCOND'
Store zero or nonzero in the operand according to the condition
codes. Value stored is nonzero iff the condition COND is true.
COND is the name of a comparison operation expression code, such
as `eq', `lt' or `leu'.
You specify the mode that the operand must have when you write the
`match_operand' expression. The compiler automatically sees
which mode you have used and supplies an operand of that mode.
The value stored for a true condition must have 1 as its low bit,
or else must be negative. Otherwise the instruction is not
suitable and you should omit it from the machine description.
You describe to the compiler exactly which value is stored by
defining the macro `STORE_FLAG_VALUE' (*note Misc::.). If a
description cannot be found that can be used for all the `sCOND'
patterns, you should omit those operations from the machine
description.
These operations may fail, but should do so only in relatively
uncommon cases; if they would fail for common cases involving
integer comparisons, it is best to omit these patterns.
If these operations are omitted, the compiler will usually
generate code that copies the constant one to the target and
branches around an assignment of zero to the target. If this
code is more efficient than the potential instructions used for
the `sCOND' pattern followed by those required to convert the
result into a 1 or a zero in `SImode', you should omit the
`sCOND' operations from the machine description.
`bCOND'
Conditional branch instruction. Operand 0 is a `label_ref' that
refers to the label to jump to. Jump if the condition codes meet
condition COND.
Some machines do not follow the model assumed here where a
comparison instruction is followed by a conditional branch
instruction. In that case, the `cmpM' (and `tstM') patterns
should simply store the operands away and generate all the
required insns in a `define_expand' (*note Expander
Definitions::.) for the conditional branch operations. All calls
to expand `vCOND' patterns are immediately preceded by calls to
expand either a `cmpM' pattern or a `tstM' pattern.
Machines that use a pseudo register for the condition code value,
or where the mode used for the comparison depends on the
condition being tested, should also use the above mechanism.
*Note Jump Patterns::
The above discussion also applies to `sCOND' patterns.
`call'
Subroutine call instruction returning no value. Operand 0 is the
function to call; operand 1 is the number of bytes of arguments
pushed (in mode `SImode', except it is normally a `const_int');
operand 2 is the number of registers used as operands.
On most machines, operand 2 is not actually stored into the RTL
pattern. It is supplied for the sake of some RISC machines which
need to put this information into the assembler code; they can
put it in the RTL instead of operand 1.
Operand 0 should be a `mem' RTX whose address is the address of
the function. Note, however, that this address can be a
`symbol_ref' expression even if it would not be a legitimate
memory address on the target machine. If it is also not a valid
argument for a call instruction, the pattern for this operation
should be a `define_expand' (*note Expander Definitions::.) that
places the address into a register and uses that register in the
call instruction.
`call_value'
Subroutine call instruction returning a value. Operand 0 is the
hard register in which the value is returned. There are three
more operands, the same as the three operands of the `call'
instruction (but with numbers increased by one).
Subroutines that return `BLKmode' objects use the `call' insn.
`call_pop', `call_value_pop'
Similar to `call' and `call_value', except used if defined and if
`RETURN_POPS_ARGS' is non-zero. They should emit a `parallel'
that contains both the function call and a `set' to indicate the
adjustment made to the frame pointer.
For machines where `RETURN_POPS_ARGS' can be non-zero, the use of
these patterns increases the number of functions for which the
frame pointer can be eliminated, if desired.
`return'
Subroutine return instruction. This instruction pattern name
should be defined only if a single instruction can do all the
work of returning from a function.
Like the `movM' patterns, this pattern is also used after the RTL
generation phase. In this case it is to support machines where
multiple instructions are usually needed to return from a
function, but some class of functions only requires one
instruction to implement a return. Normally, the applicable
functions are those which do not need to save any registers or
allocate stack space.
For such machines, the condition specified in this pattern should
only be true when `reload_completed' is non-zero and the
function's epilogue would only be a single instruction. For
machines with register windows, the routine `leaf_function_p' may
be used to determine if a register window push is required.
Machines that have conditional return instructions should define
patterns such as
(define_insn ""
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
[(cc0) (const_int 0)])
(return)
(pc)))]
"CONDITION"
"...")
where CONDITION would normally be the same condition specified on
the named `return' pattern.
`nop'
No-op instruction. This instruction pattern name should always
be defined to output a no-op in assembler code. `(const_int 0)'
will do as an RTL pattern.
`indirect_jump'
An instruction to jump to an address which is operand zero. This
pattern name is mandatory on all machines.
`casesi'
Instruction to jump through a dispatch table, including bounds
checking. This instruction takes five operands:
1. The index to dispatch on, which has mode `SImode'.
2. The lower bound for indices in the table, an integer
constant.
3. The total range of indices in the table--the largest index
minus the smallest one (both inclusive).
4. A label that precedes the table itself.
5. A label to jump to if the index has a value outside the
bounds. (If the machine-description macro
`CASE_DROPS_THROUGH' is defined, then an out-of-bounds index
drops through to the code following the jump table instead
of jumping to this label. In that case, this label is not
actually used by the `casesi' instruction, but it is always
provided as an operand.)
The table is a `addr_vec' or `addr_diff_vec' inside of a
`jump_insn'. The number of elements in the table is one plus the
difference between the upper bound and the lower bound.
`tablejump'
Instruction to jump to a variable address. This is a low-level
capability which can be used to implement a dispatch table when
there is no `casesi' pattern.
This pattern requires two operands: the address or offset, and a
label which should immediately precede the jump table. If the
macro `CASE_VECTOR_PC_RELATIVE' is defined then the first operand
is an offset which counts from the address of the table;
otherwise, it is an absolute address to jump to.
The `tablejump' insn is always the last insn before the jump
table it uses. Its assembler code normally has no need to use the
second operand, but you should incorporate it in the RTL pattern
so that the jump optimizer will not delete the table as
unreachable code.
File: gcc.info, Node: Pattern Ordering, Next: Dependent Patterns, Prev: Standard Names, Up: Machine Desc
When the Order of Patterns Matters
==================================
Sometimes an insn can match more than one instruction pattern.
Then the pattern that appears first in the machine description is the
one used. Therefore, more specific patterns (patterns that will match
fewer things) and faster instructions (those that will produce better
code when they do match) should usually go first in the description.
In some cases the effect of ordering the patterns can be used to
hide a pattern when it is not valid. For example, the 68000 has an
instruction for converting a fullword to floating point and another
for converting a byte to floating point. An instruction converting an
integer to floating point could match either one. We put the pattern
to convert the fullword first to make sure that one will be used
rather than the other. (Otherwise a large integer might be generated
as a single-byte immediate quantity, which would not work.) Instead of
using this pattern ordering it would be possible to make the pattern
for convert-a-byte smart enough to deal properly with any constant
value.
File: gcc.info, Node: Dependent Patterns, Next: Jump Patterns, Prev: Pattern Ordering, Up: Machine Desc
Interdependence of Patterns
===========================
Every machine description must have a named pattern for each of the
conditional branch names `bCOND'. The recognition template must
always have the form
(set (pc)
(if_then_else (COND (cc0) (const_int 0))
(label_ref (match_operand 0 "" ""))
(pc)))
In addition, every machine description must have an anonymous pattern
for each of the possible reverse-conditional branches. Their templates
look like
(set (pc)
(if_then_else (COND (cc0) (const_int 0))
(pc)
(label_ref (match_operand 0 "" ""))))
They are necessary because jump optimization can turn
direct-conditional branches into reverse-conditional branches.
It is often convenient to use the `match_operator' construct to
reduce the number of patterns that must be specified for branches. For
example,
(define_insn ""
[(set (pc)
(if_then_else (match_operator 0 "comparison_operator"
[(cc0) (const_int 0)])
(pc)
(label_ref (match_operand 1 "" ""))))]
"CONDITION"
"...")
In some cases machines support instructions identical except for the
machine mode of one or more operands. For example, there may be
"sign-extend halfword" and "sign-extend byte" instructions whose
patterns are
(set (match_operand:SI 0 ...)
(extend:SI (match_operand:HI 1 ...)))
(set (match_operand:SI 0 ...)
(extend:SI (match_operand:QI 1 ...)))
Constant integers do not specify a machine mode, so an instruction to
extend a constant value could match either pattern. The pattern it
actually will match is the one that appears first in the file. For
correct results, this must be the one for the widest possible mode
(`HImode', here). If the pattern matches the `QImode' instruction,
the results will be incorrect if the constant value does not actually
fit that mode.
Such instructions to extend constants are rarely generated because
they are optimized away, but they do occasionally happen in
nonoptimized compilations.
If a constraint in a pattern allows a constant, the reload pass may
replace a register with a constant permitted by the constraint in some
cases. Similarly for memory references. You must ensure that the
predicate permits all objects allowed by the constraints to prevent the
compiler from crashing.
Because of this substitution, you should not provide separate
patterns for increment and decrement instructions. Instead, they
should be generated from the same pattern that supports
register-register add insns by examining the operands and generating
the appropriate machine instruction.
File: gcc.info, Node: Jump Patterns, Next: Insn Canonicalizations, Prev: Dependent Patterns, Up: Machine Desc
Defining Jump Instruction Patterns
==================================
For most machines, GNU CC assumes that the machine has a condition
code. A comparison insn sets the condition code, recording the
results of both signed and unsigned comparison of the given operands.
A separate branch insn tests the condition code and branches or not
according its value. The branch insns come in distinct signed and
unsigned flavors. Many common machines, such as the Vax, the 68000
and the 32000, work this way.
Some machines have distinct signed and unsigned compare
instructions, and only one set of conditional branch instructions.
The easiest way to handle these machines is to treat them just like
the others until the final stage where assembly code is written. At
this time, when outputting code for the compare instruction, peek
ahead at the following branch using `next_cc0_user (insn)'. (The
variable `insn' refers to the insn being output, in the output-writing
code in an instruction pattern.) If the RTL says that is an unsigned
branch, output an unsigned compare; otherwise output a signed compare.
When the branch itself is output, you can treat signed and unsigned
branches identically.
The reason you can do this is that GNU CC always generates a pair of
consecutive RTL insns, possibly separated by `note' insns, one to set
the condition code and one to test it, and keeps the pair inviolate
until the end.
To go with this technique, you must define the machine-description
macro `NOTICE_UPDATE_CC' to do `CC_STATUS_INIT'; in other words, no
compare instruction is superfluous.
Some machines have compare-and-branch instructions and no condition
code. A similar technique works for them. When it is time to
"output" a compare instruction, record its operands in two static
variables. When outputting the branch-on-condition-code instruction
that follows, actually output a compare-and-branch instruction that
uses the remembered operands.
It also works to define patterns for compare-and-branch
instructions. In optimizing compilation, the pair of compare and
branch instructions will be combined according to these patterns. But
this does not happen if optimization is not requested. So you must
use one of the solutions above in addition to any special patterns you
define.
In many RISC machines, most instructions do not affect the condition
code and there may not even be a separate condition code register. On
these machines, the restriction that the definition and use of the
condition code be adjacent insns is not necessary and can prevent
important optimizations. For example, on the IBM RS/6000, there is a
delay for taken branches unless the condition code register is set
three instructions earlier than the conditional branch. The
instruction scheduler cannot perform this optimization if it is not
permitted to separate the definition and use of the condition code
register.
On these machines, do not use `(cc0)', but instead use a register
to represent the condition code. If there is a specific condition code
register in the machine, use a hard register. If the condition code or
comparison result can be placed in any general register, or if there
are multiple condition registers, use a pseudo register.
On some machines, the type of branch instruction generated may
depend on the way the condition code was produced; for example, on the
68k and Sparc, setting the condition code directly from an add or
subtract instruction does not clear the overflow bit the way that a
test instruction does, so a different branch instruction must be used
for some conditional branches. For machines that use `(cc0)', the set
and use of the condition code must be adjacent (separated only by
`note' insns) allowing flags in `cc_status' to be used. (*Note
Condition Code::.) Also, the comparison and branch insns can be
located from each other by using the functions `prev_cc0_setter' and
`next_cc0_user'.
However, this is not true on machines that do not use `(cc0)'. On
those machines, no assumptions can be made about the adjacency of the
compare and branch insns and the above methods cannot be used.
Instead, we use the machine mode of the condition code register to
record different formats of the condition code register.
Registers used to store the condition code value should have a mode
that is in class `MODE_CC'. Normally, it will be `CCmode'. If
additional modes are required (as for the add example mentioned above
in the Sparc), define the macro `EXTRA_CC_MODES' to list the
additional modes required (*note Condition Code::.). Also define
`EXTRA_CC_NAMES' to list the names of those modes and `SELECT_CC_MODE'
to choose a mode given an operand of a compare.
If it is known during RTL generation that a different mode will be
required (for example, if the machine has separate compare instructions
for signed and unsigned quantities, like most IBM processors), they can
be specified at that time.
If the cases that require different modes would be made by
instruction combination, the macro `SELECT_CC_MODE' determines which
machine mode should be used for the comparison result. The patterns
should be written using that mode. To support the case of the add on
the Sparc discussed above, we have the pattern
(define_insn ""
[(set (reg:CC_NOOV 0)
(compare:CC_NOOV (plus:SI (match_operand:SI 0 "register_operand" "%r")
(match_operand:SI 1 "arith_operand" "rI"))
(const_int 0)))]
""
"...")
The `SELECT_CC_MODE' macro on the Sparc returns `CC_NOOVmode' for
comparisons whose argument is a `plus'.
File: gcc.info, Node: Insn Canonicalizations, Next: Peephole Definitions, Prev: Jump Patterns, Up: Machine Desc
Canonicalization of Instructions
================================
There are often cases where multiple RTL expressions could
represent an operation peformed by a single machine instruction. This
situation is most commonly encountered with logical, branch, and
multiply-accumulate instructions. In such cases, the compiler
attempts to convert these multiple RTL expressions into a single
canonical form to reduce the number of insn patterns required.
In addition to algebraic simplifications, following
canonicalizations are performed:
* For commutative and comparison operators, a constant is always
made the second operand. If a machine only supports a constant
as the second operand, only patterns that match a constant in the
second operand need be supplied.
For these operators, if only one operand is a `neg', `not',
`mult', `plus', or `minus' expression, it will be the first
operand.
* For the `compare' operator, a constant is always the second
operand on machines where `cc0' is used (*note Jump Patterns::.).
On other machines, there are rare cases where the compiler might
want to construct a `compare' with a constant as the first
operand. However, these cases are not common enough for it to be
worthwhile to provide a pattern matching a constant as the first
operand unless the machine actually has such an instruction.
An operand of `neg', `not', `mult', `plus', or `minus' is made
the first operand under the same conditions as above.
* `(minus X (const_int N))' is converted to `(plus X (const_int
-N))'.
* Within address computations (i.e., inside `mem'), a left shift is
converted into the appropriate multiplication by a power of two.
De`Morgan's Law is used to move bitwise negation inside a bitwise
logical-and or logical-or operation. If this results in only one
operand being a `not' expression, it will be the first one.
A machine that has an instruction that performs a bitwise
logical-and of one operand with the bitwise negation of the other
should specify the pattern for that instruction as
(define_insn ""
[(set (match_operand:M 0 ...)
(and:M (not:M (match_operand:M 1 ...))
(match_operand:M 2 ...)))]
"..."
"...")
Similarly, a pattern for a "NAND" instruction should be written
(define_insn ""
[(set (match_operand:M 0 ...)
(ior:M (not:M (match_operand:M 1 ...))
(not:M (match_operand:M 2 ...))))]
"..."
"...")
In both cases, it is not necessary to include patterns for the
many logically equivalent RTL expressions.
* The only possible RTL expressions involving both bitwise
exclusive-or and bitwise negation are `(xor:M X) Y)' and `(not:M
(xor:M X Y))'.
* The sum of three items, one of which is a constant, will only
appear in the form
(plus:M (plus:M X Y) CONSTANT)
* On machines that do not use `cc0', `(compare X (const_int 0))'
will be converted to X.
* Equality comparisons of a group of bits (usually a single bit)
with zero will be written using `zero_extract' rather than the
equivalent `and' or `sign_extract' operations.
File: gcc.info, Node: Peephole Definitions, Next: Expander Definitions, Prev: Insn Canonicalizations, Up: Machine Desc
Defining Machine-Specific Peephole Optimizers
=============================================
In addition to instruction patterns the `md' file may contain
definitions of machine-specific peephole optimizations.
The combiner does not notice certain peephole optimizations when
the data flow in the program does not suggest that it should try them.
For example, sometimes two consecutive insns related in purpose can
be combined even though the second one does not appear to use a
register computed in the first one. A machine-specific peephole
optimizer can detect such opportunities.
A definition looks like this:
(define_peephole
[INSN-PATTERN-1
INSN-PATTERN-2
...]
"CONDITION"
"TEMPLATE"
"OPTIONAL INSN-ATTRIBUTES")
The last string operand may be omitted if you are not using any
machine-specific information in this machine description. If present,
it must obey the same rules as in a `define_insn'.
In this skeleton, INSN-PATTERN-1 and so on are patterns to match
consecutive insns. The optimization applies to a sequence of insns
when INSN-PATTERN-1 matches the first one, INSN-PATTERN-2 matches the
next, and so on.
Each of the insns matched by a peephole must also match a
`define_insn'. Peepholes are checked only at the last stage just
before code generation, and only optionally. Therefore, any insn which
would match a peephole but no `define_insn' will cause a crash in code
generation in an unoptimized compilation, or at various optimization
stages.
The operands of the insns are matched with `match_operands',
`match_operator', and `match_dup', as usual. What is not usual is
that the operand numbers apply to all the insn patterns in the
definition. So, you can check for identical operands in two insns by
using `match_operand' in one insn and `match_dup' in the other.
The operand constraints used in `match_operand' patterns do not have
any direct effect on the applicability of the peephole, but they will
be validated afterward, so make sure your constraints are general
enough to apply whenever the peephole matches. If the peephole matches
but the constraints are not satisfied, the compiler will crash.
It is safe to omit constraints in all the operands of the peephole;
or you can write constraints which serve as a double-check on the
criteria previously tested.
Once a sequence of insns matches the patterns, the CONDITION is
checked. This is a C expression which makes the final decision
whether to perform the optimization (we do so if the expression is
nonzero). If CONDITION is omitted (in other words, the string is
empty) then the optimization is applied to every sequence of insns
that matches the patterns.
The defined peephole optimizations are applied after register
allocation is complete. Therefore, the peephole definition can check
which operands have ended up in which kinds of registers, just by
looking at the operands.
The way to refer to the operands in CONDITION is to write
`operands[I]' for operand number I (as matched by `(match_operand I
...)'). Use the variable `insn' to refer to the last of the insns
being matched; use `prev_nonnote_insn' to find the preceding insns.
When optimizing computations with intermediate results, you can use
CONDITION to match only when the intermediate results are not used
elsewhere. Use the C expression `dead_or_set_p (INSN, OP)', where
INSN is the insn in which you expect the value to be used for the last
time (from the value of `insn', together with use of
`prev_nonnote_insn'), and OP is the intermediate value (from
`operands[I]').
Applying the optimization means replacing the sequence of insns
with one new insn. The TEMPLATE controls ultimate output of assembler
code for this combined insn. It works exactly like the template of a
`define_insn'. Operand numbers in this template are the same ones
used in matching the original sequence of insns.
The result of a defined peephole optimizer does not need to match
any of the insn patterns in the machine description; it does not even
have an opportunity to match them. The peephole optimizer definition
itself serves as the insn pattern to control how the insn is output.
Defined peephole optimizers are run as assembler code is being
output, so the insns they produce are never combined or rearranged in
any way.
Here is an example, taken from the 68000 machine description:
(define_peephole
[(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4)))
(set (match_operand:DF 0 "register_operand" "f")
(match_operand:DF 1 "register_operand" "ad"))]
"FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])"
"*
{
rtx xoperands[2];
xoperands[1] = gen_rtx (REG, SImode, REGNO (operands[1]) + 1);
#ifdef MOTOROLA
output_asm_insn (\"move.l %1,(sp)\", xoperands);
output_asm_insn (\"move.l %1,-(sp)\", operands);
return \"fmove.d (sp)+,%0\";
#else
output_asm_insn (\"movel %1,sp@\", xoperands);
output_asm_insn (\"movel %1,sp@-\", operands);
return \"fmoved sp@+,%0\";
#endif
}
")
The effect of this optimization is to change
jbsr _foobar
addql #4,sp
movel d1,sp@-
movel d0,sp@-
fmoved sp@+,fp0
into
jbsr _foobar
movel d1,sp@
movel d0,sp@-
fmoved sp@+,fp0
INSN-PATTERN-1 and so on look *almost* like the second operand of
`define_insn'. There is one important difference: the second operand
of `define_insn' consists of one or more RTX's enclosed in square
brackets. Usually, there is only one: then the same action can be
written as an element of a `define_peephole'. But when there are
multiple actions in a `define_insn', they are implicitly enclosed in a
`parallel'. Then you must explicitly write the `parallel', and the
square brackets within it, in the `define_peephole'. Thus, if an insn
pattern looks like this,
(define_insn "divmodsi4"
[(set (match_operand:SI 0 "general_operand" "=d")
(div:SI (match_operand:SI 1 "general_operand" "0")
(match_operand:SI 2 "general_operand" "dmsK")))
(set (match_operand:SI 3 "general_operand" "=d")
(mod:SI (match_dup 1) (match_dup 2)))]
"TARGET_68020"
"divsl%.l %2,%3:%0")
then the way to mention this insn in a peephole is as follows:
(define_peephole
[...
(parallel
[(set (match_operand:SI 0 "general_operand" "=d")
(div:SI (match_operand:SI 1 "general_operand" "0")
(match_operand:SI 2 "general_operand" "dmsK")))
(set (match_operand:SI 3 "general_operand" "=d")
(mod:SI (match_dup 1) (match_dup 2)))])
...]
...)
File: gcc.info, Node: Expander Definitions, Next: Insn Splitting, Prev: Peephole Definitions, Up: Machine Desc
Defining RTL Sequences for Code Generation
==========================================
On some target machines, some standard pattern names for RTL
generation cannot be handled with single insn, but a sequence of RTL
insns can represent them. For these target machines, you can write a
`define_expand' to specify how to generate the sequence of RTL.
A `define_expand' is an RTL expression that looks almost like a
`define_insn'; but, unlike the latter, a `define_expand' is used only
for RTL generation and it can produce more than one RTL insn.
A `define_expand' RTX has four operands:
* The name. Each `define_expand' must have a name, since the only
use for it is to refer to it by name.
* The RTL template. This is just like the RTL template for a
`define_peephole' in that it is a vector of RTL expressions each
being one insn.
* The condition, a string containing a C expression. This
expression is used to express how the availability of this
pattern depends on subclasses of target machine, selected by
command-line options when GNU CC is run. This is just like the
condition of a `define_insn' that has a standard name.
* The preparation statements, a string containing zero or more C
statements which are to be executed before RTL code is generated
from the RTL template.
Usually these statements prepare temporary registers for use as
internal operands in the RTL template, but they can also generate
RTL insns directly by calling routines such as `emit_insn', etc.
Any such insns precede the ones that come from the RTL template.
Every RTL insn emitted by a `define_expand' must match some
`define_insn' in the machine description. Otherwise, the compiler
will crash when trying to generate code for the insn or trying to
optimize it.
The RTL template, in addition to controlling generation of RTL
insns, also describes the operands that need to be specified when this
pattern is used. In particular, it gives a predicate for each operand.
A true operand, which needs to be specified in order to generate
RTL from the pattern, should be described with a `match_operand' in
its first occurrence in the RTL template. This enters information on
the operand's predicate into the tables that record such things. GNU
CC uses the information to preload the operand into a register if that
is required for valid RTL code. If the operand is referred to more
than once, subsequent references should use `match_dup'.
The RTL template may also refer to internal "operands" which are
temporary registers or labels used only within the sequence made by the
`define_expand'. Internal operands are substituted into the RTL
template with `match_dup', never with `match_operand'. The values of
the internal operands are not passed in as arguments by the compiler
when it requests use of this pattern. Instead, they are computed
within the pattern, in the preparation statements. These statements
compute the values and store them into the appropriate elements of
`operands' so that `match_dup' can find them.
There are two special macros defined for use in the preparation
statements: `DONE' and `FAIL'. Use them with a following semicolon,
as a statement.
`DONE'
Use the `DONE' macro to end RTL generation for the pattern. The
only RTL insns resulting from the pattern on this occasion will be
those already emitted by explicit calls to `emit_insn' within the
preparation statements; the RTL template will not be generated.
`FAIL'
Make the pattern fail on this occasion. When a pattern fails, it
means that the pattern was not truly available. The calling
routines in the compiler will try other strategies for code
generation using other patterns.
Failure is currently supported only for binary (addition,
multiplication, shifting, etc.) and bitfield (`extv', `extzv',
and `insv') operations.
Here is an example, the definition of left-shift for the SPUR chip:
(define_expand "ashlsi3"
[(set (match_operand:SI 0 "register_operand" "")
(ashift:SI
(match_operand:SI 1 "register_operand" "")
(match_operand:SI 2 "nonmemory_operand" "")))]
""
"
{
if (GET_CODE (operands[2]) != CONST_INT
|| (unsigned) INTVAL (operands[2]) > 3)
FAIL;
}")
This example uses `define_expand' so that it can generate an RTL insn
for shifting when the shift-count is in the supported range of 0 to 3
but fail in other cases where machine insns aren't available. When it
fails, the compiler tries another strategy using different patterns
(such as, a library call).
If the compiler were able to handle nontrivial condition-strings in
patterns with names, then it would be possible to use a `define_insn'
in that case. Here is another case (zero-extension on the 68000)
which makes more use of the power of `define_expand':
(define_expand "zero_extendhisi2"
[(set (match_operand:SI 0 "general_operand" "")
(const_int 0))
(set (strict_low_part
(subreg:HI
(match_dup 0)
0))
(match_operand:HI 1 "general_operand" ""))]
""
"operands[1] = make_safe_from (operands[1], operands[0]);")
Here two RTL insns are generated, one to clear the entire output
operand and the other to copy the input operand into its low half.
This sequence is incorrect if the input operand refers to [the old
value of] the output operand, so the preparation statement makes sure
this isn't so. The function `make_safe_from' copies the `operands[1]'
into a temporary register if it refers to `operands[0]'. It does this
by emitting another RTL insn.
Finally, a third example shows the use of an internal operand.
Zero-extension on the SPUR chip is done by `and'-ing the result
against a halfword mask. But this mask cannot be represented by a
`const_int' because the constant value is too large to be legitimate
on this machine. So it must be copied into a register with
`force_reg' and then the register used in the `and'.
(define_expand "zero_extendhisi2"
[(set (match_operand:SI 0 "register_operand" "")
(and:SI (subreg:SI
(match_operand:HI 1 "register_operand" "")
0)
(match_dup 2)))]
""
"operands[2]
= force_reg (SImode, gen_rtx (CONST_INT,
VOIDmode, 65535)); ")
*Note:* If the `define_expand' is used to serve a standard binary
or unary arithmetic operation or a bitfield operation, then the last
insn it generates must not be a `code_label', `barrier' or `note'. It
must be an `insn', `jump_insn' or `call_insn'. If you don't need a
real insn at the end, emit an insn to copy the result of the operation
into itself. Such an insn will generate no code, but it can avoid
problems in the compiler.