home *** CD-ROM | disk | FTP | other *** search
-
- ARM600 : Risc Goes OOP
-
-
-
-
- Apple's investment in Advanced RISC Machines was seen as a
- purely financial investment, but the ARM600 object
- orientated processor is making pundits reassess
- the move.
-
-
-
- The Acorn RISC Machine, one of the world's first (if
- not the first) commercially available 32-bit RISC
- processors, designed by the Cambridge, U.K., computer firm
- Acorn has gone through three major revisions over the years.
- With the ARM2 and the ARM3 still the engines for Acorn's
- Archimedes range of home and educational computers and its
- UNIX workstations. In addition, VLSI Technology, the U.S.
- silicon foundry that fabricated the ARM for Acorn, as been
- selling the same RISC chip as an embedded controller for
- drives and graphics accelerators.
-
- Earlier in 1991, Acorn intrigued industry observers by
- `spinning off' the ownership and development of the ARM
- architecture into a separate company, called Advanced RISC
- Machines, which is jointly owned by Acorn/Olivetti, VLSI
- Technology, and, of all people, Apple. Of all people because
- Apple had until then been seen wholly committed to the
- Motorola 680x0 chip family. Apple denied at the time that
- its ARM involvement was anything but a bit of blue-sky
- research, but then, only a few months later came the
- bombshell: Apple and IBM (of all people) had forged a
- strategic alliance to develop a future range of computers
- based on a new object-orientated operating system, and not
- based on Motorola 680x0 processors
-
- The ARM600, which features low power consumption,
- suitable for battery portables, and a highly innovative on-
- chip memory management unit (MMU) to provide hardware
- support for an object-orientated operating system.
- Interested readers are referred to an article by Apple's
- Larry Tesler in the September 1991 issue of Scientific
- American.
-
- The ARM6 Macro Cell
-
- Advanced RISC Machines, based in Swaffham Bulbeck,
- outside Cambridge, inherited most of the team that designed
- the original ARM chips, the notable exception being Steve
- Furber, who is now a professor at Manchester University but
- maintains productive links with the company. The company's
- strategic aim is to design, thought not manufacture, custom
- processors and application-specific integrated circuits
- (ASICs) based on its RISC CPU, with a particular focus on
- the embedded-controller market. The first step in realising
-
-
-
- ARM600 - RISC Goes OOP page 1
-
-
-
-
-
-
-
-
- this strategy was to fine-tune the original ARM architecture
- into what is now called the ARM6 (the last Acorn-originated
- revision was the ARM3).
-
- The ARM6 is not a chip, but a macrocell, stored in the
- Compass CAD system (supplied by VLSI Technology) that
- Advanced RISC Machines uses to design its chips. This means
- that the ARM6 is a standard VLSI circuit layout that can be
- called up and incorporated into larger chip designs, just as
- you might include a source file in a Pascal program -- an
- off the peg CPU subunit, if you like. The simplest chip
- incorporating the ARM6 will be called the ARM60, which is
- just an ARM6 cell with pads added to join it to the outside
- world. The ARM60 is compatible with and functionally
- equivalent to the old ARM2 chip: it lacks the on-chip
- processor cache of the ARM3.
-
- Advanced RISC Machines intends to produce a family of
- chips based around the ARM6 macrocell core, with other
- components added as required. For example, the ARM600 is an
- ARM6 cell, a processor cache, and an MMU. Other products
- will be customised ASICs, which might contain an ARM6 core
- and, say, D/A convertors or serial communications drivers on
- a single chip.
-
- What makes this strategy viable is the simplicity and
- the resulting small die size of the original ARM
- architecture. In 1985, Acorn had in mind a 32-bit
- replacement for the defunct 6502 chip, the mainstay of the
- early personal computers from the Apple II to the Commodore
- Pet to the Acorn BBC Micro itself. The goal was high
- execution speed, fast interrupt response, and low silicon
- cost at the expense of fancy types of instructions. A purist
- interpretation of the RISC philosophy resulted in a chip
- with only 10 types of instructions. no microcode ROM, and
- only 25,000 transistors, at the time when Motorola was using
- 200,000 transistors in its 68020.
-
- Those original ARM chips, fabricated in 3-micron CMOS
- technology, would fit on a 7-millimetre die. Six years
- later, Advanced RISC Machines can have the ARM6 fabricated
- in 1-micron CMOS and have it occupy a silicon area of about
- 2.8mm square, which can be tucked away neatly in one corner
- of an average-size die, leaving plenty of room for custom
- circuitry. The ARM6 can be clocked at 20 to 25 MHz, compared
- to the 5 MHz of the original ARM chips, giving a performance
- of over 20 million instructions per second (MIPS).
-
- The ARM6 is designed to be fabricated using fully
- static CMOS devices so that you can slow down or even stop
- the clock during any phase of its operation and then restart
- it with no loss of data. When stopped, the processor
- consumes only a few microamperes due to some residual
- leakage in the transistors. A system designer can stop the
-
-
-
- ARM600 - RISC Goes OOP page 2
-
-
-
-
-
-
-
-
- clock whenever the CPU is idling and thus reduce power
- consumption substantially, a feature that will make ARM6-
- based chips suitable for hand-held battery-powered portable
- computers. Even though the ARM6 is by no means the fastest
- RISC design nowadays, it offers substantially better MIPS-
- to-watt and MIPS-to-dollar ratios than its rivals.
-
- To prepare for the future, Advanced RISC Machines has
- enhanced the ARM6 architecture slightly while maintaining
- code compatibility with earlier ARM chips. The most
- significant difference between the original ARM chips and
- the ARM6 is that the ARM6 has a full 32-bit address bus,
- while the original ARM chips use only 26 bits. Two extra
- signal lines, called PROG32 and DATA32, have been added to
- switch between three operating configurations: 32-bit
- program and 32-bit data, 26-bit program and 32-bit data, and
- 26-bit program and 26-bit data (to run the old ARM
- programs).
-
- One consequence of longer addresses is that the flags
- can no longer be held in the first 6 bits of the program
- counter. A separate current-program-status register has been
- added, making 17 rather than 16 registers visible to the
- programmer. Another new signal, called BIGEND, toggles
- between `big-endian' and `little-endian' byte ordering --
- the price of admission to a cosmopolitan club that includes
- Apple and IBM?
-
- Other subtle modifications improve the way the ARM6
- works in virtual memory systems, correcting deficiencies in
- the old ARM chips. A virtual memory system is one in which
- disk space (or other external storage) is employed to extend
- the computer's apparent memory capacity. When the processor
- tries to read a word from a memory address that does not
- exist because that data is actually stored on disk, a
- hardware exception (usually called a page fault) occurs.
- This exception causes the processor to jump to a software
- service routine that reads a new page from disk into memory
- and then to return and repeat the interrupted instruction to
- read the now-present data.
-
- Unlike the original ARM chips, the ARM can handle
- exceptions (e.g., page faults) in the supervisor mode and
- the user mode. This is essential for running fault-driven
- virtual memory operating systems.
-
- Another refinement is only half implemented. Older ARM
- chips could accept a Data Abort signal caused by a page
- fault only if they received it during the first half of a
- memory cycle. This put rather stringent demands on cache
- timings, because a cache lookup has to be completed before
- you can look into external memory, discover that the data
- doesn't exist, and issue an Abort signal right up to the end
- of a memory cycle during single data transfers (LDR and STR
-
-
-
- ARM600 - RISC Goes OOP page 3
-
-
-
-
-
-
-
-
- instructions), but in so doing, it forces the Abort service
- routine to perform extra cleanup operations. The ARM6 does
- not yet allow you to exploit this feature to relax cache
- timings, but future versions will.
-
- To prepare programmers for its coming, the ARM6 has a
- LATEABT signal line that, when active, simulates late
- Aborts. You can incorporate the extra cleanup code now so
- that your programs will execute correctly on the next
- generation of chips. Pulling LATEABT low makes the ARM6
- compatible with earlier chips.
-
- The ARM600 Processor
-
- The ARM600 CPU is made up an ARM6 core surrounded by
- three extra on-chip functional units: the processor cache,
- the write buffer, and the MMU. Like the old ARM3 chip, the
- ARM600 has a 4Kb on-chip cache RAM that can hold data and
- instructions. The cache contains 256 lines of 4 words (16
- bytes), organised into four blocks of 64 lines; that is, its
- a 64-way set-associative cache.
-
- As with the most modern RAM designs, this cache employs
- differential sense amplifiers to minimise its cycle time,
- but these analog devices consume a lot of power. To further
- reduce power consumption, the ARM600 chip switches off the
- sense amps after the first access during sequential cache
- accesses -- a nice detail.
-
- The write buffer provides a way to further improve to
- CPU throughput without forcing you to use expensive fast
- memory. The write-through cache writes into this on-chip
- buffer, which has room for two write operations of 8 words,
- rather than directly into external memory. The write buffer
- will complete the writes to external memory in its own time,
- and the RISC core is free to execute the next instruction.
-
- By far the most intriguing aspect of the ARM600 chip is
- the MMU, which provides a fairly conventional virtual memory
- controller but a radical scheme for partitioning memory on
- object-orientated lines. The MMU translates virtual
- addresses generated in the CPU into physical data addresses,
- and it also controls memory access permissions.
-
- The virtual memory scheme works through translation
- tables stored in physical memory, and these table entries
- are cached in an on-chip translation lookaside buffer (TLB).
- The virtual address space can be mapped either into 1Mb
- sections, which only requires a one-level table lookup, or
- in pages, which require a two-level lookup. The ARM600 MMU
- supports two page sizes: small pages of 4Kb or large pages
- of 64Kb. Large pages allow single table entries to map large
- data objects, which helps keep the translation tables small.
-
-
-
-
- ARM600 - RISC Goes OOP page 4
-
-
-
-
-
-
-
-
- When the CPU, requests a memory access, the MMU's
- access-control logic first looks in the TLB to see if a
- translation for the virtual address exists among the 32
- cached entries. If there is a TLB `hit' (i.e. the
- translation is already in the TLB, which should be the case
- for most accesses), the access-control logic checks whether
- the access is permitted. If it is, the physical addres is
- output immediately.
-
- If the TLB `misses', the MMU computes an index into the
- translation table, offset from an address held in the on-
- chip translation table base register. If this translation-
- table entry is for a section, it will contain the base
- address of the section, which is combined with an index
- contained in the virtual address to give the physical
- address. If the translation table entry is for a page, it
- contains the base address of another table, the page table,
- and a second lookup is required to get the physical address.
- In both cases, permission is checked before the access can
- proceed, and the TLB is updated with the resulting physical
- address, overwriting the existing entry.
-
- The ARM designers have incorporated a neat trick into
- the MMU to allow more efficient `table walking' (i.e.
- traversing chains of indirect pointers). The MMU traps all
- data accesses that are not aligned on word boundaries,
- raising a hardware alignment fault (this does not apply to
- instruction fetches or to byte-accessing instructions).
- Because table entries are always word aligned, the bottom 2
- bits of a valid table address should always be 0. By using
- an address whose bottom 2 bits are not 0 to mark the end of
- a chain of pointers, you can detect the end of a table by
- the alignment fault it produces. This eliminates having to
- make a time-consuming check for the end of a table after
- each link traversal.
-
- Object-orientated Memory
-
- Even though the virtual memory functions of the ARM600
- MMU is sophisticated but conventional, the access-permission
- function is something rather new. Permissions are mapped
- separately from virtual addresses, and they can be
- manipulated independently of them. Additionally, address
- faults (e.g. page faults) are handled separately from
- permission faults via different hardware signals.
-
- The MMU maps permissions using domains, each of which
- is a contiguous area of virtual memory. Domains are quite
- distinct from sections and pages, which are just unit in
- which a domain's virtual memory is managed.
-
- There are 16 such domains, and each one has a 2-bit
- field in the MMU's domain access control register to define
- its access type. These bits are used to classify programs
-
-
-
- ARM600 - RISC Goes OOP page 5
-
-
-
-
-
-
-
-
- that use the domains as either clients (users of the domain)
- or managers (controllers of the domain). Clients always have
- their access permissions checked, and a domain fault is
- raised if they are not valid. Managers are not checked, and
- they can access their domain. A manager can define the
- permissions for its own domain and give different
- permissions to different clients. A client task may have a
- different set of permissions for each domain it uses, and
- this set of permissions is called the task's environment.
-
- This may all sound rather puzzling, adding
- complications to an already far-from-simple system. The
- purpose of permission mapping only starts to become clear
- when you think about the requirements of a truly object-
- orientated operating system.
-
- n an object-orientated programming system, all data is
- encapsulated in coherent memory objects, and these objects
- must be manipulated by particular programs, called the
- methods of the class to which the object belongs. Its is not
- difficult to see how hardware-enforced client permissions
- can be used to protect objects from access by methods other
- than their own. Domains would be used to distinguish types
- of object -- in effect, a hardware expression of classes.
-
- Many object-orientated programming researchers favour a
- secondary-storage scheme called a persistent object store.
- Objects become entities with their own life span,
- independent of the invocation of a particular application.
- When you are not using them, they live on disk, and when you
- are not using them, they live in memory. The transition
- between these two states is transparent. If you need to
- access an object that is not in memory, this secondary-
- storage scheme brings the object into memory without your
- having to issue a load command or type a filename. In fact,
- the concept of a file and the distinction between files and
- memory variables disappear altogether.
-
- Some PC application systems, such as Borland's VROOM,
- possess a persistent aspect (i.e., for code, not data), but
- real persistence should be a property of the operating
- system itself. The ARM600 MMU can offer hardware support for
- such a persistent object-orientated operating system by
- combining its virtual memory and permission-mapping roles.
- Such an operating system would be fault-driven: the raising
- and subsequent correction of address, permission, and domain
- faults is what drives your computation.
-
- There is more, however. In a persistent object-
- orientated operating system, garbage collection becomes an
- important issue. In any system that permits dynamic (i.e.,
- run-time) allocation of memory, the heap fills up with
- objects that are no longer being used, and the system would
- grind to a halt when no new objects could be created (or
-
-
-
- ARM600 - RISC Goes OOP page 6
-
-
-
-
-
-
-
-
- paged in) were it not for garbage collection. Traditional
- garbage-collection algorithms involve temporarily halting
- the system and scanning through memory to release space
- occupied by `dead' objects. You will be familiar with this
- annoying interruption if you've done serious programming in
- Lisp or Smalltalk.
-
- The latest algorithms advocate the use of concurrent
- garbage collection. A garbage-collection manager runs as a
- background task, reclaiming dead objects as soon as they
- become inaccessible (i.e., when nothing else in the system
- points to them). This is preferable in interactive, real-
- time systems with GUIs, where a sudden freezing of the
- system would be most disconcerting.
-
- One way of implementing concurrent garbage collection
- is to divide memory into a live region, where computation
- occurs, and a dead region, where garbage collection is
- continually proceeding. One technique is to make the live
- region a narrow window that moves cyclically through the
- memory space, leaving a dead region behind it. The problem
- is that a few (if the algorithm is correctly tuned) live
- objects are likely to be left stranded in the dead region,
- still pointed to be objects in the live region. Such
- stranded objects have illegal addresses and must not be
- accessed (or even referred to).
-
- A permission-mapping MMU like that in the ARM600 chip
- can prevent such accesses by raising a permission fault. To
- recover from such a fault, the operating system rescues the
- stranded object by copying it into the live region, where
- the requested access can proceed normally. The ARM600 chip's
- domain system suggests a separate, concurrent manager task
- to collect garbage in each domain. It makes possible the use
- of a second chip as a parallel garbage-collection processor.
-
- From the looks of things, you might wonder how any
- memory access ever succeeds. In practice, most accesses
- succeed immediately, from the TLB, and the system has been
- highly optimised to make the most likely cases the least
- time-consuming. The MMU can also reduce power consumption.
- Cycling the fast RAM of the TLB consumes most of the power.
- So, the address (as opposed to permission) segment of the
- TLB is turned on only in the case of a main cache miss.
-
- Boundary-Scan Testing
-
- The ARM600 chip (and all its future siblings) is
- noteworthy in another respect, because Advanced RISC
- Machines has adopted the IEEE's Joint Testy Action Group
- (JTAG) standard for testing circuits by boundary-scan
- methods. Modern VLSI chips are becoming too complex to test
- by the old method of prodding their pins with a logic probe.
- They have too many pins that you cannot reach when the chip
-
-
-
- ARM600 - RISC Goes OOP page 7
-
-
-
-
-
-
-
-
- is inserted in a circuit board.
-
- Boundary-scanning involves designing additional
- components into a chip's layout, which will typically use
- about 5 percent of the total silicon area. Each pad on the
- silicon die (which will ultimately be connected to a pin on
- the chip package) is associated with a boundary-scan cell,
- whose main active component is a shift register. An outer
- circuit ring connects all these cells to four pins that make
- up the test access port. Using a special serial protocol, an
- engineer can send signals down the TAP and read out the
- state of the other pins or drive any pin to a desired state.
- The protocol allows the TAPs of several chips to be daisy
- chained, so you can test or drive the pins of a collection
- of chips installed in a circuit board. Boundary scan
- promises to revolutionise the field of circuit testing,
- debugging, and diagnostics.
-
- The JTAG protocol specifies that each chip using it
- much have a unique manufacturer and device ID implanted in
- it, which means that smart software will be able to
- identify, exercise, and diagnose board-level products in
- their normal working environment. So, the days of worrying
- about whose drive controller chip or universal asynchronous
- receiver/transmitter is used in your clone board may be
- numbered. Advanced RISC Machines is only one of the
- companies to adopt the standard, but it is hoped the whole
- industry will embrace it in the near future.
-
- The ARM600 chip demonstrates once again that U.K.
- companies can devise leading-edge technology, even if the
- finance has to come from Italy and the U.S. (the City of
- London has abandoned even the pretense of supporting high-
- tech ventures). If Apple builds the ARM600 chip into a
- product, it will be the first design win for a British CPU
- in the volume market.
-
-
- Compiled from BYTE (December 1991) magazine
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ARM600 - RISC Goes OOP page 8
-