RISC DISC 1

home *** CD-ROM | disk | FTP | other *** search

/ RISC DISC 1 / RISC_DISC_1.iso / usefulinfo / text / arm600 < prev next >

Wrap

Text File | 1994-10-15 | 21.4 KB | 524 lines

ARM600 : Risc Goes OOP Apple's investment in Advanced RISC Machines was seen as a purely financial investment, but the ARM600 object orientated processor is making pundits reassess the move. The Acorn RISC Machine, one of the world's first (if not the first) commercially available 32-bit RISC processors, designed by the Cambridge, U.K., computer firm Acorn has gone through three major revisions over the years. With the ARM2 and the ARM3 still the engines for Acorn's Archimedes range of home and educational computers and its UNIX workstations. In addition, VLSI Technology, the U.S. silicon foundry that fabricated the ARM for Acorn, as been selling the same RISC chip as an embedded controller for drives and graphics accelerators. Earlier in 1991, Acorn intrigued industry observers by `spinning off' the ownership and development of the ARM architecture into a separate company, called Advanced RISC Machines, which is jointly owned by Acorn/Olivetti, VLSI Technology, and, of all people, Apple. Of all people because Apple had until then been seen wholly committed to the Motorola 680x0 chip family. Apple denied at the time that its ARM involvement was anything but a bit of blue-sky research, but then, only a few months later came the bombshell: Apple and IBM (of all people) had forged a strategic alliance to develop a future range of computers based on a new object-orientated operating system, and not based on Motorola 680x0 processors The ARM600, which features low power consumption, suitable for battery portables, and a highly innovative on- chip memory management unit (MMU) to provide hardware support for an object-orientated operating system. Interested readers are referred to an article by Apple's Larry Tesler in the September 1991 issue of Scientific American. The ARM6 Macro Cell Advanced RISC Machines, based in Swaffham Bulbeck, outside Cambridge, inherited most of the team that designed the original ARM chips, the notable exception being Steve Furber, who is now a professor at Manchester University but maintains productive links with the company. The company's strategic aim is to design, thought not manufacture, custom processors and application-specific integrated circuits (ASICs) based on its RISC CPU, with a particular focus on the embedded-controller market. The first step in realising ARM600 - RISC Goes OOP page 1 this strategy was to fine-tune the original ARM architecture into what is now called the ARM6 (the last Acorn-originated revision was the ARM3). The ARM6 is not a chip, but a macrocell, stored in the Compass CAD system (supplied by VLSI Technology) that Advanced RISC Machines uses to design its chips. This means that the ARM6 is a standard VLSI circuit layout that can be called up and incorporated into larger chip designs, just as you might include a source file in a Pascal program -- an off the peg CPU subunit, if you like. The simplest chip incorporating the ARM6 will be called the ARM60, which is just an ARM6 cell with pads added to join it to the outside world. The ARM60 is compatible with and functionally equivalent to the old ARM2 chip: it lacks the on-chip processor cache of the ARM3. Advanced RISC Machines intends to produce a family of chips based around the ARM6 macrocell core, with other components added as required. For example, the ARM600 is an ARM6 cell, a processor cache, and an MMU. Other products will be customised ASICs, which might contain an ARM6 core and, say, D/A convertors or serial communications drivers on a single chip. What makes this strategy viable is the simplicity and the resulting small die size of the original ARM architecture. In 1985, Acorn had in mind a 32-bit replacement for the defunct 6502 chip, the mainstay of the early personal computers from the Apple II to the Commodore Pet to the Acorn BBC Micro itself. The goal was high execution speed, fast interrupt response, and low silicon cost at the expense of fancy types of instructions. A purist interpretation of the RISC philosophy resulted in a chip with only 10 types of instructions. no microcode ROM, and only 25,000 transistors, at the time when Motorola was using 200,000 transistors in its 68020. Those original ARM chips, fabricated in 3-micron CMOS technology, would fit on a 7-millimetre die. Six years later, Advanced RISC Machines can have the ARM6 fabricated in 1-micron CMOS and have it occupy a silicon area of about 2.8mm square, which can be tucked away neatly in one corner of an average-size die, leaving plenty of room for custom circuitry. The ARM6 can be clocked at 20 to 25 MHz, compared to the 5 MHz of the original ARM chips, giving a performance of over 20 million instructions per second (MIPS). The ARM6 is designed to be fabricated using fully static CMOS devices so that you can slow down or even stop the clock during any phase of its operation and then restart it with no loss of data. When stopped, the processor consumes only a few microamperes due to some residual leakage in the transistors. A system designer can stop the ARM600 - RISC Goes OOP page 2 clock whenever the CPU is idling and thus reduce power consumption substantially, a feature that will make ARM6- based chips suitable for hand-held battery-powered portable computers. Even though the ARM6 is by no means the fastest RISC design nowadays, it offers substantially better MIPS- to-watt and MIPS-to-dollar ratios than its rivals. To prepare for the future, Advanced RISC Machines has enhanced the ARM6 architecture slightly while maintaining code compatibility with earlier ARM chips. The most significant difference between the original ARM chips and the ARM6 is that the ARM6 has a full 32-bit address bus, while the original ARM chips use only 26 bits. Two extra signal lines, called PROG32 and DATA32, have been added to switch between three operating configurations: 32-bit program and 32-bit data, 26-bit program and 32-bit data, and 26-bit program and 26-bit data (to run the old ARM programs). One consequence of longer addresses is that the flags can no longer be held in the first 6 bits of the program counter. A separate current-program-status register has been added, making 17 rather than 16 registers visible to the programmer. Another new signal, called BIGEND, toggles between `big-endian' and `little-endian' byte ordering -- the price of admission to a cosmopolitan club that includes Apple and IBM? Other subtle modifications improve the way the ARM6 works in virtual memory systems, correcting deficiencies in the old ARM chips. A virtual memory system is one in which disk space (or other external storage) is employed to extend the computer's apparent memory capacity. When the processor tries to read a word from a memory address that does not exist because that data is actually stored on disk, a hardware exception (usually called a page fault) occurs. This exception causes the processor to jump to a software service routine that reads a new page from disk into memory and then to return and repeat the interrupted instruction to read the now-present data. Unlike the original ARM chips, the ARM can handle exceptions (e.g., page faults) in the supervisor mode and the user mode. This is essential for running fault-driven virtual memory operating systems. Another refinement is only half implemented. Older ARM chips could accept a Data Abort signal caused by a page fault only if they received it during the first half of a memory cycle. This put rather stringent demands on cache timings, because a cache lookup has to be completed before you can look into external memory, discover that the data doesn't exist, and issue an Abort signal right up to the end of a memory cycle during single data transfers (LDR and STR ARM600 - RISC Goes OOP page 3 instructions), but in so doing, it forces the Abort service routine to perform extra cleanup operations. The ARM6 does not yet allow you to exploit this feature to relax cache timings, but future versions will. To prepare programmers for its coming, the ARM6 has a LATEABT signal line that, when active, simulates late Aborts. You can incorporate the extra cleanup code now so that your programs will execute correctly on the next generation of chips. Pulling LATEABT low makes the ARM6 compatible with earlier chips. The ARM600 Processor The ARM600 CPU is made up an ARM6 core surrounded by three extra on-chip functional units: the processor cache, the write buffer, and the MMU. Like the old ARM3 chip, the ARM600 has a 4Kb on-chip cache RAM that can hold data and instructions. The cache contains 256 lines of 4 words (16 bytes), organised into four blocks of 64 lines; that is, its a 64-way set-associative cache. As with the most modern RAM designs, this cache employs differential sense amplifiers to minimise its cycle time, but these analog devices consume a lot of power. To further reduce power consumption, the ARM600 chip switches off the sense amps after the first access during sequential cache accesses -- a nice detail. The write buffer provides a way to further improve to CPU throughput without forcing you to use expensive fast memory. The write-through cache writes into this on-chip buffer, which has room for two write operations of 8 words, rather than directly into external memory. The write buffer will complete the writes to external memory in its own time, and the RISC core is free to execute the next instruction. By far the most intriguing aspect of the ARM600 chip is the MMU, which provides a fairly conventional virtual memory controller but a radical scheme for partitioning memory on object-orientated lines. The MMU translates virtual addresses generated in the CPU into physical data addresses, and it also controls memory access permissions. The virtual memory scheme works through translation tables stored in physical memory, and these table entries are cached in an on-chip translation lookaside buffer (TLB). The virtual address space can be mapped either into 1Mb sections, which only requires a one-level table lookup, or in pages, which require a two-level lookup. The ARM600 MMU supports two page sizes: small pages of 4Kb or large pages of 64Kb. Large pages allow single table entries to map large data objects, which helps keep the translation tables small. ARM600 - RISC Goes OOP page 4 When the CPU, requests a memory access, the MMU's access-control logic first looks in the TLB to see if a translation for the virtual address exists among the 32 cached entries. If there is a TLB `hit' (i.e. the translation is already in the TLB, which should be the case for most accesses), the access-control logic checks whether the access is permitted. If it is, the physical addres is output immediately. If the TLB `misses', the MMU computes an index into the translation table, offset from an address held in the on- chip translation table base register. If this translation- table entry is for a section, it will contain the base address of the section, which is combined with an index contained in the virtual address to give the physical address. If the translation table entry is for a page, it contains the base address of another table, the page table, and a second lookup is required to get the physical address. In both cases, permission is checked before the access can proceed, and the TLB is updated with the resulting physical address, overwriting the existing entry. The ARM designers have incorporated a neat trick into the MMU to allow more efficient `table walking' (i.e. traversing chains of indirect pointers). The MMU traps all data accesses that are not aligned on word boundaries, raising a hardware alignment fault (this does not apply to instruction fetches or to byte-accessing instructions). Because table entries are always word aligned, the bottom 2 bits of a valid table address should always be 0. By using an address whose bottom 2 bits are not 0 to mark the end of a chain of pointers, you can detect the end of a table by the alignment fault it produces. This eliminates having to make a time-consuming check for the end of a table after each link traversal. Object-orientated Memory Even though the virtual memory functions of the ARM600 MMU is sophisticated but conventional, the access-permission function is something rather new. Permissions are mapped separately from virtual addresses, and they can be manipulated independently of them. Additionally, address faults (e.g. page faults) are handled separately from permission faults via different hardware signals. The MMU maps permissions using domains, each of which is a contiguous area of virtual memory. Domains are quite distinct from sections and pages, which are just unit in which a domain's virtual memory is managed. There are 16 such domains, and each one has a 2-bit field in the MMU's domain access control register to define its access type. These bits are used to classify programs ARM600 - RISC Goes OOP page 5 that use the domains as either clients (users of the domain) or managers (controllers of the domain). Clients always have their access permissions checked, and a domain fault is raised if they are not valid. Managers are not checked, and they can access their domain. A manager can define the permissions for its own domain and give different permissions to different clients. A client task may have a different set of permissions for each domain it uses, and this set of permissions is called the task's environment. This may all sound rather puzzling, adding complications to an already far-from-simple system. The purpose of permission mapping only starts to become clear when you think about the requirements of a truly object- orientated operating system. n an object-orientated programming system, all data is encapsulated in coherent memory objects, and these objects must be manipulated by particular programs, called the methods of the class to which the object belongs. Its is not difficult to see how hardware-enforced client permissions can be used to protect objects from access by methods other than their own. Domains would be used to distinguish types of object -- in effect, a hardware expression of classes. Many object-orientated programming researchers favour a secondary-storage scheme called a persistent object store. Objects become entities with their own life span, independent of the invocation of a particular application. When you are not using them, they live on disk, and when you are not using them, they live in memory. The transition between these two states is transparent. If you need to access an object that is not in memory, this secondary- storage scheme brings the object into memory without your having to issue a load command or type a filename. In fact, the concept of a file and the distinction between files and memory variables disappear altogether. Some PC application systems, such as Borland's VROOM, possess a persistent aspect (i.e., for code, not data), but real persistence should be a property of the operating system itself. The ARM600 MMU can offer hardware support for such a persistent object-orientated operating system by combining its virtual memory and permission-mapping roles. Such an operating system would be fault-driven: the raising and subsequent correction of address, permission, and domain faults is what drives your computation. There is more, however. In a persistent object- orientated operating system, garbage collection becomes an important issue. In any system that permits dynamic (i.e., run-time) allocation of memory, the heap fills up with objects that are no longer being used, and the system would grind to a halt when no new objects could be created (or ARM600 - RISC Goes OOP page 6 paged in) were it not for garbage collection. Traditional garbage-collection algorithms involve temporarily halting the system and scanning through memory to release space occupied by `dead' objects. You will be familiar with this annoying interruption if you've done serious programming in Lisp or Smalltalk. The latest algorithms advocate the use of concurrent garbage collection. A garbage-collection manager runs as a background task, reclaiming dead objects as soon as they become inaccessible (i.e., when nothing else in the system points to them). This is preferable in interactive, real- time systems with GUIs, where a sudden freezing of the system would be most disconcerting. One way of implementing concurrent garbage collection is to divide memory into a live region, where computation occurs, and a dead region, where garbage collection is continually proceeding. One technique is to make the live region a narrow window that moves cyclically through the memory space, leaving a dead region behind it. The problem is that a few (if the algorithm is correctly tuned) live objects are likely to be left stranded in the dead region, still pointed to be objects in the live region. Such stranded objects have illegal addresses and must not be accessed (or even referred to). A permission-mapping MMU like that in the ARM600 chip can prevent such accesses by raising a permission fault. To recover from such a fault, the operating system rescues the stranded object by copying it into the live region, where the requested access can proceed normally. The ARM600 chip's domain system suggests a separate, concurrent manager task to collect garbage in each domain. It makes possible the use of a second chip as a parallel garbage-collection processor. From the looks of things, you might wonder how any memory access ever succeeds. In practice, most accesses succeed immediately, from the TLB, and the system has been highly optimised to make the most likely cases the least time-consuming. The MMU can also reduce power consumption. Cycling the fast RAM of the TLB consumes most of the power. So, the address (as opposed to permission) segment of the TLB is turned on only in the case of a main cache miss. Boundary-Scan Testing The ARM600 chip (and all its future siblings) is noteworthy in another respect, because Advanced RISC Machines has adopted the IEEE's Joint Testy Action Group (JTAG) standard for testing circuits by boundary-scan methods. Modern VLSI chips are becoming too complex to test by the old method of prodding their pins with a logic probe. They have too many pins that you cannot reach when the chip ARM600 - RISC Goes OOP page 7 is inserted in a circuit board. Boundary-scanning involves designing additional components into a chip's layout, which will typically use about 5 percent of the total silicon area. Each pad on the silicon die (which will ultimately be connected to a pin on the chip package) is associated with a boundary-scan cell, whose main active component is a shift register. An outer circuit ring connects all these cells to four pins that make up the test access port. Using a special serial protocol, an engineer can send signals down the TAP and read out the state of the other pins or drive any pin to a desired state. The protocol allows the TAPs of several chips to be daisy chained, so you can test or drive the pins of a collection of chips installed in a circuit board. Boundary scan promises to revolutionise the field of circuit testing, debugging, and diagnostics. The JTAG protocol specifies that each chip using it much have a unique manufacturer and device ID implanted in it, which means that smart software will be able to identify, exercise, and diagnose board-level products in their normal working environment. So, the days of worrying about whose drive controller chip or universal asynchronous receiver/transmitter is used in your clone board may be numbered. Advanced RISC Machines is only one of the companies to adopt the standard, but it is hoped the whole industry will embrace it in the near future. The ARM600 chip demonstrates once again that U.K. companies can devise leading-edge technology, even if the finance has to come from Italy and the U.S. (the City of London has abandoned even the pretense of supporting high- tech ventures). If Apple builds the ARM600 chip into a product, it will be the first design win for a British CPU in the volume market. Compiled from BYTE (December 1991) magazine ARM600 - RISC Goes OOP page 8