Source Code 1992 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1992 March / Source_Code_CD-ROM_Walnut_Creek_March_1992.iso / usenet / altsrcs / 1 / 1229 / scbdoc

Wrap

Text File | 1990-12-28 | 25.8 KB | 686 lines

.nr LL 8i .ND September 13, 1985 .RP .TL A MULTIUSER BENCHMARK .br TO COMPARE .br UNIX SYSTEMS .AU Philip M. Mills .AI NCR Corporation 3325 Platt Springs Road West Columbia, SC 29169 phone 803 791 6377 .2C .SH Unix Benchmarks Inadequate .PP Despite all the evaluation of Unix systems using benchmarks, no readily available benchmark today provides an accurate estimate of a multiuser Unix system's overall processing effectiveness. Published articles showing benchmark results as well as the results from executing benchmarks, typically provide tables of data containing measurements of different parts of the system, leaving one with the job of interpreting the results as best they can. What one really wants to know is will a set of application programs execute faster on one Unix system compared to another Unix system. .PP The problem with the results provided by most benchmarks is that one cannot accurately infer which system can execute a set of application programs the best for the following reasons: .IP (1) The benchmark does not contain a workload that provided an adequate simulation of a Unix multiuser application environment. .IP (2) The benchmark ignores the system's ability to overlap operations. .IP (3) One is unable to correlate the time to perform simple operations with the system's overall ability to process work. .IP (4) The benchmark cannot evaluate Unix systems with multiple main processors. .PP To overcome the difficulties just described, a self-measuring portable multiuser multiprocessor unix benchmark has been developed. This benchmark provides: .IP (1) A reasonable approximation to a multiuser multiprocessor unix application environment. .IP (2) A single measure of the overall system's processing effectiveness. .IP (3) An easy way to compare the relative overall system's processing effectiveness between systems. .IP (4) A method of relating application requirements to the benchmark results. .SH Defining Performance .PP Whether buying computers or using them, one of the qualities of major importance is performance. Since performance may be inadvertently confused with other attributes of a computer system it is useful to describe performance as it applies to computers and other mechanisms. .PP Performance may be described as the result of carrying out a set of actions in a specific environment to satisfy an objective or goal. In track and field each event has a different environment and goal and the results may be running a distance in some amount of time or jumping so many feet. In performance the results of carrying out the prescribed set of actions is usually described in terms of measurable quantities. .PP There are two main results that are important to measure in a Unix based computer system: responsiveness and throughput. Responsiveness measures the system's ability to respond to a work request in a given amount of time. Throughput measures the total amount of work performed over a given period of time. The work on a computer system consists of the actions of electronic and electro-mechanical processing. Performance results are often measured by executing a given workload and measuring the time spent processing the work. .SH Functionality .PP A computer has three major areas of functionality that a workload should contain. The first is the processing by the central processing unit (CPU) where all the calculations are performed; the second is the disk subsystem where the computer stores data files; the third is serial I/O subsystem which controls such items as video displays, terminals and printers. .SH Overlapped Operations .PP In a multiuser Unix system there are many user processes or programs executing at the same time. This type of processing allows the I/O processing of some user programs to be overlapped with the CPU processing of other user programs. The result of overlapping the I/O operations with CPU operations depends on the amount of intelligence built into the I/O processors. .PP An intelligent serial I/O subsystem may handle interrupts from multiple on-line users, provide a slave serial I/O processor, handle buffering for both input and output and may even execute parts of the Unix kernel's tty code. Besides providing overlapped serial I/O operations for the terminals and printers with the main CPU processing, the intelligent serial I/O subsystem may also greatly reduce the amount of processing required by the main CPU. The reason for this reduction is that functions or processing that use to be done by the main CPU are now done by the serial I/O subsystem. .PP An intelligent disk subsystem will have one or more disk controllers and disk drives and may also have a slave processor. The disk subsystem handles the control of the disk drive operations, handles interrupts, handles buffering and provides a high speed DMA data transfer in and out of main memory. If a slave processor is available it may be used to execute some of the Unix kernel file code. Intelligent disk subsystems provide overlapped operations with the main CPU and also reduces the load on the main processor. In some disk subsystems, operations on separate disk drives can also be overlapped with each other producing a much higher effective transfer rate for the disk subsystem. .SH Overlapped Performance .PP The effects of overlapped operations are extremely important to computer system performance. This point is illustrated by the following example. Assume a workload is developed that contains 100 seconds of CPU processing, 100 seconds of disk file processing and 100 seconds of terminal I/O processing on computer system A and also on computer system B. System A has intelligent disk and tty subsystems which can overlap operations with the main CPU and each other and system A also supports the common Unix multiuser mode. Therefore system A can execute the workload in 100 seconds. Although system B has the same CPU power it cannot overlap I/O operations. System B will take 300 seconds of real time to execute the workload. For the workload described system A is three times more powerful than system B in that it will have a throughput rate three times that of system B. Another way of putting it is that for a fixed amount of time system A will accomplish three times as much work as system B. .PP Although the example was developed to illustrate a point, a typical unix workload contains a significant amount of disk and serial I/O and the overlapping of these operations by the hardware/software can make a substantial improvement in performance. .SH Application Benchmarks .PP A sound approach to evaluating the performance of various computer systems is to take the application or sets of application programs that make up the user environment and execute them on each computer system to be evaluated. It is most important that the set of application programs be truly representative of the overall user environment. This approach is often not feasible because the set of applications do not exist or it would take too much manpower to port the application programs to run on each computer system being evaluated. Also it may take too much time to execute a large set of applications on a number of computer systems. .SH Portable Benchmarks .PP For the reasons just given portable benchmarks that are easy to obtain and execute in a relatively short period of time are often used to simulate the application program environment. These benchmarks should measure the results needed in performance evaluation. The most important results that should be provided are measurements of throughput and responsiveness. .PP A well designed general purpose benchmark used to simulate the workload of the Unix application environment should have the following features: .IP (1) A broad weighted mix of basic CPU operations used with different data referencing modes. .IP (2) A number of user processes executing in a multiprocess mode. .IP (3) Terminal I/O taking place with a number of terminals at the same time. .IP (4) Disk I/O taking place with a number of files scattered over the disk drive surface to cause head movement. The benchmark should have the ability to use multiple disk drives. .IP (5) Results that can provide an accurate measure of the computer system's ability to execute work over time. The workload producing the results should contain all the processing operations on the system so the system's ability to overlap operations will be taken into account. .IP (6) Benchmark should be portable across various Unix systems. .SH Evaluating Existing Benchmarks .PP Before developing a new benchmark an evaluation of existing available benchmarks was performed. The benchmarks are of two types. The first type evaluates the time to perform a set of some simple basic operations such as add integer by placing each operation in a loop and measuring the time to complete the loop. The second type of benchmark consists of one or more programs that are designed to be appropriate for a benchmark. These programs are run and the time to execute the programs is measured. .PP The AIM Benchmark Suite II is a well known benchmark of the first type used to evaluate Unix computer system's performance (see Vol 11 No. 3 Page 89 for an example). Given the objectives of providing an accurate estimate of throughput or responsiveness through the simulation of a Unix multiuser application environment this benchmark did not meet our goals for the following reasons: .IP (1) The benchmark failed to include the basic operations for comparisons, logical, bit and branch operations which find significant use in the Unix environment. .IP (2) The benchmark uses mostly register variables. Where data references across the various basic operations, such as, automatic, static, pointer and array references are commonly used in the Unix environment. Array referencing is included as a separate test. .IP (3) The tty test only involves one or two terminals. The use of multiple tty lines all active at the same time represents the actual user environment and its need can best be illustrated by an example. Assume the tty peripheral controller can handle 3200 characters per second total on output and it is connected to 8 tty lines at 9600 baud. For a single user the tty controller can maintain 960 characters per second to that user terminal. However, for 8 heavily active users the tty controller can only maintain an average of only 400 characters per second per tty line. For certain workloads this limitation will affect the overall throughput of the computer system. .IP (4) The disk test consists of reading, writing and copying a single file. In a multiuser Unix environment a number of separate file disk I/O requests are made which are scattered randomly over the disk drive surface. This sequence of random disk I/O requests requires significant disk head movement. .IP (5) Unix computer systems are often purchased with multiple disk drives. It is therefore necessary to be able to include in the benchmark multiple disk requests at the same time for all the disk drives in the system. It is important to evaluate the system in this mode because it more accurately simulates the user environment. Many systems are able to overlap some disk I/O operations between drives as well as with the main CPU. The AIM benchmark did not provide this capability. .IP (6) There is no way to correlate the point system used by the AIM benchmark in a multiuser environment with throughput or responsiveness. The AIM point system cannot take into account the overlapped operations of tty and disk I/O and the reduction in the main CPU processing time due to the intelligent I/O processors. Even a simple floppy disk controller can overlap operations with the main CPU. .PP Although the AIM Benchmark Suite II was developed to execute in a single user mode it has been used extensively to evaluate multiuser Unix systems. For our purposes of evaluating Unix multiuser systems the AIM benchmark does not provide an adequate simulation of the Unix user environment, and second, does not provide an accurate measure of the Unix computer system's ability to process work. .PP The AIM benchmark does provide information that is quite useful in tuning unix systems. This information can best be used (and is) by organizations marketing Unix computer systems who are able to modify the hardware and unix software to improve performance. .PP The Dhrystone is another benchmark which provides a good mix of statement types, operators and data reference types but it does not provide any I/O operations which are needed for the multiple tty lines and multiple disk drives used by the Unix multiuser environment. .SH Throughput Vs. Responsiveness .PP In the development of the benchmark a decision had to be made to measure throughput or responsiveness. Although response time is a very important performance measure in an interactive Unix multiuser system it requires the use of a remote terminal emulator computer to provide tty inputs to obtain repeatable results. .PP Because responsiveness and throughput are both very much related to a computer system's ability to process work the relationship between the two was investigated. The results of this investigation showed that a 20% increase in throughput would provide a 20% or greater improvement in response time. How much greater than 20% the improvement in response time would be depends on the ratio of the time to process an interactive user request to the average user think time. If the think time goes to zero, throughput and responsiveness vary the same from system to system for a given workload. .PP In an interactive system like Unix the user think time tends to create idle time on the various system components. Given that the user is not a part of the computer system that is to be purchased, throughput is really the best measure of the computer system's ability to process work. For the reasons just stated the benchmark measures throughput as a result of executing a particular workload. .SH The System Characterization Benchmark .PP The purpose of the benchmark is to compare Unix systems on their ability to do work (throughput) using a simulated Unix multiuser environment with multiple tty lines, multiple disk drives and one or more main processors. For a fair comparison each computer system should execute the same workload with the same number of tty lines and the same number of disk drives. .PP The application requirements for a computer system will vary in each environment such as word processing, accounting, graphics, data base management, spread sheet, compilers and scientific computations. For this reason the benchmark (SCB) executes a number of different workloads each with a different mix of I/O operations. Because of the many different workloads used by the benchmark, it is more of a system characterization than a single benchmark. For this reason the benchmark is called the System Characterization Benchmark (SCB). .PP The SCB has the following characteristics: .IP (1) The SCB measures the rate (throughput) at which fixed units of work can be executed in a unit of time (minutes). .IP (2) The SCB takes into account the effects of all overlapped operations executed in parallel, including multiple main processor units, if they exist. .IP (3) The SCB simulates a multiuser Unix application environment with .RS .IP (a) A broad mix of main processor operations and data references .IP (b) Multiple tty lines with terminals .IP (c) Multiple disk drives each with its own set of data files .RE .IP (4) The SCB consists of 50 different runs to provide a system characterization that will cover the many different application environments. .IP (5) The SCB provides graphical output for a quick analysis and comparisons on the basis of throughput. .IP (6) The SCB provides information on the effective rates of the main central processing unit (CPU), the disk subsystem and the tty subsystem. .PP The first part of the SCB consists of executing three programs crun, drun and trun to produce the report shown in Figure 1 that shows the effective capabilities of the one or more main CPU , the disk subsystem and the tty subsystems respectively. .PP The purpose of crun is to obtain the relative processing power of the one or more main CPUs. The crun program also provides for multiprocessor systems an estimate of the ratio of the multiprocessor's throughput to a single processor's throughput. If there is only one CPU this ratio is one. .PP The crun code consists of a weighted mix of basic operations on various types of data references. The frequency of execution of the basic operations involving the different types of data references are weighted according to an internal unpublished compiler study of Unix based C code. Detail information on the mix of operations and data references is contained in the SCB documentation. The time to execute the crun code is measured and used to determine the user CPU time per work unit process. .PP In setting up the SCB each disk drive specified by the user has 200 files of 20K bytes each placed on it. In the execution of drun the user may specify the percent of files to be read and written on each disk drive. The disk program drun produces the information on the disk subsystem in the subsystem report by reading and writing file blocks randomly over the different disk drive surfaces specified. .PP The serial I/O subsystem is usually made up of one or more tty controllers where each tty controller handles a fixed number of tty lines/terminals. The tty controller typically can handle some maximum level of characters per second for tty output and input across all lines. The information in the subsystem report is obtained by writing 32 character lines to all tty ports as fast as the tty controllers can handle them. .PP The second part of the SCB consists of 50 separate runs controlled by a shell script. Each run is made up of 200 work units. The user processes execute 8 or some setable multiuser level at a time. Each work unit consists of the following components. .IP (1) The execution of a copy of xrun code with a CPU user time proportional to the system's crun time. .IP (2) A given number of 1K disk block reads and writes. .IP (3) A given number of 32 character line writes to a tty line/terminal. .PP Each work unit process in the run is identical. The disk and tty requests are equally spaced in the CPU user time to provide more consistent and repeatable results. Each work unit has its own unique disk read file and write file. The disk files are preassigned to each work unit at random. A work unit process has a unique tty line during its execution. On completion the tty line is given to the next work unit process started. .PP The 50 runs made by a shell script uses all the combinations of the following disk and tty levels: .IP disk blocks 0, 4, 8, 12, 16, 20, 24, 28, 32, 36 .IP tty lines 0, 30, 60, 90, 120 .PP The disk operations are divided evenly between reads and writes of existing files. The work units executed per minute are calculated for each of the 50 runs and the results are presented in a graphical form. .SH The Results Of The SCB Benchmark .PP The results of executing the SCB are presented in the graphical form shown in Figure 2. The left axis represents the number of work units that can be executed per minute (throughput). The bottom axis is the number of disk requests in a unit of work process. .PP The letters on the graph (A B C E F) represent the different levels of tty line writes contained in each work unit process. .LD Letter Number of tty line writes A 0 B 30 C 60 D 90 E 120 .DE .PP The graphical results are produced on a 130 column printer and the border on the graphical report is 8 1/2" x 11" so that it may be placed into reports when desired. .PP There are 50 points on a graph each specified by a letter. Each point represents the throughput rate in work units per minute. Each point is determined by executing 200 work unit processes each with a unit of CPU user processing and the number of disk and tty requests specified on the graph. .PP The SCB results show how a computer system performs under a variety of loads. On moving to the right on the graph, the effect of executing more disk requests is indicated by the steepness of the curve towards the bottom axis. The more limited the disk subsystem in capability the steeper the downward slope to the right. The different letters show the effect of increasing the tty load per work unit process. The more limited the tty subsystem in capability the greater the downward distance between the curves of the same letter. Point A next to the left axis shows the system's ability to execute CPU user processing with no I/O. .PP Facilities are also provided for comparing two different computer systems by taking the ratio of the throughputs of all 50 points and graphing the results shown in Figure 3. The left axis shows the ratio and the bottom axis and the letters (A B C E F) are the same as Figure 2. This graph shows in which area or areas (main CPU, tty, disk) one system is superior to the other. Ratios greater than one show an increase in capability in percent above one and ratios less than one show less capability in percent under one. .PP The ratio graph of Figure 3 shows machine A is superior in handling disk requests. Machine B has a cpu/compiler speed that is about five percent faster. The two machines are equivalent in handling tty write requests. .SH Obtaining And Running The SCB .PP The software for the system characterization benchmark (SCB) may be obtained free on IBM PC or NCR Tower compatible floppy from NCR by writing on company letterhead to: .ID SCB Distribution NCR Corporation 3325 Platt Springs Road West Columbia, SC 29169 Attention: Ms. Darlene Amick .DE .PP If you wish the software on nine track tape you need to send one with your request. The set up and operation is described in the help file on the media. Once a simple config file is set up, two shell scripts that are provided handle the rest. .SH Relating User Application To The SCB .PP Using the ratio graph system A can now be compared with system B across a broad mix of different workloads. However system A may be better than system B (ratio greater than one) over certain regions of the graph and worse than system B (ratio less than one) on other regions of the graph. For this reason users may wish to identify the region of the SCB graph for their typical applications. .PP To obtain information on the application, execute the application with the unix timex command. The timex command with the o option gives the CPU user time, the total number of blocks read or written to disk and the total characters transferred to the terminals. Divide the total number of CPU user seconds into total disk blocks read and written and into the total characters transferred to the terminal. This gives two quantities: the number of disk transfers per second of user CPU time and the number of tty characters transferred per second of user CPU time. .PP These two quantities can be related to a location on the SCB graphs. First obtain the user CPU time per work unit process by taking it from the subsystem report and calling this quantity C. Multiply C by the number of disk requests per second of user CPU time from the application. This gives the number of disk requests on the bottom axis of the SCB graphs. Next multiply C times the number of characters transferred per second of user CPU time from the application divided by 32 giving the number of tty lines on the SCB graphs. Having related the application to a location on the SCB graphs the relative performance for the application can now be estimated for every system on which the SCB was run. The run time for the application on a new system can be estimated by multiplying the run time on the old system by the ratio on the SCB ratio graph at the application point. .PP The Unix timex command may also be used to execute shell commands which means whole sets of applications may be related to locations on the SCB graphs. Using the SCB ratio graphs the percent improvement in throughput can be estimated for a number of new systems for different types of applications. These estimates can be obtained without the need to port any of the application programs. .SH Conclusions .PP By the very nature of benchmarks they are an approximation of the user application environment. While there are many different benchmarks measuring various elements of Unix computer systems, a decision to buy a certain computer should be based on the system's ability to handle the workload in the form of electronic and electro-mechanical processing. The time to perform an integer add or read a single disk file is useful information but is certainly not sufficient information to base a computer purchase on. .PP The problem with gathering a lot of details on Unix computer systems in the form of specifications and the results of running various benchmarks is that it is extremely difficult, if not impossible, because of the system's internal complexity to accurately relate this information to a single measure of processing effectiveness. The SCB uses the system itself to relate all the various operations and provide one measure of processing effectiveness. Because the processing effectiveness varies with the nature of the workload the SCB estimates performance for a number of different workloads. Providing workloads with various I/O levels is an important part of the SCB design because many Unix systems are I/O intensive. For a number of applications I/O turns out to be the limiting factor that determines throughput. .PP The SCB provides a reasonable approximation to a multiuser Unix application environment, provides an easy way to compare the relative processing effectiveness between systems and provides a method of narrowing performance estimates to specific or sets of application programs. .PP The SCB is easy to set up and run and it provides a comprehensive and accurate benchmark. .SH Acknowledgment .PP I wish to thank Myron Merritt for his many hours of discussion and his ideas used in the development of the system characterization benchmark.