home *** CD-ROM | disk | FTP | other *** search
- .nr LL 8i
- .ND September 13, 1985
- .RP
- .TL
- A MULTIUSER BENCHMARK
- .br
- TO COMPARE
- .br
- UNIX SYSTEMS
- .AU
- Philip M. Mills
- .AI
- NCR Corporation
- 3325 Platt Springs Road
- West Columbia, SC 29169
- phone 803 791 6377
- .2C
- .SH
- Unix Benchmarks Inadequate
- .PP
- Despite all the evaluation of Unix systems using benchmarks,
- no readily available benchmark today provides an accurate
- estimate of a multiuser Unix system's overall
- processing effectiveness.
- Published articles showing benchmark results as well as
- the results from executing benchmarks,
- typically provide tables of data containing
- measurements of different parts of the system,
- leaving one with the job of interpreting the results
- as best they can.
- What one really wants to know is will a set of application programs
- execute faster on one Unix system compared
- to another Unix system.
- .PP
- The problem with the results provided by most
- benchmarks is that one cannot accurately infer
- which system can execute a set of application programs
- the best for the following reasons:
- .IP (1)
- The benchmark does not contain a workload that provided
- an adequate simulation of a Unix multiuser application environment.
- .IP (2)
- The benchmark ignores the system's ability to overlap operations.
- .IP (3)
- One is unable to correlate the time to perform simple operations with the
- system's overall ability to process work.
- .IP (4)
- The benchmark cannot evaluate Unix systems with multiple main processors.
- .PP
- To overcome the difficulties just described,
- a self-measuring portable multiuser multiprocessor unix
- benchmark has been developed.
- This benchmark provides:
- .IP (1)
- A reasonable approximation to a multiuser multiprocessor
- unix application environment.
- .IP (2)
- A single measure of the overall system's processing effectiveness.
- .IP (3)
- An easy way to compare the relative overall system's
- processing effectiveness between systems.
- .IP (4)
- A method of relating application requirements to the benchmark results.
- .SH
- Defining Performance
- .PP
- Whether buying computers or using them, one
- of the qualities of major importance is performance.
- Since performance may be inadvertently confused with other attributes
- of a computer system it is useful to describe
- performance as it applies to computers and other mechanisms.
- .PP
- Performance may be described as
- the result of carrying out a set
- of actions in a specific environment to satisfy an objective
- or goal.
- In track and field each event has a
- different environment and goal and the
- results may be running a distance in
- some amount of time or jumping
- so many feet.
- In performance the results of carrying out the prescribed set
- of actions is usually
- described in terms of measurable quantities.
- .PP
- There are two main results that are important
- to measure in a Unix based computer system:
- responsiveness and throughput.
- Responsiveness
- measures the system's ability to respond to a
- work request in a given amount of time.
- Throughput measures the total amount of
- work performed over a given period of
- time.
- The work on a computer system
- consists of the actions of electronic and electro-mechanical
- processing.
- Performance results are often measured by executing a given workload and
- measuring the time spent processing the work.
- .SH
- Functionality
- .PP
- A computer has three major areas of functionality
- that a workload should contain.
- The first is the processing by the central processing unit (CPU)
- where all the calculations are performed;
- the second is the disk subsystem
- where the computer stores data files; the third is serial I/O subsystem which
- controls such items as video displays, terminals and printers.
- .SH
- Overlapped Operations
- .PP
- In a multiuser Unix system there
- are many user processes or programs executing at
- the same time.
- This type of processing
- allows the I/O processing of some user programs
- to be overlapped with the CPU processing of other user programs.
- The result
- of overlapping the I/O operations with
- CPU operations depends on the amount of intelligence
- built into the I/O processors.
- .PP
- An intelligent serial I/O subsystem
- may handle interrupts from multiple
- on-line users, provide a slave
- serial I/O processor, handle buffering for both
- input and output and may even execute
- parts of the Unix kernel's tty code.
- Besides providing overlapped serial I/O operations for
- the terminals and printers
- with the main CPU processing, the intelligent serial I/O
- subsystem may also greatly reduce
- the amount of processing required by the main CPU.
- The reason for this reduction is that functions or processing
- that use to be done by the main CPU are now done by the serial
- I/O subsystem.
- .PP
- An intelligent disk subsystem will have one or more
- disk controllers and disk drives and may also have
- a slave processor.
- The disk subsystem
- handles the control of the
- disk drive operations, handles interrupts, handles buffering
- and provides a high speed DMA data transfer in and out of main memory.
- If a slave processor is available it may be used to execute some of
- the Unix kernel file code.
- Intelligent disk subsystems provide overlapped operations with the main CPU
- and also reduces the load on the main processor.
- In some disk subsystems, operations on separate disk drives can
- also be overlapped with each other producing a much higher effective transfer
- rate for the disk subsystem.
- .SH
- Overlapped Performance
- .PP
- The effects of overlapped operations
- are extremely important to computer system performance.
- This point is illustrated
- by the following example.
- Assume a workload
- is developed that contains 100 seconds
- of CPU processing, 100 seconds of disk file
- processing and 100 seconds of terminal I/O
- processing on computer system A and also on computer
- system B.
- System A has intelligent
- disk and tty subsystems
- which can overlap operations with the main CPU and each other
- and system A also supports the common Unix multiuser mode.
- Therefore system A can execute the workload in 100 seconds.
- Although system B has the same CPU power it cannot overlap I/O operations.
- System B will take 300 seconds of real time to
- execute the workload.
- For the workload
- described system A
- is three times more powerful than system B
- in that it will have a throughput rate
- three times that of system B.
- Another way of putting
- it is that for a fixed amount of time
- system A will accomplish three times as
- much work as system B.
- .PP
- Although the example was developed
- to illustrate a point, a typical unix
- workload contains a significant amount of disk and serial I/O
- and the overlapping
- of these operations by the hardware/software can make a
- substantial improvement in performance.
- .SH
- Application
- Benchmarks
- .PP
- A sound approach to evaluating the
- performance of various computer systems is to
- take the application or sets of application
- programs that make up the user environment
- and execute them on each computer system
- to be evaluated.
- It is most important that the set of application
- programs be truly representative of the overall
- user environment.
- This approach is often not feasible because the
- set of applications do not exist or it would
- take too much manpower to port the application
- programs to run on each computer system being
- evaluated.
- Also it may take too much time to execute a large
- set of applications on a number of computer systems.
- .SH
- Portable Benchmarks
- .PP
- For the reasons just given portable
- benchmarks that are easy to obtain and
- execute in a relatively short period of time
- are often used to simulate the application program
- environment.
- These benchmarks should measure the results needed in performance evaluation.
- The most important results
- that should be provided are measurements
- of throughput and responsiveness.
- .PP
- A well designed general purpose
- benchmark used to simulate the workload
- of the Unix application environment
- should have the following features:
- .IP (1)
- A broad weighted mix of basic CPU operations used with different data
- referencing modes.
- .IP (2)
- A number of user processes executing in a multiprocess mode.
- .IP (3)
- Terminal I/O taking place with a number of terminals at the same time.
- .IP (4)
- Disk I/O taking place with a number of files scattered
- over the disk drive surface to cause head movement.
- The benchmark should have the ability to use
- multiple disk drives.
- .IP (5)
- Results that can provide an accurate
- measure of the computer system's ability to execute work
- over time.
- The workload producing the results should contain all the
- processing operations on the system so the system's ability
- to overlap operations will be taken into account.
- .IP (6)
- Benchmark should be portable across various Unix systems.
- .SH
- Evaluating Existing Benchmarks
- .PP
- Before developing a
- new benchmark an evaluation of existing
- available benchmarks was performed.
- The benchmarks are of two types.
- The first type evaluates the time to perform a set of
- some simple basic operations such as add integer by placing
- each operation in a loop and measuring the time to complete the loop.
- The second type of benchmark consists of one or more
- programs that are designed to be appropriate for a benchmark.
- These programs are run and the time to execute the programs is measured.
- .PP
- The AIM Benchmark Suite II is a well known benchmark of the first type
- used to evaluate Unix computer system's
- performance (see Vol 11 No. 3 Page 89 for an example).
- Given the objectives of providing an
- accurate estimate of throughput or responsiveness
- through the simulation of a Unix multiuser
- application environment this benchmark did not meet our
- goals for the following reasons:
- .IP (1)
- The benchmark failed to include the basic operations for comparisons,
- logical, bit and branch operations which
- find significant use in the Unix environment.
- .IP (2)
- The benchmark uses mostly register variables.
- Where data references across
- the various basic operations, such as, automatic,
- static, pointer and array references
- are commonly used in the Unix environment.
- Array referencing is included as a separate test.
- .IP (3)
- The tty test only involves one or two terminals.
- The use of multiple tty lines all active at the same time
- represents the actual user
- environment and its need can best be illustrated
- by an example.
- Assume the tty peripheral controller can handle 3200 characters per
- second total on output and it is connected to 8 tty lines at 9600 baud.
- For a single user the tty controller can maintain 960 characters per second
- to that user terminal.
- However, for 8 heavily active users the tty controller
- can only maintain an average of only 400 characters per second per tty
- line.
- For certain workloads this limitation will affect
- the overall throughput of the computer system.
- .IP (4)
- The disk test consists of reading, writing and copying a single file.
- In a multiuser Unix environment a number of separate file disk I/O requests
- are made which are scattered randomly over the disk drive surface.
- This sequence of random disk I/O requests requires significant disk head movement.
- .IP (5)
- Unix computer systems
- are often purchased with multiple disk drives.
- It is therefore necessary to be able to include
- in the benchmark multiple disk requests at the same
- time for all the disk drives in the system.
- It is important to evaluate the system in this mode
- because it more accurately simulates the user environment.
- Many systems are able to overlap some disk I/O operations
- between drives as well as with the main CPU.
- The AIM benchmark did not provide this capability.
- .IP (6)
- There is no way to correlate the point system used by the AIM benchmark
- in a multiuser environment with throughput or responsiveness.
- The AIM point system cannot take into account the overlapped operations of tty
- and disk I/O and the reduction in the main CPU
- processing time due to the intelligent I/O processors.
- Even a simple floppy disk controller can overlap operations with the
- main CPU.
- .PP
- Although the AIM Benchmark Suite II was developed to execute in a single user
- mode it has been used extensively to evaluate multiuser Unix systems.
- For our purposes of evaluating Unix multiuser systems the AIM benchmark
- does not provide an adequate simulation of the Unix user environment,
- and second, does not provide an accurate measure of the Unix computer
- system's ability to process work.
- .PP
- The AIM benchmark does provide information that is quite useful in tuning
- unix systems. This information can best be used (and is) by organizations
- marketing Unix computer systems who are able to modify the hardware and
- unix software to improve performance.
- .PP
- The Dhrystone is another
- benchmark which provides a good mix of statement
- types, operators and data reference types but it does not
- provide any I/O operations which are needed for the multiple
- tty lines and multiple disk drives used by the Unix multiuser environment.
- .SH
- Throughput Vs. Responsiveness
- .PP
- In the development of the
- benchmark a decision had to be made to
- measure throughput or responsiveness.
- Although response time is a very important performance
- measure in an interactive Unix multiuser
- system it requires the use of a remote
- terminal emulator computer to provide tty inputs
- to obtain repeatable results.
- .PP
- Because responsiveness and throughput are
- both very much related to a computer system's ability
- to process work the relationship
- between the two was investigated.
- The results of this investigation showed that a 20% increase in throughput would
- provide a 20% or greater improvement
- in response time.
- How much greater than 20% the improvement in response time would be
- depends on the ratio of the time to process an interactive user
- request to the average user think time.
- If the think time goes to zero, throughput and
- responsiveness vary the same from system to system
- for a given workload.
- .PP
- In an interactive system like Unix the user think time
- tends to create idle time on the various system components.
- Given that the user is not a part of the computer system that is to be purchased,
- throughput is really the best measure of the computer system's
- ability to process work.
- For the reasons just stated the benchmark
- measures throughput as a result of
- executing a particular workload.
- .SH
- The System Characterization Benchmark
- .PP
- The purpose of the benchmark is
- to compare Unix systems on their ability
- to do work (throughput) using
- a simulated Unix multiuser environment with
- multiple tty lines, multiple disk drives and one or more main processors.
- For a fair comparison each computer system should
- execute the same workload with the same number
- of tty lines and the same number of disk drives.
- .PP
- The application requirements for a computer
- system will vary in each environment such as
- word processing, accounting, graphics, data base management, spread sheet, compilers and
- scientific computations.
- For this reason the
- benchmark (SCB) executes a number of different
- workloads each with a different mix of I/O operations.
- Because of the many different workloads used by the benchmark,
- it is more of a system characterization than a single benchmark.
- For this reason the benchmark is called the
- System Characterization Benchmark (SCB).
- .PP
- The SCB has the following characteristics:
- .IP (1)
- The SCB measures the rate (throughput) at which fixed
- units of work can be executed in
- a unit of time (minutes).
- .IP (2)
- The SCB takes into account the effects of all
- overlapped operations executed in parallel,
- including multiple main processor units, if they exist.
- .IP (3)
- The SCB simulates a multiuser Unix application
- environment with
- .RS
- .IP (a)
- A broad mix of main processor operations and data references
- .IP (b)
- Multiple tty lines with terminals
- .IP (c)
- Multiple disk drives each with its own set of data files
- .RE
- .IP (4)
- The SCB consists of 50 different runs to provide
- a system characterization that will cover the many
- different application environments.
- .IP (5)
- The SCB provides graphical output for a quick analysis
- and comparisons on the basis of throughput.
- .IP (6)
- The SCB provides information on the effective rates
- of the main central processing unit (CPU),
- the disk subsystem and the tty subsystem.
- .PP
- The first part of the SCB consists of executing three programs
- crun, drun and trun to produce the report shown in Figure 1 that
- shows the effective capabilities of the one or more main CPU , the disk subsystem
- and the tty subsystems respectively.
- .PP
- The purpose of crun is to obtain the relative processing power of
- the one or more main CPUs.
- The crun program also provides for multiprocessor systems
- an estimate of the ratio of the multiprocessor's throughput
- to a single processor's throughput.
- If there is only one CPU this ratio is one.
- .PP
- The crun code consists of a weighted mix of basic operations
- on various types of data references.
- The frequency of execution of the basic operations involving the
- different types of data references are weighted according to an
- internal unpublished compiler study of Unix based C code.
- Detail information on the mix of operations and data references
- is contained in the SCB documentation.
- The time to execute the crun code is measured and used to
- determine the user CPU time per work unit process.
- .PP
- In setting up the SCB each disk drive
- specified by the user has 200 files of
- 20K bytes each placed on it.
- In the execution
- of drun the user may specify the percent of
- files to be read and written on each disk drive.
- The disk program drun produces the
- information on the disk subsystem in the
- subsystem report by reading and writing file blocks
- randomly over the different disk drive surfaces specified.
- .PP
- The serial I/O subsystem is usually made up
- of one or more tty controllers where each
- tty controller handles a fixed number of
- tty lines/terminals.
- The tty controller typically
- can handle some maximum level
- of characters per
- second for tty output and input across
- all lines.
- The information in the subsystem report is obtained by writing 32
- character lines
- to all
- tty ports as fast as the tty controllers can handle them.
- .PP
- The second part of the SCB consists of 50
- separate runs controlled by a shell script.
- Each run is made up of 200 work units.
- The user processes execute 8 or some setable
- multiuser level at a time.
- Each work unit consists of the following components.
- .IP (1)
- The execution of a copy of xrun code with a
- CPU user time proportional to the system's crun time.
- .IP (2)
- A given number of 1K disk block reads and writes.
- .IP (3)
- A given number of 32 character line writes
- to a tty line/terminal.
- .PP
- Each work unit process in the run is
- identical.
- The disk and tty requests are equally spaced in the
- CPU user time to provide more consistent and repeatable results.
- Each work unit has its own unique disk read file and write file.
- The disk files are preassigned to each work unit at random.
- A work unit process has a unique tty line during its execution.
- On completion the tty line is given to the next work unit process started.
- .PP
- The 50 runs made by a shell script uses all the combinations of
- the following disk and tty levels:
- .IP
- disk blocks 0, 4, 8, 12, 16, 20, 24, 28, 32, 36
- .IP
- tty lines 0, 30, 60, 90, 120
- .PP
- The disk operations are divided evenly between reads and writes of existing
- files.
- The work units executed per minute are calculated for each of the 50 runs and
- the results are presented in a graphical form.
- .SH
- The Results Of The SCB Benchmark
- .PP
- The results of executing the SCB are presented in the graphical
- form shown in Figure 2.
- The left axis represents the number of work units
- that can be executed per minute (throughput).
- The bottom axis is the number of disk requests
- in a unit of work process.
- .PP
- The letters on the graph (A B C E F) represent the different levels
- of tty line writes contained in each work unit process.
- .LD
- Letter Number of tty
- line writes
-
- A 0
- B 30
- C 60
- D 90
- E 120
-
- .DE
- .PP
- The graphical results are produced on a 130 column printer and
- the border on the graphical report is 8 1/2" x 11" so that it
- may be placed into reports when desired.
- .PP
- There are 50 points on a graph each specified by a letter.
- Each point represents the throughput rate in work units per minute.
- Each point is determined by executing 200 work unit processes each with
- a unit of CPU user processing and the number of disk and tty requests
- specified on the graph.
- .PP
- The SCB results show how a computer system performs under a variety
- of loads.
- On moving to the right on the graph, the effect of executing more
- disk requests is indicated by the steepness of the curve towards
- the bottom axis.
- The more limited the disk subsystem in capability the steeper
- the downward slope to the right.
- The different letters show the effect of increasing the
- tty load per work unit process.
- The more limited the tty subsystem in capability the greater the downward
- distance between the curves of the same letter.
- Point A next to the left axis shows the system's
- ability to execute CPU user processing with no I/O.
- .PP
- Facilities are also provided for comparing two different computer
- systems by taking the ratio of the throughputs of all 50 points and
- graphing the results shown in Figure 3.
- The left axis shows the ratio and the bottom axis and the
- letters (A B C E F) are the same as Figure 2.
- This graph shows in which area or areas (main CPU, tty, disk) one system
- is superior to the other.
- Ratios greater than one show an increase in capability in percent above one
- and ratios less than one show less capability in percent under one.
- .PP
- The ratio graph of Figure 3 shows machine A is superior in handling disk requests.
- Machine B has a cpu/compiler speed that is about five percent faster.
- The two machines are equivalent in handling tty write requests.
- .SH
- Obtaining And Running The SCB
- .PP
- The software for the system characterization benchmark (SCB)
- may be obtained free on IBM PC or NCR Tower compatible floppy from NCR
- by writing on company letterhead to:
- .ID
-
- SCB Distribution
- NCR Corporation
- 3325 Platt Springs Road
- West Columbia, SC 29169
- Attention: Ms. Darlene Amick
- .DE
- .PP
- If you wish the software on nine track tape you need to send one
- with your request.
- The set up and operation is described in the help
- file on the media.
- Once a simple config file is set up, two shell scripts that are provided
- handle the rest.
- .SH
- Relating User Application To The SCB
- .PP
- Using the ratio graph system A can now be compared
- with system B across a broad mix of different workloads.
- However system A may be better than system B (ratio greater than one)
- over certain regions of the graph
- and worse than system B (ratio less than one)
- on other regions of the graph.
- For this reason users may wish to identify the region
- of the SCB graph for their typical applications.
- .PP
- To obtain information on the application,
- execute the application with the unix
- timex command.
- The timex command with the o option
- gives the CPU user time, the total number of blocks read or written to disk
- and the total characters transferred to the terminals.
- Divide the total number of
- CPU user seconds into total disk blocks
- read and written and into the total
- characters transferred to the terminal.
- This gives two quantities:
- the number of disk transfers per second of user CPU time and
- the number of tty characters transferred per second of user CPU time.
- .PP
- These two quantities can be
- related to a location on the SCB graphs.
- First obtain the user CPU time per work unit process by taking it
- from the subsystem report
- and calling this quantity C.
- Multiply C by the number of disk requests per second
- of user CPU time from the application.
- This gives the number of disk requests on the bottom
- axis of the SCB graphs.
- Next multiply C
- times the number of characters transferred
- per second of user CPU time from the application
- divided by 32 giving the number of tty lines on the SCB graphs.
- Having related the application to a location on the SCB graphs
- the relative performance for the application can now be estimated
- for every system on which the SCB was run.
- The run time for the application on a new system can be estimated
- by multiplying the run time on the old system by the ratio on the SCB
- ratio graph at the application point.
- .PP
- The Unix timex command may also be used to execute shell commands
- which means whole sets of applications may be related to locations on the
- SCB graphs.
- Using the SCB ratio graphs the percent improvement
- in throughput can be estimated for a number of new systems
- for different types of applications.
- These estimates can be obtained without the need to port any of
- the application programs.
- .SH
- Conclusions
- .PP
- By the very nature of benchmarks they are
- an approximation of the user application environment.
- While there are many different benchmarks measuring various elements
- of Unix computer systems, a decision to buy a certain computer should be based
- on the system's ability to handle the workload in the form
- of electronic and electro-mechanical processing.
- The time to perform an integer add or read a single
- disk file is useful information but is certainly not sufficient information
- to base a computer purchase on.
- .PP
- The problem with gathering a lot of details on Unix computer systems
- in the form of specifications and the results of running various
- benchmarks is that it is extremely difficult, if not impossible,
- because of the system's internal complexity to accurately
- relate this information to a single measure of processing effectiveness.
- The SCB uses the system itself to relate all the various operations
- and
- provide one measure of processing effectiveness.
- Because the processing effectiveness varies with the nature of
- the workload the SCB estimates performance
- for a number of different workloads.
- Providing workloads with various I/O levels is an important part
- of the SCB design because many Unix systems are I/O intensive.
- For a number of applications I/O turns out to be the limiting factor
- that determines throughput.
- .PP
- The SCB provides a reasonable approximation to a multiuser Unix application
- environment, provides an easy way to compare the relative processing
- effectiveness between systems and provides a method of narrowing
- performance estimates to specific or sets of application programs.
- .PP
- The SCB is easy to set up and run and it provides a comprehensive and
- accurate benchmark.
- .SH
- Acknowledgment
- .PP
- I wish to thank Myron Merritt for his many hours of discussion and his ideas used in the development of the system characterization benchmark.
-