home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
-
-
-
- AAAA MMMMUUUULLLLTTTTIIIIUUUUSSSSEEEERRRR BBBBEEEENNNNCCCCHHHHMMMMAAAARRRRKKKK
- TTTTOOOO CCCCOOOOMMMMPPPPAAAARRRREEEE
- UUUUNNNNIIIIXXXX SSSSYYYYSSSSTTTTEEEEMMMMSSSS
-
-
- Philip M. Mills
-
- NCR Corporation
- 3325 Platt Springs Road
- West Columbia, SC 29169
- phone 803 791 6377
-
-
-
-
-
-
- September 13, 1985
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- AAAA MMMMUUUULLLLTTTTIIIIUUUUSSSSEEEERRRR BBBBEEEENNNNCCCCHHHHMMMMAAAARRRRKKKK
- TTTTOOOO CCCCOOOOMMMMPPPPAAAARRRREEEE
- UUUUNNNNIIIIXXXX SSSSYYYYSSSSTTTTEEEEMMMMSSSS
-
-
- Philip M. Mills
-
- NCR Corporation
- 3325 Platt Springs Road
- West Columbia, SC 29169
- phone 803 791 6377
-
-
-
-
-
-
- _U_n_i_x _B_e_n_c_h_m_a_r_k_s _I_n_a_d_e_q_u_a_t_e operations.
-
- Despite all the evaluation of (3) One is unable to correlate the
- Unix systems using benchmarks, no time to perform simple opera-
- readily available benchmark today tions with the system's overall
- provides an accurate estimate of a ability to process work.
- multiuser Unix system's overall pro-
- cessing effectiveness. Published (4) The benchmark cannot evaluate
- articles showing benchmark results as Unix systems with one or multi-
- well as the results from executing ple main processors.
- benchmarks, typically provide tables
- of data containing measurements of To overcome the difficulties
- different parts of the system, leav- just described, a self-measuring
- ing one with the job of interpreting portable multiuser multiprocessor
- the results as best they can. What unix benchmark has been developed.
- one really wants to know is will a This benchmark provides:
- set of application programs execute
- faster on one Unix system compared to (1) A reasonable approximation to a
- another Unix system. multiuser multiprocessor unix
- application environment.
- The problem with the results
- provided by most benchmarks is that (2) A single measure of the overall
- one cannot accurately infer which system's processing effective-
- system can execute a set of applica- ness.
- tion programs the best for the fol-
- lowing reasons: (3) An easy way to compare the rela-
- tive overall system's processing
- (1) The benchmark did not contain a effectiveness between systems.
- workload that provided an ade-
- quate simulation of a Unix mul- (4) A method of relating application
- tiuser application environment. requirements to the benchmark
- results.
- (2) The benchmark ignored the
- system's ability to overlap This benchmark is being made
-
-
-
-
-
-
-
-
-
- - 2 -
-
-
- available free to the Unix community. as video displays, terminals and
- printers.
- _D_e_f_i_n_i_n_g _P_e_r_f_o_r_m_a_n_c_e
- _O_v_e_r_l_a_p_p_e_d _O_p_e_r_a_t_i_o_n_s
- Whether buying computers or
- using them, one of the qualities of In a multiuser Unix system there
- major importance is performance. are many user processes or programs
- Since performance may be inadver- executing at the same time. This
- tently confused with other attributes type of processing allows the I/O
- of a computer system it is useful to processing of some user programs to
- describe performance as it applies to be overlapped with the CPU processing
- computers and other mechanisms. of other user programs. The result
- of overlapping the I/O operations
- Performance may be described as with CPU operations depends on the
- the result of carrying out a set of amount of intelligence built into the
- actions in a specific environment to I/O processors.
- satisfy an objective or goal. In
- track and field each event has a dif- An intelligent serial I/O sub-
- ferent environment and goal and the system may handle interrupts from
- results may be running a distance in multiple on-line users, provide a
- some amount of time or jumping so slave serial I/O processor, handle
- many feet. In performance the buffering for both input and output
- results of carrying out the and may even execute parts of the
- prescribed set of actions is usually Unix kernel's tty code. Besides pro-
- described in terms of measurable viding overlapped serial I/O opera-
- quantities. tions for the terminals and printers
- with the main CPU processing, the
- There are two main results that intelligent serial I/O subsystem may
- are important to measure in a Unix also greatly reduce the amount of
- based computer system: responsiveness processing required by the main CPU.
- and throughput. Responsiveness meas- The reason for this reduction is that
- ures the system's ability to respond functions or processing that use to
- to a work request in a given amount be done by the main CPU are now done
- of time. Throughput measures the by the serial I/O subsystem.
- total amount of work performed over a
- given period of time. The work on a An intelligent disk subsystem
- computer system consists of the will have one or more disk controll-
- actions of electronic and electro- ers and disk drives and may also have
- mechanical processing. Performance a slave processor. The disk subsys-
- results are often measured by execut- tem handles the control of the disk
- ing a given workload and measuring drive operations, handles interrupts,
- the time spent processing the work. handles buffering and provides a high
- speed DMA data transfer in and out of
- _F_u_n_c_t_i_o_n_a_l_i_t_y main memory. If a slave processor is
- available it may be used to execute
- A computer has three major areas some of the Unix kernel file code.
- of functionality that a workload Intelligent disk subsystems provide
- should contain. The first is the overlapped operations with the main
- processing by the central processing CPU and also reduces the load on the
- unit (CPU) where all the calculations main processor. In some disk subsys-
- are performed; the second is the disk tems, operations on separate disk
- subsystem where the computer stores drives can also be overlapped with
- data files; the third is serial I/O each other producing a much higher
- subsystem which controls such items effective transfer rate for the disk
-
-
-
-
-
-
-
-
-
- - 3 -
-
-
- subsystem. overall user environment. This
- approach is often not feasible
- _O_v_e_r_l_a_p_p_e_d _P_e_r_f_o_r_m_a_n_c_e because the set of applications do
- not exist or it would take too much
- The effects of overlapped opera- manpower to port the application pro-
- tions are extremely important to com- grams to run on each computer system
- puter system performance. This point being evaluated. Also it may take
- is illustrated by the following exam- too much time to execute a large set
- ple. Assume a workload is developed of applications on a number of com-
- that contains 100 seconds of CPU pro- puter systems.
- cessing, 100 seconds of disk file
- processing and 100 seconds of termi- _P_o_r_t_a_b_l_e _B_e_n_c_h_m_a_r_k_s
- nal I/O processing on computer system
- A and also on computer system B. For the reasons just given port-
- System A has intelligent disk and tty able benchmarks that are easy to
- subsystems which can overlap opera- obtain and execute in a relatively
- tions with the main CPU and each short period of time are often used
- other and system A also supports the to simulate the application program
- common Unix multiuser mode. There- environment. These benchmarks should
- fore system A can execute the work- measure the results needed in perfor-
- load in 100 seconds. Although system mance evaluation. The most important
- B has the same CPU power it cannot results that should be provided are
- overlap I/O operations. System B measurements of throughput and
- will take 300 seconds of real time to responsiveness.
- execute the workload. For the work-
- load described system A is three A well designed general purpose
- times more powerful than system B in benchmark used to simulate the work-
- that it will have a throughput rate load of the Unix application environ-
- three times that of system B. ment should have the following
- Another way of putting it is that for features:
- a fixed amount of time system A will
- accomplish three times as much work (1) A broad weighted mix of basic
- as system B. CPU operations used with dif-
- ferent data referencing modes.
- Although the example was
- developed to illustrate a point, a (2) A number of user processes exe-
- typical unix workload contains a sig- cuting in a multiprocess mode.
- nificant amount of disk and serial
- I/O and the overlapping of these (3) Terminal I/O taking place with a
- operations by the hardware/software number of terminals at the same
- can make a substantial improvement in time.
- performance.
- (4) Disk I/O taking place with a
- _A_p_p_l_i_c_a_t_i_o_n _B_e_n_c_h_m_a_r_k_s number of files scattered over
- the disk drive surface to cause
- A sound approach to evaluating head movement. The benchmark
- the performance of various computer should have the ability to use
- systems is to take the application or multiple disk drives.
- sets of application programs that
- make up the user environment and exe- (5) Results that can provide an
- cute them on each computer system to accurate measure of the computer
- be evaluated. It is most important system's ability to execute work
- that the set of application programs over time. The workload produc-
- be truly representative of the ing the results should contain
-
-
-
-
-
-
-
-
-
- - 4 -
-
-
- all the processing operations on (3) The tty test only involves one
- the system so the system's abil- or two terminals. The use of
- ity to overlap operations will multiple tty lines all active at
- be taken into account. the same time represents the
- actual user environment and its
- (6) Benchmark should be portable need can best be illustrated by
- across various Unix systems. an example. Assume the tty
- peripheral controller can handle
- _E_v_a_l_u_a_t_i_n_g _E_x_i_s_t_i_n_g _B_e_n_c_h_m_a_r_k_s 3200 characters per second total
- on output and it is connected to
- Before developing a new bench- 8 tty lines at 9600 baud. For a
- mark an evaluation of existing avail- single user the tty controller
- able benchmarks was performed. The can maintain 960 characters per
- benchmarks are of two types. The second to that user terminal.
- first type evaluates the time to per- However, for 8 heavily active
- form a set of some simple basic users the tty controller can
- operations such as add integer by only maintain an average of only
- placing each operation in a loop and 400 characters per second per
- measuring the time to complete the tty line. For certain workloads
- loop. The second type of benchmark this limitation will affect the
- consists of one or more programs that overall throughput of the com-
- are designed to be appropriate for a puter system.
- benchmark. These programs are run
- and the time to execute the programs (4) The disk test consists of read-
- is measured. ing, writing and copying a sin-
- gle file. In a multiuser Unix
- The AIM Benchmark Suite II is a environment a number of separate
- well known benchmark of the first file disk I/O requests are made
- type used to evaluate Unix computer which are scattered randomly
- system's performance (see Vol 11 No. over the disk drive surface.
- 3 Page 89 for an example). Given the This sequence of random disk I/O
- objectives of providing an accurate requests requires significant
- estimate of throughput or responsive- disk head movement.
- ness through the simulation of a Unix
- multiuser application environment (5) Unix computer systems are often
- this benchmark did not meet our goals purchased with multiple disk
- for the following reasons: drives. It is therefore neces-
- sary to be able to include in
- (1) The benchmark failed to include the benchmark multiple disk
- the basic operations for com- requests at the same time for
- parisons, logical, bit and all the disk drives in the sys-
- branch operations which find tem. It is important to evalu-
- significant use in the Unix ate the system in this mode
- environment. because it more accurately simu-
- lates the user environment.
- (2) The benchmark uses mostly regis- Many systems are able to overlap
- ter variables. Where data some disk I/O operations between
- references across the various drives as well as with the main
- basic operations, such as, CPU. The AIM benchmark did not
- automatic, static, pointer and provide this capability.
- array references are commonly
- used in the Unix environment. (6) There is no way to correlate the
- Array referencing is included as point system used by the AIM
- a separate test. benchmark in a multiuser
-
-
-
-
-
-
-
-
-
- - 5 -
-
-
- environment with throughput or Because responsiveness and
- responsiveness. The AIM point throughput are both very much related
- system cannot take into account to a computer system's ability to
- the overlapped operations of tty process work the relationship between
- and disk I/O and the reduction the two was investigated. The
- in the main CPU processing time results of this investigation showed
- due to the intelligent I/O pro- that a 20% increase in throughput
- cessors. Even a simple floppy would provide a 20% or greater
- disk controller can overlap improvement in response time. How
- operations with the main CPU. much greater than 20% the improvement
- in response time would be depends on
- Although the AIM Benchmark Suite the ratio of the time to process an
- II was developed to execute in a sin- interactive user request to the aver-
- gle user mode it has been used exten- age user think time. If the think
- sively to evaluate multiuser Unix time goes to zero, throughput and
- systems. For our purposes of responsiveness vary the same from
- evaluating Unix multiuser systems the system to system for a given work-
- AIM benchmark does not provide an load.
- adequate simulation of the Unix user
- environment, and second, does not In an interactive system like
- provide an accurate measure of the Unix the user think time tends to
- Unix computer system's ability to create idle time on the various sys-
- process work. tem components. Given that the user
- is not a part of the computer system
- The AIM benchmark does provide that is to be purchased, throughput
- information that is quite useful in is really the best measure of the
- tuning unix systems. This information computer system's ability to process
- can best be used (and is) by organi- work. For the reasons just stated
- zations marketing Unix computer sys- the benchmark measures throughput as
- tems who are able to modify the a result of executing a particular
- hardware and unix software to improve workload.
- performance.
- _T_h_e _S_y_s_t_e_m _C_h_a_r_a_c_t_e_r_i_z_a_t_i_o_n _B_e_n_c_h_m_a_r_k
- The Dhrystone is another bench-
- mark which provides a good mix of The purpose of the benchmark is
- statement types, operators and data to compare Unix systems on their
- reference types but it does not pro- ability to do work (throughput) using
- vide any I/O operations which are a simulated Unix multiuser environ-
- needed for the multiple tty lines and ment with multiple tty lines, multi-
- multiple disk drives used by the Unix ple disk drives and one or more main
- multiuser environment. processors. For a fair comparison
- each computer system should execute
- _T_h_r_o_u_g_h_p_u_t _V_s. _R_e_s_p_o_n_s_i_v_e_n_e_s_s the same workload with the same
- number of tty lines and the same
- In the development of the bench- number of disk drives.
- mark a decision had to be made to
- measure throughput or responsiveness. The application requirements for
- Although response time is a very a computer system will vary in each
- important performance measure in an environment such as word processing,
- interactive Unix multiuser system it accounting, graphics, data base
- requires the use of a remote terminal management, spread sheet, compilers
- emulator computer to provide tty and scientific computations. For
- inputs to obtain repeatable results. this reason the benchmark (SCB) exe-
- cutes a number of different workloads
-
-
-
-
-
-
-
-
-
- - 6 -
-
-
- each with a different mix of I/O The first part of the SCB con-
- operations. Because of the many dif- sists of executing three programs
- ferent workloads used by the bench- crun, drun and trun to produce the
- mark, it is more of a system charac- report shown in Figure 1 that shows
- terization than a single benchmark. the effective capabilities of the one
- For this reason the benchmark is or more main CPU , the disk subsystem
- called the System Characterization and the tty subsystems respectively.
- Benchmark (SCB).
- The purpose of crun is to obtain
- The SCB has the following the relative processing power of the
- characteristics: one or more main CPUs. The crun pro-
- gram also provides for multiprocessor
- (1) The SCB measures the rate systems an estimate of the ratio of
- (throughput) at which fixed the multiprocessor's throughput to a
- units of work can be executed in single processor's throughput. If
- a unit of time (minutes). there is only one CPU this ratio is
- one.
- (2) The SCB takes into account the
- effects of all overlapped opera- The crun code consists of a
- tions executed in parallel, weighted mix of basic operations on
- including multiple main proces- various types of data references.
- sor units, if they exist. The frequency of execution of the
- basic operations involving the dif-
- (3) The SCB simulates a multiuser ferent types of data references are
- Unix application environment weighted according to an internal
- with unpublished compiler study of Unix
- based C code. Detail information on
- (a) A broad mix of main proces- the mix of operations and data refer-
- sor operations and data ences is contained in the SCB docu-
- references mentation. The time to execute the
- crun code is measured and used to
- (b) Multiple tty lines with determine the user CPU time per work
- terminals unit process.
-
- (c) Multiple disk drives each In setting up the SCB each disk
- with its own set of data drive specified by the user has 200
- files files of 20K bytes each placed on it.
- In the execution of drun the user may
- (4) The SCB consists of 50 different specify the percent of files to be
- runs to provide a system charac- read and written on each disk drive.
- terization that will cover the The disk program drun produces the
- many different application information on the disk subsystem in
- environments. the subsystem report by reading and
- writing file blocks randomly over the
- (5) The SCB provides graphical out- different disk drive surfaces speci-
- put for a quick analysis and fied.
- comparisons on the basis of
- throughput. The serial I/O subsystem is usu-
- ally made up of one or more tty con-
- (6) The SCB provides information on trollers where each tty controller
- the effective rates of the main handles a fixed number of tty
- central processing unit (CPU), lines/terminals. The tty controller
- the disk subsystem and the tty typically can handle some maximum
- subsystem. level of characters per second for
-
-
-
-
-
-
-
-
-
- - 8 -
-
-
- tty output and input across all cuted per minute are calculated for
- lines. The information in the sub- each of the 50 runs and the results
- system report is obtained by writing are presented in a graphical form.
- 32 character lines to all tty ports
- as fast as the tty controllers can _T_h_e _R_e_s_u_l_t_s _O_f _T_h_e _S_C_B _B_e_n_c_h_m_a_r_k
- handle them.
- The results of executing the SCB
- The second part of the SCB con- are presented in the graphical form
- sists of 50 separate runs controlled shown in Figure 2. The left axis
- by a shell script. Each run is made represents the number of work units
- up of 200 work units. The user that can be executed per minute
- processes execute 8 or some setable (throughput). The bottom axis is the
- multiuser level at a time. Each work number of disk requests in a unit of
- unit consists of the following com- work process.
- ponents.
- The letters on the graph (A B C
- (1) The execution of a copy of xrun E F) represent the different levels
- code with a CPU user time pro- of tty line writes contained in each
- portional to the system's crun work unit process.
- time.
- Letter Number of tty
- (2) A given number of 1K disk block line writes
- reads and writes.
- A 0
- (3) A given number of 32 character B 30
- line writes to a tty C 60
- line/terminal. D 90
- E 120
- Each work unit process in the
- run is identical. The disk and tty
- requests are equally spaced in the
- CPU user time to provide more con- The graphical results are pro-
- sistent and repeatable results. Each duced on a 130 column printer and the
- work unit has its own unique disk border on the graphical report is 8
- read file and write file. The disk 1/2" x 11" so that it may be placed
- files are preassigned to each work into reports when desired.
- unit at random. A work unit process
- has a unique tty line during its exe- There are 50 points on a graph
- cution. On completion the tty line each specified by a letter. Each
- is given to the next work unit pro- point represents the throughput rate
- cess started. in work units per minute. Each point
- is determined by executing 200 work
- The 50 runs made by a shell unit processes each with a unit of
- script uses all the combinations of CPU user processing and the number of
- the following disk and tty levels: disk and tty requests specified on
- the graph.
- disk blocks 0, 4, 8, 12, 16, 20,
- 24, 28, 32, 36 The SCB results show how a com-
- puter system performs under a variety
- tty lines 0, 30, 60, 90, 120 of loads. On moving to the right on
- the graph, the effect of executing
- The disk operations are divided more disk requests is indicated by
- evenly between reads and writes of the steepness of the curve towards
- existing files. The work units exe- the bottom axis. The more limited
-
-
-
-
-
-
-
-
-
- - 10 -
-
-
- the disk subsystem in capability the track tape you need to send one with
- steeper the downward slope to the your request. The set up and opera-
- right. The different letters show tion is described in the help file on
- the effect of increasing the tty load the media. Once a simple config file
- per work unit process. The more lim- is set up, two shell scripts that are
- ited the tty subsystem in capability provided handle the rest.
- the greater the downward distance
- between the curves of the same _R_e_l_a_t_i_n_g _U_s_e_r _A_p_p_l_i_c_a_t_i_o_n _T_o _T_h_e _S_C_B
- letter. Point A next to the left
- axis shows the system's ability to Using the ratio graph system A
- execute CPU user processing with no can now be compared with system B
- I/O. across a broad mix of different work-
- loads. However system A may be
- Facilities are also provided for better than system B (ratio greater
- comparing two different computer sys- than one) over certain regions of the
- tems by taking the ratio of the graph and worse than system B (ratio
- throughputs of all 50 points and less than one) on other regions of
- graphing the results shown in Figure the graph. For this reason users may
- 3. The left axis shows the ratio and wish to identify the region of the
- the bottom axis and the letters (A B SCB graph for their typical applica-
- C E F) are the same as Figure 2. tions.
- This graph shows in which area or
- areas (main CPU, tty, disk) one sys- To obtain information on the
- tem is superior to the other. Ratios application, execute the application
- greater than one show an increase in with the unix timex command. The
- capability in percent above one and timex command with the o option gives
- ratios less than one show less capa- the CPU user time, the total number
- bility in percent under one. of blocks read or written to disk and
- the total characters transferred to
- The ratio graph of Figure 3 the terminals. Divide the total
- shows machine A is superior in han- number of CPU user seconds into total
- dling disk requests. Machine B has a disk blocks read and written and into
- cpu/compiler speed that is about five the total characters transferred to
- percent faster. The two machines are the terminal. This gives two quanti-
- equivalent in handling tty write ties: the number of disk transfers
- requests. per second of user CPU time and the
- number of tty characters transferred
- _O_b_t_a_i_n_i_n_g _A_n_d _R_u_n_n_i_n_g _T_h_e _S_C_B per second of user CPU time.
-
- The software for the system These two quantities can be
- characterization benchmark (SCB) may related to a location on the SCB
- be obtained free on IBM PC or NCR graphs. First obtain the user CPU
- Tower compatible floppy from NCR by time per work unit process by taking
- writing on company letterhead to: it from the subsystem report and cal-
- ling this quantity C. Multiply C by
- the number of disk requests per
- SCB Distribution second of user CPU time from the
- NCR Corporation application. This gives the number
- 3325 Platt Springs Road of disk requests on the bottom axis
- West Columbia, SC 29169 of the SCB graphs. Next multiply C
- Attention: Ms. Darlene Amick times the number of characters
- transferred per second of user CPU
- time from the application divided by
- If you wish the software on nine 32 giving the number of tty lines on
-
-
-
-
-
-
-
-
-
- - 12 -
-
-
- the SCB graphs. Having related the processing effectiveness. Because
- application to a location on the SCB the processing effectiveness varies
- graphs the relative performance for with the nature of the workload the
- the application can now be estimated SCB estimates performance for a
- for every system on which the SCB was number of different workloads. Pro-
- run. The run time for the applica- viding workloads with various I/O
- tion on a new system can be estimated levels is an important part of the
- by multiplying the run time on the SCB design because many Unix systems
- old system by the ratio on the SCB are I/O intensive. For a number of
- ratio graph at the application point. applications I/O turns out to be the
- limiting factor that determines
- The Unix timex command may also throughput.
- be used to execute shell commands
- which means whole sets of applica- The SCB provides a reasonable
- tions may be related to locations on approximation to a multiuser Unix
- the SCB graphs. Using the SCB ratio application environment, provides an
- graphs the percent improvement in easy way to compare the relative pro-
- throughput can be estimated for a cessing effectiveness between systems
- number of new systems for different and provides a method of narrowing
- types of applications. These esti- performance estimates to specific or
- mates can be obtained without the sets of application programs.
- need to port any of the application
- programs. The SCB is easy to set up and
- run and it provides a comprehensive
- _C_o_n_c_l_u_s_i_o_n_s and accurate benchmark.
-
- By the very nature of benchmarks _A_c_k_n_o_w_l_e_d_g_m_e_n_t
- they are an approximation of the user
- application environment. While there I wish to thank Myron Merritt
- are many different benchmarks measur- for his many hours of discussion and
- ing various elements of Unix computer his ideas used in the development of
- systems, a decision to buy a certain the system characterization bench-
- computer should be based on the mark.
- system's ability to handle the work-
- load in the form of electronic and
- electro-mechanical processing. The
- time to perform an integer add or
- read a single disk file is useful
- information but is certainly not suf-
- ficient information to base a com-
- puter purchase on.
-
- The problem with gathering a lot
- of details on Unix computer systems
- in the form of specifications and the
- results of running various benchmarks
- is that it is extremely difficult, if
- not impossible, because of the
- system's internal complexity to accu-
- rately relate this information to a
- single measure of processing effec-
- tiveness. The SCB uses the system
- itself to relate all the various
- operations and provide one measure of
-
-
-
-
-
-
-