Source Code 1992 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1992 March / Source_Code_CD-ROM_Walnut_Creek_March_1992.iso / usenet / altsrcs / 1 / 1230 / scbprdoc

Wrap

Text File | 1990-12-28 | 30.3 KB | 661 lines

AAAA MMMMUUUULLLLTTTTIIIIUUUUSSSSEEEERRRR BBBBEEEENNNNCCCCHHHHMMMMAAAARRRRKKKK TTTTOOOO CCCCOOOOMMMMPPPPAAAARRRREEEE UUUUNNNNIIIIXXXX SSSSYYYYSSSSTTTTEEEEMMMMSSSS Philip M. Mills NCR Corporation 3325 Platt Springs Road West Columbia, SC 29169 phone 803 791 6377 September 13, 1985 AAAA MMMMUUUULLLLTTTTIIIIUUUUSSSSEEEERRRR BBBBEEEENNNNCCCCHHHHMMMMAAAARRRRKKKK TTTTOOOO CCCCOOOOMMMMPPPPAAAARRRREEEE UUUUNNNNIIIIXXXX SSSSYYYYSSSSTTTTEEEEMMMMSSSS Philip M. Mills NCR Corporation 3325 Platt Springs Road West Columbia, SC 29169 phone 803 791 6377 _U_n_i_x _B_e_n_c_h_m_a_r_k_s _I_n_a_d_e_q_u_a_t_e operations. Despite all the evaluation of (3) One is unable to correlate the Unix systems using benchmarks, no time to perform simple opera- readily available benchmark today tions with the system's overall provides an accurate estimate of a ability to process work. multiuser Unix system's overall pro- cessing effectiveness. Published (4) The benchmark cannot evaluate articles showing benchmark results as Unix systems with one or multi- well as the results from executing ple main processors. benchmarks, typically provide tables of data containing measurements of To overcome the difficulties different parts of the system, leav- just described, a self-measuring ing one with the job of interpreting portable multiuser multiprocessor the results as best they can. What unix benchmark has been developed. one really wants to know is will a This benchmark provides: set of application programs execute faster on one Unix system compared to (1) A reasonable approximation to a another Unix system. multiuser multiprocessor unix application environment. The problem with the results provided by most benchmarks is that (2) A single measure of the overall one cannot accurately infer which system's processing effective- system can execute a set of applica- ness. tion programs the best for the fol- lowing reasons: (3) An easy way to compare the rela- tive overall system's processing (1) The benchmark did not contain a effectiveness between systems. workload that provided an ade- quate simulation of a Unix mul- (4) A method of relating application tiuser application environment. requirements to the benchmark results. (2) The benchmark ignored the system's ability to overlap This benchmark is being made - 2 - available free to the Unix community. as video displays, terminals and printers. _D_e_f_i_n_i_n_g _P_e_r_f_o_r_m_a_n_c_e _O_v_e_r_l_a_p_p_e_d _O_p_e_r_a_t_i_o_n_s Whether buying computers or using them, one of the qualities of In a multiuser Unix system there major importance is performance. are many user processes or programs Since performance may be inadver- executing at the same time. This tently confused with other attributes type of processing allows the I/O of a computer system it is useful to processing of some user programs to describe performance as it applies to be overlapped with the CPU processing computers and other mechanisms. of other user programs. The result of overlapping the I/O operations Performance may be described as with CPU operations depends on the the result of carrying out a set of amount of intelligence built into the actions in a specific environment to I/O processors. satisfy an objective or goal. In track and field each event has a dif- An intelligent serial I/O sub- ferent environment and goal and the system may handle interrupts from results may be running a distance in multiple on-line users, provide a some amount of time or jumping so slave serial I/O processor, handle many feet. In performance the buffering for both input and output results of carrying out the and may even execute parts of the prescribed set of actions is usually Unix kernel's tty code. Besides pro- described in terms of measurable viding overlapped serial I/O opera- quantities. tions for the terminals and printers with the main CPU processing, the There are two main results that intelligent serial I/O subsystem may are important to measure in a Unix also greatly reduce the amount of based computer system: responsiveness processing required by the main CPU. and throughput. Responsiveness meas- The reason for this reduction is that ures the system's ability to respond functions or processing that use to to a work request in a given amount be done by the main CPU are now done of time. Throughput measures the by the serial I/O subsystem. total amount of work performed over a given period of time. The work on a An intelligent disk subsystem computer system consists of the will have one or more disk controll- actions of electronic and electro- ers and disk drives and may also have mechanical processing. Performance a slave processor. The disk subsys- results are often measured by execut- tem handles the control of the disk ing a given workload and measuring drive operations, handles interrupts, the time spent processing the work. handles buffering and provides a high speed DMA data transfer in and out of _F_u_n_c_t_i_o_n_a_l_i_t_y main memory. If a slave processor is available it may be used to execute A computer has three major areas some of the Unix kernel file code. of functionality that a workload Intelligent disk subsystems provide should contain. The first is the overlapped operations with the main processing by the central processing CPU and also reduces the load on the unit (CPU) where all the calculations main processor. In some disk subsys- are performed; the second is the disk tems, operations on separate disk subsystem where the computer stores drives can also be overlapped with data files; the third is serial I/O each other producing a much higher subsystem which controls such items effective transfer rate for the disk - 3 - subsystem. overall user environment. This approach is often not feasible _O_v_e_r_l_a_p_p_e_d _P_e_r_f_o_r_m_a_n_c_e because the set of applications do not exist or it would take too much The effects of overlapped opera- manpower to port the application pro- tions are extremely important to com- grams to run on each computer system puter system performance. This point being evaluated. Also it may take is illustrated by the following exam- too much time to execute a large set ple. Assume a workload is developed of applications on a number of com- that contains 100 seconds of CPU pro- puter systems. cessing, 100 seconds of disk file processing and 100 seconds of termi- _P_o_r_t_a_b_l_e _B_e_n_c_h_m_a_r_k_s nal I/O processing on computer system A and also on computer system B. For the reasons just given port- System A has intelligent disk and tty able benchmarks that are easy to subsystems which can overlap opera- obtain and execute in a relatively tions with the main CPU and each short period of time are often used other and system A also supports the to simulate the application program common Unix multiuser mode. There- environment. These benchmarks should fore system A can execute the work- measure the results needed in perfor- load in 100 seconds. Although system mance evaluation. The most important B has the same CPU power it cannot results that should be provided are overlap I/O operations. System B measurements of throughput and will take 300 seconds of real time to responsiveness. execute the workload. For the work- load described system A is three A well designed general purpose times more powerful than system B in benchmark used to simulate the work- that it will have a throughput rate load of the Unix application environ- three times that of system B. ment should have the following Another way of putting it is that for features: a fixed amount of time system A will accomplish three times as much work (1) A broad weighted mix of basic as system B. CPU operations used with dif- ferent data referencing modes. Although the example was developed to illustrate a point, a (2) A number of user processes exe- typical unix workload contains a sig- cuting in a multiprocess mode. nificant amount of disk and serial I/O and the overlapping of these (3) Terminal I/O taking place with a operations by the hardware/software number of terminals at the same can make a substantial improvement in time. performance. (4) Disk I/O taking place with a _A_p_p_l_i_c_a_t_i_o_n _B_e_n_c_h_m_a_r_k_s number of files scattered over the disk drive surface to cause A sound approach to evaluating head movement. The benchmark the performance of various computer should have the ability to use systems is to take the application or multiple disk drives. sets of application programs that make up the user environment and exe- (5) Results that can provide an cute them on each computer system to accurate measure of the computer be evaluated. It is most important system's ability to execute work that the set of application programs over time. The workload produc- be truly representative of the ing the results should contain - 4 - all the processing operations on (3) The tty test only involves one the system so the system's abil- or two terminals. The use of ity to overlap operations will multiple tty lines all active at be taken into account. the same time represents the actual user environment and its (6) Benchmark should be portable need can best be illustrated by across various Unix systems. an example. Assume the tty peripheral controller can handle _E_v_a_l_u_a_t_i_n_g _E_x_i_s_t_i_n_g _B_e_n_c_h_m_a_r_k_s 3200 characters per second total on output and it is connected to Before developing a new bench- 8 tty lines at 9600 baud. For a mark an evaluation of existing avail- single user the tty controller able benchmarks was performed. The can maintain 960 characters per benchmarks are of two types. The second to that user terminal. first type evaluates the time to per- However, for 8 heavily active form a set of some simple basic users the tty controller can operations such as add integer by only maintain an average of only placing each operation in a loop and 400 characters per second per measuring the time to complete the tty line. For certain workloads loop. The second type of benchmark this limitation will affect the consists of one or more programs that overall throughput of the com- are designed to be appropriate for a puter system. benchmark. These programs are run and the time to execute the programs (4) The disk test consists of read- is measured. ing, writing and copying a sin- gle file. In a multiuser Unix The AIM Benchmark Suite II is a environment a number of separate well known benchmark of the first file disk I/O requests are made type used to evaluate Unix computer which are scattered randomly system's performance (see Vol 11 No. over the disk drive surface. 3 Page 89 for an example). Given the This sequence of random disk I/O objectives of providing an accurate requests requires significant estimate of throughput or responsive- disk head movement. ness through the simulation of a Unix multiuser application environment (5) Unix computer systems are often this benchmark did not meet our goals purchased with multiple disk for the following reasons: drives. It is therefore neces- sary to be able to include in (1) The benchmark failed to include the benchmark multiple disk the basic operations for com- requests at the same time for parisons, logical, bit and all the disk drives in the sys- branch operations which find tem. It is important to evalu- significant use in the Unix ate the system in this mode environment. because it more accurately simu- lates the user environment. (2) The benchmark uses mostly regis- Many systems are able to overlap ter variables. Where data some disk I/O operations between references across the various drives as well as with the main basic operations, such as, CPU. The AIM benchmark did not automatic, static, pointer and provide this capability. array references are commonly used in the Unix environment. (6) There is no way to correlate the Array referencing is included as point system used by the AIM a separate test. benchmark in a multiuser - 5 - environment with throughput or Because responsiveness and responsiveness. The AIM point throughput are both very much related system cannot take into account to a computer system's ability to the overlapped operations of tty process work the relationship between and disk I/O and the reduction the two was investigated. The in the main CPU processing time results of this investigation showed due to the intelligent I/O pro- that a 20% increase in throughput cessors. Even a simple floppy would provide a 20% or greater disk controller can overlap improvement in response time. How operations with the main CPU. much greater than 20% the improvement in response time would be depends on Although the AIM Benchmark Suite the ratio of the time to process an II was developed to execute in a sin- interactive user request to the aver- gle user mode it has been used exten- age user think time. If the think sively to evaluate multiuser Unix time goes to zero, throughput and systems. For our purposes of responsiveness vary the same from evaluating Unix multiuser systems the system to system for a given work- AIM benchmark does not provide an load. adequate simulation of the Unix user environment, and second, does not In an interactive system like provide an accurate measure of the Unix the user think time tends to Unix computer system's ability to create idle time on the various sys- process work. tem components. Given that the user is not a part of the computer system The AIM benchmark does provide that is to be purchased, throughput information that is quite useful in is really the best measure of the tuning unix systems. This information computer system's ability to process can best be used (and is) by organi- work. For the reasons just stated zations marketing Unix computer sys- the benchmark measures throughput as tems who are able to modify the a result of executing a particular hardware and unix software to improve workload. performance. _T_h_e _S_y_s_t_e_m _C_h_a_r_a_c_t_e_r_i_z_a_t_i_o_n _B_e_n_c_h_m_a_r_k The Dhrystone is another bench- mark which provides a good mix of The purpose of the benchmark is statement types, operators and data to compare Unix systems on their reference types but it does not pro- ability to do work (throughput) using vide any I/O operations which are a simulated Unix multiuser environ- needed for the multiple tty lines and ment with multiple tty lines, multi- multiple disk drives used by the Unix ple disk drives and one or more main multiuser environment. processors. For a fair comparison each computer system should execute _T_h_r_o_u_g_h_p_u_t _V_s. _R_e_s_p_o_n_s_i_v_e_n_e_s_s the same workload with the same number of tty lines and the same In the development of the bench- number of disk drives. mark a decision had to be made to measure throughput or responsiveness. The application requirements for Although response time is a very a computer system will vary in each important performance measure in an environment such as word processing, interactive Unix multiuser system it accounting, graphics, data base requires the use of a remote terminal management, spread sheet, compilers emulator computer to provide tty and scientific computations. For inputs to obtain repeatable results. this reason the benchmark (SCB) exe- cutes a number of different workloads - 6 - each with a different mix of I/O The first part of the SCB con- operations. Because of the many dif- sists of executing three programs ferent workloads used by the bench- crun, drun and trun to produce the mark, it is more of a system charac- report shown in Figure 1 that shows terization than a single benchmark. the effective capabilities of the one For this reason the benchmark is or more main CPU , the disk subsystem called the System Characterization and the tty subsystems respectively. Benchmark (SCB). The purpose of crun is to obtain The SCB has the following the relative processing power of the characteristics: one or more main CPUs. The crun pro- gram also provides for multiprocessor (1) The SCB measures the rate systems an estimate of the ratio of (throughput) at which fixed the multiprocessor's throughput to a units of work can be executed in single processor's throughput. If a unit of time (minutes). there is only one CPU this ratio is one. (2) The SCB takes into account the effects of all overlapped opera- The crun code consists of a tions executed in parallel, weighted mix of basic operations on including multiple main proces- various types of data references. sor units, if they exist. The frequency of execution of the basic operations involving the dif- (3) The SCB simulates a multiuser ferent types of data references are Unix application environment weighted according to an internal with unpublished compiler study of Unix based C code. Detail information on (a) A broad mix of main proces- the mix of operations and data refer- sor operations and data ences is contained in the SCB docu- references mentation. The time to execute the crun code is measured and used to (b) Multiple tty lines with determine the user CPU time per work terminals unit process. (c) Multiple disk drives each In setting up the SCB each disk with its own set of data drive specified by the user has 200 files files of 20K bytes each placed on it. In the execution of drun the user may (4) The SCB consists of 50 different specify the percent of files to be runs to provide a system charac- read and written on each disk drive. terization that will cover the The disk program drun produces the many different application information on the disk subsystem in environments. the subsystem report by reading and writing file blocks randomly over the (5) The SCB provides graphical out- different disk drive surfaces speci- put for a quick analysis and fied. comparisons on the basis of throughput. The serial I/O subsystem is usu- ally made up of one or more tty con- (6) The SCB provides information on trollers where each tty controller the effective rates of the main handles a fixed number of tty central processing unit (CPU), lines/terminals. The tty controller the disk subsystem and the tty typically can handle some maximum subsystem. level of characters per second for - 8 - tty output and input across all cuted per minute are calculated for lines. The information in the sub- each of the 50 runs and the results system report is obtained by writing are presented in a graphical form. 32 character lines to all tty ports as fast as the tty controllers can _T_h_e _R_e_s_u_l_t_s _O_f _T_h_e _S_C_B _B_e_n_c_h_m_a_r_k handle them. The results of executing the SCB The second part of the SCB con- are presented in the graphical form sists of 50 separate runs controlled shown in Figure 2. The left axis by a shell script. Each run is made represents the number of work units up of 200 work units. The user that can be executed per minute processes execute 8 or some setable (throughput). The bottom axis is the multiuser level at a time. Each work number of disk requests in a unit of unit consists of the following com- work process. ponents. The letters on the graph (A B C (1) The execution of a copy of xrun E F) represent the different levels code with a CPU user time pro- of tty line writes contained in each portional to the system's crun work unit process. time. Letter Number of tty (2) A given number of 1K disk block line writes reads and writes. A 0 (3) A given number of 32 character B 30 line writes to a tty C 60 line/terminal. D 90 E 120 Each work unit process in the run is identical. The disk and tty requests are equally spaced in the CPU user time to provide more con- The graphical results are pro- sistent and repeatable results. Each duced on a 130 column printer and the work unit has its own unique disk border on the graphical report is 8 read file and write file. The disk 1/2" x 11" so that it may be placed files are preassigned to each work into reports when desired. unit at random. A work unit process has a unique tty line during its exe- There are 50 points on a graph cution. On completion the tty line each specified by a letter. Each is given to the next work unit pro- point represents the throughput rate cess started. in work units per minute. Each point is determined by executing 200 work The 50 runs made by a shell unit processes each with a unit of script uses all the combinations of CPU user processing and the number of the following disk and tty levels: disk and tty requests specified on the graph. disk blocks 0, 4, 8, 12, 16, 20, 24, 28, 32, 36 The SCB results show how a com- puter system performs under a variety tty lines 0, 30, 60, 90, 120 of loads. On moving to the right on the graph, the effect of executing The disk operations are divided more disk requests is indicated by evenly between reads and writes of the steepness of the curve towards existing files. The work units exe- the bottom axis. The more limited - 10 - the disk subsystem in capability the track tape you need to send one with steeper the downward slope to the your request. The set up and opera- right. The different letters show tion is described in the help file on the effect of increasing the tty load the media. Once a simple config file per work unit process. The more lim- is set up, two shell scripts that are ited the tty subsystem in capability provided handle the rest. the greater the downward distance between the curves of the same _R_e_l_a_t_i_n_g _U_s_e_r _A_p_p_l_i_c_a_t_i_o_n _T_o _T_h_e _S_C_B letter. Point A next to the left axis shows the system's ability to Using the ratio graph system A execute CPU user processing with no can now be compared with system B I/O. across a broad mix of different work- loads. However system A may be Facilities are also provided for better than system B (ratio greater comparing two different computer sys- than one) over certain regions of the tems by taking the ratio of the graph and worse than system B (ratio throughputs of all 50 points and less than one) on other regions of graphing the results shown in Figure the graph. For this reason users may 3. The left axis shows the ratio and wish to identify the region of the the bottom axis and the letters (A B SCB graph for their typical applica- C E F) are the same as Figure 2. tions. This graph shows in which area or areas (main CPU, tty, disk) one sys- To obtain information on the tem is superior to the other. Ratios application, execute the application greater than one show an increase in with the unix timex command. The capability in percent above one and timex command with the o option gives ratios less than one show less capa- the CPU user time, the total number bility in percent under one. of blocks read or written to disk and the total characters transferred to The ratio graph of Figure 3 the terminals. Divide the total shows machine A is superior in han- number of CPU user seconds into total dling disk requests. Machine B has a disk blocks read and written and into cpu/compiler speed that is about five the total characters transferred to percent faster. The two machines are the terminal. This gives two quanti- equivalent in handling tty write ties: the number of disk transfers requests. per second of user CPU time and the number of tty characters transferred _O_b_t_a_i_n_i_n_g _A_n_d _R_u_n_n_i_n_g _T_h_e _S_C_B per second of user CPU time. The software for the system These two quantities can be characterization benchmark (SCB) may related to a location on the SCB be obtained free on IBM PC or NCR graphs. First obtain the user CPU Tower compatible floppy from NCR by time per work unit process by taking writing on company letterhead to: it from the subsystem report and cal- ling this quantity C. Multiply C by the number of disk requests per SCB Distribution second of user CPU time from the NCR Corporation application. This gives the number 3325 Platt Springs Road of disk requests on the bottom axis West Columbia, SC 29169 of the SCB graphs. Next multiply C Attention: Ms. Darlene Amick times the number of characters transferred per second of user CPU time from the application divided by If you wish the software on nine 32 giving the number of tty lines on - 12 - the SCB graphs. Having related the processing effectiveness. Because application to a location on the SCB the processing effectiveness varies graphs the relative performance for with the nature of the workload the the application can now be estimated SCB estimates performance for a for every system on which the SCB was number of different workloads. Pro- run. The run time for the applica- viding workloads with various I/O tion on a new system can be estimated levels is an important part of the by multiplying the run time on the SCB design because many Unix systems old system by the ratio on the SCB are I/O intensive. For a number of ratio graph at the application point. applications I/O turns out to be the limiting factor that determines The Unix timex command may also throughput. be used to execute shell commands which means whole sets of applica- The SCB provides a reasonable tions may be related to locations on approximation to a multiuser Unix the SCB graphs. Using the SCB ratio application environment, provides an graphs the percent improvement in easy way to compare the relative pro- throughput can be estimated for a cessing effectiveness between systems number of new systems for different and provides a method of narrowing types of applications. These esti- performance estimates to specific or mates can be obtained without the sets of application programs. need to port any of the application programs. The SCB is easy to set up and run and it provides a comprehensive _C_o_n_c_l_u_s_i_o_n_s and accurate benchmark. By the very nature of benchmarks _A_c_k_n_o_w_l_e_d_g_m_e_n_t they are an approximation of the user application environment. While there I wish to thank Myron Merritt are many different benchmarks measur- for his many hours of discussion and ing various elements of Unix computer his ideas used in the development of systems, a decision to buy a certain the system characterization bench- computer should be based on the mark. system's ability to handle the work- load in the form of electronic and electro-mechanical processing. The time to perform an integer add or read a single disk file is useful information but is certainly not suf- ficient information to base a com- puter purchase on. The problem with gathering a lot of details on Unix computer systems in the form of specifications and the results of running various benchmarks is that it is extremely difficult, if not impossible, because of the system's internal complexity to accu- rately relate this information to a single measure of processing effec- tiveness. The SCB uses the system itself to relate all the various operations and provide one measure of