home *** CD-ROM | disk | FTP | other *** search
-
-
- Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
-
-
- Reinhold P. Weicker
- Siemens AG, E STE 35
- Postfach 3240
- D-8520 Erlangen
- Germany (West)
-
-
-
-
- 1. Why a Version 2 of Dhrystone?
-
- The Dhrystone benchmark program [1] has become a popular benchmark for
- CPU/compiler performance measurement, in particular in the area of
- minicomputers, workstations, PC's and microprocesors. It apparently
- satisfies a need for an easy-to-use integer benchmark; it gives a first
- performance indication which is more meaningful than MIPS numbers
- which, in their literal meaning (million instructions per second),
- cannot be used across different instruction sets (e.g. RISC vs. CISC).
- With the increasing use of the benchmark, it seems necessary to
- reconsider the benchmark and to check whether it can still fulfill this
- function. Version 2 of Dhrystone is the result of such a re-
- evaluation, it has been made for two reasons:
-
- o Dhrystone has been published in Ada [1], and Versions in Ada, Pascal
- and C have been distributed by Reinhold Weicker via floppy disk.
- However, the version that was used most often for benchmarking has
- been the version made by Rick Richardson by another translation from
- the Ada version into the C programming language, this has been the
- version distributed via the UNIX network Usenet [2].
-
- There is an obvious need for a common C version of Dhrystone, since C
- is at present the most popular system programming language for the
- class of systems (microcomputers, minicomputers, workstations) where
- Dhrystone is used most. There should be, as far as possible, only
- one C version of Dhrystone such that results can be compared without
- restrictions. In the past, the C versions distributed by Rick
- Richardson (Version 1.1) and by Reinhold Weicker had small (though
- not significant) differences.
-
- Together with the new C version, the Ada and Pascal versions have
- been updated as well.
-
- o As far as it is possible without changes to the Dhrystone statistics,
- optimizing compilers should be prevented from removing significant
- statements. It has turned out in the past that optimizing compilers
- suppressed code generation for too many statements (by "dead code
- removal" or "dead variable elimination"). This has lead to the
- danger that benchmarking results obtained by a naive application of
- Dhrystone - without inspection of the code that was generated - could
- become meaningless.
-
- The overall policiy for version 2 has been that the distribution of
- statements, operand types and operand locality described in [1] should
- remain unchanged as much as possible. (Very few changes were
- necessary; their impact should be negligible.) Also, the order of
- statements should remain unchanged. Although I am aware of some
- critical remarks on the benchmark - I agree with several of them - and
- know some suggestions for improvement, I didn't want to change the
- benchmark into something different from what has become known as
- "Dhrystone"; the confusion generated by such a change would probably
- outweight the benefits. If I were to write a new benchmark program, I
- wouldn't give it the name "Dhrystone" since this denotes the program
- published in [1]. However, I do recognize the need for a larger number
- of representative programs that can be used as benchmarks; users should
- always be encouraged to use more than just one benchmark.
-
- The new versions (version 2.0 for C, Pascal and Ada) will be
- distributed as widely as possible. Readers who want to use the
- benchmark for their own measurements can obtain a copy in machine-
- readable form on floppy disk (MS-DOS or XENIX format) from the author.
- In addition, the new versions have been posted to the UNIX network
- Usenet.
-
-
- 2. Overall Characteristics of Version 2
-
- In general, version 2 follows - in the parts that are significant for
- performance measurement, i.e. within the measurement loop - the
- published (Ada) version and the C versions previously distributed.
- Where the versions distributed by Rick Richardson [2] and Reinhold
- Weicker have been different, it follows the version distributed by
- Reinhold Weicker. (However, the differences have been so small that
- their impact on execution time in all likelihood has been negligible.)
- The initialization and UNIX instrumentation part - which had been
- omitted in [1] - follows mostly the ideas of Rick Richardson [2].
- However, any changes in the initialization part and in the printing of
- the result have no impact on performance measurement since they are
- outside the measaurement loop. As a concession to older compilers,
- names have been made unique within the first 8 characters for the C
- version.
-
- The original publication of Dhrystone did not contain any statements
- for time measurement since they are necessarily system-dependent.
- However, it turned out that it is not enough just to inclose the main
- procedure of Dhrystone in a loop and to measure the execution time. If
- the variables that are computed are not used somehow, there is the
- danger that the compiler considers them as "dead variables" and
- suppresses code generation for a part of the statements. Therefore in
- version 2 all variables of "main" are printed at the end of the
- program. This also permits some plausibility control for correct
- execution of the benchmark.
-
- At several places in the benchmark, code has been added, but only in
- branches that are not executed. The intention is that optimizing
- compilers should be prevented from moving code out of the measurement
- loop, or from removing code altogether. Statements that are executed
- have been changed in very few places only. In these cases, only the
- role of some operands has been changed, and it was made sure that the
- numbers defining the "Dhrystone distribution" (distribution of
- statements, operand types and locality) still hold as much as possible.
- Except for sophisticated optimizing compilers, execution times for
- version 2.0 should be the same as for previous versions.
-
- Because of the self-imposed limitation that the order and distribution
- of the executed statements should not be changed, there are still cases
- where optimizing compilers may not generate code for some statements.
- To a certain degree, this is unavoidable for small synthetic
- benchmarks. Users of the benchmark are advised to check code listings
- whether code is generated for all statements of Dhrystone.
-
- Contrary to the suggestion in the published paper and its realization
- in the versions previously distributed, no attempt has been made to
- subtract the time for the measurement loop overhead. (This calculation
- has proven difficult to implement in a correct way, and its omission
- makes the program simpler.) However, since the loop check is now part
- of the benchmark, this does have an impact - though a very minor one -
- on the distribution statistics which have been updated for this
- version.
-
-
- 3. Discussion of Individual Changes
-
- In this section, all changes are described that affect the measurement
- loop and that are not just renamings of variables. All remarks refer to
- the C version; the other language versions have been updated similarly.
-
- In addition to adding the measurement loop and the printout statements,
- changes have been made at the following places:
-
- o In procedure "main", three statements have been added in the non-
- executed "then" part of the statement
-
- if (Enum_Loc == Func_1 (Ch_Index, 'C'))
-
- they are
-
- strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
- Int_2_Loc = Run_Index;
- Int_Glob = Run_Index;
-
- The string assignment prevents movement of the preceding assignment
- to Str_2_Loc (5'th statement of "main") out of the measurement loop
- (This probably will not happen for the C version, but it did happen
- with another language and compiler.) The assignment to Int_2_Loc
- prevents value propagation for Int_2_Loc, and the assignment to
- Int_Glob makes the value of Int_Glob possibly dependent from the
- value of Run_Index.
-
- o In the three arithmetic computations at the end of the measurement
- loop in "main ", the role of some variables has been exchanged, to
- prevent the division from just cancelling out the multiplication as
- it was in [1]. A very smart compiler might have recognized this and
- suppressed code generation for the division.
-
- o For Proc_2, no code has been changed, but the values of the actual
- parameter have changed due to changes in "main".
-
- o In Proc_4, the second assignment has been changed from
-
- Bool_Loc = Bool_Loc | Bool_Glob;
-
- to
-
- Bool_Glob = Bool_Loc | Bool_Glob;
-
- It now assigns a value to a global variable instead of a local
- variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
- used afterwards.
-
- o In Func_1, the statement
-
- Ch_1_Glob = Ch_1_Loc;
-
- was added in the non-executed "else" part of the "if" statement, to
- prevent the suppression of code generation for the assignment to
- Ch_1_Loc.
-
- o In Func_2, the second character comparison statement has been changed
- to
-
- if (Ch_Loc == 'R')
-
- ('R' instead of 'X') because a comparison with 'X' is implied in the
- preceding "if" statement.
-
- Also in Func_2, the statement
-
- Int_Glob = Int_Loc;
-
- has been added in the non-executed part of the last "if" statement,
- in order to prevent Int_Loc from becoming a dead variable.
-
- The distribution statistics have been changed only by the addition of
- the measurement loop iteration (1 additional statement, 4 additional
- local integer operands) and by the change in Proc_4 (one operand
- changed from local to global). The distribution statistics in the
- comment headers have been updated accordingly.
-
-
- 4. String Operations
-
- The string operations (string assignment and string comparison) have
- not been changed, to keep the program consistent with the original
- version.
-
- There has been some concern that the string operations are over-
- represented in the program, and that execution time is dominated by
- these operations. This was true in particular when optimizing
- compilers removed too much code in the main part of the program, this
- should have been mitigated in version 2.
-
- It should be noted that this is a language-dependent issue: Dhrystone
- was first published in Ada, and with Ada or Pascal semantics, the time
- spent in the string operations is, at least in all implementations
- known to me, considerably smaller. In Ada and Pascal, assignment and
- comparison of strings are operators defined in the language, and the
- upper bounds of the strings occuring in Dhrystone are part of the type
- information known at compilation time. The compilers can therefore
- generate efficient inline code. In C, string assignemt and comparisons
- are not part of the language, so the C library functions "strcpy" and
- "strcmp" have to be used. In addition to the overhead caused by two
- additional function calls, these functions are defined for null-
- terminated strings where the length of the strings is not known at
- compilation time; the function has to check every byte for the
- termination condition (the null byte).
-
- Obviously, a C library which includes efficiently coded "strcpy" and
- "strcmp" functions helps to obtain good Dhrystone results. However, I
- don't think that this is unfair since string functions do occur quite
- frequently in real programs (editors, command interpreters, etc.). If
- the strings functions are implemented efficiently, this helps real
- programs as well as benchmark programs.
-
- I admit that the string comparison in Dhrystone terminates later (after
- scanning 20 characters) than most string comparisons in real programs.
- For consistency with the original benchmark, I didn't change the
- program despite this weakness.
-
-
- 5. Intended Use of Dhrystone
-
- When Dhrystone is used, the following "ground rules" apply:
-
- o Separate compilation (Ada and C versions)
-
- As mentioned in [1], Dhrystone was written to reflect actual
- programming practice in systems programming. The division into
- several compilation units (5 in the Ada version, 3 in the C version)
- is intended, as is the distribution of inter-module and intra-module
- subprogram calls. Although on many systems there will be no
- difference in execution time to a Dhrystone version where all
- compilation units are merged into one file, the rule is that separate
- compilation should be used. The intention is that real programming
- practice, where programs consist of several independently compiled
- units, should be reflected. This also has implies that the compiler,
- while compiling one unit, has no information about the use of
- variables, register allocation etc. occuring in other compilation
- units. Although in real life compilation units will probably be
- larger, the intention is that these effects of separate compilation
- are modeled in Dhrystone.
-
- A few language systems have post-linkage optimization available
- (e.g., final register allocation is performed after linkage). This
- is a borderline case: Post-linkage optimization involves additional
- program preparation time (although not as much as compilation in one
- unit) which may prevent its general use in practical programming. I
- think that since it defeats the intentions given above, it should not
- be used for Dhrystone.
-
- Unfortunately, ISO/ANSI Pascal does not contain language features for
- separate compilation. Although most commercial Pascal compilers
- provide separate compilation in some way, we cannot use it for
- Dhrystone since such a version would not be portable. Therefore, no
- attempt has been made to provide a Pascal version with several
- compilation units.
-
- o No procedure merging
-
- Although Dhrystone contains some very short procedures where
- execution would benefit from procedure merging (inlining, macro
- expansion of procedures), procedure merging is not to be used. The
- reason is that the percentage of procedure and function calls is part
- of the "Dhrystone distribution" of statements contained in [1].
-
- o Other optimizations are allowed, but they should be indicated
-
- It is often hard to draw an exact line between "normal code
- generation" and "optimization" in compilers: Some compilers perform
- operations by default that are invoked in other compilers only when
- optimization is explicitly requested. Also, we cannot avoid that in
- benchmarking people try to achieve results that look as good as
- possible. Therefore, optimizations performed by compilers - other
- than those listed above - are not forbidden when Dhrystone execution
- times are measured. Dhrystone is not intended to be non-optimizable
- but is intended to be similarly optimizable as normal programs. For
- example, there are several places in Dhrystone where performance
- benefits from optimizations like Common Subexpression Elimination,
- Value Propagation etc., but normal programs usually also benefit from
- these optimizations. Therefore, no effort was made to artificially
- prevent such optimizations. However, measurement reports should
- indicate which compiler optimization levels have been used, and
- reporting results with different levels of compiler optimization for
- the same hardware is encouraged.
-
- o Default results are those without "register" declarations (C version)
-
- When Dhrystone results are quoted without additional qualification,
- they should be understood as results obtained without use of the
- "register" attribute. Good compilers should be able to make good use
- of registers even without explicit register declarations ([3], p.
- 193).
-
- Of course, for experimental purposes, post-linkage optimization,
- procedure merging and/or compilation in one unit can be done to
- determine their effects. However, Dhrystone numbers obtained under
- these conditions should be explicitly marked as such; "normal"
- Dhrystone results should be understood as results obtained following
- the ground rules listed above.
-
- In any case, for serious performance evaluation, users are advised to
- ask for code listings and to check them carefully. In this way, when
- results for different systems are compared, the reader can get a
- feeling how much performance difference is due to compiler optimization
- and how much is due to hardware speed.
-
-
- 6. Acknowledgements
-
- The C version 2.0 of Dhrystone has been developed in cooperation with
- Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
- "Version 1.1" distributed previously by him over the UNIX network
- Usenet. Through his activity with Usenet, Rick Richardson has made a
- very valuable contribution to the dissemination of the benchmark. I
- also thank Chaim Benedelac (National Semiconductor), David Ditzel
- (SUN), Earl Killian and John Mashey (MIPS), Alan Smith and Rafael
- Saavedra-Barrera (UC at Berkeley) for their help with comments on
- earlier versions of the benchmark.
-
-
- 7. Bibliography
-
- [1]
- Reinhold P. Weicker: Dhrystone: A Synthetic Systems Programming
- Benchmark.
- Communications of the ACM 27, 10 (Oct. 1984), 1013-1030
-
- [2]
- Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
- Informal Distribution via "Usenet", Last Version Known to me: Sept.
- 21, 1987
-
- [3]
- Brian W. Kernighan and Dennis M. Ritchie: The C Programming
- Language.
- Prentice-Hall, Englewood Cliffs (NJ) 1978
-
-