home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-12-21 | 47.0 KB | 1,125 lines |
- Newsgroups: comp.software-eng
- Path: sparky!uunet!spool.mu.edu!sgiblab!munnari.oz.au!metro!usage!syacus!paulb
- From: paulb@syacus.acus.oz.au (Paul Bandler)
- Subject: Value of Code Coverage Analysis Metrics - Summary
- Message-ID: <1992Dec21.061509.25208@syacus.acus.oz.au>
- Organization: ACUS Australian Centre for Unisys Software, Sydney
- Date: Mon, 21 Dec 1992 06:15:09 GMT
- Lines: 1115
-
-
- Sender: news@lmpsbbs.comm.mot.com (Net News)
- Organization: Motorola Land Mobile Products Sector
- Lines: 55
- Nntp-Posting-Host: 145.16.3.73
-
- In article <1992Dec14.072812.13689@syacus.acus.oz.au>, paulb@syacus.acus.oz.au (Paul Bandler) writes:
-
- <preamble deleted>
-
- |> I believe we have a tool to measure the %BFA 'Branch Flow Analysis' but
- |> of course the engineers are responsible for producing the test cases to
- |> exercise the code.
- |>
- |> I have 3 questions:-
- |>
- |> 1) Do people think that this is a valuable metric?
- |> 2) Is it a cost effective excersize to get engineers to achieve a particular
- |> %BFA as a completion criteria?
- |> 3) What is a realistic %BFA to aim for?
- |> Paul Bandler
-
- BFA is a valuable tool for testing. %BFA, as with all unitless numbers, is
- of a more dubious nature.
-
- %BFA does not tell the person looking at it which paths were not tested
- and why. This is important because engineers could, subconsciously or
- otherwise, use their quota of untested branches to avoid testing the more
- complex/niggly areas of code.
-
- The nature of the testing is also important - only if you are programming
- in an interpreted language where syntax must be checked by execution,
- is executing every line of code useful by itself. (You can
- run through every branch of a square root function but it won't
- test for what happens with a negative parameter. There is always a
- temptation to say that a line i++ 'passes because it increments i' without
- checking when it shouldn't.)
-
- As to cost effectiveness this is so dependant upon the available tools,
- maintainance costs, whether or not testability was built into the code etc.
- that any global statement would be rash.
-
- A realistic %BFA target is again dependant upon specific circumstances.
- I don't know if anyone has done any work to find out the %age of
- defects found, against testing coverage and time taken but I would
- be interested to find out. The benefits of higher percentages are not
- linear and certainly peaks.
-
- Setting %BFA as a part of completion criteria is probably worthwhile but
- getting dogmatic about it is probably not. BFA is very useful for helping
- engineers see where more testing is required it is not that useful as an
- exam mark for the testing.
-
- -- David
-
- ========= All opinions are mine and not necessarily Motorola's ============
- = @mail : David Alexander, Channel Tunnel Software, Motorola, Lyon Way, =
- = Camberley (ZUK20), Surrey GU15 3QG, U.K. =
- = Email : (Internet) davidal@comm.mot.com Motorola X400-gateway : CDA004 =
- = Telephone : (office) +44 (0)276-413340 (home)+44 (0)276-24249 =
- ===========================================================================
-
- Subject: >>> Value of High Code Coverage Metrics in Testing - Request for Opinion
- Sender: usenet@news.eng.convex.com (news access account)
- Message-ID: <ssimmons.724336081@convex.convex.com>
- Date: Mon, 14 Dec 1992 12:28:01 GMT
- Nntp-Posting-Host: pixel.convex.com
- Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA
- X-Disclaimer: This message was written by a user at CONVEX Computer
- Corp. The opinions expressed are those of the user and
- not necessarily those of CONVEX.
- Lines: 27
-
-
- Performing test coverage analysis on code is more of a good attitude than
- it is a silver bullet for removing bugs. There is an old mgmt saying,
- "What is inspected is what gets done".
-
- > 1) Do people think that this is a valuable metric?
-
- Again, it improves the code but it does not remove all bugs. Design flaws
- about cases not being handled cannot be found. Also, any code that uses data
- driven tables (e.g. finite state parsers) cannot be measured effectively.
-
- > 2) Is it a cost effective excersize to get engineers to achieve a particular
- > %BFA as a completion criteria?
-
- Sure... if you have the time in the schedule and effective tools to do it.
- Usually, it is best to have people do coverage analysis on their own code
- and have people who don't know the code test it for unanticipated conditions.
-
- > 3) What is a realistic %BFA to aim for?
-
- Fairly low percentage of 50% is usually the maximum possible value. Much
- code is assertion testing. However, every condition should be accounted.
-
- Thank you.
-
-
- Steve Simmons
-
- From comp.software-eng Wed Dec 16 08:41:18 1992
- Newsgroups: comp.software-eng
- Path: syacus!usage!metro!munnari.oz.au!spool.mu.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!menudo.uh.edu!sugar!claird
- From: claird@NeoSoft.com (Cameron Laird)
- Subject: Re: >>> Value of High Code Coverage Metrics in Testing - Request for Opinion
- Organization: NeoSoft Communications Services -- (713) 684-5900
- Date: Mon, 14 Dec 1992 14:16:41 GMT
- Message-ID: <Bz96Bu.EBw@NeoSoft.com>
- References: <ssimmons.724336081@convex.convex.com>
- Lines: 31
-
- In article <ssimmons.724336081@convex.convex.com> ssimmons@convex.com (Steve Simmons) writes:
- .
- .
- .
- >> 2) Is it a cost effective excersize to get engineers to achieve a particular
- >> %BFA as a completion criteria?
- >
- >Sure... if you have the time in the schedule and effective tools to do it.
- PARTICULARLY if you don't have time in the schedule.
- Try it; in my experience, people learn a lot from their
- first coverage exercises. It's far better to learn
- those things *before* shipping the product.
- >Usually, it is best to have people do coverage analysis on their own code
- >and have people who don't know the code test it for unanticipated conditions.
- >
- >> 3) What is a realistic %BFA to aim for?
- >
- >Fairly low percentage of 50% is usually the maximum possible value. Much
- >code is assertion testing. However, every condition should be accounted.
- 85%. Serious.
- .
- .
- .
- If you're lucky, Brian Marick will tune in to this
- conversation; he's the gentleman with the most ex-
- perience and insight on this topic.
- --
-
- Cameron Laird
- claird@Neosoft.com (claird%Neosoft.com@uunet.uu.net) +1 713 267 7966
- claird@litwin.com (claird%litwin.com@uunet.uu.net) +1 713 996 8546
-
- To: paulb@syacus.acus.oz.au
- Subject: Re: Value of High Code Coverage Metrics in Testing - Request for Opinion
- Newsgroups: comp.software-eng
- References: <1992Dec14.072812.13689@syacus.acus.oz.au>
- Status: RO
-
- In comp.software-eng you write:
- >I have 3 questions:-
-
- >1) Do people think that this is a valuable metric?
- >2) Is it a cost effective excersize to get engineers to achieve a particular
- > %BFA as a completion criteria?
- >3) What is a realistic %BFA to aim for?
- >Paul Bandler
-
- Brian Marick has written a couple of papers that answer your questions;
- they are "Experience with the Cost of Test Suite Coverage Measures" and
- "Three Ways to Improve Your Testing". They discuss various types of code
- coverage, and how useful they are. Both are available by anonymous ftp
- from <something>.cs.uiuc.edu. Unfortunately, I don't remember what the
- <something> is; however, you could mail him at marick@cs.uiuc.edu, and
- I'm sure he'd be glad to tell you how to get them.
-
- --Samuel Bates
- samuel@cs.wisc.edu
-
- Date: Mon, 14 Dec 92 09:25:53 PST
- From: Todd Huffman <huffman@yoko.STAT.ORST.EDU>
- Message-Id: <9212141725.AA21682@yoko>
- To: paulb%syacus.acus.OZ.AU
- Subject: Re: Value of High Code Coverage Metrics in Testing - Request for Opinion
- Newsgroups: comp.software-eng
- In-Reply-To: <1992Dec14.072812.13689@syacus.acus.oz.au>
- Organization: Oregon State University Math Department
- Cc:
- Status: RO
-
- Branch coverage metrics are quite useful, and they are accepted by the
- software engineering community at large as being useful. The only
- consideration to work out is efficiency of getting these numbers and
- cost vs. benefit for the organization.
-
- I heard a good comment from a seminar by Tsun Chow (he's at ATT
- Naperville, Ill.). If your line coverage is not 100% then you
- are shipping software that has never been executed. Sounds risky!
- If your branch coverage is not 100% then you are shipping software
- with branches that have never been taken. Also risky.
-
- My experience is that this sort of metric collection must be automated
- if it is to be cost effective. It must be done at the UNIT test stage.
- Something 80 - 100% would be useful. If programmers do not have to
- get 80% coverage, then you will have some of them release barely tested
- modules when the schedule gets tight.
-
- This sort of metric must be put in perspective with the whole QA program.
- Even with 100% coverage there will remain bugs. I think it is important
- to track bugs discovered by the integration/system test group per module.
- Then you will know which programmers have released buggy code to test.
- The 80% (or whatever you choose) branch coverage level should prevent
- the very bad code from being release and subsequent code churning.
-
- Here is a reference where test coverage metrics are used quite well--
- "Experience in Testing the Motif Interface", Jason Su, Paul Ritter
- (they're at Hewlett-Packard). March 1991, IEEE Software.
- That whole issue is devoted to testing--other articles are also good.
-
- Thats all for my 2 cents worth.
- Todd Huffman
-
-
- Received: from hotel.mitre.org.w151_sparc by milner.mitre.org.w151_sparc (4.1/SMI-4.1)
- id AA16924; Mon, 14 Dec 92 12:06:24 EST
- Date: Mon, 14 Dec 92 12:06:24 EST
- From: drodman@milner.mitre.org (David B Rodman)
- Message-Id: <9212141706.AA16924@milner.mitre.org.w151_sparc>
- To: paulb@syacus.acus.oz.au
- Subject: A Brian Marick testing case study (long)
- Status: RO
-
-
- ------- Forwarded Message
-
- Replied: Mon, 06 Jul 92 16:28:06 -0400
- Replied: "Brian Marick <marick@cs.uiuc.edu> "
- Received: from mwunix.mitre.org by milner.mitre.org.w151_sparc (4.1/SMI-4.1)
- id AA10005; Mon, 6 Jul 92 16:25:27 EDT
- Return-Path: <marick@cs.uiuc.edu>
- Received: from m.cs.uiuc.edu by mwunix.mitre.org (5.61/SMI-2.2)
- id AA07558; Mon, 6 Jul 92 16:23:11 -0400
- Received: by m.cs.uiuc.edu id AA15701
- (5.64+/IDA-1.3.4 for vecellio@milner.mitre.org); Mon, 6 Jul 92 15:26:15 -0500
- Date: Mon, 6 Jul 92 15:26:15 -0500
- From: Brian Marick <marick@cs.uiuc.edu>
- Message-Id: <9207062026.AA15701@m.cs.uiuc.edu>
- To: vecellio@milner.mitre.org
- Cc: vecellio@milner.mitre.org
- In-Reply-To: Gary Vecellio's message of Mon, 06 Jul 92 16:20:32 -0400 <9207062020.AA09946@milner.mitre.org.w151_sparc>
- Subject: A testing case study (long)
-
- A CASE STUDY IN COVERAGE TESTING
- Brian Marick
- Testing Foundations
-
- Abstract
-
- I used a C coverage tool to measure the quality of its own test suite.
- I wrote new tests to improve the coverage of a 2600-line segment of
- the tool. I then reused and extended those tests for the next
- release, which included a complete rewrite of that segment. The
- experience reinforced my beliefs about coverage-based testing:
-
- 1. A thorough test suite should achieve nearly 100% feasible coverage.
- 2. Adding tests for additional coverage can be cheap and effective.
- 3. To be effective, testing should not be a blind attempt to achieve
- coverage. Instead, use coverage as a signal that points to weakly-tested
- parts of the specification.
- 4. In addition to suggesting new tests, coverage also tells you when existing
- tests aren't doing what you think, a common problem.
- 5. Coverage beyond branch coverage is worthwhile.
- 6. Even with thorough testing, expect documentation, directed
- inspections, beta testing, and customers to find bugs, especially design
- and specification bugs.
-
- The Generic Coverage Tool
-
- GCT is a freeware coverage tool for C programs, based on the GNU C
- compiler. It measures these kinds of coverage:
- - - branch coverage (every branch must be taken in both directions)
- - - multi-condition coverage (in 'if (a && b)', both subexpressions must
- evaluate to true and false).
- - - loop coverage (require loop not to be taken, to be traversed exactly once,
- and traversed more than once)
- - - relational coverage (require tests for off-by-one errors)
- - - routine entry and call point coverage.
- - - race coverage (extension to routine coverage for multiprocessing)
- - - weak mutation coverage (a research technique)
-
- (For more, see [Marick92].)
-
- The tool comes with a large regression test suite, developed in
- parallel with the code, using a "design a little, test a little, code
- a little" approach, much like that described in [Rettig91]. About
- half the original development time was spent in test construction (with, I
- believe, a corresponding reduction in the amount of frantic debugging
- when problems were found by users - though of course there was some of
- that). Most of the tests are targetted to particular subsystems, but
- they are not unit tests. That is, the tests invoke GCT and deduce
- subsystem correctness by examining GCT's output. Only a few routines
- are tested in isolation using stubs - that's usually too expensive. When
- needed, test support code was built into GCT to expose its internal
- state.
-
- In early releases, I had not measured the coverage of GCT's own test
- suite. However, in planning the 1.3 release, I decided to replace the
- instrumentation module with two parallel versions. The original
- module was to be retained for researchers; commercial users would use
- a different module that wouldn't provide weak mutation coverage but
- would be superior in other ways. Before redoing the implementation, I
- wanted the test suite to be solid, because I knew a good test suite
- would save implementation time.
-
- Measuring Coverage
-
- I used branch, loop, multi-condition, and relational coverage. I'm
- not convinced weak mutation coverage is cost-effective. Here were my
- initial results for the 2617 lines of code I planned to replace.
- (The count excludes comments, blank lines, and lines including only
- braces.)
-
- BINARY BRANCH INSTRUMENTATION (402 conditions total)
- 47 (11.69%) not satisfied.
- 355 (88.31%) fully satisfied.
-
- SWITCH INSTRUMENTATION (90 conditions total)
- 14 (15.56%) not satisfied.
- 76 (84.44%) fully satisfied.
-
- LOOP INSTRUMENTATION (24 conditions total)
- 5 (20.83%) not satisfied.
- 19 (79.17%) fully satisfied.
-
- MULTIPLE CONDITION INSTRUMENTATION (390 conditions total)
- 56 (14.36%) not satisfied.
- 334 (85.64%) fully satisfied.
-
- OPERATOR INSTRUMENTATION (45 conditions total) ;; This is relational coverage
- 7 (15.56%) not satisfied.
- 38 (84.44%) fully satisfied.
-
- SUMMARY OF ALL CONDITION TYPES (951 total)
- 129 (13.56%) not satisfied.
- 822 (86.44%) fully satisfied.
-
- These coverage numbers are consistent with what I've seen using black
- box unit testing combined with judicious peeks into the code. (See
- [Marick91].) I do not target coverage in my test design; it's more
- important to concentrate on the specification, since many important
- faults will be due to omitted code [Glass81].
-
- When the uncovered conditions were examined more closely (which took
- less than an hour), it was clear that the tests were more thorough
- than appears from the above. The 129 uncovered conditions broke down
- as follows:
-
- 28 were impossible to satisfy (sanity checks, loops with fixed bounds
- can't be executed 0 times, and so on).
-
- 46 were support code for a feature that was never implemented (because
- it turned out not to be worthwhile); these were also impossible to
- exercise.
-
- 17 were from temporary code, inserted to work around excessive stack
- growth on embedded systems. It was always expected to be removed, so
- was not tested.
-
- 24 were due to a major feature, added late, that had never had
- regression tests written for it.
-
- 14 conditions corresponded to 10 untested minor features.
-
- All in all, the test suite had been pleasingly thorough.
-
- New Tests Prior to the Rewrite
-
- I spent 4 hours adding tests for the untested major feature. I was
- careful not to concentrate on merely achieving coverage, but rather on
- designing tests based on what the program was supposed to do. Coverage
- is seductive - like all metrics, it is only an approximation of what's
- important. When "making the numbers" becomes the prime focus,
- they're often achieved at the expense of what they're supposed to measure.
-
- This strategy paid off. I found a bug in handling switches within
- macros. A test designed solely to achieve coverage would likely have
- missed the bug. (That is, the uncovered conditions could have
- been satisfied by an easy - but inadequate - test.)
-
- There was another benefit. Experience writing these tests clarified
- design changes I'd planned to make anyway. Writing tests often has
- this effect. That's why it's good to design tests (and write user
- documentation) as early as possible.
-
- I spent two more hours testing the minor features. I did not write
- tests for features that were only relevant to weak mutation.
-
- Branch coverage discovered one pseudo-bug: dead code. A particular
- special case check was incorrect. It was testing a variable against
- the wrong constant. This check could never be true, so the special
- case code was never executed. However, the special case code turned
- out to have the same effect as the normal code, so it was removed.
- (This fault was most likely introduced during earlier maintenance.)
-
- At this point, tests written because of multi-condition, loop, and relational
- coverage revealed no bugs. My intuitive feel was that the tests
- were not useless - they checked situations that might well have led to
- failures, but didn't.
-
- I reran the complete test suite overnight and rechecked coverage the
- next day. One test error was discovered; a typo caused the test to
- miss checking what it was designed to test. Rechecking took 1/2 hour.
-
- Reusing the Test Suite
-
- The rewrite of the instrumentation module was primarily a
- re-implementation of the same specification. All of the test suite
- could be reused, and there were few new features that required new
- tests. (I did go ahead and write tests for the weak mutation features
- I'd earlier ignored.) About 20% of the development time was spent on
- the test suite (including new tests, revisions to existing tests, and
- a major reorganization of the suite's directory structure and
- controlling scripts).
-
- The regression test suite found minor coding errors; they always do,
- in a major revision like this. It found no design flaws. Rewriting
- the internal documentation (code headers) did. (After I finish code,
- I go back and revise all the internal documentation. The shift in
- focus from producing code to explaining it to an imaginary audience
- invariably suggests improvements, usually simplifications. Since I'm
- a one-man company, I don't have the luxury of team code reads.)
-
- The revised test suite achieved 96.47% coverage. Of 37 unsatisfied
- conditions:
-
- 27 were impossible to satisfy.
- 2 were impossible to test portably (GNU C extensions).
- 2 were real (though minor) omissions.
- 1 was due to a test that had been misplaced in the reorganization.
- 5 were IF tests that had been made redundant by the rewrite. They were removed.
-
- It took an hour to analyse the coverage results and write the needed
- tests. They found no bugs. Measuring the coverage for the augmented
- test suite revealed that I'd neglected to add one test file to the
- test suite's controlling script.
-
- Other Tests
-
- The 1.3 release also had other features, which were duly tested. For
- one feature, relational operator coverage forced the discovery of a
- bug. A coverage condition was impossible to satisfy because the code
- was wrong. I've found that loop, multi-condition, and relational
- operator coverage are cheap to satisfy, once you've satisfied branch
- coverage. This bug was severe enough that it alone justified the time
- I spent on coverage beyond branch.
-
- Impossible conditions due to bugs happen often enough that I believe
- goals like "85% coverage" are a mistake. The problem with such goals
- is that you don't look at the remaining 15%, deciding, without
- evidence, that they're either impossible or not worth satisfying. It's
- better - and not much more expensive - to decide each case on its
- merits.
-
- What Testing Missed
-
- Three bugs were discovered during beta testing, one after release (so
- far). I'll go into some detail, because they nicely illustrate the
- types of bugs that testing tends to miss.
-
- The first bug was a high level design omission. No testing technique
- would force its discovery. ("Make sure you instrument routines with a
- variable number of arguments, compile them with the GNU C
- compiler, and do that on a machine where GCC uses its own copy of
- <varargs.h>.") This is exactly the sort of bug that beta testing is
- all about.
-
- Fixing the bug required moderately extensive changes and additions,
- always dangerous just before a release. Sure enough, the fix
- contained two bugs of its own (perhaps because I rushed to meet a
- self-imposed deadline).
-
- - - The first was a minor design omission. Some helpful code was added
- to warn GCC users iff they need to worry about <varargs.h>. This code
- made an assumption that was violated in one case. Coverage would not
- force a test to detect this bug, which is of the sort that's fixed by
- changing
-
- if (A && B)
-
- to
-
- if (A && B && C)
-
- It would have been nice if GCT would have told me that "condition C,
- which you should have but don't, was never false", but this is more
- than a coverage tool can reasonably be expected to do. I found the
- bug by augmenting the acceptance test suite, which consists of
- instrumenting and running several large "real" programs. (GCT's test
- suite contains mostly small programs.) Instrumenting a new real
- program did the trick.
-
- - - As part of the original fix, a particular manifest constant had to
- be replaced by another in some cases. I missed one of the cases. The
- result was that a few too few bytes of memory were allocated for a
- buffer and later code could write past the end. Existing tests did
- indeed force the bytes to be written past the end; however, this
- didn't cause a failure on my development machine (because the memory
- allocator rounds up). It did cause a failure on a different machine.
- Memory allocation bugs, like this one and the next, often slip past
- testing.
-
- The final bug was a classic: freeing memory that was not supposed to
- be freed. None of the tests caused the memory to be reused after
- freeing, but a real program did. I can envision an implementable type
- of coverage that would force detection of this bug, but it seems as
- though a code-read checklist ought to be better. I use such a
- checklist, but I still missed the bug.
-
-
- References
-
- [Rettig91] Marc Rettig, "Practical Programmer: Testing Made
- Palatable", CACM, May, 1991.
-
- [Marick91] Brian Marick, "Experience with the Cost of Different
- Coverage Goals for Testing", Pacific Northwest Software Quality
- Conference, 1991.
-
- [Marick92] Brian Marick, "A Tutorial Introduction to GCT",
- "Generic Coverage Tool (GCT) User's Guide", "GCT Troubleshooting",
- "Using Race Coverage with GCT", and "Using Weak Mutation Coverage with GCT",
- Testing Foundations, 1992.
-
- [Glass81] Robert L. Glass. "Persistent Software Errors", Transactions
- on Software Engineering", vol. SE-7, No. 2, pp. 162-168,
- March, 1981.
-
- Brian Marick, marick@cs.uiuc.edu, uiucdcs!marick
- Testing Foundations: Consulting, Training, Tools.
-
-
-
- ------- End of Forwarded Message
-
-
-
-
- From drodman@milner.mitre.org@usage.csd.unsw.oz Tue Dec 15 04:05:25 1992
- From: drodman@milner.mitre.org (David B Rodman)
- Message-Id: <9212141702.AA16882@milner.mitre.org.w151_sparc>
- To: paulb@syacus.acus.oz.au
- Subject: Value of High Code Coverage Metrics in Testing
- Cc: drodman@milner.mitre.org
- Status: RO
-
- >I have 3 questions:-
-
- >1) Do people think that this is a valuable metric?
- >2) Is it a cost effective excersize to get engineers to achieve a particular
- > %BFA as a completion criteria?
- >3) What is a realistic %BFA to aim for?
-
- 1) (is a valuable metric?)
- My personal opinion is that it is. However, it must be taken within the
- context of a comprehensive test program and not as a sole metric. Also the metric
- should not be abused by the development engineers as an excuse for thought! What I
- mean by this is the drive to achieve high branch coverage sometimes causes
- engineers to design simple tests which get high coverage but result in missing errors.
-
- example pseudo code:
- if value greater than 0.0 then
- next value = 1/sqr(value) /comment/ *error* should be sqroot()
- end_if
-
- simple test cases would be 1, -1 and maybe 0.0, However all of these test cases
- produce the same result as the correct code, leaving what could be a very hard
- to find error to s/w integration.
-
- 2) (Is it a cost effective excersize)
- I do not have any numbers available, but I have a copy of "Brian Marick
- <marick@cs.uiuc.edu> " "A CASE STUDY IN COVERAGE TESTING" (previously posted)
- He has done some measurements based on his own programming habits. I will forward as
- a seperate posting.
-
- 3) (What is a realistic %BFA to aim for?)
- I would make it 100%, then make exceptions on a case by case basis.
- Exception handling code, spurious error code and other generally unproducable
- actions are good cadidates for exceptions.
-
-
- I have some interest in other opinions/answers to your questions. Could you
- please return/post a summary of resonses to your questions.
- Thanks
-
-
- David Rodman
- MITRE Corporation
- Mclean, VA
- drodman@mitre.org
-
- From Michael_P._Kirby.roch803@xerox.com@usage.csd.unsw.oz Tue Dec 15 04:02:29 1992
- Date: Mon, 14 Dec 1992 08:04:38 PST
- From: Michael_P._Kirby.roch803@xerox.com
- Subject: Re: Value of High Code Coverage Metrics in Tes
- To: paulb@syacus.acus.oz.au
- Cc: kirby.roch803@xerox.com
- Reply-To: Michael_P._Kirby.roch803@xerox.com
- Message-Id: <"14-Dec-92 11:04:38 EST".*.Michael_P._Kirby.roch803@Xerox.com>
- Status: RO
-
- Received: by milo.isdl (4.1/SMI-4.0) id AA07499; Mon, 14 Dec 92 11:04:33 EST
-
- Paul,
-
- Do you guys already practice software inspections? From my experience it is
- very good practice to hold formal software inspections. There are many good
- articles that describe both the process and the results
-
- Russle, "Experience withInspection in Ultralarge Scale Developments," IEEE Software,
- January 1991
-
- Fagan, "Design and code inspections to reduce errors in program development," IBM
- Systems Journal, No.3, 1976, pp.184-211.
-
- Fagan, "Advances in Software Inspections," IEEE Transactions on Software Engineering,
- July 1986, pp744-751
-
- These are just a couple.
-
- I have talked to several people who say that once good inspections are in place,
- unit testing becomes unnessesary. System testing, of course, is still very important.
-
- As for BFA, I don't have any experience with it, but I'm of the philosophy that
- testing by exaustion is not feasable. Therefore BFA only tells you how close you are
- to a reasonable test coverage. Perhaps an alternative approach is to apply some kind
- of reliability growth modeling. (John Musa has written several books on the subject).
- Here the idea is that we test based on a customer operational profile. (i.e. we
- test features, not code path's). We then model the number of "failures" that a customer
- will see. From this we can set a threshold of failures/unit time that is "acceptable"
- to the customer. At this point the product is ready to ship.
-
- After shipping the product we can do realiability modeling to deterimine how
- good our integration testing really was.
-
- These are just other ideas.
-
- Mike Kirby
- Xerox Corp
- E-mail: kirby.roch803@xerox.com
-
- From lawrence@uk.ate.slb.com@usage.csd.unsw.oz Wed Dec 16 04:07:02 1992
- Date: Tue, 15 Dec 92 14:11:24 GMT
- From: lawrence%ukfca1@sj.ate.slb.com
- Message-Id: <9212151411.AA07293@juniper.uk.ate.slb.com>
- To: paulb@syacus.acus.oz.au
- Subject: Code coverage metrics.
- Status: RO
-
-
- Paul,
-
- At the current site where I am working the Sun utility 'tcov' is used
- to give a simple metric for test coverage.
-
- As I understand it tcov gives a form of BFA based on statements.
-
- The powers that be here have decided that all code will be tested to
- give a minimum of 90% coverage.
-
- I don't know how the figure 90% was reached, I suspect it just sounded good.
-
- All the same, looking at the history of software engineering at this
- site, including a "test coverage" criterion in the engineering standards
- has been beneficial.
-
- The main benefit has been that engineers now think about coverage when
- designing the tests.
-
- (Yes, unit testing is done by the engineers here).
-
- The down side is that the engineers are aware of the (probably arbitrary)
- coverage figure when designing/coding and this affects their efforts.
- In particular, engineers are encouraged to introduce a lot of executable
- code (rather than making it table/data driven) since they know that this
- will increase the amount of code which will be exercised with "easy" test
- cases, and the amount of code which is hard to test will be relatively
- small. If they can get the "hard" test cases into the 10% which doesn't
- have to be tested then it makes their job easier/quicker.
-
- Of course the brownie points are awarded for speed rather than quality.
-
- I hope this is of interest to you.
-
- I would appreciate a summary of what you learn. maybe you could mail
- me as I don't always have access to the news feed.
-
- (PS: I forgot to mention - this is on respnse to Article 6755 in
- comp.software-eng)
-
- Glenn
- < lawrence@uk.ate.slb.co >
-
- From philip@mama.research.canon.oz Tue Dec 15 18:02:17 1992
- Date: Tue, 15 Dec 92 17:45:29 EST
- From: philip@research.canon.oz.au (Philip Craig)
- Message-Id: <9212150645.AA01582@denis.research.canon.oz.au>
- To: paulb@syacus.acus.oz.au
- Subject: Re: Value of High Code Coverage Metrics in Testing - Request for Opinion
- Newsgroups: comp.software-eng
- In-Reply-To: <1992Dec14.072812.13689@syacus.acus.oz.au>
-
- Organization: Canon Information Systems Research Australia
- Status: RO
-
- In article <1992Dec14.072812.13689@syacus.acus.oz.au> you write:
- >I have been tasked with the Quality Assurance of a Product Development. One of
- >the metrics I have been asked to measure is the 'Branch Flow Analysis' (BFA)
- >percentage achieved during unit and system testing. i.e. How much of the
- >potential paths through the code is actually excersized during these
- >testing phases.
- >
- >I believe we have a tool to measure the %BFA but of course the engineers are
- >responsible for producing the test cases to excersize the code.
- >
- >I have 3 questions:-
- >
- >1) Do people think that this is a valuable metric?
- >2) Is it a cost effective excersize to get engineers to achieve a particular
- > %BFA as a completion criteria?
- >3) What is a realistic %BFA to aim for?
-
- Hi Paul,
-
- I'm responsible for some testing here at CISRA, so I'd be thrilled to get
- copies of any replies that you get (apart from me-too ones!).
-
- what are you using to measure BFA? GCT?
-
- It seems to me that it can be valuable objective quantitative metric,
- particularly if the people writing the test suites are *not* initially
- striving for a particular BFA percentage.
-
- That is, have them write tests to test functionality, function points, or
- whatever, and then look at the BFA percentage. This can tell you whether
- a good portion of the functionality is being tested (where good depends
- very much on how much confidence you need to have in the product).
-
- Have I ever seen you at soccer? I know some ACUS people go. Are you the
- Paul who goes?
- --
- Philip Craig philip@research.canon.oz.au Phone:+61 2 8052951 Fax:+61 2 8052929
- "Now bid me run, and I will strive with things impossible -
- yea, and get the better of them."
- -- W. Shakespeare, JULIUS CAESAR
-
- From marick@hal.cs.uiuc.edu@usage.csd.unsw.oz Sat Dec 19 04:33:12 1992
- Date: Fri, 18 Dec 1992 11:30:36 -0600
- From: Brian Marick <marick@hal.cs.uiuc.edu>
- Message-Id: <199212181730.AA00537@hal.cs.uiuc.edu>
- To: paulb@syacus.acus.oz.au
- Subject: Re: Value of High Code Coverage Metrics in Testing - Request for Opinion
- Newsgroups: comp.software-eng
- References: <1992Dec14.072812.13689@syacus.acus.oz.au>
- Status: RO
-
- This may be of interest:
-
- A CASE STUDY IN COVERAGE TESTING
- Brian Marick
- Testing Foundations
-
- Abstract
-
- I used a C coverage tool to measure the quality of its own test suite.
- I wrote new tests to improve the coverage of a 2600-line segment of
- the tool. I then reused and extended those tests for the next
- release, which included a complete rewrite of that segment. The
- experience reinforced my beliefs about coverage-based testing:
-
- 1. A thorough test suite should achieve nearly 100% feasible coverage.
- 2. Adding tests for additional coverage can be cheap and effective.
- 3. To be effective, testing should not be a blind attempt to achieve
- coverage. Instead, use coverage as a signal that points to weakly-tested
- parts of the specification.
- 4. In addition to suggesting new tests, coverage also tells you when existing
- tests aren't doing what you think, a common problem.
- 5. Coverage beyond branch coverage is worthwhile.
- 6. Even with thorough testing, expect documentation, directed
- inspections, beta testing, and customers to find bugs, especially design
- and specification bugs.
-
- The Generic Coverage Tool
-
- GCT is a freeware coverage tool for C programs, based on the GNU C
- compiler. It measures these kinds of coverage:
- - branch coverage (every branch must be taken in both directions)
- - multi-condition coverage (in 'if (a && b)', both subexpressions must
- evaluate to true and false).
- - loop coverage (require loop not to be taken, to be traversed exactly once,
- and traversed more than once)
- - relational coverage (require tests for off-by-one errors)
- - routine entry and call point coverage.
- - race coverage (extension to routine coverage for multiprocessing)
- - weak mutation coverage (a research technique)
-
- (For more, see [Marick92].)
-
- The tool comes with a large regression test suite, developed in
- parallel with the code, using a "design a little, test a little, code
- a little" approach, much like that described in [Rettig91]. About
- half the original development time was spent in test construction (with, I
- believe, a corresponding reduction in the amount of frantic debugging
- when problems were found by users - though of course there was some of
- that). Most of the tests are targetted to particular subsystems, but
- they are not unit tests. That is, the tests invoke GCT and deduce
- subsystem correctness by examining GCT's output. Only a few routines
- are tested in isolation using stubs - that's usually too expensive. When
- needed, test support code was built into GCT to expose its internal
- state.
-
- In early releases, I had not measured the coverage of GCT's own test
- suite. However, in planning the 1.3 release, I decided to replace the
- instrumentation module with two parallel versions. The original
- module was to be retained for researchers; commercial users would use
- a different module that wouldn't provide weak mutation coverage but
- would be superior in other ways. Before redoing the implementation, I
- wanted the test suite to be solid, because I knew a good test suite
- would save implementation time.
-
- Measuring Coverage
-
- I used branch, loop, multi-condition, and relational coverage. I'm
- not convinced weak mutation coverage is cost-effective. Here were my
- initial results for the 2617 lines of code I planned to replace.
- (The count excludes comments, blank lines, and lines including only
- braces.)
-
- BINARY BRANCH INSTRUMENTATION (402 conditions total)
- 47 (11.69%) not satisfied.
- 355 (88.31%) fully satisfied.
-
- SWITCH INSTRUMENTATION (90 conditions total)
- 14 (15.56%) not satisfied.
- 76 (84.44%) fully satisfied.
-
- LOOP INSTRUMENTATION (24 conditions total)
- 5 (20.83%) not satisfied.
- 19 (79.17%) fully satisfied.
-
- MULTIPLE CONDITION INSTRUMENTATION (390 conditions total)
- 56 (14.36%) not satisfied.
- 334 (85.64%) fully satisfied.
-
- OPERATOR INSTRUMENTATION (45 conditions total) ;; This is relational coverage
- 7 (15.56%) not satisfied.
- 38 (84.44%) fully satisfied.
-
- SUMMARY OF ALL CONDITION TYPES (951 total)
- 129 (13.56%) not satisfied.
- 822 (86.44%) fully satisfied.
-
- These coverage numbers are consistent with what I've seen using black
- box unit testing combined with judicious peeks into the code. (See
- [Marick91].) I do not target coverage in my test design; it's more
- important to concentrate on the specification, since many important
- faults will be due to omitted code [Glass81].
-
- When the uncovered conditions were examined more closely (which took
- less than an hour), it was clear that the tests were more thorough
- than appears from the above. The 129 uncovered conditions broke down
- as follows:
-
- 28 were impossible to satisfy (sanity checks, loops with fixed bounds
- can't be executed 0 times, and so on).
-
- 46 were support code for a feature that was never implemented (because
- it turned out not to be worthwhile); these were also impossible to
- exercise.
-
- 17 were from temporary code, inserted to work around excessive stack
- growth on embedded systems. It was always expected to be removed, so
- was not tested.
-
- 24 were due to a major feature, added late, that had never had
- regression tests written for it.
-
- 14 conditions corresponded to 10 untested minor features.
-
- All in all, the test suite had been pleasingly thorough.
-
- New Tests Prior to the Rewrite
-
- I spent 4 hours adding tests for the untested major feature. I was
- careful not to concentrate on merely achieving coverage, but rather on
- designing tests based on what the program was supposed to do. Coverage
- is seductive - like all metrics, it is only an approximation of what's
- important. When "making the numbers" becomes the prime focus,
- they're often achieved at the expense of what they're supposed to measure.
-
- This strategy paid off. I found a bug in handling switches within
- macros. A test designed solely to achieve coverage would likely have
- missed the bug. (That is, the uncovered conditions could have
- been satisfied by an easy - but inadequate - test.)
-
- There was another benefit. Experience writing these tests clarified
- design changes I'd planned to make anyway. Writing tests often has
- this effect. That's why it's good to design tests (and write user
- documentation) as early as possible.
-
- I spent two more hours testing the minor features. I did not write
- tests for features that were only relevant to weak mutation.
-
- Branch coverage discovered one pseudo-bug: dead code. A particular
- special case check was incorrect. It was testing a variable against
- the wrong constant. This check could never be true, so the special
- case code was never executed. However, the special case code turned
- out to have the same effect as the normal code, so it was removed.
- (This fault was most likely introduced during earlier maintenance.)
-
- At this point, tests written because of multi-condition, loop, and relational
- coverage revealed no bugs. My intuitive feel was that the tests
- were not useless - they checked situations that might well have led to
- failures, but didn't.
-
- I reran the complete test suite overnight and rechecked coverage the
- next day. One test error was discovered; a typo caused the test to
- miss checking what it was designed to test. Rechecking took 1/2 hour.
-
- Reusing the Test Suite
-
- The rewrite of the instrumentation module was primarily a
- re-implementation of the same specification. All of the test suite
- could be reused, and there were few new features that required new
- tests. (I did go ahead and write tests for the weak mutation features
- I'd earlier ignored.) About 20% of the development time was spent on
- the test suite (including new tests, revisions to existing tests, and
- a major reorganization of the suite's directory structure and
- controlling scripts).
-
- The regression test suite found minor coding errors; they always do,
- in a major revision like this. It found no design flaws. Rewriting
- the internal documentation (code headers) did. (After I finish code,
- I go back and revise all the internal documentation. The shift in
- focus from producing code to explaining it to an imaginary audience
- invariably suggests improvements, usually simplifications. Since I'm
- a one-man company, I don't have the luxury of team code reads.)
-
- The revised test suite achieved 96.47% coverage. Of 37 unsatisfied
- conditions:
-
- 27 were impossible to satisfy.
- 2 were impossible to test portably (GNU C extensions).
- 2 were real (though minor) omissions.
- 1 was due to a test that had been misplaced in the reorganization.
- 5 were IF tests that had been made redundant by the rewrite. They were removed.
-
- It took an hour to analyse the coverage results and write the needed
- tests. They found no bugs. Measuring the coverage for the augmented
- test suite revealed that I'd neglected to add one test file to the
- test suite's controlling script.
-
- Other Tests
-
- The 1.3 release also had other features, which were duly tested. For
- one feature, relational operator coverage forced the discovery of a
- bug. A coverage condition was impossible to satisfy because the code
- was wrong. I've found that loop, multi-condition, and relational
- operator coverage are cheap to satisfy, once you've satisfied branch
- coverage. This bug was severe enough that it alone justified the time
- I spent on coverage beyond branch.
-
- Impossible conditions due to bugs happen often enough that I believe
- goals like "85% coverage" are a mistake. The problem with such goals
- is that you don't look at the remaining 15%, deciding, without
- evidence, that they're either impossible or not worth satisfying. It's
- better - and not much more expensive - to decide each case on its
- merits.
-
- What Testing Missed
-
- Three bugs were discovered during beta testing, one after release (so
- far). I'll go into some detail, because they nicely illustrate the
- types of bugs that testing tends to miss.
-
- The first bug was a high level design omission. No testing technique
- would force its discovery. ("Make sure you instrument routines with a
- variable number of arguments, compile them with the GNU C
- compiler, and do that on a machine where GCC uses its own copy of
- <varargs.h>.") This is exactly the sort of bug that beta testing is
- all about.
-
- Fixing the bug required moderately extensive changes and additions,
- always dangerous just before a release. Sure enough, the fix
- contained two bugs of its own (perhaps because I rushed to meet a
- self-imposed deadline).
-
- - The first was a minor design omission. Some helpful code was added
- to warn GCC users iff they need to worry about <varargs.h>. This code
- made an assumption that was violated in one case. Coverage would not
- force a test to detect this bug, which is of the sort that's fixed by
- changing
-
- if (A && B)
-
- to
-
- if (A && B && C)
-
- It would have been nice if GCT would have told me that "condition C,
- which you should have but don't, was never false", but this is more
- than a coverage tool can reasonably be expected to do. I found the
- bug by augmenting the acceptance test suite, which consists of
- instrumenting and running several large "real" programs. (GCT's test
- suite contains mostly small programs.) Instrumenting a new real
- program did the trick.
-
- - As part of the original fix, a particular manifest constant had to
- be replaced by another in some cases. I missed one of the cases. The
- result was that a few too few bytes of memory were allocated for a
- buffer and later code could write past the end. Existing tests did
- indeed force the bytes to be written past the end; however, this
- didn't cause a failure on my development machine (because the memory
- allocator rounds up). It did cause a failure on a different machine.
- Memory allocation bugs, like this one and the next, often slip past
- testing. [Later note: tools like Cahill's dbmalloc/Sentinal and Pure's
- Purify can move some of these bugs from the realm of the untestable to
- the realm of the testable. I now use dbmalloc, and it would have
- caught this bug.]
-
- The final bug was a classic: freeing memory that was not supposed to
- be freed. None of the tests caused the memory to be reused after
- freeing, but a real program did. I can envision an implementable type
- of coverage that would force detection of this bug, but it seems as
- though a code-read checklist ought to be better. I use such a
- checklist, but I still missed the bug.
-
-
- References
-
- [Rettig91] Marc Rettig, "Practical Programmer: Testing Made
- Palatable", CACM, May, 1991.
-
- [Marick91] Brian Marick, "Experience with the Cost of Different
- Coverage Goals for Testing", Pacific Northwest Software Quality
- Conference, 1991.
-
- [Marick92] Brian Marick, "A Tutorial Introduction to GCT",
- "Generic Coverage Tool (GCT) User's Guide", "GCT Troubleshooting",
- "Using Race Coverage with GCT", and "Using Weak Mutation Coverage with GCT",
- Testing Foundations, 1992.
-
- [Glass81] Robert L. Glass. "Persistent Software Errors", Transactions
- on Software Engineering", vol. SE-7, No. 2, pp. 162-168,
- March, 1981.
-
- Brian Marick, marick@cs.uiuc.edu, uiucdcs!marick
- Testing Foundations: Consulting, Training, Tools.
-
-
- From pilchuck!phred!timf@uunet.uu.net@usage.csd.unsw.oz Fri Dec 18 12:33:31 1992
- >From pilchuck!phred!timf@uunet.uu.net Wed Dec 16 09:15:43 1992
- Message-Id: <m0n2CFL-000FCGC@data-io.com>
- Date: Wed, 16 Dec 92 17:15:43 -0800
- To: pilchuck!paulb@syacus.acus.oz.au
- Subject: Re: Value of High Code Coverage Metrics in Testing - Request for Opinion
- From: pilchuck!phred!timf@uunet.uu.net (Tim Farley)
- Status: RO
-
- This is probably getting too you really late since our news
- here is really slow, but...
-
- I think code coverage is a valuable metric, as long as it's
- not you're only measurement of test completeness. As long as
- you're doing other things to assure that you have adequate
- data coverage for your tests, and can show functional coverage
- as well, you should come out okay. If you haven't already, you
- might want to check through the IEEE standard for measures to
- produce reliable software (982.2-1988) for some other coverage
- measurements.
-
- Depending on the tool you're using to calculate code coverage
- it can be very cost effective. The tool that I've used (TekDB
- with CCA) prints a variety of reports to show exactly what code
- has been tested and what hasn't. For engineers, this makes it
- really easy to get the coverage up since all you do is run some
- tests, generate the report, see what you haven't hit, then add
- some more tests for these. We usually start the engineers off
- with the high level test cases developed from the requirements
- specifications and design documents, have them run through those
- and check the coverage, then add in additional tests for what
- hasn't been hit. It's not difficult, and if time has already been
- scheduled for the engineer to do testing, it doesn't really add
- anymore time (assuming a reasonable estimate for testing was
- given) and can actually decrease the amount of time if they aren't
- following any method with their testing.
-
- I saw someone else posted something that said that 50% or so was
- a good target. We set our target at 95%, with the expectation
- that anything less than 100% had to be include a description of
- why the remaining code had not been tested. Saying you didn't have
- enough time wasn't accepted. They probability of that code being
- executed, and the severity of a defect in the functionality
- directly supported by that code, had to be included.
-
- We always got at least 95% coverage, usually higher. With our
- products, turning on the unit and running very basic functions
- got about 60-70% coverage. You could get up to 90% or so with
- the basic error and boundary conditions. The rest were usually
- very rare hardware faults detected by the software. (I should
- add that these were all test and measurement devices with
- embedded software, in case you're wondering.)
-
- I don't know what kind of devices you're working on. I'm currently
- working on medical devices which have a lot more error detection
- software and redundant systems, and I'm not sure what it would take
- to get the same amount of coverage. However, I will set 100% as
- the goal and anything less than that requiring a complete description
- (which would still have to be approved).
-
- What development environment are you working with? Just curious.
-
- Tim Farley
- Sr. SQA Engineer
-
-