Discussions

News: Beautiful Tests chapter from Alberto Savoia

  1. Beautiful Tests chapter from Alberto Savoia (11 messages)

    JUnitFactory's Alberto Savoia wrote a chapter for a book called "Beautiful Code," entitled "Beautiful Tests." The chapter, published online, includes some criteria for beautiful testing - where beauty also means "working." Three criteria: simplicity, breadth and depth, and illustrative of elegant use. It's a great chapter. He walks through testing a binary search, which has an apparently well-known bug in it (Your Humble Editor saw the bug immediately, of course, right after having had it clearly explained to him.) Finding the bug is an interesting challenge, and presents an excellent opportunity for Alberto to walk through the testing process. The bug is caused by integral overflow, which sounds like a great reason to have a holiday (or not.) The chapter uses JUnit, but the concepts apply to any testing framework. He walks through smoke tests ("does it work at all?"), then boundary tests (searches through empty and very small sets). Boundary tests are an interesting issue for him, especially since the bug lies in a failure to handle very large arrays properly. His testing environment doesn't have the resources to handle the failure condition dataset.
    But I also want to test the search with a very large array, and this is where it gets interesting (especially with the hindsight knowledge that the bug manifests itself only on arrays with over one billion elements). My first thought is to create an array large enough to ensure that the integer-overflow bug has been fixed, but I immediately recognize a testability issue: my laptop does not have enough resources to create an array that large in memory. But I know that there are systems that do have many gigabytes of memory and keep large arrays in memory. I want to make sure, one way or another, that the mid integer does not overflow in those cases. What can I do? I know that by the time I am done with some of the other tests I have in mind, I will have enough tests to give me confidence that the basic algorithm and implementation works provided that the midpoint is calculated correctly and does not overflow into a negative number. So, here’s a summary of my reasoning, leading to a possible testing strategy for enormous arrays:
    1. I cannot test binarySearch directly with arrays large enough to verify that the overflow bug in the calculation of mid does not occur anymore.
    2. However, I can write enough tests to give me confidence that my binarySearch implementation works correctly on smaller arrays.
    3. I can also test the way mid is calculated when very large values are used, without getting arrays involved.
    4. So, if I can gain enough confidence through testing that:
      • My implementation of the basic binarySearch algorithm is sound as long as mid is calculated correctly, and
      • The way the midpoint is calculated is correct then I can have confidence that binarySearch will do the right thing on very large arrays.
    So the not-so-obvious, but beautiful, testing strategy is to isolate and test the pesky, overflow-prone calculation independently. One possibility is to create a new method:static int calculateMidpoint(int low, int high) { return (low + high) >>> 1; }then change the following line in the code from:int mid = (low + high) >>> 1;to:int mid = calculateMidpoint(low, high);and then test the heck out of the calculateMidpoint method to make sure it always does the right thing.
    There's actually quite a bit more in the chapter, including dataset generation and mutation testing. It's fascinating reading, and bodes well for the book in which it's published.

    Threaded Messages (11)

  2. What other test books do u recommend?[ Go to top ]

    I am interested to have a learning about testing for software, specifically what are the guidelines, how to write test cases, and more.
  3. I am interested to have a learning about testing for software, specifically what are the guidelines, how to write test cases, and more.
    I have one :-) Hani and I authored "Next Generation Testing in Java", which is available for preorder on Amazon. -- Cedric
  4. Cedric, I haven't looked at that book on Amazon, but I'm quite sure that it'll be about TestNG. If I use Spring, not Seam, will that book be helpful to me? I mean, is there anything that helps me use TestNG with Spring (because Spring default test framework is JUnit 3.8.x)?
  5. Re: What other test books do u recommend?[ Go to top ]

    Thai, Check out Spring 2.1 M4 - this has a new testing framework that supports TestNG. I have tried it myself yet - but intend to over the weekend. Bob
  6. Hi Thai,
    Cedric,

    I haven't looked at that book on Amazon, but I'm quite sure that it'll be about TestNG. If I use Spring, not Seam, will that book be helpful to me? I mean, is there anything that helps me use TestNG with Spring (because Spring default test framework is JUnit 3.8.x)?
    Yes, Spring is extensively covered in the book. We do use TestNG to illustrate most of the concepts, but most of the advice contained in the advice is fairly general. Alex Miller wrote a short pre-review: http://tech.puredanger.com/2007/09/05/next-generation-java-testing-on-the-way/ -- Cedric
  7. Why Just JAVA???[ Go to top ]

    Testing has to be addressed as a enterprise view and solution. I hate JAVA test , .NET test, Mainframe test, My test and your test, finally My test is better than your test approach. Thanks
  8. Interesting chapter. A few comments: 1) "theories" look a lot like a new name for "preconditions", which have been around for a long time. You can call then "test preconditions" if you want to separate them somewhat from the design by contract terminology, but I don't see any fundamental difference (and by the way, TestNG supports these more formally with @Test(dependsOn...)). I also think the boundary between a theory and a test is narrow to the point of being meaningless... When do I express a condition in a theory or in a test? What do I learn from a theory failing compared to a test failing? I'm not really sure. By the way, with TestNG, I would just write tests for all of them and put them in different groups ("smoke-test", "functional-test", etc...). 2) I don't think approaching performance (or scaling actually, since it's what you're actually testing) works by counting an arbitrary statement inside the implementation because you are testing a derived value instead of the real value. And how to you generalize this idea to more complex algorithms? For better testing of scalability, we need to go back to the basics. The definition of an O(log(n)) algorithm is that if the data size grows by 10, then the time should grow by an order of log(10). Why not test that directly? Added bonus to this approach: the test will keep working regardless of how your CPU is upgraded (we have an entire section dedicated to just how to do that in our book as well). 3)) The code in the random section doesn't declare "rand", so the user is probably going to be confused what this variable is and, more importantly, how it's initialized (edit: just noticed it's introduced further down, still would be nice to let the user know so they won't panic). More generally, I think Alberto's approach suffers from a critical oversight: it's not reproducible. Random testing is fine (we cover this extensively in our own book) but, as Dmitry hinted, it needs to be reproducible, so that if a test fails, you can rerun it and obtain the same failure. For this reason, I would recommend passing the seed to the rand number in parameter of the test method or, better, injecting it: public int[] generateRandomSortedArray(Random rand, int maxArraySize, int maxValue) {...} After that, initialize rand with either a @DataProvider or by injection with Guice... By the way, some of the most beautiful testing code I have seen used dependency injection to create code that was strikingly simple and easy to understand, too bad DI didn't get mentioned in this chapter... -- Cedric Author "Next Generation Testing with Java."
  9. Cedric, In software testing, a theory is a statement, in code, of something that should hold true over a possibly infinite set of values. To take a different example than Alberto's, consider a function factors(n), which finds the prime numbers that, when multiplied, equal n. A test for this function might look like (using JUnit 4.4 throughout) @Test public void factorsOf15() { // 15 = 3 * 5 assertThat(factors(15), is(asList(3, 5))); } This is great as an example, but it only finds bugs if they show up when finding the factors of 15. A theory can check for correctness on any input: @Theory public void factorsOfAnyNumber(int n) { List factors = factors(n); // each factor is prime assertThat(factors, each(isPrime())); // the product of the factors is n assertThat(productOf(factors), is(n)); } You're right that test evaluation should be reproducible. In JUnit, a developer specifies a list of data points that will be tried in the theory at test run-time: @DataPoints int[] ints = {0, 3, 777, 1776}; However, the advantage of a theory is that it can also be _explored_, for example, by throwing a bunch of random values at is, as Alberto does. We recommend exploration as a bug-finding process separate from running the tests to find regressions. I hope that helps to clarify what a theory is. I think that precondition is best left as a term for method specifications, and assertions in the implementation code that check them, design-by-contract or not: if (input assumptions, in both theories and tests, which state the conditions that a test assumes (for example, "assumeTrue(internetConnected())")--which may be very different from the conditions that the tested method assumes. All are different from a test dependency in TestNG, which is a statement about the relationship between two tests: if A fails, we know B will fail, so don't waste time testing B. Theories, preconditions, test assumptions, and test dependencies are all different concepts useful in different places.
  10. David has answered the question about the differences between theories, preconditions, and dependencies quite thoroughly, so let me address the other two comments.
    2) I don't think approaching performance (or scaling actually, since it's what you're actually testing) works by counting an arbitrary statement inside the implementation because you are testing a derived value instead of the real value. And how to you generalize this idea to more complex algorithms? For better testing of scalability, we need to go back to the basics. The definition of an O(log(n)) algorithm is that if the data size grows by 10, then the time should grow by an order of log(10). Why not test that directly? Added bonus to this approach: the test will keep working regardless of how your CPU is upgraded (we have an entire section dedicated to just how to do that in our book as well).
    I believe both approaches are useful. One is more algorithm oriented, the other more system/real-world oriented. One may be better than the other depending on the situation. If the code in question makes heavy use of system resources (e.g. file access, swapping, etc.) using real-time might be my first choice. In this case, binary search is so darn simple and so darn fast that I was running against the coarse granularity of the system clock and I would have had to complicate things to get around that (and I was already running against my word-count limit.) On top of that, since pretty much everyone knows how to put a 'start_timer()/stop_timer()' around the code they want to measure, I thought I'd introduce the lesser-known/used performance testing technique. I am still glad I did, but I was not alone. As a matter of fact I was in good company. Jon Bentley use the very same count technique of counting comparisons in chapter 3 in his section on quicksort.
    ... I think Alberto's approach suffers from a critical oversight: it's not reproducible.

    Random testing is fine (we cover this extensively in our own book) but, as Dmitry hinted, it needs to be reproducible, so that if a test fails, you can rerun it and obtain the same failure.

    For this reason, I would recommend passing the seed to the rand number in parameter of the test method...
    I agree. I am glad people are reading the chapter carefully enough to notice (several other people made the same comment.) Funny thing is that I wrote the original version of the code with the random number generator seeded (I even remember using my favorite number (i.e. 42) for the seed value), then I took it out at some point for some reason, and forgot to reseed it when I cut-and-pasted it from the IDE and into the document. Just because those tests are meant to be exploratory in nature, it does not mean that they should not be reproducible and re-use in regression testing if so desired - especially since I took some time to write them. Fortunately most of the people reading the chapter should be smart enough to realize how to change 'new Random()' into 'new Random(seed)'. Alberto
  11. I disagree with some approaches author uses in this chapter. First of all the Random acts of testing. This can lead to different test results, i.e. in theory your tests may pass or not though you run them through the same unchanged peace of code. How many times you need to run your tests to be assured that random generator will cover all branches of your code. Again, what if your code will change, should you rerun the test N times again to get the assurance? You're varying the parameters of random generator (maxArraySize, maxValue and experiments) to test your implementation - from 1000 to 10,000, then 100,000 and so on. You're finished with 1,000,000 and when you're what value will you leave in the test and commit to your version control system where your code will be tested according your build server's scheduling? Now, what if you modify your code - do you need to vary those parameters again from 1000 to 1000000? I don't think you do need such kind of tests at all. Test's inputs and outputs should be strictly defined (right up to constant values). I like the idea about the Theories, but I feel this is just another view to JUnit's TestCases and its test methods; to me there's no difference between testTheories() method and JUnit's TestCase.setUp() method, and the only difference is that you apply Random acts of testing to run your assertTheoryX methods against JUnit's TestCases.testXXX methods. As I already said if you strike out the random test inputs and consider a set of predefined test inputs you may write your test methods as usual to test different aspects of your implementation's functionality (as you're actually doign with, say, searchArrayOfSizeOne test method). Another thing is unnecessary tests in this particular example (with binary search). For instance, why testing different values of integers? I mean how the fact that I'm trying to fing the position of negative (or zero) integer element of array may affect the search algorithm (testForMinAndMaxInteger)? Also while writing my implementation of binarySearch (as you recommend ;) ) using TDD of course :) I've written the following test:
    public void testFindTInNonEmptyArrayIfItContainsT() throws Exception { int[] testArray = {5, 3, 0, 6, 1, 2, 4}; Array array = new Array(testArray); for (int i = 0; i < testArray.length; i++) { int j = testArray[i]; assertEquals(i, array.indexOf(j)); } }
    Yes, my binarySearch implementation differs from your a little :) but the point is you can have smoke test (actually when I write first version of the test it looked like smokeTestsForBinarySearch), boundary test (testBoundaryCasesForItemLocation) and your Theory 4 assertion. While writing the test I've actually misprinted and didn't noticed how I placed two same integer values at the array which leads to the test's failure. To be true I didn't know the specification (and still don't :shameonme) of the binary search and how it should act in this situation. But the fact I used small numbers (from 0 to 9) gives me the opportunity to find the input with 2 equal values in array. As I said - there's no need to use large or negative integers for this algorithm in testcases. I've also written 3 more tests that I think covers my implementation:
    public void testFindTInSingleElementArrayOfT() throws Exception { int[] testArray = {1}; Array array = new Array(testArray); for (int i = 0; i < testArray.length; i++) { int j = testArray[i]; assertEquals(i, array.indexOf(j)); } } public void testFindTInEmptyArray() throws Exception { int[] testArray = {}; Array array = new Array(testArray); assertEquals(-1, array.indexOf(1)); } public void testFindTInNonEmptyArrayThatDoesntContainsT() throws Exception { int[] testArray = {1, 2, 3}; Array array = new Array(testArray); assertEquals(-1, array.indexOf(4)); }
    I think its a little compact than those from the chapter. To be true, I didn't write the test about the mid statement but to my luck I've implemented the mid statement as middleIndex = leftIndex + (rightIndex - leftIndex) / 2; which isn't buggy :) This is just my thoughts that been inspired by the chapter, thats why I'd rate it very good and a "must read" :) P.S. Here's my implementation (written in 30 minutes and not optimised as good as from the chapter :)
    package dmitrygusev; import java.util.ArrayList; import java.util.Collections; import java.util.List; public class Array { private List pairs; class Pair implements Comparable { public final int index; public final int value; public Pair(int index, int value) { this.index = index; this.value = value; } public int compareTo(Pair o) { return new Integer(value).compareTo(new Integer(o.value)); } @Override public String toString() { return "pair[" + index + "] = " + value; } } public Array(int[] array) { if (array == null) { pairs = new ArrayList(); } else { pairs = new ArrayList(array.length); for (int i = 0; i < array.length; i++) { int j = array[i]; pairs.add(new Pair(i, j)); } Collections.sort(pairs); } } public int indexOf(int t) { int leftIndex = 0; int rightIndex = pairs.size() - 1; if (rightIndex < leftIndex) { return -1; } int middleIndex = (rightIndex - leftIndex) / 2; int hits = 0; try { while (rightIndex - leftIndex > 1) { hits ++; Pair pair = pairs.get(middleIndex); if (pair.value == t) { return pair.index; } if (pair.value > t) { rightIndex = middleIndex; } else { leftIndex = middleIndex; } middleIndex = leftIndex + (rightIndex - leftIndex) / 2; } Pair pair = pairs.get(rightIndex); if (pair.value == t) { return pair.index; } pair = pairs.get(leftIndex); if (pair.value == t) { return pair.index; } return -1; } finally { System.out.println(hits); } } }
  12. Some clarifications[ Go to top ]

    Hi Dmitry, Thank you for your extensive feedback. If my chapter encouraged you to think upon, and improve, on how I approached testing in this particular case, then it did what I wanted it to do :-). I agree with you that some of the tests might appear to be overkill. Keep in mind, however, that a tight limit on the number of words for each chapter forced me to use a simple example (i.e. binary search) and a trivial data types (i.e. integers and arrays of integers) to explain and demonstrate some non-trivial testing concepts and techniques. The main objective was to show the basic technique so people can apply it to their much more complex code and data types. Regarding "random acts of testing": I stand by the idea testing with lots of random inputs using theories. There is a lot of research on the value of random test data and, more often than not, it has helped me to find bugs that I would have completely missed. When we hand-pick all the test data we often (unconsciously) apply our preconceived notions and assumptions to the data selection, the same pre-conceived notions and assumptions we used when implementing the code. In many cases, it's those assumptions that cause problems because the users of the code might use it in ways and with data sets that you might not have thought of. Having said that, I would not typically check in the random tests as part of my continuous integration and testing cycle. Some tests are just written to prove to yourself that the algorithm and the code does what it's supposed to do while you are actively developing and experimenting with it. They are very useful to give you the initial confidence (and make sure you have not made any wrong assumptions), but it's probably overkill to run them again and again. If you want to check in random tests, you can make the test data repeatable and deterministic by seeding the random number generator. Thanks again for taking the time to post and share your own techniques. Alberto P.S. My AgitarLabs colleagues David Saff and Marat Boshernitsan have written a very good paper on combining test theories (i.e. forall type assertions) with large (and possibly random) input data sets using a TDD approach: http://shareandenjoy.saff.net/tdd-specifications.pdf