JUnitFactory's Code C.R.A.P. metric

Home

News: JUnitFactory's Code C.R.A.P. metric

  1. JUnitFactory's Code C.R.A.P. metric (22 messages)

    Alberto Savoia has written an article for Artima covering a plugin for Eclipse called crap4j, which uses cyclomatic complexity and code coverage from automated tests to help estimate the effort and risk associated with maintaining legacy code, yielding a "Change Risk Analysis and Prediction" score - i.e., whether a reader will say "this is crap!"
    There is no fool-proof, 100% objective and accurate way to determine if a particular piece of code is crappy or not. However, our intuition – backed by research and empirical evidence – is that unnecessarily complex and convoluted code, written by someone else, is the code most likely to elicit a "This is crap!" response. If the person looking at the code is also responsible for maintaining it going forward, the response typically changes into "Oh crap!" Since writing automated tests (e.g., using JUnit) for complex code is particularly hard to do, crappy code usually comes with few, if any, automated tests. The presence of automated tests implies not only some degree of testability (which in turn seems to be associated with better, or more thoughtful, design), but it also means that the developers cared enough and had enough time to write tests – which is a good sign for the people inheriting the code. Because the combination of complexity and lack of tests appear to be good indicators of code that is potentially crappy – and a maintenance challenge – my Agitar Labs colleague Bob Evans and I have been experimenting with a metric based on those two measurements. The Change Risk Analysis and Prediction (CRAP) score uses cyclomatic complexity and code coverage from automated tests to help estimate the effort and risk associated with maintaining legacy code. We started working on an open-source experimental tool called "crap4j" that calculates the CRAP score for Java code. We need more experience and time to fine tune it, but the initial results are encouraging and we have started to experiment with it in-house. Crap4J is currently a prototype and it's implemented as an Eclipse (3.2.1 or later) plug-in which finds and runs any JUnit tests in the project to calculate the coverage component. If you are interested in contributing to crap4j’s open-source effort to support other environments and test frameworks (e.g. TestNG) and/or coverage tools (e.g. Emma) please let us know.
    Now we just need to watch out for a tool testing for a Software Heuristic for Implementability and Testability.

    Threaded Messages (22)

  2. Excellent idea....how tied is the current source code to Eclipse? A Maven plugin for this would be perfect for my organization. Mike
  3. Hi Mike, I am glad you like it and, as Jorge (see below) mentioned, it should be easy to take the output of any metric tool that calculates method coverage and method complexity and calculate the CRAP out of it. Are you already using something like Cobertura with Maven? If not, that would be a great first step. Here are some instructions: http://mojo.codehaus.org/cobertura-maven-plugin/ Alberto
  4. And an Ant plugin[ Go to top ]

    Looks worth trying. We do all this type of analysis as part of CI build and report in CruiseControl, so Maven/Ant integration is essential.
  5. Re: And an Ant plugin[ Go to top ]

    Looks worth trying. We do all this type of analysis as part of CI build and report in CruiseControl, so Maven/Ant integration is essential.
    Hi Stephen, We've had other requests for CI/CC/Maven/Ant support, so it's high on our priority list. It looks like the shortest/quickest path is to leverage Cobertura - since it already collects coverage and complexity numbers. If anyone wants to help out feel free to jump in. Thanks, Alberto
  6. Love the acronym![ Go to top ]

    crap4j - finally a "Beavis & Butthead" sytle acronym that is needed in the Java realm. Sometimes we take life / coding too seriously.....time for a laugh once in a while....and anything that helps with testing is high on the cool meter as well! :-) Regards, Tom
  7. Atlassian's Clover product combines complexity and coverage metrics in reports to show developers where to prioritise their testing effort. Clover's reports show this in a handy "cloud" view that makes it easy to quickly spot complex untested code - the big red classes jump out at you on the report page. This makes reports easier to process, rather than having the eyes glaze over scanning columns and columns of numbers. A live "Cloud" report example: http://downloads.atlassian.com/software/clover/samples/index_project_risks.html This kind of reporting is via a tightly integrated Eclipse plugin: http://confluence.atlassian.com/display/CLOVER/Clover+Eclipse+Plugin Read more here: http://www.atlassian.com/software/clover/whats-new.jsp Cheers, -Brendan http://www.atlassian.com
  8. If someone can point me to the sister project F.I.X.C.R.A.P. that would be much appreciated.
  9. This is GREAT![ Go to top ]

    I love the sense of humor. Just the name alone guarantees this plugin will be a success. And the FIXCRAP is a good idea but probably much more difficult to implement ;-)
  10. I mean... We can tweak any Cobertura plugin in some minutes to give that info, can't we?
  11. Go for it![ Go to top ]

    Isn't this just a calculus on Cobertura's output?
    Well, not just on Cobertura's output. Cobertura is just one of many tools that report popular metrics such as coverage and complexity. It would be more accurate to say that C.R.A.P. is a mapping from a set of software metrics/measurements (currently only cyclomatic complexity and basis path coverage but we plan to add more) to a "higher"-level metric. An analogy would be the Body Mass Index (BMI) which maps a person's height and weight (using the formula BMI=(weight in kg/(height in meters)^2) into one of 4 categories: * Underweight = <18.5 * Normal weight = 18.5-24.9 * Overweight = 25-29.9 * Obesity = BMI of 30 or greater The BMI, like the CRAP, does not work in all cases (e.g. body-builders), nor does it account for how the fat is distributed (apparently belly-fat is more of a health risk than fat in other locations). Nevertheless, the BMI appears to be a useful first-pass tool for doctors to determine whether they/you should pay attention to your weight - I've been subjected to it at my latest check-up (23.3 - if you must know :-) ).
    I mean... We can tweak any Cobertura plugin in some minutes to give that info, can't we?
    Absolutely. You should do that and share the results so the existing Cobertura users can use it with minimal effort. What's important and worth experimenting with is the concept and the formula. The implementation is secondary. We got things started with an IDE plug-in because, in my experience, the easier, more point-and-click, you make something, the more likely people are to try it. But if people find CRAP useful and worth tracking, we'll definitely need a command-line version so it can be used as part of the build/test process. Alberto
  12. Doesn't work for me[ Go to top ]

    I installed the plugin on Eclipse 3.3.0. I get the toilet paper icon, as well as a Crap4j menu (which, in my opinion, is a waste of screen space, my menu bar is already almost too long on my MacBook). But when I click on the icon, or on the "Run Crap4j" menu entry, nothing happens. BTW, Alberto: I also have constructive comments about your online JUnit tests generator, if you have the bandwidth for them ;). -- S. Fermigier - CEO - Nuxeo - www.nuxeo.com
  13. Re: Doesn't work for me[ Go to top ]

    Bonsoir Stephane, Sorry you are having problems. We tested the plug in on 3.3 (both on Mac and Windows) and it worked just fine. A common user error is to not click on the top-level project icon. You need to do that to make sure that the Eclipse "focus" is on the right item. I've also seen some rare problems where strange project configurations, paths, or miscellaneous project errors, create problems. If you have tests, make sure that they run and complete. Here's a quick, 2-min, test to see if the problem is with the plug-in installation or with the particular project you used it on: Create a new Java project (say "TestCrap4J"). Eclipse should by default create a source directory "src". Inside that directory create a new class, say "Testing123". At this point you should have see a the following in the Package Explorer view:
    TestCrap4J
    []src [](default package) []Testing123.java Click at the top level of the project (i.e. TestCrap4J) to make sure the focus is on the project. Then click the toilet paper icon. You should get a pretty barren crap4j report in a couple of seconds. Could you please try that for me? Thanks. Alberto
    BTW, Alberto: I also have constructive comments about your online JUnit tests generator, if you have the bandwidth for them ;).
    Ah, my other fun pet project. I'd love to get your feedback, but we probably should not use this thread for it. Feel free to email me at "my first name" at agitar.com and we can talk. Looking forward to it.
  14. Validation of the metric?[ Go to top ]

    How was the metric validated? At first glance,the metric looks like something that was taken out of a hat and is used to market Agitator (which seems like a cool tool). Can we have an idea what were the systems analysed to justify the use of this metric? After all, it was mentioned that the metric's usefulness is "backed by research and empirical evidence". I found "From daikon to agitator: lessons and challenges in building a commercial tool for developer testing" which describes Agitator, but doesn't mention this metric.
  15. How was the metric validated?
    Hi Stephane, As I made very clear at the beginning of the blog - and throughout it - the whole point of releasing this initial version of Crap4J is to make it available so people can experiment with it. We are *in the process* of validating it through experimentation. We are following the scientific method which includes: * Characterizations (Quantifications, observations, and measurements) * Hypotheses(theoretical, hypothetical explanations of observations and measurements) * Predictions (reasoning including logical deduction from hypothesis and theory) * Experiments (tests of all of the above) You can read more about the thought process and background by reading some of my replies to the original CRAP thread on Artima - which I included in my post. http://www.artima.com/forums/flat.jsp?forum=106&thread=210575 If the metric proves useful after we, and hopefully others, experiment with it, we will then publish a paper about it. Actually, it might be just as interesting to publish a paper if it turns out not to be useful or have predictive value. If you want something fully backed and a more conclusive result wait for the paper.
    At first glance,the metric looks like something that was taken out of a hat and is used to market Agitator (which seems like a cool tool).
    Perhaps you should take a second glance, because this is a bit insulting and accusatory. With apologies to Elvis: "We can't go on together With suspicious minds And we can't build our dreams On suspicious minds" You know Stepane, there are some people who actually are driven to do and share interesting work related to, but outside, their main line of business. I, and most of my colleagues, have a passion for software development and testing that goes beyond our daytime job and responsibilities. I am actually being extra careful to distance Crap4J from Agitar and Agitator because, for one, the company doesn't really want to be associated with something that has "crap" in the name. Furthermore, Crap4J doesn't even take into account the kind of tests and assertions that you generate with Agitator. It's designed for projects with traditional, manually created, JUnit tests. I am fortunate enough that the research arm of Agitar (AgitarLabs) allows me to do some interesting experimental work and contribute to the open-source and academic community. In addition to JUnitFactory and Crap4J we are the primary source of funding for further development of JUnit (where 2 of the 3 key contributors), JUnit co-creator Kent Beck and MIT's David Saff are sponsored by us. The main contributor to CruiseControl also happens to be our VP of Product Management.
    I found "From daikon to agitator: lessons and challenges in building a commercial tool for developer testing" which describes Agitator, but doesn't mention this metric.
    Of course not, as I said the two are unrelated. One is a product, the other one is a research project. The ACM published paper you mention "From Daikon to Agitator ..." (and the associated invited talk given at ISSTA - a major conference for software testing researchers and practitioners) is further evidence that we enjoy learning and experimenting but we also share what we have learned to advance the state of the art. There is so much left to do, learn, and improve when it comes to software development and testing. I believe that collaboration and cross-pollination between industry, open-source, and academia is one of the best way to accelerate progress. That's what we are trying to do with Crap4J. Alberto
  16. As you say "Metrics should never be an end unto themselves", but it's not clear what you are specifically trying to measure, and why what you are measuring is better than size alone (as V(g) or LOCs) and coverage (and maybe a simple ratio). With other quality models (like COCOMO), people tend to use curve fitting. COCOMO uses man-months as the value to build the model, but I don't know what you are planning on using. I recommend that you read Wohlin's Experimentation in software engineering: an introduction and Fenton's Software Metrics: a Rigorous and Practical Approach. For your information, I'm part of academia that conducts these kinds of studies and I want to know why this metric is better than plethora of others that measure quality.
  17. As you say "Metrics should never be an end unto themselves", but it's not clear what you are specifically trying to measure, and why what you are measuring is better than size alone (as V(g) or LOCs) and coverage (and maybe a simple ratio). With other quality models (like COCOMO), people tend to use curve fitting. COCOMO uses man-months as the value to build the model, but I don't know what you are planning on using.

    I recommend that you read Wohlin's Experimentation in software engineering: an introduction and Fenton's Software Metrics: a Rigorous and Practical Approach.

    For your information, I'm part of academia that conducts these kinds of studies and I want to know why this metric is better than plethora of others that measure quality.
    Hi Stephane, I believe that I addressed and answered these questions already in the series of articles about the CRAP metric and the ensuing discussions and dozens of replies in the blogs. It's worth reading the discussions and the replies as well as the original blogs. But, to cut to the chase: Despite the plethora software metrics, the hundreds of papers and books on the subject (with which we are VERY familiar), I think we can agree that, outside of academia, software metrics are seeing very little use. There is no lack of metrics, but an appalling lack of usage of any of those metrics. You mention COCOMO; I can't remember the last time I saw an organization that used COCOMO, most of them probably have never even heard the term. The few "popular" metrics that are used with some frequency, like code coverage, are more often than not ABused and MISused. I talk with dozens of software organizations and hundreds of programmers every year and I am continually surprised at how most software organizations are flying blind. They have no high-level insights into the code. Their code base is a giant black-box; a dark and scary place that harbors all sorts of programming horrors. By keeping the lights off, it's very hard to fix those horrors and prevent new ones. We have seen Java classes with 10,000+ lines of code. Hand written methods with a cyclomatic complexity of 300+, and worse. Usually, these problems only surface when the ancient crappy code changes hand and has to be maintained by some poor newbie developer. Without some help by metrics like CRAP, the poor newbie developers can only discover the magnitude of the task ahead of him/her by exploring the code a bit at a time, like an cave explorer with a flashlight. One of the main objectives of the CRAP metric is to help organizations and individual developers to understand what they are getting into when they are handed off a steaming pile of legacy code. The CRAP metric is not about code quality, and I don't think I ever said that. The CRAP metric is about identifying code that by, virtue of excessive complexity and lack of adequate tests, is likely to present a maintenance challenge - especially by the poor people who inherit it after the original developers are long gone. The kind of code that elicits the "this is crap" or "oh crap!" responses from other programmers. It's important to understand that lack of crappiness does not guarantee or imply quality - the same way that low blood-pressure does not guarantee or imply overall health. High crap is a just a risk factor to be taken into account (and possibly prevented), like high blood-pressure. CRAP was inspired by metrics like the SEI's Software Maintainability Index. Which combines various code complexity and volume metrics (more volume and complexity --> harder to understand and maintain) with code comments (comments ~= docs/specs --> easier to understand and maintain). I have come to believe that automated tests are a better way to specify and preserve code behavior than comments, so in CRAP we are using automated tests instead of comments. We did not pull this metric out of a hat, as you mentioned in your first reply, but we built upon and modernized existing metrics to take advantage of changes in technology and practices (e.g. these days, developer/unit testing is a growing best practice, so it makes sense to take advantage of the presence of automated tests.) We tried to make CRAP simple to understand (i.e. excessive complexity=bad, tests=good), free and easy to download and use and, most importantly, easy to act upon. We are also making CRAP, and its various implementations, open-source so people can help refine it and modify it. It's meant to evolve and improve with use, not to be perfect out of the box. Boy, this is a long post, and I have to get back to my "real job" - better stop here. Stephane, please don't think that I am against adding more rigor and validation to CRAP. That's my goal; it's just that I believe it will be best achieved through actual usage and experimentation and that requires an initial - if imperfect - implementation. Perhaps we started on the wrong foot but since we both seem to care enough about the topic to spend time answering each other's posts, perhaps we can continue this discussion off-line. Perhaps even meet/talk on the phone or in person. I would like that very much if you are game. You can email me at "my first name" at agitar.com. Hope to hear from you. Alberto
  18. I wasn't clear, I'm happy to see efforts to add quality measurement to software development processes. My comments come from my experience in industrial settings and in those cases, measurements need to be linked to clear business objectives. Good luck to you with your project.
  19. I wasn't clear, I'm happy to see efforts to add quality measurement to software development processes. My comments come from my experience in industrial settings and in those cases, measurements need to be linked to clear business objectives. Good luck to you with your project.
    Hi Stephane, The need and motivation for the CRAP metric also comes from an industrial setting, and there is a clear business objective: identifying hard to maintain and poorly tested code. Dealing with legacy code is a huge business problem. Most developers spend most of their time maintaining and enhancing legacy code that they have not written. This kind of job is hard enough, but when the code is "crap" - using the more polite of the two terms most commonly used by developers - the difficulty, cost, and pain of maintenance increases substantially. I can't be sure of the reasons why none of the other metrics that might serve a similar purpose have not caught on. But I suspect that it's a combination of too many metrics to choose from (I know I am adding to the problem), poor "marketing" of the metrics to industry and end users, complexity, lack of easy access/availability (i.e. free plug-ins), etc. This is too bad, because I believe there is a lot to be gained through the insight that software metrics can give us. With CRAP, our goal is to simplify things to encourage broad adoption and consistent usage. We leverage and combine established metrics (each of which is easy to understand and act-upon) into a single number, encourage contribution and experimentation from the users, make the tools not only free and open, but also integrated with all the most common IDEs and CIT (Continuous Integration and Testing) frameworks, and get people to talk about it and use it. "Marketing" the metric to its intended audience (i.e. developers), and asking for their feedback and participation in evolving it, is an important part of the CRAP effort; and it's something that most existing metrics have lacked. If you ask software developers and managers, most of them take a pretty dim view of most software metrics; they see them as academic exercises, handed off from some ivory tower, and which bear little relevance to their day-to-day needs. It's safe to say that, when it comes to broad adoption by industry, software metrics have been mostly a failure. At the beginning we talked about business objectives. Well, it's hard to realize any objectives if the metrics are not used. Our goal with the CRAP metric to achieve broad awareness and adoption. I believe this would be great for the software industry. We might argue about the fine points and specific values for the components in this first incarnation of the CRAP formula; but, at the heart of it, CRAP encourages testing and discourages bundling too much logic and complexity in a single method. These are both good things.
  20. Does C.R.A.P. metric helpful in real world? or It is one of those applicable in theory.
  21. When should we flush?[ Go to top ]

    What is the value of the C.R.A.P. metric that indicates a potential issue?
  22. Re: When should we flush?[ Go to top ]

    What is the value of the C.R.A.P. metric that indicates a potential issue?
    Hi Nick, I covered the interpretation issue in the original Artima article: http://www.artima.com/weblogs/viewpost.jsp?thread=215899 The article contains a table that shows the current threshold values. These are initial values based on preliminary experiments. We'll know whether or not they are too lax or too draconian as people start using it and give us feedback. Ultimately, thought, since I doubt that we'll reach universal agreement on what complexity is tolerable and what amount of testing is adequate, the plug-in will probably allow users to set their own threshold. Alberto
  23. Does C.R.A.P. metric helpful in real world? or It is one of those applicable in theory.
    I certainly hope that CRAP turns out to be a VERY practical metric. If you believe the basic premises: 1) That, generally speaking, overly complex code is more prone to having defects when it's originally created and more prone to introduction of new defects when it's modified. 2) That, generally speaking, having tests helps to protect people who work on the code from introducing regressions. Then, the idea of identifying overly-complex code that lacks tests, should be of very practical use. Below is a segment of the original post on Artima where we describe our thinking and objectives for the metrics: [] We believe that, in order to be useful and become widely adopted, a software metric should be easy to understand, easy to use, and – most importantly – easy to act upon. You should not have to acquire a bunch of additional knowledge in order to use a new metric. If a metric tells you that your inter-class coupling and coherence score (I am making this up) is 3.7, would you know if that’s good or bad? Would you know what you need to do to improve it? Are you even in a position to make the kind of deep and pervasive architectural changes that might be required to improve this number? You can read the entire post, along with the other objectives, at: http://www.artima.com/forums/flat.jsp?forum=106&thread=210575 Alberto