Using Sonar to Bake a Quality Feedback Loop into the Build Cycle
by Brian Chaplin
The number of business defects in an application tracks with the number of technical defects. More technical defects normally means more business defects. This means that technical defects can be used as a metric for the overall quality of a development approach. Sonar is a great dashboard for tracking code quality at the project and file level. But how can it be used to as part of the daily workflow? With a little work, key Sonar feedback metrics can be integrated into the code/test/commit/build cycle at the individual level. Code quality becomes visible to individual developers providing immediate feedback.
Implement a Zero-Defect Enhancement Process. Integrating Sonar metrics into the build cycle enables the automation of a zero-defect code quality policy. Use the metrics to inform developers immediately of a quality regression. Every submission can be monitored so that developers can quickly correct and resubmit the corrected code. Code quality increases as a by-product of the enhancement process. Code correction early in the project reduces the number of large re-factoring projects leaving more resources for projects that automate the business. Code quality is automatically enhanced as a part of the normal development process. A system with active enhancements can realize a 15% to 25% annual increase in quality just by baking quality into the build cycle.
Code Quality Czar. As with any quality program, management support is crucial. Like all metrics and quality measurement and management programs there is some overhead. A typical rule of thumb allows 1% extra overhead to measure productivity and quality. In this case, management must have an experienced Java developer or developers available to answer and resolve developers’ inevitable code quality issues and questions.
Export Key Sonar and Source Committer Data
The basic idea is to
1. Compare a Java file’s Sonar statistics before and after Sonar runs.
2. Determine who changed the file.
3. Execute SQL queries to produce reports.
Integrating Sonar into the continuous improvement cycle requires the addition of two steps to the build. They:
1. Identify which files have changed for each Sonar run
2. Retrieve the quality metrics for each changed file
3. Identify which developers touched each file
4. Tie the quality changes to the developer and VCS change-list.
The simple Sonar post processing system runs after the Sonar build. Its functions:
1. Export key Sonar data to a SQL database
2. Extract source repository change data to a SQL database
3. Query the database to publish key feedback reports.
Bake Quality into the Feedback Loop. Once key data is available, use it to quickly inform the developer when she submits a file in error so she can quickly correct it. At the same time feed key indices to management so they can reward and coach on a timely and accurate basis.
Continuous or at Least Near-Continuous Integration. The reports compare the incremental quality change between the current and previous Sonar build. Since the crux of the approach requires a before and after comparison of Sonar statistics, it’s imperative that Sonar run frequently. The more frequent the better. Small projects can run Sonar as part of their continuous build cycle. Large builds can take an hour so it typically runs once or twice a day for such projects. Some continuous integration servers (e.g., Hudson) have a Sonar option to easily incorporate it into the build cycle.
Notify Quickly. Send e-mails to a committer and lead when a file degrades in code coverage or violations. Put a permalink on the e-mail so the offender can quickly navigate to the file’s error display in Sonar. Install the Sonar plug-in in the developers’ IDE (e.g., Eclipse) so they can quickly see what regressed.
Exporting Made Easy. Using the Sonar REST API, statistics can be easily exported to an SQL database. Simple Groovy scripts calling the API and the Spring JDBC library make this a fairly straightforward task. Similarly, source code version control systems have an API which can be called and the data exported via the same Spring JDBC library.
Optional Project Information. Optionally, project or defect tracking data from systems like Atlassian’s JIRA can also be exported. Then it becomes possible to produce additional management reports such as which project had the most Sonar defects.
Data-marts and Reports. Once exported, key reports can be produced via SQL queries. Combining the Spring JDBC library and the Apache POI library allows you to output the query results to Excel with a minimal amount of code. Add Excel pivot tables and charts to the detail results and you have a mini-data mart for your Excel-knowledgeable users. You can use management’s favorite spreadsheet as a template into which you pour the query results using POI. Package this as a maven task and run as the final step of the build. If coding API calls is impractical or inconvenient, simply use the SQL report writer of your choice.
Publish Key Reports
Code Quality Metrics
Sonar offers many metrics but which are useful when integrating it into the code/test/deploy process?
The Two Most Useful Metrics. Coverage is measured by lines and conditions covered by the build’s unit tests. Don’t use line coverage or branch coverage percentage metrics. Percentages are useful as a high-level summary but are nearly useless in measuring file-level process improvement. Instead use lines uncovered by tests and conditions uncovered by tests. Unlike percentages, you can simply add them up to determine a committer’s credit or technical debt. They also show both coverage and complexity management.
The Side Benefit of Tracking Branch Coverage. Regarding branch coverage (conditions uncovered by tests), it’s amazing how code becomes simpler and easier to understand when a programmer has to write unit tests to cover all the branches. For example, he thinks twice before wrapping Java code in a try block (which should be thought through anyway). Now a test must be written for that throw or finally clause. Writing a branch test should make him think about and ever reconsider the try block or condition. It should also encourage him to document it.
Code compliance is measured by counting violations from Checkstyle, PMD and FindBugs warnings. Group the violations into the 5 Sonar criticality levels as appropriate. Classify Javadoc violations as minor (weight = 1) and complexity violations as major (weight = 5), for example. Score the violations by their severity level and sum the weighted violations into a single compliance metric.
Don’t Change the Rules During the Game. Review the rules and weights carefully. Otherwise, if you change the weights or rules after implementing the measurement program starts you’ll have to perform some extra work adjusting the metrics.
Don’t Prohibit Complexity, Manage It. Programmers get paid to add complexity (i.e., logic conditions) to an app. There’s nothing wrong with adding complexity. Rather the concern is managing it. Measure how it’s managed. Require that file complexity must be bounded, methods must be small and conditions must be tested. Limit a Java class file’s total size as measured by total complexity and average method complexity. If it exceeds the limit, consider it a technical debt if it increases (worsens) and a credit if it decreases. Further enforce complexity management by weighting Checkstyle and PMD complexity violations as critical.
Comment density, the hardest metric. In addition to the mantra “code a little, test a little” there should be a corollary mantra, “code a little, comment a little.” If you’re reviewing code submissions, require the submitter to write what he just told the reviewer. Somehow he can verbalize what the logic is doing but just can’t bring himself to type it in the IDE. Comment density is the hardest metric for many programmers to meet. Also, the reviewer needs to ensure that the committer is actually adding meaningful explanations to the code, not just trying to inflate the number. Remember, Sonar can only count the comments, not actually read them.
This is the only key metric that uses a percentage so it must be managed for the occasional false positive due to the problem with percentages. 25% is a good standard to use, meaning there is one comment line or Javadoc line for every three lines of code.
Architecture and Design
Changes to a package can also be tracked. For example, statistics such as file tangles are only available at that level. Design metrics are very important but difficult to assign accountability for since submissions are tracked at the file, not package level. They are best evaluated early on in the project. By the time a project is at the code/test/build cycle phase, design metrics can be hard to correct. In that case, a Sonar-guided re-factoring project is more appropriate when attempting to improve design metrics.
The Most Important Report. To be actionable, technical debt must be tracked by metric within committer within file. Metrics include violations (minor, major and critical), uncovered (lines and conditions), complexity limit violations, comment density and duplicated
Sum Debt by Metric by Committer by File. Technical debt is calculated based on the sum of the metric changed by committer and file. If the developer degrades coverage by 1 line in April and 2 lines in May, his debt for the line coverage metric for that file is 3 (lines).
Open Defect Tickets. Ensure that the debt is repaid by opening defect tickets against that developer for any technical debt still open at the end of the month.
Developer Debt, Credit and Activity
Create reports that enable leads to both reward and coach as appropriate. Combine debt, credit and activity into one easy-to-read report for management.
A developer has both credit and debt. It is actually quite common for a submitter to have a lot of technical credit and debt, especially if there’s deadline pressure. Display both sides of the ledger. A lead needs to coach and monitor technical debt, even among those with a lot of credit.
Track Times at Bat, Too. It’s important to weigh both quantity and quality to encourage both productivity and quality. Activity can be viewed as a rough proxy of productivity. Track activity in order to provide a perspective for interpreting quality. Both home runs and at-bats must be tracked if an accurate picture is to be gained. Count the developer’s submitted files, change lists and trouble tickets as an indicator of activity.
Summarize, Drill-Down and Color-Code. Put all these metrics for a time period on a single report so a lead can have an at-a-glance perspective of the team. Enable drill-down to the component files and check-ins to enable more detailed coaching. Color-code for easy reference.
Especially when starting a code quality program, it’s important to gain enthusiasm and participation. Publish a daily report that summarizes each committer’s positive contribution for the previous day. Make sure management sees it.
Branch and Line Coverage Added
Two Wrongs Make a Right. Use the decrease in uncovered lines and branches to determine the credit (via the double negative) to be given for adding branch and line coverage. That is, sum the decrease in uncovered lines and branches for the developer’s check-ins to calculate the credit.
Coverage Merely Correlates to Activity. If a developer is to avoid technical debt, every submission will add line and branch coverage equal to the lines and branches added by that submission. The more lines and branches added, the more covered lines and branches added. Therefore this metric merely correlates to activity if the developer is properly executing the “code a little, test a little” principle. Therefore, enhance this metric by using…
Code Coverage Opportunity Exploited
Identify the Achievers. This is the percentage coverage increase per file. If a file started at 40% covered and ended at 60% covered, the submitter is credited with a 50% improvement in code coverage for that file. Weight this by file size (usually non-commenting lines of code) and you can determine how much a submitter went beyond the call of duty in increasing code coverage. This identifies those who took the opportunity available to them and ran with it. Encourage such submitters.
Increase in Compliance
Reward those who repair. Another double negative, this credit is the decrease in Checkstyle, PMD and Findbugs violations in a submitter’s files for that period. This assumes that the code base is in need of repair, a typical scenario for those who run Sonar only after some development has already occurred. It rewards those who contribute to its repair.
Naturally this is the opposite of technical credit:
Increase in Uncovered Lines and Branches
Sum the increase in uncovered lines and branches for the developer’s check-ins to calculate the debt.
Decrease in Compliance,
This debt is the increase in Checkstyle, PMD and Findbugs violations in a submitter’s files for that period.
Initial Defect Rate
Get it Right the First Time. This is the initial defect rate, the count of files with errors (either decreased compliance or code coverage) divided by the total number of file submits. This is an indicator of rework that has to be done to fix the defect.
Keeping it real
Developer Credibility is Crucial. Fixing the code requires involving the developer. In turn, involving the developer requires them to support the quality metrics system. They may not always like it but they should at least be confident in the numbers. If the numbers are wrong, the system will be ignored.
Do the Right Thing
Code quality metrics should always support a developer that does the “right thing.” If a developer does the right thing and gets penalized for it, fix the number. It’s important to recognize the corner cases and issues that need to be addressed in order to maintain their support. Here are some:
Occasionally it may make sense to allow a violation to stand. For example, a developer may reason that a large switch statement doesn’t violate the method complexity limit because it would require a major refactoring that would jeopardize the code in the short-run. That’s what the // NOSONAR feature is for. Allow a developer to use the evil // NOSONAR only as last resort and only with another developer reviewing the change. Require appropriate comments for such exemptions.
Allow Exceptions to 100% Coverage
There are a few Cobertura defects that cause code coverage issues. While rare, be aware of them to be able to quickly remedy a developer’s complaint. Some other cases:
Private Constructor. Checkstyle requires a private constructor in a utility class. However, a unit test would require that static utility class to needlessly instantiate itself just to gain 100% coverage. Cut the developer some slack and allow this constructor to be uncovered by unit tests.
Debug Logging. Another borderline case involves writing debugging information to the log. Arguably, it may not be necessary to require coverage of
badMethod(); // then again, maybe I should test this
Two Developers Changed the Same File
Even in a continuous build scenario there will be occasional cases where two or more developers changed the same file between Sonar runs, a collision. A table must be maintained to exclude such collisions. Either the whole file change can be excluded for or just a particular metric for that file change for that submitter can be excluded. It depends on the nature of the file change.
Class, not Method, Level. Changes are tracked only to the file (class) level because this is the smallest level of detail that Sonar exposes. This can have subtle implications:
I Only Care About the Method I Changed. If you (unfortunately) have large files, such a large file can have many methods. In such a case, the programmer only cares about the method she changed and tends to ignore the file statistics which are reported back to her. With prompt attention to a quality warning e-mail she can easily locate the method in the Sonar display for that file and determine what regressed. If she waits, there will be many commits by many developers over time. At that point, assessing responsibility for that file’s quality becomes murky because Sonar displays only current, not prior, versions.
Get Out of Jail Free
What if the defect can’t be fixed? Such cases include:
• The defect may have been fixed by someone else already,
• the code may have changed radically,
• the code may depend on static or un-testable code which has dependencies that can’t be satisfied in a unit test environment or
• the change would require a design change or major re-factoring that would best be done later.
Be Fair. Such cases are rare, less than 1% of the file submissions. But to be fair to the developer it’s important to clear their name for that particular file change metric. A “get-out-of-jail-free” table in the reporting database can be used to clear such errors. The quality czar can judge whether an error must be fixed or merely cleared.
Unit Test Aids
Eliminate the Grunt-Work. Some easy and tedious unit tests can be automated. For example, a unit test harness can be written that will test the accessors and constructors for all Java beans. That alone can increase coverage by 20%. This also helps the credibility of the program by eliminating some grunt work and freeing up developer time to focus on more complicated testing. While not strictly related to Sonar, such efforts gain developer support.
When to Start
Now. In the future, key metrics and information such as who changed the file may well be implemented as a Sonar plug-in. In the meantime, bake the Sonar metrics into your code/test/build process and work the reports. Even a million-line plus code base already in enhancement phase can double its coverage and halve its violations, all without impacting its enhancement deliverables or budget. Reap the benefits now.
 The problem with percentages. Percent of code covered would seem to be a useful metric to track. However, consider this common scenario. You have a 100-line class 80% covered by unit tests, 20 lines uncovered. The diligent developer realizes that she no longer needs 50 of those lines and removes them. However those 50 were all covered by unit tests. Now the uncovered part doubles from 20/100 to 40/100 and the coverage percentage slumps from 80% to 60%. She has just been penalized for doing the right thing. Such a metric is death to developers’ support and ultimately management support of a code quality system. Instead use lines uncovered by tests and conditions uncovered by tests. In this case such a number would have remained at 20.
Brian Chaplin (brian.chaplin at gmail dawt com) is an independent consultant based in Washington, DC who has been involved with software productivity and quality measurement over 25 years. He has developed key business metrics systems for the pharmaceutical, insurance, direct marketing and financial markets industries. He is a Sun Certified Enterprise Architect with a BA in Mathematics from Northwestern University.