Tuesday, May 29, 2012

More code coverage treemap fun

Quite some time ago, I built a treemap visualization of code coverage. This had been inspired by the information visualization course I was taking then, and I have since found a less static version more useful for getting a feeling of code coverage than lcov's index pages. Back then, two years ago, the only decent toolkit I found for building treemaps was in Java, which would necessitate the use of applets if I tried to put it on the web. Since most of what I produced was just static images, using Java locally wasn't issue. Then again, I'm also a contributor to an organization that is pushing hard about being able to do things in pure JS. So I thought I might as well try to replicate it using a JS toolkit.

Two years has made quite a difference in terms of JS toolkits. Back in 2010, the only JS toolkit I found that had treemaps was protovis, which I found fairly unusable. Now, there are a few more higher-quality toolkits, but they've all had a fairly major problem of forcing you to shoehorn your data into a specific format instead of letting you customize how data is formatted. The JavaScript Infovis Toolkit is the one that both makes me happy in the results and annoyed to get it to work. Finally, I settled on D3, which is still too low-level to make me happy (don't ask me how painful getting animations working was).

The end result is here (the equivalent lcov result is also available). Every file is a rectangle in that represents a single file of source code. Hover over it, and you get details of coverage in that file. Click on it, and you can zoom in one level down in the source code hierarchy. If you ctrl-click, you zoom one level back. The size of the rectangles is indicative of how large the file is (you can configure it to be by number of lines or number of functions). Their color is indicative of what percentage of lines (or functions) is covered: a red color means very few are executed in tests, while a green color means most are. There is a scale that indicates what a given percentage of code looks like, and there is even internal code to affect the scale (so as to make the midpoint muddy red be not 50% but more like 70%), but this is disabled since the scale is more difficult to change. Changing parameters causes the treemap to animate to its new position, but the sheer number of elements here appears to give Gecko heartburn—it's much smoother if you zoom in to smaller directories! I do realize that this also results in a lot of boxes flying around each other, but a squarified treemap is unfortunately a very unstable algorithm.

The technology that underlies the visualization isn't complicated. I used lcov to produce a summary file, and then used a python script to convert the file into a summarized JSON version. Tooltips are provided using tipsy, which I don't particularly like all that much, but I haven't found anything that is as easy to use while also servicing my needs (most importantly, placing tooltips so they don't fall off the edge of the screen). As an aside, a visualization toolkit that doesn't have any tooltip support is extremely bad—details-on-demand is a critical component of information visualization.

There's some things I don't have: I don't scrape branch coverage yet, for example. And having a query parameter that lets you zoom into a specific directory immediately would be helpful, as would being able to cut off the tree before the leaves. I would also like to see a nice view of a single file, so I can completely kick the lcov output. Being able to select individual files would also be a boon. Another neat feature would be to select from multiple sources (e.g., different test suite coverages or even monitoring as code coverage changes through the years!).

As a postscript, let me give you a tour of top-level directories in the view. The tiny sliver in the uppper-left is, from top-to-bottom, the C++ source code of Thunderbird-specific code, lightning, and LDAP. Then there is a sizable block for mork, and the rest of the left column is mailnews code. Then there is a barely-noticeable column and another small but noticeable column which contain between them a lot of small top-level directories in mozilla. The next column (similar in width to the comm-central column) contains, from top-to-bottom, IPC, spellcheck, widget, security, accessibility, parser, and xpcom. The next column contains editor, toolkit, network, and DOM code. The right 40% of the diagram or so lacks definable columns. The upper-left of this portion (you'll notice it by the large red block with a lot of "S" letters peeking out) is the graphics code. To its right lies JS. Beneath both of these lies the code for layout, and beneath that and at the bottom is content.

And, no, my code is not on github yet because most of everything that matters is in that one HTML file. I also plan to integrate this view into DXR at some point in time.

No comments: