What metrics to collect in a Scrum Sprint cycle

I recently joined PureWRX as CTO. PureWRX is a very new company which that means I am building out the technology team and all of the processes that accompany the team. I’m a big believer in the iterative nature of Scrum, so I’d like to comment on the metrics I collect and communicate.

For a little background, I took a ScrumMaster certification course around 2006 and I have been using aspects of Scrum ever since. The metrics I will discuss here I used at my previous employer, BuildASign.com, where I managed all of the software engineering efforts for the past 2+ years. I took the position at BuildASign because I saw an opportunity to focus entirely on the engineering process and culture. What made this opportunity great is that I had 100% buy-in from management. This doesn’t happen everywhere, so I consider myself lucky.

I’m also a huge believer in Team. We rise and fall as a team. So, even though I take credit for instilling this process, it is the team that accomplished the results. I will now switch from first person singular to first person plural.

Points

We measure points. Points are a measure of complexity and amount of work required to complete a task. The sizes are XS, S, M, L, XL, and XXL. Anything else is too small to mention or too large to estimate.

We equate these sizes to numbers to facilitate visualization. Just as light and sound are measured on logarithmic scales, so are the sizes of tasks. The numbers we used were 1, 5, 13, 30, 70, 150. The increasing curve represents an increasing error in the estimate of the work. [An aside: the curve is basically a doubling with a bonus. I mention the comparison to light (lumens) and sound (decibels) because humans can most easily recognize a doubling of intensity. For this reason, the traditionally used Fibonacci series for sizes is insufficient in my opinion.]

The size of a card should not change. In other words, the characteristics that define a M card never change so that a M card today is comparable to a M card next year. However, the expectation is that we will get better and faster at what we do over time. Thus, we expect to see more points accomplished per cycle over time. This is where we set a goal. We will have to get 2-3 cycles of data to establish a baseline and then set improvement goals from that point.

We publish points in two ways:
1) We publish how we did each cycle and look at trends
2) We publish the completed points during a cycle to see how we are progressing on a daily basis.

Accuracy

We measure the accuracy of our estimates. At the end of each cycle, we review all of the tasks and discuss where our estimates were wrong and why. Misses can be positive or negative. We sum the misses and look at the percentage of the total cycle. This gets published for all to see. Again, we will establish a baseline and then set goals. I would expect accuracy around 10% for several cycles.

Stability

Automated testing is the name of the game. Any time a developer changes code in one area of the code, it will impact seemingly unrelated areas of the code. We arrest this problem by writing automated “unit tests” as part of the day to day work. Analysis tools then tell us what percentage of the code has a test around it. This is Code Coverage.

Bugs

Working on code that failed to do what was intended is an indication of a failure in the process early in the work flow. It is also wasted time. While software bugs are inevitable, that doesn’t mean we shouldn’t look to improve the issue. With each cycle, we indicate how many points of the total were spent fixing bugs. This ratio is the bug percentage. We publish this for all to see and expect it to decrease over time.

Trends

Trend numbers are calculated by looking at 6 consecutive values, removing the high and low, and averaging the remaining 4. These trends are published for all to see.

Culture of Quality

By focusing on the areas mentioned above, we establish a Culture of Quality. Everything else we might want will automatically improve over time. Did we release features on time? Our point velocity and accuracy tell us this. Is our site always up and taking orders? Our automated test coverage and focus on low bug counts keep our site going.

The Results

These metrics were used over a 2 year period with an established team and established code base. Much of the first year involved tweaking the team by discovering and removing ineffective members and adding new members. Just like a good sports team, every good team does this continuously.

One can’t help but like the results. The per developer productivity increased 40%. This growth was gradual over entire period. Meanwhile, the time spent on bugs dropped from 50% to 10%. This improvement occurred mostly in the first year as the Culture of Quality took over.

Way to go team!