Relative Estimating and Sizing Stories

Getting used to sizing Stories, whether using t-shirt sizes or modified Fibonacci, takes a long time. With 19th century traditional, industrial era management still very prevalent in the way people work, we’re all programmed to think about “how many hours will it take?” We look at all the tasks we have to do, add them up, and that’s our response. When we give an estimate, people assume that’s set in stone and hold us to the date or hours we relate.

Traditional estimation methods are flawed

While adding up tasks and hours is easy, the practice is flawed. You’re not very likely to be accurate when you use this approach. The more precise you try to be, estimating from weeks to days to hours, then the more likely your result won’t be accurate. When we’re not accurate we are likely to lose the trust of our customers, stakeholders and clients.
Here’s why this form of estimation is flawed.

Tasks are not independent

People seem to think that when one task takes longer they can apply their knowledge to subsequent tasks in a way that will enable when to be completed more quickly.

“Activities are said to be independent if the duration of one activity does not influence the duration of another activity. … the amount of time it takes to excavate the basement is independent of the amount of time to pain the walls. Flipping a coin multiple times is another example of independent activities.
Are software activities independent? Unfortunately, no. If I’m writing the client portion of an application and the first screen takes 50% longer than scheduled there is a good change that each of the remaining screens is also going to take longer than planned”

Mike Cohn [1] reinforces that the real knowledge we should be learning is that all of the tasks are more likely to take longer.

Multi-tasking causes further delays

In studies of time and motion, Clark and Wheelwright [2] found that the time an individual spends on value-adding work drops rapidly when they are working on more than two tasks. This is due to context switching and people attempting to multi-task, says Gerald Weinberg [3].

Quality-Software-Management — For each item a person works on beyond the first, there is approximately a 20% inefficiency introduced due to context switching

This is further exacerbated by the way the brain has to change between processing past information and focusing on the here and now [4]. The brain simultaneously processes information into long-term and short-term memory. When you switch to something new your brain is still processing into memory and has limited capacity to focus effectively and then store new information.

Estimation doesn’t include distractions

How often are you distracted at work? How often does your mobile phone ring? How often do you stop to check your email? How often do you stop to check for text messages or social media messages? How often does someone stop by your desk at the office and ask you a question?
Research indicates that it can take you 10-18 minutes to recover from each distraction and get back to the the same level of attention [5].

But customers still want to know when they’ll get the outcome!

Whether estimation is flawed or not, our customers still want to know when they’ll get their stuff. I’ve heard many Agile Teams refuse to estimate and say “we’re agile: you’ll get it when it’s done”. But for an empirical process like Scrum, one based on observation, data, inspection and adaptation, such a response is just poor practice and fails to build trust.
So, while our traditional methods of estimation in hours by adding individual tasks is flawed, there are other options that are more effective. Relative estimation and sizing is the choice for most Agile Teams.

Relative Estimation and Sizing

Agile Teams tend not to estimate based on adding up hours. They estimate size and then derive time. This, according to Mike Cohn, is best-practice [1].
Example:

A new piece of work is defined by the Scrum Team’s Product Owner. It’s to build in a sort/filter on cost for a list of courses available to students.

The Scrum Master facilitates a discussion amongst the Development Team members about the size of the new piece of work, not how many hours individual tasks will take. Most often, this discussion is supported by an activity known as Planning Poker, and is designed to rapidly achieve consensus amongst different people with different skill sets as to the “size” of the work.

During Planning Poker, the size of the new piece of work is compared to the size of work previously done.

Maker:0x4c,Date:2017-10-12,Ver:4,Lens:Kan03,Act:Lar01,E-Y — The Story on the left is equivalent in size to which Story on the right?

If the Development Team feels the new work (item #20173 in the picture above) is of equivalent size to the last sort routine they delivered (item #20172), then because the previous work was size “M” then the new work is sized as “M”.

Using a relative estimation grid

When I’m coaching Scrum Teams, at some point toward the end of Sprint Planning I’ll ask them to put up all of the Stories on a wall, with like sized items all in a row, and then ask them to make some decisions about whether or not each item in each row is similar to the other items in that row.
You can see below that one Story initially had a size of “2” (this team was working with Fibonacci numbers). Once they put all the Stories on the wall, they decided that the Story was more like the items in row “3”. So, they moved the Story and changed the size.

Putting Stories in rows of like size allows teams to easily shuffle Stories around and confirm their estimations

Collecting data throughout the Sprint

I’ll often have a Scrum Master put a “dot” for each day a Story is in-progress. It’s the easiest way to determine whether the Story being worked is indeed the size people thought it was back in Sprint Planning.

This Story was in-progress for 4 days, but the team also had 2-days of waiting.

If “M” Stories usually take 4 days to get to “Done”, then we’d expect the new Story to also spend 4 days in-progress. If the Scrum Master feels that the Story still has a lot of work outstanding by day 2, then there’s a pretty good chance something is impeding the Story. This data gives the Scrum Master the perfect opportunity to check-in with the Development Team and discover the root cause of what is actually going on (they might not even realise the Story is impeded) and act to remove those impediments. This kind of data should then be examined during a Retrospective to understand, for example:

Why did the Development Team initially think the Story was a “M?
When it reached done, was the Story of the same cycle time as a “L”?
Why was the Story larger than expected?
What made this Story really an “L” instead of a M”? Was there just more work? Was this type of work new?
How can we avoid underestimating this type of Story in the future?

size-and-retrospective — Compare the cycle time in the Retrospective and talk about whether your estimations were accurate or not

Baseline the comparison data

Once a team have lots of data on their Stories, it’s then easy to create a baseline for future comparison. Once you set the baseline, the trick is not to change it. Any future decreases in the cycle time for a Story’s size will be indicative of an improvement and result in increased velocity.

comparing-story-sizes — Always compare new Stories back to the baseline. Once you’ve set the baseline, don’t re-set it.

Communicating “time” back to stakeholders

Once we have a baseline, then it’s straight forward to communicate to business “when they’ll get it”. Compare the new work to work already completed. If the new work is similar to work in the “M” category, then communicate the cycle time back to the stakeholder/client.
In our current example, most size “M” Stories are completed in 4 days (estimate size and then derive time), but sometimes the Development Team have to wait for people outside of their team. If there’s likely to be waiting time, then the stakeholder can see the work in it’s “Done” state after 6 days.

Conclusions

Relative estimation is often a hard concept to understand. We’re so programmed to think in hours that just comparing new work to old work and allocating a size category like a t-shirt is strange and foreign.
The reality is that our normal ways of estimation have critical flaws in them. Relative estimation reduces the impacts of these flaws because it’s based on actual data and not our best guesses, and this is why it’s considered best practice.
M
– – – –
References
[1] Cohn, M. (2016) Agile Estimation and Planning.
[2] Clark, K. B., and Wheelwright, S. C (1993) Managing New Product and Process Development: Text and Cases. NY: Free Press.
[3] Weinberg, G. (2011) Quality Software Management: Systems Thinking, p284.
[4] MacKinnon, M. (2014) Neuroscience of Mindfulness: Default Mode Network, Meditation, & Mindfulness. Online at: https://www.mindfulnessmd.com/2014/07/08/neuroscience-of-mindfulness-default-mode-network-meditation-mindfulness/
[5] American Psychological Association, March 20, 2006. Online at: http://www.apa.org/research/action/multitask.aspx
[6]

About the author

Matthew Hodgson

Matt is the CEO of Zen Ex Machina, Professional Scrum Trainer (PST) and SAFe SPC5. He's an author, keynote speaker, and a regular presenter at international conferences across Australia, USA, Asia, and Europe.

Relative Estimating and Sizing Stories