Getting used to sizing Stories, whether using t-shirt sizes or modified Fibonacci, takes a long time. With 19th century traditional, industrial era management still very prevalent in the way people work, we’re all programmed to think about “how many hours will it take?” We look at all the tasks we have to do, add them up, and that’s our response. When we give an estimate, people assume that’s set in stone and hold us to the date or hours we relate.
Traditional estimation methods are flawed
While adding up tasks and hours is easy, the practice is flawed. You’re not very likely to be accurate when you use this approach. The more precise you try to be, estimating from weeks to days to hours, then the more likely your result won’t be accurate. When we’re not accurate we are likely to lose the trust of our customers, stakeholders and clients.
Here’s why this form of estimation is flawed.
Tasks are not independent
People seem to think that when one task takes longer they can apply their knowledge to subsequent tasks in a way that will enable when to be completed more quickly.
“Activities are said to be independent if the duration of one activity does not influence the duration of another activity. … the amount of time it takes to excavate the basement is independent of the amount of time to pain the walls. Flipping a coin multiple times is another example of independent activities.
Are software activities independent? Unfortunately, no. If I’m writing the client portion of an application and the first screen takes 50% longer than scheduled there is a good change that each of the remaining screens is also going to take longer than planned”
Mike Cohn [1] reinforces that the real knowledge we should be learning is that all of the tasks are more likely to take longer.
Multi-tasking causes further delays
In studies of time and motion, Clark and Wheelwright [2] found that the time an individual spends on value-adding work drops rapidly when they are working on more than two tasks. This is due to context switching and people attempting to multi-task, says Gerald Weinberg [3].
This is further exacerbated by the way the brain has to change between processing past information and focusing on the here and now [4]. The brain simultaneously processes information into long-term and short-term memory. When you switch to something new your brain is still processing into memory and has limited capacity to focus effectively and then store new information.
Estimation doesn’t include distractions
How often are you distracted at work? How often does your mobile phone ring? How often do you stop to check your email? How often do you stop to check for text messages or social media messages? How often does someone stop by your desk at the office and ask you a question?
Research indicates that it can take you 10-18 minutes to recover from each distraction and get back to the the same level of attention [5].
But customers still want to know when they’ll get the outcome!
Whether estimation is flawed or not, our customers still want to know when they’ll get their stuff. I’ve heard many Agile Teams refuse to estimate and say “we’re agile: you’ll get it when it’s done”. But for an empirical process like Scrum, one based on observation, data, inspection and adaptation, such a response is just poor practice and fails to build trust.
So, while our traditional methods of estimation in hours by adding individual tasks is flawed, there are other options that are more effective. Relative estimation and sizing is the choice for most Agile Teams.
Relative Estimation and Sizing
Agile Teams tend not to estimate based on adding up hours. They estimate size and then derive time. This, according to Mike Cohn, is best-practice [1].
Example:
A new piece of work is defined by the Scrum Team’s Product Owner. It’s to build in a sort/filter on cost for a list of courses available to students.
The Scrum Master facilitates a discussion amongst the Development Team members about the size of the new piece of work, not how many hours individual tasks will take. Most often, this discussion is supported by an activity known as Planning Poker, and is designed to rapidly achieve consensus amongst different people with different skill sets as to the “size” of the work.

During Planning Poker, the size of the new piece of work is compared to the size of work previously done.

If the Development Team feels the new work (item #20173 in the picture above) is of equivalent size to the last sort routine they delivered (item #20172), then because the previous work was size “M” then the new work is sized as “M”.
Using a relative estimation grid
When I’m coaching Scrum Teams, at some point toward the end of Sprint Planning I’ll ask them to put up all of the Stories on a wall, with like sized items all in a row, and then ask them to make some decisions about whether or not each item in each row is similar to the other items in that row.
You can see below that one Story initially had a size of “2” (this team was working with Fibonacci numbers). Once they put all the Stories on the wall, they decided that the Story was more like the items in row “3”. So, they moved the Story and changed the size.
Collecting data throughout the Sprint
I’ll often have a Scrum Master put a “dot” for each day a Story is in-progress. It’s the easiest way to determine whether the Story being worked is indeed the size people thought it was back in Sprint Planning.
If “M” Stories usually take 4 days to get to “Done”, then we’d expect the new Story to also spend 4 days in-progress. If the Scrum Master feels that the Story still has a lot of work outstanding by day 2, then there’s a pretty good chance something is impeding the Story. This data gives the Scrum Master the perfect opportunity to check-in with the Development Team and discover the root cause of what is actually going on (they might not even realise the Story is impeded) and act to remove those impediments. This kind of data should then be examined during a Retrospective to understand, for example:
- Why did the Development Team initially think the Story was a “M?
- When it reached done, was the Story of the same cycle time as a “L”?
- Why was the Story larger than expected?
- What made this Story really an “L” instead of a M”? Was there just more work? Was this type of work new?
- How can we avoid underestimating this type of Story in the future?

Baseline the comparison data
Once a team have lots of data on their Stories, it’s then easy to create a baseline for future comparison. Once you set the baseline, the trick is not to change it. Any future decreases in the cycle time for a Story’s size will be indicative of an improvement and result in increased velocity.
Communicating “time” back to stakeholders
Once we have a baseline, then it’s straight forward to communicate to business “when they’ll get it”. Compare the new work to work already completed. If the new work is similar to work in the “M” category, then communicate the cycle time back to the stakeholder/client.
In our current example, most size “M” Stories are completed in 4 days (estimate size and then derive time), but sometimes the Development Team have to wait for people outside of their team. If there’s likely to be waiting time, then the stakeholder can see the work in it’s “Done” state after 6 days.
Conclusions
Relative estimation is often a hard concept to understand. We’re so programmed to think in hours that just comparing new work to old work and allocating a size category like a t-shirt is strange and foreign.
The reality is that our normal ways of estimation have critical flaws in them. Relative estimation reduces the impacts of these flaws because it’s based on actual data and not our best guesses, and this is why it’s considered best practice.
M
– – – –
References
[1] Cohn, M. (2016) Agile Estimation and Planning.
[2] Clark, K. B., and Wheelwright, S. C (1993) Managing New Product and Process Development: Text and Cases. NY: Free Press.
[3] Weinberg, G. (2011) Quality Software Management: Systems Thinking, p284.
[4] MacKinnon, M. (2014) Neuroscience of Mindfulness: Default Mode Network, Meditation, & Mindfulness. Online at: https://www.mindfulnessmd.com/2014/07/08/neuroscience-of-mindfulness-default-mode-network-meditation-mindfulness/
[5] American Psychological Association, March 20, 2006. Online at: http://www.apa.org/research/action/multitask.aspx
[6]