Scalable Agile Estimation and Normalization of Story Points: Estimation Challenges Galore! (Part 2 of 5)

In Part 1 of this multi-part blog series, I introduced the topic of the blog series and provided an overview.  Scalable agile estimation methods are required to provide reliable estimates of workload (work effort) and also reliable velocity metrics (both estimated velocity and observed velocity) at the team, program and portfolio levels of large-scale agile projects.  Without reliable estimates of workload and reliable velocity metrics at all levels, effective and meaningful determination of costs, return on investments and project prioritization cannot be made.   For scalable agile estimation methods to work properly, story points across the space dimension (teams, epic hierarchies, feature group hierarchies, goals, project hierarchies, programs, portfolios, etc.) as well as across the time dimension (sprints, release cycles) need to have the same meaning and the same unit of measure.  In other words, story points need to be normalized so they represent the same amount of work across the space and time dimensions.

In this Part 2 I first review the key requirements that must be satisfied for the traditional velocity-based planning to work properly.  I then present three key challenges associated with the traditional velocity-based planning, explain how those challenges get exacerbated as agile projects begin to scale up, and what we need to do to mitigate them.

Critical requirements for traditional velocity-based planning to work properly

It must be recognized that the historical velocity of an agile team is a reliable leading indicator of its future velocity only when the below conditions have been met:

  • Exactly the same members of a well-jelled agile team will continue for the next sprint.  Agile team members are not substitutable commodities.  Team members may differ greatly in their individual contributions and even a single team member change may have a major impact on team dynamics and productivity.   Bringing in a brilliant prima donna new team member may actually reduce team productivity and velocity.
  • The team has demonstrated a stable velocity over the last 3 to 4 sprints.  This is required for the past average velocity to serve as a leading indicator for the future velocity.
  • The number of work days available in the next sprint is almost the same as the number of work days in the past few sprints; i.e., the capacity of an agile team stays the same sprint after sprint.
  • Any technology or platform change efforts will be relatively consistent over time (i.e., no drastic changes from sprint to sprint).
  • Same amount of learning effort by team member is required from sprint to sprint (i.e., no drastic changes in learning effort from sprint to sprint).
  • No new major constraints or impediment are anticipated or discovered.
  • Neither the team nor its members are assigned to multiple concurrent projects.
  • Planned work for the team remains consistent across sprints, i.e., the team is not disrupted by unplanned work, or unexpected or poorly managed dependencies on other teams or external vendors.
  • There is not a significant amount of work carried over from sprint to sprint.

There is a good parallel between this set of requirements and weather forecasting based on “Yesterday’s Weather Model.”    If the weather has been stable for the last several days, tomorrow’s weather can be forecasted based on the recent weather pattern.  Similarly, under the set of requirements listed above, velocity for the next sprint (tomorrow’s weather) can be estimated to be close to the average velocity over the last 3 to 4 sprints (yesterday’s weather pattern).    Therefore, the above set of requirements is popularly referred to as “yesterday’s weather model.”   Agile teams produce a more credible forecast of velocity, duration and costs when they are working in a stable yesterday’s weather model pattern.

Key challenges with story point estimation and velocity metric

There are three key challenges associated with story point estimation and velocity metrics for large-scale agile projects.    These challenges also exist in large enterprises that may have a large number of mostly independent projects, where it is often necessary to aggregate effort estimation and velocity metric data across multiple projects to provide visibility for senior management.   Senior management often wants to know organizational velocity history and projected velocity trends, progress barometers and reports aggregated using story points and velocities of a large number of projects.  These challenges are less severe and manageable for small agile projects consisting of a single or few agile teams.    These challenges cannot be ignored for large-scale agile projects.

Challenge 1 – A single story point is unlikely to represent the same amount of work across teams and across sprints: Using agile project management tools (such as VersionOne), enterprises often do story point roll-up (adding up) across both space and time dimensions:

  • Space dimension: project hierarchies (which may span different application domains), epic hierarchies which may span different teams or projects, feature group hierarchies, goals, programs and portfolios
  • Time dimension: sprints (iterations), release cycles

Enterprises also do other kinds of story point math, such as story point averaging, % completion in progress bars, burn-up charts showing accepted number of story points across days in a sprint or across sprints in a release cycle, velocity trends and projections for teams, programs, portfolios and the whole enterprise.  In addition, many reports are generated to show story point and velocity metric consolidating data across projects in a program or across programs in a portfolio.

For example, if story point 1 for Team A represents 1 ideal day of work, and story point 1 for Team B represents 1.5 ideal days of work, then rolling up or addition of story points across Teams A and B that make a program does not make sense.  In fact, any math involving story points across Team A and Team B does not make sense as the amount of work represented by a story point for each team is different, i.e., story point scales used by different teams are not the same.   If you simply add up those story points, it would be similar to adding up financial results of different international divisions of a multi-national company by simply adding up their financial data represented in each division’s local currency without any currency conversion, i.e., currency normalization.

For the same reasons, epic points calculated based on a roll-up of story points of stories making up the epic will not make sense if those stories are estimated by different teams with different story point scales.   Program or portfolio or enterprise velocities calculated by rolling up velocity numbers of different teams with different story point scales will not make sense.

For any story point math or story point reporting to make sense, a single story point of work needs to indicate the same amount of work across teams and sprints.   This is a major and critical requirement which is often taken for granted without validation: do story points represent equal amount of work effort across both space and time dimensions?  Is this requirement satisfied?  How do you know?  Or is it an article of faith born out of blissful ignorance?  Unless proven to be true through actual measurements in an enterprise, we must assume that the amount of work represented by one story point across space and time dimensions is not equal.

This challenge will be exacerbated with larger agile projects as there will be a lot more teams, projects, programs, portfolios, epics, etc., giving rise to natural variations in story point sizes and scales.

Challenge 2 – Bottom-up story point data is not available for estimating work during program and portfolio planning:  In an enterprise, business initiatives often drive portfolio planning.  Business initiatives are realized with large epics that often span multiple release cycles.   A portfolio is managed with a set of coordinated programs, where epics are broken down into features during program planning.  Each feature may take an entire release cycle and may need multiple sprints to complete.   A program is assigned to multiple teams, features are broken down into small stories that can be parceled out to different teams, and each story is small enough to be completed in a single sprint.   During portfolio and program planning sessions, most epics and features are not yet broken down into stories.  So it is not practical to roll up story point estimates of all lower-level stories (most stories and their estimates are not available yet).   As a result, it is very difficult to estimate how much work is involved in an epic hierarchy, program or portfolio.

Challenge 3 – Yesterday’s weather model requirements may not apply:   When an agile team is about to plan its very first sprint or has completed only 1 or 2 sprints, there is very little reliable historical velocity data that can be used for forecasting the velocity of an upcoming future sprint.  This can be characterized as the “start-up phase” problem; the expectation is that a team will reach stable velocity within a few sprints, and start benefitting from the stable environment of yesterday’s weather model.   Keep in mind that at any point in time if a team composition changes in a major way, it is effectively a new team; it is thrown back to the start-up phase and has to wait for at least few sprints to regain stable environment and reestablish stable velocity.   The yesterday’s weather model requirements need to be examined for each sprint (they cannot be taken for granted).    Whenever, yesterday’s weather model is not applicable, past velocity measure is not a leading indicator of future velocity for an agile team.  In the start-up phase, we need other techniques for forecasting future velocity.

Furthermore, yesterday’s weather model requirements are difficult to hold as agile projects scale up. Even if yesterday’s weather model may be valid for a specific individual team, it becomes really challenging to hold that model as you scale up from a single-team agile project to a multi-team agile program, to a multi-program agile portfolio to a multi-portfolio enterprise.  If dependencies among multiple teams are not minimized and managed well, they will adversely impact the velocity of one or more teams.  In a large project, it is not enough to resolve team-level impediments.  There are impediments that need to be resolved at program and portfolio levels too, which may impact several teams.

Let us assume that for a single team, the probability of holding yesterday’s weather model requirements is 90%, i.e., there is 10% probability that one or more requirements of the yesterday’s weather model will not be satisfied for a single team as it moves from one sprint to the next sprint.  What is the probability that yesterday’s weather model is applicable to a program of 4 teams running in parallel (i.e., so-called Scrum of Scrum of 4 teams)?  That probability will reduce from 90% for each individual team to 0.9 x 0.9 x 0.9 x 0.9 = 0.656 = 66% for a program of 4 teams.    There is only 66% probability that all the requirements under yesterday’s weather model will hold for the next sprint for an entire program of 4 teams.  With similar probability math, you can determine that there is only 19% probability that all the requirements under yesterday’s weather model will hold for the next sprint for an entire portfolio of 4 programs (a total of 16 teams); and there is only 0.1% (almost zero) probability that all the requirements under yesterday’s weather model will hold for the next sprint for an entire enterprise of 4 portfolios (a total of 64 teams).

It may be easier to apply yesterday’s weather model to a small geographic region (a team); it is not easy at all to scale it up to a large geographic area (programs, portfolios and enterprise), spanning larger duration of time (multiple sprints and release cycles).  Sooner or later, the weather pattern is going to change somewhere over a large space (adversely impacting one or more teams in an enterprise), and at some point in time over a large time span (adversely impacting one or more sprints).

The bottom line:  We may not be able to depend on the past velocity data to forecast future velocity for a larger program or portfolio because the basic requirements for stable yesterday’s weather model for programs and portfolios are difficult to hold and sustain.  We need other solutions for estimating and forecasting without depending on yesterday’s weather model.

In Part 3 of the blog series, I will explain two existing and published scalable agile estimation methods and present my critique of those methods.  The first method is the one covered by Mike Cohn in his Agile Estimating and Planning book.  The second method is the story point normalization method used in SAFe.  The SAFe method is based on identifying a baseline story of 1 ideal developer day (1 IDD) effort by each team.  Therefore, I refer to this method as the “1-IDD Normalization Method” (1NM for short).

In Part 4 I will present a scalable agile estimation method, called Calibrated Normalization Method (CNM).  I have developed, taught and applied CNM by working with clients in my agile training and coaching engagements since 2010.   Part 4 emphasizes the CNM bottom-up estimation (from teams to programs up to portfolios).   I will also explain how CNM can be used by large enterprises that have a large number of projects that may be largely independent of each other, i.e., not necessarily organized into programs and portfolios.

In Part 5 I will explain how CNM performs the top-down estimation (from portfolios to programs down to teams).   CNM estimates the scope of work at the portfolio and program levels without the need to know lower-level story point details.  In Part 5 I will also compare and relate 1NM with CNM, and explain how CNM fully address all three challenges explained here in Part 2.

Acknowledgements: I have greatly benefited from discussions and review comments on this blog series from my colleagues at VersionOne, especially Andy Powell, Lee Cunningham and Dave Gunther.

Your feedback: I would love to hear from the readers of this blog either here or by e-mail (Satish.Thatte@VersionOne.com) or hit me on twitter (@smthatte).

Part 1: Introduction and Overview of the Blog Series – published on 14 October 2013.

Part 3Review of published scalable agile estimation methods – published on 7 November 2013.

Part 4Calibrated Normalization Method for Bottom-Up Estimation – published on 18 November 2013.

Part 5: Calibrated Normalization Method for Top-Down Estimation – published on 2 December 2013.

This entry was posted in Agile Development, Agile Management, Agile Methodologies, Agile Metrics, Agile Portfolio Management, Agile Project Management, Agile Teams, Agile Velocity, Distributed Agile, Enterprise Agile, Scaling Agile. Bookmark the permalink.

4 Responses to Scalable Agile Estimation and Normalization of Story Points: Estimation Challenges Galore! (Part 2 of 5)

  1. Most of what I’ve read recently has indicated that as long as you’ve made the stories small enough they tend to be a pretty linear range and that there isn’t more than a 3-5x factor for any given story versus another story. Therefore we have chosen to only have two story sizes: the small ones are either trivial or simple. These we size at 5 points. The medium ones are kind of hard, but are still doable in a 2 week sprint by no more than 2 people. These we size at 20 points. Anything that is larger than a medium story is considered too large and must be broken down into smaller components that fit into either small or medium. We’re only a couple of months into this, but we seem to be seeing as much or more predictability now as we were when were agonizing over sizes and now it takes about 3-5 minutes per story for the team to size versus about 10 minutes per story before. This saves us lots of time which is used for productive work instead of sizing.

    • Satish Thatte says:

      Hi Ted,
      You have outlined an interesting technique for story size estimation. The challenge I see is that as you scale up to team of teams (programs) and program of programs (portfolios), how would you ensure that the “5 story point” stories across all teams, programs and portfolios across all sprints represent similar amount of work effort? similarly, “20 story point” stories across all across all teams, programs and portfolios across all sprints represent similar amount of work effort? and finally, “20 story point” stories are indeed 4 times as much effort as “5 story point” stories for all teams, programs, portfolios across all sprints? Without these assurances, story point math, reports, progress bars, etc., generated by an agile project management tool will not be meaningful. This is the essence of story point normalization problem, which is the topic of this 5-part blog series.

      Regards,

      Satish Thatte
      Agile/Lean Coach and Product Consultant
      VersionOne

      • Ted Schroeder says:

        I understand the point of your 5 blog series. My point is that it doesn’t matter. Recent studies indicate that as long as you’re in the “linear” range of project size (where size is anything from something completely trivial to something that can take at most two weeks for a single person) then you’re just as accurate making everything a single point as doing any other project size normalization or whatever. So, making your stories small enough solves all normalization issues and you’ll be just as accurate as if you spend a lot of time trying to get the “perfect” size. The moral of the story is “Spend your time doing the important thing, figuring out how to create the right size stories, not trying to figure out how big your stories are”.

        • Satish Thatte says:

          I have no argument with making stories small. Small batch sizes are good as lean methods teach us. I am not talking about project sizing or spending effort to make “perfect” size stories as you have alleged. Even after a team has spent time to split epics or large stories into small size stories, making them equal size is a non-trivial effort, and could be argued as wasteful (it is meta-work). When you have several programs in a portfolio, and each program with 5 to 10 teams, ensuring that all teams make the SAME equal size stories across all sprints is even more difficult. So the normalization challenge does not go away. My blog offers three classes of techniques to address the challenge. The blog does not advocate large-size stories, or “perfect” size stories. It does not recommend spending time on wasteful activities. Project sponsors do need estimates about effort, cost or schedule. This reality for most organizations does not and will not go away.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>