Archive

Posts Tagged ‘root cause analysis’

Why NOT to avoid (or forget) “Walking Skeletons”

January 8, 2010 Comments off

At Point2, we have recently embraced using the concept of a “Walking Skeleton” as the first development work we do when starting a new project/module/bundle. This approach allows us make sure all of the overhead of a new project is accounted for and functioning properly before our project becomes too complex to allow for. We create the walking skeleton as part of our first sprint which ensures we have some deliverable by the end of our first iteration. By the time our walking skeleton is complete we have worked all of the kinks out of our CI, deployment, and testing strategies which are very straight forward still at this point.

Unfortunately, when my team recently started its current project, we dropped the ball when it came to finishing the walking skeleton end-to-end before starting on more complex tasks. It didn’t occur to as at the time that this was a bad thing because we were still making visible progress. In retrospect though, we acknowledged the difficulties and extra tasks we had created for ourselves by neglecting to help the skeleton take its first steps.

One issue we encountered was not being able to easily demonstrate new functionality to the product owners for sign-off. Because we had no certain way of running data through our application end-to-end (not even “Hello World”), we ended up fudging steps in the process just to see the desired results. This didn’t allow the business to try out the features without first having knowledge of the internal mechanics of the product. It also made it difficult, if not impossible, to properly functionally test the system in a true black-box fashion.

Another speed bump we ran into was the extra refactoring we found ourselves doing because the interfaces between components in our system were still evolving. We had not pushed data through each moving part so this meant that the way the parts fit together had not been clearly considered and defined. As the pieces came together we realized different interfaces were more appropriate and with interface refactoring comes unit test refactoring. Now I normally encourage a healthy dose of refactoring to every piece of code but not when I just finished the code for the task earlier that morning. Had we actually pushed something all the way through the pipe, we might have realized earlier that our initial architecture was not appropriate and, in fact, didn’t even make sense.

So why did this happen in the first place? After recognizing what happened, my team agreed we should perform a root cause analysis. This activity produced the following Ishikawa diagram:

Walking Skeleton Fail

The causes we came up with were interesting but not surprising. We decided the main reason we forgot about finishing the walking skeleton was we were just too excited to get started on the new project, using new technology.

We came up with two action items to address the issues we came up with:

  1. Simply remember to plan for a walking skeleton next project.
  2. Blog about our experience to help others avoid the same problems. 😉

By: Jesse Webb

A Positive Root Cause Analysis

January 8, 2010 1 comment

At the end of every development iteration we always do a retrospective and a root cause analysis meeting. In our retrospective we cover what went well, what did not go so well, and come up with action items for things to try in our next sprint. In our root cause analysis meetings we usually pick some problem that occurred, try to figure out why it happened, then understand how we could prevent this problem from happening again in the future. Our root cause analysis meetings are usually always on a negative topic. However, during our recent project we decided to change things up and do a root cause analysis of the following question:

Why did our last project go so well?

This may seem like a strange question to ask, but often in root cause analysis meetings we tend to focus only on the mistakes made during a project. When problems occur you need to identify the root causes of those problems to prevent them from happening again. While in a project that goes well it is just as important to recognize what we did right this time to ensure we understand “why”. What did we do different this time that worked? What were the crucial decisions made that kept the project on track? In our root cause analysis we found some key decisions, that in retrospect were important but at the time did not seem critical.

Failing fast when a decision starts to become problematic
For our project we thought we had a clear, straight forward design to work with from the beginning. However, after spending even just a day spiking some ideas our design immediately started to show cracks. Our design that on the surface looked simple, turned out to be far more complicated to implement than we had imagined. A large part of the reason for this problem was that our project was to make changes to an existing application about which no one on our project had any previous knowledge. We immediately had a team huddle, called a “Just In Time” design meeting and corrected our course. As a result, we lost a day, instead of a week or a month going down the wrong path.

Consulting experts early
When we started our project we knew of a few ways to accomplish our task, we had received suggestions that sounded fine, but we really were not sure if our approach was the best solution available. Fortunately we have some very experienced people in our company that have spent many years contracting and as a result have an incredible variety of experiences from which to draw upon. So we called a quick design meeting with one of these experts, showed them what we were thinking of doing and just picked their brain for ideas. It turned out our expert was able to come up with an approach to our problem that not only would allow us to complete the task within the time-line given to us by our business team, but at the same time would allow us to implement a cleaner solution.

Keeping code ownership high
We had no one person on the team that if they were sick for a day it would prevent a task from being completed. During our project we made sure every line of code was written with a pair (We always try to pair program every line of code) and switch pairs regularly. Because of this knowledge sharing we did not have any “Experts” on any one area of the application. We always had at least 2-3 members of the team who were knowledgeable enough on any given area of the application be able to bring another developer up to speed.

Break all stories into small tasks with a clear definition of “Done”
The stories we work on during a sprint always show the business value we are adding, but from a developers perspective there are usually multiple tasks required to complete each story. At the start of each sprint we held a task breakdown meeting for breaking each story down into a set of small tasks. Our team found that having a set of clearly defined tasks for each story was very important to keeping the project on track. With any story we receive from the business team their will be questions and as a result we found that doing this task breakdown meeting helped flush out many of those questions at the start of the sprint, as opposed to after development had already began, which in the past was usually what happened. It made it clear to our team lead and business analyst exactly what work was being done, who was doing it, what tasks had been completed, and what tasks had not yet been started. Also this gave our business analyst and team lead a better idea of when to expect demos since they could see how many tasks were remaining before a story would be completed.

Demo to business team often
We started our project doing a fairly poor job of demoing but this was corrected after one of our sprint retrospectives. Business analysts need to see the work being done and often will think of something that was missed, or see something that perhaps spawns another story. One of the easiest ways to ensure that what is being developed matches what the business team wants is to keep them in the loop and an excellent way to do that is through frequent demos.

Quick feedback from business team
During development there are always going to be cases in business logic spotted by developers that were missed during the initial planning phase. When a developer spots a missed case and brings it up to the business team, quick feedback from the business team can play a major role in keeping the project on schedule. In our last project this turn around time was often hours, if not shorter in most cases.

Keep the systems team involved from the start
The people who will be deploying the application and hosting it should be involved right from the start of the project. Your systems team has experienced many deployments and also know the pain of hosting a problematic application. Allowing the systems team, who will be responsible for the application after development has been completed, to be involved in key decisions can greatly improve the chances of a successful deployment and potentially reduce the cost of hosting the application.

Conclusion
Having your team do a positive root cause analysis can be very useful. It sometimes seems like after a problem in one sprint, we focus in the next sprint so much on improving in that one area that we sometimes slip in areas where we were previously doing well (for our team it was demos). On previous projects I have worked on, we definitely tried to follow each of these best practices outlined in our positive root cause analysis. However, since we moved to Agile almost two years ago, this was the first project where everything just “clicked”.

by Brian Richardson

Why Did That Happen?

May 15, 2009 1 comment

During our Sprints it is not uncommon for some problem event to occur.  Such events could be anything from a story taking considerably longer to complete than it was estimated to take, to a production bug being injected into a Sprint.  As a team we would usually acknowledge these events in our retrospective and make a few brief comments about how to eliminate similar occurrences in the future.  It seemed however as though we would promptly forget about the problem…until it happened again.

After seeing these problem events continue to occur, my team started throwing around the idea of doing root cause analysis. The hope was to get to the bottom of why these things were happening in the first place.

At first we were not quite sure how to conduct a formal root cause analysis, but after a little research we got ourselves pointed in the right direction. Our first analysis was a great exercise in drilling down into the heart of something the team saw as a problem. By focusing on one specific problem we were able to:

  • identify a plethora of individual areas that led to the problem in question.
  • isolate causes that could immediately be acted upon.
  • identify causes that we as a team could not solve alone.
  • bring causes into the foreground spurring discussion, and setting the team up to be mindful of them in the future.

As a team we have seen the benefit of analyzing our problem events and have integrated root cause analysis into our Sprint cycle. Just like we have a retrospective at the end of the Sprint we also have a root cause analysis session. One team member is responsible for presenting a problem event to the team and leading the analysis. To date we’ve done this three times and have used two different methods of analysis (Ishikawa Diagram, Cause Mapping). The way the analysis is run is completely at the session leader’s discretion.

We’ve already started to deal with areas that were immediately actionable, and have at least started to get the ball rolling in some areas that require a little more thought and organization. It is no doubt in my mind that as my team continues to identify and eliminate the root cause of our most crippling problems we will reach that hyper-productive state that so many Scrum teams strive for.

By Hemant J. Naidu