Posts Tagged ‘acceptance tests’

Levels of Testing

June 24, 2009 Comments off

I’ve had reason recently to do some thinking on the various “levels” of software testing. I think there’s a rough hierarchy here, but there’s some debate about the naming and terminology in some cases. The general principals are pretty well accepted, however, and I’d like to list them here and expound on what I think each level is all about.

An important concern in each of these levels is to achieve as high a level of automation as possible, along with some mechanism to report to the developers (or other stakeholders, as required) when tests are failing, in a way that doesn’t require them to go somewhere and look at something. I’m a big fan of flashing red lights and loud sirens, myself ūüôā

Unit testing is one of the most common, and yet in many ways, misunderstood levels of test. I’ve got a separate rant/discussion in the works about TDD (and BDD), but suffice it to say that unit-level testing is a fundamental of test-driven development.

A unit test should test one class (at most – perhaps only part of a class). All other dependencies should be either mocked or stubbed out. If you are using Spring to autowire classes into your test, it’s definitely not a unit test – it’s at least a functional or integration test. There should be no databases or external storage involved – all of those are external and superfluous to a single class that you’re trying to verify is doing the right thing.

Another reason to write comprehensive unit tests is that it’s the easiest place to fix a bug: there are fewer moving parts, and when a simple unit tests breaks it should be entirely clear what’s wrong and what needs to be fixed and how to fix it.

As you go up the stack to more and more complex levels of testing, it becomes harder and harder to tell what broke and how to fix it.

Generally unit tests for any given module are executed as part of every build before a developer checks in code – sometimes this will also include some functional tests as well, but it’s generally a bad idea for any higher-level tests to be run before each and every check-in (due to the impact on developer cycle-time). Instead, you let your CI server handle that, often on a scheduled basis.

Some people suggest that functional and integration are not two separate types, but I’m separating them here. The key differentiation is that a functional test will likely span a number of classes in a single module, but not involve more than one executable unit. It likely will involve a few classes from within a single classpath space (e.g. from within a single jar or such). In the Java world (or other JVM-hosted languages), this means that a functional test is contained within a single VM instance.

This level might include tests that involve a database layer with an in-memory database, such as hypersonic – but they don’t use an *external* service, like MySQL – that would be an integration test, which we explore next.

Generally in a functional test we are not concerned with the low-level sequence of method or function calls, like we might be in a unit test. Instead, we’re doing more “black box” testing at this level, making sure that when we pour in the right inputs we get the right outputs out, and that when we supply invalid input that an appropriate level of error handling occurs, again, all within a single executable chunk.

As soon as you have a test that requires more than one executable to be running in order to test, it’s an integration test of some sort. This includes all tests that verify API contracts between REST or SOAP services, for instance, or anything that talks to an out-of-process database (as then you’re testing the integration between your app and the database server).

Ideally, this level of test should verify *just* the integration, not repeat the functionality of the unit tests exhaustively, otherwise they are redundant and not DRY.

In other words, you should be checking that the one service connects to the other, makes valid requests and gets valid responses, not comprehensively testing the content of the request or response – that’s what the unit and functional tests are for.

An example of an integration test is one where you fire up a copy of your application with an actual database engine and verify that the operation of your persistence layer is as expected, or where you start the client and server of your REST service and ensure that they exchange messages the way you wanted.

Acceptance tests often take the same form as a functional or integration test, but the author and audience are usually different: in this case an acceptance test should be authored by the story originator (the customer proxy, sometimes a business analyst), and should represent a narrative sequence of exercising various application functionality.

They are again not exhaustive in the way that unit tests attempt to be in that they don’t necessarily need to exercise all of the code, just the code required to support the narrative defined by a series of stories.

Fitnesse, Specs, Easyb, RSpec and Green Pepper are all tools designed to assist with this kind of testing.

If your application or service is designed to be used by more than one client or user, then it should be tested for concurrency. This is a test that simulates simultaneous concurrent load over a short period of time, and ensures that the replies from the service remain successful under that load.

For a concurrency test, we might verify just that the response contains some valid information, and not an error, as opposed to validating every element of the response as being correct (as this would again be an overlap with other layers of testing, and hence be redundant).

Performance, not to be confused with load and scalability, is a timing-based test. This is where you load your application (either with concurrent or sequential requests, depending on it’s intended purpose) with requests, and ensure that the requests receive a response within a specified time frame (often for interactive apps a rule is the “two second rule”, as it’s thought that users will tolerate a delay up to that level).

It’s important that performance tests be run singly and on an isolated system under a known load, or you will never get consistency from them.

Performance can be measured at various levels, but is most commonly checked at the integration or functional levels.

A close relative of, but not identical with concurrency tests are load and/or scalability tests. This is where you (automatically) pound on the app under a simulated user (or client) load, ideally more than it will experience in production, and make sure that it does not break. At this point you’re not concerned with how slow it goes, only that it doesn’t break – e.g. that you *can* scale, not that you can scale linearly or on any other performance curve.

Quality Assurance
Many Agile and Lean teams eschew a formal quality assurance group, and the testing such a group does, in favor of the concept of “built in” QA. Quality assurance, however, goes far beyond determining if the software perform as expected. I have a detailed post in the works that talks about how else we can measure the quality of the software we produce, as it’s a topic unto itself.

Alpha/Beta deployments
Not strictly testing at all, the deployment of alpha or beta versions of an application nonetheless relates to testing, even though it is far less formalized and rigorous than mechanized testing.

This is a good place to collect more subjective measures such as usability and perceived responsiveness.

Manual Tests
The bane of every agile project, manual tests should be avoided like the undying plague, IMO. Even the most obscure user interface has an automated tool for scripting the testing of the actual user experience – if nothing else, you should be recording any manual tests with such a tool, so that when it’s done you can “reply” the test without further manual interaction.

At each level of testing here, remember, have confidence in your tests and keep it DRY. Don’t test the same thing over and over again on purpose, let the natural overlap between the layers catch problems at the appropriate level, and when you find a problem, drive the test for it down as low as possible on this stack – ideally right back to the unit test level.

If all of your unit test are perfect and passing, you’d never see a failing test at any of the other levels, theoretically. I’ve never seen that kind of testing nirvana achieved entirely, but I’ve seen projects come close – and those were projects with a defect rate so low it was considered practically unattainable by other teams, yet it was done at a cost that was entirely reasonable.

By: Mike Nash


Creating a Fitnesse Wiki Widget

April 2, 2009 Comments off


We are using Fitnesse to acceptance test our REST apis.  One of our calls requires that a UUID be created and then used in several places throughout the test.  The syntax we wanted to use was :-

!define myUUID ${!uuid}

This however is not possible, because the !define widget in fitnesse does not support value interpolation.  So, we hit on a revised syntax which looks like this:-


This assigns a randomly generated UUID to the variable myUUID, which is then referencable by the usual fitnesse wiki syntax ${myUUID}.  The implementation looks like this:-

import java.util.UUID;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import fitnesse.wikitext.widgets.ParentWidget;
import fitnesse.html.HtmlUtil;

public class GuidWidget extends ParentWidget {

public static final String REGEXP = “!uuid\\{[\\w]+\\}”;
private static final Pattern pattern = Pattern.compile(“!uuid\\{([\\w]+)\\}”);

private String varName;

public GuidWidget(ParentWidget parent, String text) throws Exception {
final Matcher matcher = pattern.matcher(text);
varName =;

public String render() throws Exception {
this.parent.addVariable(varName, UUID.randomUUID().toString());
return HtmlUtil.metaText(varName + ” assigned a UUID”);

public String asWikiText() throws Exception {
return “!uuid{” + varName + “}”;

When researching how to implement my own custom widget, I read several posts complaining that !define myVar {!MyWidget(thing)} does not evaluate the custom widget in the define widget.  Well, the ParentWidget class, which you have to extend, allows you to call parent.addVariable(varName, varValue) which does the magic for you.
The one slightly tricky part was that you need to register your widget in the file, and then make sure the new class is in the classpath of fitnesse (not your tests via the !path property).  To do that, all you need to do is add the compiled class to a jar, put the jar in your fitnesse runner lib directory, then amend the run.bat or  Our new run.bat looks like this:-

java -cp fitnesse.jar;lib\guid.jar fitnesse.FitNesse ....

When fitnesse starts, it reads all the widgets in the WikiWidgets property and loads them.  It then outputs a list of the widgets that it has added to the console out.

By: Chris Tarttelin

User Stories and Such

March 10, 2009 1 comment

StoryI have been attending sessions and tutorials from the Agile track at SD West for the most part. I am interested in learning about how the experts recommend we write user stories, acceptance tests, AMDD and Agile estimation and planning. This post¬†is about the ‘From Stories to Automated Acceptance Tests’ session by Brett Schuchert. Here are some recommendations I have gathered:

  • We need acceptance tests in the stories to let the devs know when they are ‘done’.¬† We already do this, and our devs are usually good about pointing out when acceptance tests are incomplete or vague.
  • Acceptance tests should be¬†run on a staging machine before we mark the story as complete.
  • Acceptance tests should not be technology specific. “Click the OK button” implies there is an OK button, which is a web design call, not a BA call. Instead the BA should state the acceptance criteria to be “The user indicates he is done”
  • An acceptance test should have a few assertions. Many assertions in¬†an acceptance test is an indication that the story needs to be be split up.
  • Too much mocking can lead to integration failures.
  • User stories should be specific, concrete rather than an abstraction. Examples may be provided to illustrate the point. A good¬†story should not be vague, should not be time bound (some event that might occur in the future is an example of this – this is not testable), should be specific and not too broad.
  • Include stories of conflict or error that we would like to handle.
  • Remember INVEST and SMART

Acceptance test automation: There is a lot of talk at the conference about FitNesse and Slim. I saw some examples of these being used for acceptance tests. Since FitNesse is wiki based, we can add text or images to describe what the purpose of the tests are. This additional information is ignored by FitNesse. Also, the FitNesse configs should not be machine specific (should not include reference to paths, etc.)

By: Veena Mankidy