Monday 28 December 2009

big UI changes and their effect on tests

I recently read this post in Brian Marick's blog, and it set me thinking. He's talking about a test whose intention in some way survived three major GUI revisions. The test code had to be rewritten each time, but the essence of it was retained. He says:

I changed the UI of an app and so… I had to rewrite the UI code and the tests/checks of the UI code. It might seem the checks were worthless during the rewrite (and in 1997, I would have thought so). But it turns out that all (or almost all) of those checks contained an idea that needed to be preserved in the new interface. They were reminders of important things to think about, and that (it seemed to me) repaid the cost of rewriting them.

That was a big surprise to me.

I'm not sure why Brian is so surprised about this. If the user intentions and business rules are the same, then some aspects of the tests should also be preserved. A change in UI layout or technology should mean superficial changes only. In fact, one of the main claims for PyUseCase is that by having the tests written in a domain language decoupled from the specifics of the UI, it enables you to write tests that survive major UI changes. In practice this means when you rewrite the UI, you are saved the trouble of also rewriting the tests. So Geoff and I decided to write some code and see if this was true for the example Brian outlines.

In the blog post, there is only one small screenshot and some vague descriptions of the GUIs these tests are for, so we did some interpolation. I hope we have written an application that gets to the gist of the problem, although it is undoubtedly less beautiful and sophisticated than the one Brian was working on. All the code and tests is on launchpad here.

We started by writing an application which I hope is like his current GUI. You select animals in a list, click "book" and they appear in a new list below. You select procedures from another list, and unsuitable animals disappear.



In my app, I had to make up some procedures, in this case "milking", which is unsuitable for Guicho (no udders on a gelding!), and "abdominocentesis" which is suitable for all animals, (no idea what that is, but it was in Brian's example :-). Brian describes a test where an animal that is booked should not stay booked if you choose a procedure that is unsuitable for it, then change your mind and instead choose a procedure that it is suitable for.


select animals Guicho
book selected animals
choose procedure milking
choose procedure abdominocentesis
quit
This is a list of the actions the user must take in the GUI. So Guicho should disappear when you select "milking", and reappear as available, but not as booked, when you select "abdominocentesis". This information is not in the use case file, since it only documents user actions.

The other part of the test is the UI log, which documents what the application actually does in response to the user actions. This log is auto generated by pyUseCase. For this test, I won't repeat the whole file, (you can view it here), but I will go through the important parts:

'select animals' event created with arguments 'Guicho'

'book selected animals' event created with arguments ''

Updated : booked animals with columns: booked animals ,
-> Guicho | gelding

This part of the log shows that Guido is listed as booked.


'choose procedure' event created with arguments 'milking'

Updated : available animals with columns: available animals , animal type
-> Good Morning Sunshine | mare
-> Goat 3 | goat
-> Goat 4 | goat
-> Misty | mare

Updated : booked animals with columns: booked animals ,


So you see that after we select "milking" the lists of available and booked animals are updated, Guicho disappears, and the "booked animals" section is now blank. The log goes on to show what happens when we select "abdominocentesis":


'choose procedure' event created with arguments 'abdominocentesis'

Updated : available animals with columns: available animals , animal type
-> Good Morning Sunshine | mare
-> Goat 3 | goat
-> Goat 4 | goat
-> Guicho | gelding
-> Misty | mare

'quit' event created with arguments ''


ie the "available animals" list is updated and Guicho reappears, but the booked animals list is not updated. This means we know the application behaves as desired - booked animals that are not suitable for a procedure do not reappear as booked if another procedure is selected.

Ok, so far so good. What happens to the test when we compeletely re-jig the UI and it instead looks like this?



Now there is no book button, and you book animals by ticking a checkbox. Selecting a procedure will remove unsuitable animals from the list in the same way as before. So now if you change your mind about the procedure, animals that reappear on the list should not be marked as booked, even if they were before they disappeared. There is no separate list of booked animals.

What we did was take a copy of the tests and the code, updated the code, and see what we needed to do to the tests to make them work again. In the end it was reasonably straightforward. We didn't re-record or rewrite any tests. We just had to modify the use cases to remove the reference to the book button, and save new versions of the UI log to reflect the new UI layout. The use case part of the test looks like this now:


book animal Guicho
choose procedure milking
choose procedure abdominocentesis
quit

which is one line shorter than before, since we no longer have separate user actions for selecting and booking an animal.

So updating the tests to work with the changed UI consisted of:
  1. remove reference to "book" button in UI map file, since button no longer exists
  2. in use case files for all tests, replace "select animals x, y" with a line for each animal, "book animal x" and "book animal y".
  3. Run the tests. All fail in identical manner. Check the changes in the UI log file using a graphical diff tool, once. (no need to look at every test since they are grouped together as identical by TextTest)
  4. Save the updated use cases and UI logs. (the spurious line "book selected animals" is removed from the use case files since the button no longer exists)
  5. Run the tests again. All pass.
The new UI log file looks like this:

'book animal' event created with arguments 'Guicho'

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box (checked) | Guicho | gelding
-> Check box | Misty | mare

'choose procedure' event created with arguments 'milking'

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box | Misty | mare

'choose procedure' event created with arguments 'abdominocentesis'

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box | Guicho | gelding
-> Check box | Misty | mare

'quit' event created with arguments ''
It is quite explicit that Guicho is marked as booked before he disappears, and not checked when he comes back. Updating the UI map file was very easy - we viewed it in a graphical diff tool, noted the new column for the checkbox and the lack of the list of booked animals were as expected, and clicked "save" in TextTest.

I only actually had like 5 tests, but updating them to cope with the changed UI was relatively straightforward, and would still have been straightforward even if I had had 600 of them.

I'm quite pleased the way PyUseCase coped in this case. I really believe that with this tool you will be able to write your tests once, and they will be able to survive many generations of your UI. I think this toy example goes some way to showing how.

Wednesday 16 December 2009

PyUseCase 3.0

Geoff has been working really hard for the past few months, writing pyUseCase 3.0. It has some very substantial improvements over previous versions, and I am very excited about it. He's written about how it works here.

It's a tool for testing GUIs with a record-replay paradigm, that actually works. Seriously, you can do agile development with these tests, they don't break the minute you change your GUI. The reason for this is that the tests are written in a high level domain language, decoupled from the actual current layout of your GUI. The tool lets you create and maintain a mapping file from the current widgets to the domain language, and helps you to keep it up to date.

In a way it's a bit like Robot, or Twist, or Cucumber, that your tests end up being very human readable. The main difference is the record-replay capability. Anyone who can use the application GUI can create a test, which they can run straight away. With these other tools, a programmer typically has to go away and map the user domain language of the test into something that actually executes.

The other main way in which pyUseCase is different from other tools, is the way it checks your application did the right thing. Instead of the test writer having to choose some aspects of the GUI and make assertions about what they should look like, pyUseCase just records what the whole GUI looks like, in a plain text log. Then you can use TextTest to compare the log you get today with the one you originally recorded when you created the test. The test writer can concentrate on just normal interaction with the GUI, and still have very comprehensive assertions built into the tests they create.

pyUseCase, together with TextTest, makes it really easy to create automated tests, without writing code, that are straightforward to maintain, and readable by application users. Geoff has been developing his approach to testing for nearly a decade, and I think it is mature enough now, and sufficiently far ahead of the competition, that it is going to transform the way we do agile testing.

:-D

Thursday 10 December 2009

Jens Östergaard on Scrum

Today I listened to a presentation about "Scrum for Managers" from Jens Östergaard. He's a big, friendly Dane who grew up in Sweden, and now lives in the UK. I first met Jens at XP2003 in Genoa, when he had just run his first successful Scrum project. These days he spends his time flying around the world, teaching Scrum courses and coaching Scrum Masters. (He'll be doing 2 more CSM courses in Göteborg in the next 6 months, and speaking at Scandinavian Developer Conference).

One thing I noticed about his talk was that most things about Scrum hardly seem to have changed at all. Jens was largely using the same language and examples that are in the original books. The other thing that struck me was that Jens said nothing about the technical practices that are needed to make agile development work. In my world, you can't possibly hope to reliably deliver working software every sprint/iteration if you havn't got basic practices like continuous integration and automated testing in place. I asked Jens about this afterwards, and he said it was deliberate. Scrum is a project management framework that can be applied to virtually any field, not just software development. Therefore he didn't want to talk about software specific practices.

When I first heard Ken Schwaber talk about Scrum (keynote at XP2002) I'm farily sure he included the XP developer practices. I can't find my notes from that speech, but I remember him being very firey and enthusiastic and encouraging us to go out and convert the world to Scrum and XP (the word agile wasn't invented then).

Scrum has been hugely successful since then. Today we had a room full of project managers and line managers who all knew something about Scrum, many of whom are using it daily in their organizations. Scrum is relatively easy to understand and get going with at the project level, and has this CSM training course that thousands of people have been on. These are not bad things.

I do think that dropping the XP development practices entirely from the description of Scrum is unhelpful. I chatted with several people who are having difficulty getting Scrum to work in their organizations, and I think lack of developer practices, particularly automated testing, is compounding their problems. I think a talk given to software managers needs to say something about how developers might need coaching and training in new practices if they are going to succeed with Scrum.

Friday 4 December 2009

Scandinavian Developer Conference 2010


The programme for Scandinavian Developer Conference has just been published. I think we have a fantastic line up of speakers this year. I am particularly pleased Michael Feathers, Brian Marick and Diana Larsen have agreed to join us, and that this year my husband Geoff is also a speaker.

I have met Michael and Diana at many XP conferences over the years, but I missed Brian Marick the one time I was at the agile conference in North America, so I'm particularly interested to hear what he has to say. He has been very influential in the testing community, and invented the idea of testing quadrants, which I think is a very helpful way of thinking about testing.

Michael Feathers is known for his book "Working Effectively with Legacy Code", which I reviewed early drafts of back in like 2004. He and I also competed together in "Programming with the Stars" at agile2008. Michael works for Object Mentor, coaching teams in all things agile.

Diana Larsen is chairman of the agile alliance, and has written a book about retrospectives together with Esther Derby. I think I first met her at XP2005, when I attended her tutorial, which I remember as outstanding. It was very interactive and all about communication skills and teambuilding. Her job seems to be all about teaching the people skills needed for agile to work.

Geoff is going to be talking about texttest, which goes from strength to strength, and productive GUI testing with pyUseCase. Geoff has been doing an awful lot of work on this tool lately, and I am really excited about the possibilities it opens up for agile testing. I will have to write a separate post on that though, so watch this space :-)

Many of the other speakers are familiar faces who I look forward to meeting up with again - Bill Wake, (books about refactoring, XP and Ruby), Erik Lundh, (the earliest Swedish XP coach), Niclas Nilsson (ruby, programming guru), Jimmy Nilsson (Domain Driven Design book), Neal Ford (Thoughtworks, productive programmer book), Thomas Nilsson, (CTO, Responsive, linköping), Ola Ellnestam (CEO, Agical, stockholm), Marcus Ahnve (programming guru), Chris Hedgate (programming guru)...

I'm also very pleased that I'm going to be speaking again this year, after the success of my previous presentation on "clean code". This year I hope to talk about agile testing and how best to approach it.

One of the reasons I keep going back to the XP conference is the amount of interaction and discussion generated by the many workshops and open space sessions. There are very few straight talks, and all are either presentations of academic papers, or keynotes. When I saw the proposed programme for SDC a couple of weeks ago, I felt it was lacking something. Eight parallel tracks of presentations is all very well, but where is the interaction, the whole reason to go to a conference and not just watch presentations on infoq? So I proposed a ninth "track", devoted to discussion, called "conversation corner". Luckily my colleagues at iptor, who are organizing the conference, liked my idea.

To get the conversations going, I am organizing four "fishbowl" style discussions, seeded by conference speakers. I've picked topics that interest me, and invited other conference speakers, who I think are also interested in these topics, to join me.

I am hoping that after participating in one or two of my fishbowls, some conference attendees might feel comfortable proposing discussions of their own. To that end there will be a board with timeslots and index cards, so people can write up their topic, assign it to an empty timeslot, and hence invite more people to join them.

It won't be full blown open space, there will be no opening meeting with everyone, or two minute pitches proposing sessions. I won't be explaining the law of two feet or the open space rules. But it is a step in that direction, and I hope a complement to the organized speeches going on in the rest of the conference.

Perhaps you'd like to join us at the conference? Register here.

Thursday 26 November 2009

Iptor


It occurred to me that I should mention that I now work for Iptor Konsult AB rather than IBS JavaSolutions. This is not due to my changing jobs, rather that IBS decided that since we don't do the same things as the rest of the company, we should have a separate name and image. We are even going to be a separate legal entity, although still wholly owned by IBS.

I think it's a very positive development for both parties, and I personally am much more comfortable standing up and presenting myself as from Iptor than I ever was when it was IBS JavaSolutions. For a start it's not also the name of a rather unpleasant illness, and secondly it means I can avoid mentioning Java. A language I now work with daily, but don't enjoy nearly so much as python.

In practice though, the name change probably won't make that much difference in my daily life.

TextTest progress

When I wrote this post about the public Texttest nightjob statistics, I thought they were a bit confusing and hard to understand. Geoff has now rewritten the page to contain fewer numbers, and just report the passed and failed tests from the previous night. I think it's a bit easier to read now. He has also ditched about 500 tests that were running on Solaris, since that platform is not used often, and these tests never found bugs that weren't also found on Linux.

The statistics are still pretty impressive though, don't you think?

Sunday 22 November 2009

TestNG - my opinions

It's Java Forum next week, here in Göteborg. I'm giving a short talk about TestNG, a tool I've been using lately.

My basic conclusion is that TestNG is a very easy step from JUnit, and one you don't need to take if all your tests are true unit tests (ie fast and independant). TestNG has some nice features which help when your tests are slow and/or have external dependencies, especially if they are mixed together in the same test classes as true unit tests. I think it's pretty useful for unit and integration tests. (aka quadrant 1, technology facing).

Having said that, what bothers me about TestNG is that it means your test code is written in Java. For me, that makes it unsuitable for for system tests, (aka quadrant 2, business facing). If you have anything resembling an involved customer, you're going to at least want to encourage them to read the system tests to verify they are correct, and to gain confidence that the system is working. Truly agile teams have these people helping write tests. Many customer types won't be happy working with Java. You might be able to get by, though, if you have descriptive test names, good javadoc, and test data in separate files that they can read.

Rather than spending time learning TestNG, I think you may get more payback from tools such as Fitnesse, Robot or TextTest, which all allow you to get customers involved in reading and even writing tests. I think it could be a perfectly sensible choice to stick with JUnit for unit tests, and use one of these tools for both integration and system tests. What you choose will of course depends on the situation, for example the size of the system, the nature of the test data, and how many tools your team is willing to learn.

Wednesday 18 November 2009

Video of "Varför går man till en Coding Dojo?"

I spoke at Smidig2009 about a month ago, and they have now put up videos of all the talks. So I just had the uncanny experience of watching myself speak (here is a link to it). I'm sure my swedish accent sounds better in my head when I'm talking, but I do apparently get my point across, since about a dozen people turned up to do a code kata with me in the open space in the afternoon.

Sunday 1 November 2009

JFokus

I've just heard that I've been accepted as a speaker at JFokus, in Stockholm in January. I'll be saying something about how to write good tests using Selenium, a tool I've been using a fair amount lately. I'm looking forward to the chance to meet up with the wider Java community in Sweden and find out what's new and what's hot.

Monday 19 October 2009

Smidig 2009

I'm off to Oslo on Wednesday for Smidig 2009. They've just announced the programme, which consists of lightning talks in the morning, and open space in the afternoons. I'll be giving a lightning talk on Friday morning about why you might want to go to a coding dojo. I'm hoping there will be enough interest to run a Randori/dojo in the afternoon. Someone commented that there have been dojo meetings in Oslo before, so I'll be interested to find out if they do them the same way I've been doing them.

The conference talks are mostly going to be in Norwegian, but mine isn't the only item in Swedish. I spotted Ola Ellnestam (agical) and Thomas Nilsson (responsive) on the list too. So if it turns out that I find spoken Norwegian totally incomprehensible, there will be a few people I can talk to!

Sunday 18 October 2009

Friday 2 October 2009

testing through the GUI costs less with the right tools

Bob Martin has just written a post in his blog where he tells the story of a test manager who has 80 000 manual tests, and wishes they were automated instead. Bob writes:

"One common strategy to get your tests automated is to outsource the problem. You hire some team of test writers to transform your manual tests into automated tests using some automation tool. These folks execute the manual test plan while setting up the automation tool to record their actions. Then the tool can simply play the actions back for each new release of the system; and make sure the screens don’t change."

Bob then goes on to explain why this is such a terrible idea - and blames it all on coupling. That the tests and the GUI are coupled to the extent that when you change the GUI, loads of tests break. Wheras humans can handle a fair amount of GUI changes and still correctly determine whether a manual test should pass or fail, machines fall over all too easily and just fail as soon as something unexpected happens. So you end up re-recording them, which can cost as much as just doing the tests manually in the first place.

These problems are of course bigger or smaller depending on the GUI automation tool you choose. Anything that records pixel positions will fall over when you simply change the screen resolution, let alone when you add new buttons and features in your GUI. More modern tools record the names or ids of the widgets, so they don't break if the widget simply moves to another part of the screen. In other words, you reduce your coupling.

Geoff has been working on PyUseCase which takes this to another level. Instead of coupling the tests to widget names, you couple them to "domain actions". This makes your tests even more robust in the face of gui changes. A drop down list can turn into a set of radio buttons and your tests won't mind, since they just say something like "select airport SFO". This doesn't isolate you from the big changes, like moving the order of the screens in a wizard around, but since the tests are written in plain text, in a language any domain expert can read, they are relatively cheap to update.

There is another respect in which machines under-perform compared to manual testers. An intelligent human will usually do a certain amount of exploration beyond the scripted test steps they have infront of them. They try to understand the purpose of the test, click around a bit and ask questions when parts of the system peripheral to the test in hand start to look odd. Machines don't do any exploration, and in fact often don't even notice errors on parts of the screen they havn't been told to look at.

Geoff's PyUseCase can partly address this kind of a problem. Used together with TextTest, it will continually scan the log the System Under Test produces, and fail the test for example if any stack traces appear. PyUseCase also automatically produces a low fidelity ascii-art-esque log of how the current screen looks, and can compare it against what it looked like last time the test ran. Changes are flagged as test failures, which will bring to your attention the change in an unrelated corner of the screen which says "32nd December" instead of "1st January".

I know that sounds like we just introduced a huge amount of coupling between the tests and the way the GUI looks, and yes, we have. The difference is that this coupling is very easy to manage. If 1000 tests all fail saying "expected: 1st January, found: January 1st", TextTest handily groups all the test failures and lets you accept or reject the change en-masse. So it is very little work to update a lot of tests when the GUI just looks different, but you don't care.

There is still a problem though, that the machine will not explore outside of the scripted steps you tell it to perform. So you will have to do some manual exploratory testing too, not everything can be automated.

So a simplistic lets-just-automate-our-manual-tests is a bad idea because machines can't handle GUI changes as well as humans can, and because machines don't look around and explore. Potentially your automated tests will cost more than your manual tests, and find fewer bugs.

So should we stick with our manual test suite then? No, of course not. The value of automated tests is not simply that you can run them more cheaply than manual tests, it is that you can run them more often - at every build, constantly supplying developers with valuable feedback rather than just at the end of the release cycle. It is this kind of feedback that enables refactoring, and lets developers build quality code from the start. That is their real gain over manual tests.

Bob Martin's suggestion is that you shouldn't rely on expensive GUI tests for this kind of feedback - only perhaps 15% of your tests should be GUI reliant. The rest run against some kind of api, which is less volatile and hence cheaper to maintain. With the kinds of tools Bob I suspect has been using for GUI testing I'm not surprised he says this. I just think that with tools like PyUseCase and TextTest the costs are much reduced, and call for reconsideration of this ratio. Looking at Geoff's self tests for TextTest (a GUI intensive tool), around half are testing through the GUI, using pyUseCase. Basically I don't think GUI tests have to be as bad and expensive as Bob makes out.

Thursday 3 September 2009

First JDojo@Gbg meeting

A little while ago we had the first meeting of our new coding dojo here in Göteborg. We are focussing on learning Test Driven Development using Java and Eclipse. I was very encouraged that two of my colleagues, Fredrik and Martin, volunteered to help organize the group. There was actually quite a lot of interest generally, and we filled all 12 places and even have a (small) waiting list. I didn't want the group to grow too big, since the dojo style of learning should be quite participatory, and the time slot is only 2 hours. Everyone should get a chance to be heard, and to take the keyboard.

At the meeting I introduced the dojo concept with a set of slides I have used before. At dojo meetings our focus should be on deliberate practice, aquiring good coding habits, mutual encouragement and feedback.

We then took on KataFizzBuzz which went very smoothly. I started by introducing the Kata, using this picture of the "teacher" pointing at you, asking you to say the next number in the FizzBuzz sequence. She is sufficiently scary looking that you definitely need to write a program to print a FizzBuzz cheat sheet before the next lesson!


I also introduced something I havn't done before at a dojo meeting - starting with some code rather than a blank editor. I had the acceptance test for the Kata already coded up and failing. When I practiced this Kata I realized the hardest part was writing the acceptance test, which captures the sequence that is written to System.out. I could have begun the meeting by written it in front of the audience, but I really wanted to get them coding, not just watching me.

Martin wrote the first unit test, fizzbuzz(1) -> [1] and I noticed that his style is slightly different from mine. He fixed all the compiler errors as he went along, whereas I would tend to leave them all until I finish the test, or at least until I want to run it. Maybe that is because he has worked in Java/Eclipse longer than me, and that is the way Eclipse likes you to work. Anyway, I then implemented the code to make the test pass (fake it!) and wrote the next test fizzbuzz(2) -> [1,2]. So then he had to write just enough code to make it pass (a simple loop).

Then we handed the keyboard to two members of the audience, and Martin and I sat down in their chairs, and the Randori was really underway. We continued with this pair and two others using this ping-pong style until after about an hour we had completed the first part of the Kata - printing out the basic fizzbuzz sequence up to 100.

I suggested that the pair at the front try to run the acceptance test, which they did, and it failed. The reason was that the unit tests had been testing an internal method fizzbuzz() and the acceptance test checked that when you call main() you get the right sequence written to System.out. It was at this point I wondered if I had made the right decision when I wrote the acceptance test in advance, since that meant the guy at the keyboard clearly didn't really understand what it was for. His first thought of how to make it pass was to change it to call fizzbuzz() instead of main(), until I stopped him - "No! don't change the test! Fix the code!". I felt like I was rapping him over the knuckles with a ruler (something I am thankful my Maths teacher never did).

Towards the end of the meeting so we held a 10 minute retrospective. People seemed cautiously positive towards TDD and the dojo in general, but I think they maybe still getting used to the format and working out whether it is "ok" to be openly critical. I hope for more dissent, discussion and group learning next time.

Friday 28 August 2009

Testing TextTest

Geoff has just put up a couple of new pages on the texttest website, with some coverage statistics for his self tests. He uses coverage.py to produce this report which shows all the python modules in texttest, and marks covered statements in green. I think it's pretty impressive - he's claiming over 98% statement coverage for the over 17 000 lines of python code in texttest.

I had a poke around looking for some numbers to compare this to, and found on this page someone claiming Fitnesse has 94% statement coverage from its unit tests, and the Java Spring framework has 75% coverage. It's probably unwise to compare figures for different programming languages directly, but it gives you an idea.

Geoff also publishes the results of his nightly run of self tests here. It looks a bit complicated, but Geoff explained it to me. :-) He's got nearly 2000 tests testing texttest on unix, and about 900 testing it on windows. As you can see, the tests don't always pass, some are annoying ones that fail sporadically, some are due to actual bugs, which then get fixed. So even though he rarely has a totally green build, the project looks healthy overall, with new tests and fixes being added all the time.

Out of those 3000 odd tests that get run every night, Geoff has a core of about 1000 that he will run before every significant check-in. Since they run in parallel on a grid, they usually take about 2 minutes to execute. (When he has to run them at home in series on our fairly low spec linux laptop they take about half an hour.)

Note that we aren't talking about unit tests here, these are high level acceptance tests, running the whole texttest system. About half of them use PyUseCase to simulate user actions in the texttest GUI, the rest interact with the command line interface. Many of the tests use automatically generated test doubles to simulate interaction with 3rd party systems like version control, grid engines, diff programs etc.

Pretty impressive, don't you think? Well I'm impressed. But then I am married to him so I'm not entirely unbiased :-)

Monday 17 August 2009

Domain Specific Languages for Selenium tests

I've been doing some work lately creating automated functional test suites using Selenium RC to simulate user interaction with a web GUI. I discovered quickly that the tests you record directly from selenium are rather brittle, and hard to read. In order to make the tests more robust and readable, I have been extracting reusable chunks of script that make sense from the user perspective, into separate methods. For example when testing a page for registering a new provider, you might have a ProviderPage domain class, with method "createNewProvider". This method encapsulates all the selenium calls that interact with the page, and lets your test be written in terms of the domain.

I just saw this article from Patrick Wilson Welsh basically saying the same thing, only his DSL has three layers of indirection instead of just two. As well as encapsulating page operations in a Page class, he encapsulates operations on widgets within a page. I hadn't thought of doing that. It makes the code in the Page class more readable. I might try that, and see if it improves my code.

Wednesday 5 August 2009

What other people have written about new dojos

Gathering ideas for my new dojo :-)

Ivan Sanchez wrote about starting a coding dojo, and he rekons a Randori is best with 10 people or less. We will be more than 10 at JDojo@gbg. He suggests a prepared kata in that case. That might be possible. His favourite starting kata is KataMinesweeper.

Danilo Sato wrote about how to find suitable Katas, and suggests several for beginning dojos, including KataMinesweeper.

Gary Pollice wrote an article about what a coding dojo is, which is quite well explained, but doesn't give any specific advice for new dojos.

The guys running the finnish dojo have a similar article about what a coding dojo is, and some rules. They put a maximum of 15 participants on their randori. They also introduce "iterations" of 30 minutes, and spend 5 minutes planning in between.

Lots of ideas to think about, anway.

Monday 3 August 2009

a new dojo - JDojo@Gbg

I'm planning to start a new dojo this autumn, called JDojo@Gbg. I was inspired by the guys at Responsive in Linköping, who I met at XP2009. They have been running a dojo for some time now, and find it is an excellent way to introduce programmers from their clients to the ideas of Test Driven Development. I think we could do with more test infected programmers about the place in Göteborg, too.

I already run a dojo as part of GothPy, and Got.rb also runs regular Kata/dojo evenings, but because those programming languages are not mainstream, many developers wouldn't consider coming along. That is why the new dojo is explicitly going to use Java, or at least, the JVM platform.

I'm thinking about what Katas we are going to tackle at the new dojo, and last night I had a go at KataFizzBuzz in Java. It is an extremely simple problem to solve, and initially I thought it was too easy to be a Kata actually. Then at agile2008 I was looking around for a Kata that Michael Feathers and I could perform in 4 minutes for the "Programming with the Stars" competition, and it seemed to fit the bill. I was quite pleased we got done in that short amount of time (in python of course :-)

A couple of people have commented that this Kata is actually quite good for teaching TDD, just because it is so simple to solve. People are forced to think about TDD instead of the problem. It can easily be made more interesting by adding new requirements too. So I think I might try it out at JDojo@Gbg.

Thursday 2 July 2009

Programming, History and Bletchley Park

Today at europython we listened to a keynote about Bletchley Park. This was the centre of British and allied codebreaking activities during the second world war, and where the first digital, programmable computer was built, Colossus. We heard about the current financial plight of the museum there, and the need for investment to renovate the huts that amongst others Alan Turing worked in. Dr Sue Black told us about her experiences trying to help lobby the government for more money for Bletchley park, using social networking, blogs and twitter. She recounted that she had recently met an elderly gentleman, one of the surviving codebreakers. She told us how close she felt to history when he related a story about when he was decoding a nazi message during the war, and his shock when he got to the end and discovered the message was signed “Adolf Hitler, Fuhrer”.

As a professional programmer, I think the site where the first digitally programmable computer was built has to be a place worth preserving. I hope that people will be able to visit there and see the reconstructed Colossus computer and be inspired by the stories of innovation and codebreaking that it enabled.

It was particuarly poignant for me to think about this when in the next session I checked my email and found a message from my mother saying that my grandmother died this morning. She was a living link to the history of the second world war for me. During the war she was a wireless operator, transmitting and receiving messages in morse code. And now she is not there any more. I am kind of in shock. But it just confirms for me that we need museums like the one at Bletchley Park to retain contact with our history.

(I wrote this post yesterday)

Wednesday 3 June 2009

XP2009

I had a fantastic time at XP2009 in Sardinia, but I didn't find time to blog much about it until now. I just wrote a report of my TDD workshop on jsolutions.se.

Monday 25 May 2009

Lean with lego

I'm at XP2009 this week, the sun is shining, the sea is blue and the italians are disorganized. I'm having a great time so far. This morning I was at a workshop led by Danilo Sato and Francisco Trindade - 'The lego lean game'. The idea was that we should learn about lean principles by playing with lego. I thought it was good fun, and we did learn quite a bit about pull, kanban and continuous improvement. I hope I will be able to run this simulation for some of my colleagues, I think it is a good practical way of learning using all the senses, not just reading books or listening to presentations. Much more kinesthetic.

Wednesday 13 May 2009

caffeine, the science and the propaganda

I usually follow the news in the agile world on infoq, and I like the feature whereby you can listen to selected conference presentations online. This week I was making some biscuits one evening, so I took the chance to listen to Linda Rising talking about "agility: possibilities at a personal level". I have to say I was rather disappointed. I think her material was not that relevant to agile in the first place, many of her claims lacked credibility, and although the talk was superficially entertaining, it did not supply useful insights or conclusions. This provoked me into leaving a comment on the infoq site. I wonder if anyone will notice.

Monday 11 May 2009

Europython

I'm looking forward to Europython in Birmingham at the end of June. Geoff and I are going to be rather busy at it. They've just published the programme, and between us we are holding 5 sessions. I'm running a "coder's dojo", a "clean code challenge", and Geoff and I are doing a tutorial on texttest together. Geoff is also giving talks about texttest and pyUseCase.

The coder's dojo session is a copy of the original XP2005 workshop by Laurent Bossavit and Emmanual Gaillot, only in python with different Katas. The structure is the same though - introduction, prepared Kata, randori, retrospective. I thought it worked really well in 2005, so why change a winning format?

The clean code challenge is an idea I came up with. I've written on this blog before about KataArgs, and my dodgy python translation of Bob Martin's code. I'm interested to know what the python community will make of it. I'm basically planning to throw this code out to anyone who turns up, and ask them to refactor it into better python. I'm of course hoping they will produce some innovative, beautifully pythonic solutions, and show me what clean code looks like in python.

The tutorial on texttest is essentially similar to the one Geoff and I did at Europython in 2005. It's longer though, a half day, and builds on all the experience we have had since, doing tutorials at XP2006 and agile2008 for example.

All in all, I hope we'll have some time and energy left to go to the other items in the programme. It looks like being a busy conference.

Wednesday 6 May 2009

TDD performer artists announced

As I wrote in a previous post, I am organising a workshop at XP2009 called "Test Driven Development: Performing Art". I am very pleased to announce that I have four pairs of expert programmers willing to perform prepared Katas at it! You can read about it here. I am especially pleased that all the performers are experienced coders with previous involvement in coding dojos in places such as Stockholm, Linköping, London, São Paulo and Helsinki. I think we're going to have a great afternoon on 27th May.

Tuesday 21 April 2009

They learn so fast and they are so cheap

Recently I've had the priviledge of working with a team of developers where I sit in the same room as half of them, and the other half are in China. My role is to help them to develop a suite of automated system tests alongside the production code. After a few month's work, we now have quite a substantial product, with quite a substantial test suite.

When we started, very few of the developers had written much in the way of system tests, and even fewer knew how to write good, maintainable ones. Over the weeks, I have been promoting practices to enhance test readability, reviewing test code, and pointing out areas that need better coverage.

I've noticed that with the local developers, reviews and feedback are usually conducted face to face, informally, whereas with the offshore developers, it all goes via email, with a substantial time delay. This has meant that the Swedish developers have learnt faster, since they benefit from shorter feedback cycles, and a richer form of communication. Having said that, the Chinese developers are doing nearly as well. They seem really motivated to deliver what I ask for, and keep requesting and responding to feedback until they have written what I consider to be some pretty good tests.

It's not all sweetness and light, however. As much as learning the technical skills of writing tests, the team needs to learn the culture of maintaining them. The CI server complains the build is broken far too often, and it is because the developers generally are not running the tests before they check in. My perception is that the offshore developers are worse at this, and my interpretation is not that they are somehow less good developers, far from it. I think that they just don't have the same management support to spend time on maintaining the tests as the onshore ones.

Management in Sweden has really bought into the idea that investing in automated tests pays off over the long term, and vigorously support me in discussions with recalcitrant developers. Management in China has not. My impression is that they see only the costs associated with writing, running and maintaining automated tests, and would rather hire some (ridiculously cheap) Chinese students to run manual tests instead.

I would like to believe that this automated test suite is a really good investment for the future of this product. My experience tells me it should enable regression bugs to be found very soon after insertion, and enable much more frequent product releases. (You don't have to wait for a 6 week manual test cycle before each release). Over the many year lifetime of the product, this should significantly outweigh the initial investment we have made creating it, and the ongoing costs of keeping it running.

The reality may be quite different. Future versions of the product will likely be developed entirely in China, and I suspect that without their Swedish colleagues' enthusiasm, the Chinese management might decide the test suite should be quietly dismantled and left to rot. That may be the right economic decision, although it makes me weep to think of it. All I can do is console myself with the thought that at least the tests are so readable they will be easy to convert into manual test cases detailed enough for dirt cheap unskilled Chinese students to perform.

Thursday 2 April 2009

Call for participation in workshop at XP2009

I'm looking for some really good coders. People who can write outstanding code, and yet know that however good they are they can always get a little bit better with a little practice, feedback and reflection. The kind of coders that attend a coding dojo and practice on code katas.

Interested? Could be persuaded to attend XP2009? Take a look at this call for participation.

Tuesday 31 March 2009

Scandinavian Developers Conference

At the speakers dinner the night before the conference:

Ola Bini: "Do you have any actual code examples in your talk about clean code tomorrow?"
Me: "No"
Ola Bini: "Well, I'm sorry but that means I can't come and listen to it"

Not such an auspicious start perhaps, but fortunately about 125 other conference participants didn't seem to mind the lack of actual code, and did turn up for my talk. Some of them even blogged favourably about it. To my surprise, some guy came up to me afterwards and said he helped organize the JFokus conference, and did I have a Java talk I could give at it?

It was a lot of work preparing my presentation, and I got some really useful feedback from the two practice runs I did, at GothPy and for my colleagues at IBS. It was this feedback that prompted me to take out all the code examples I originally had in the presentation, actually.

Overall the conference seemed to go really well. There were about 450 participants, about 40 speakers, and 6 parallel tracks. I attended some great sessions, too but I'll leave a summary of them to another post.

Just in case you were wondering, I didn't go to Ola Bini's talk either ;-)

Wednesday 25 February 2009

Java-ish python

I am really interested to find out more about this concept of "clean code", and in particular how it relates to programming language. To this end, I'm still chewing on KataArgs.

My latest idea is to start from Bob Martin's Java implementation, and translate this as directly as possible into python. My idea is to then refactor it to be more pythonic, and see if it turns out looking anything like his Ruby implementation.

I have put up some code on launchpad, which is my attempt at a direct translation of the Java. It was really interesting to do, actually. Of course I had read the Java before, and followed along in the book all the steps to create it, but actually translating it made me understand the code on another level. When I tackled this Kata from scratch, I also got a much better understanding of it, but this was different.

One thing that jumped out at me was the error handling. It's much more comprehensive than anything I've produced in my own solutions, and also compared with his Ruby solution. So I think it's a bit misleading of him to say "I recently rewrote this in Ruby and it was 1/7 the size". Of course it is smaller. It does less. Although to be fair, in a way it does more too...

One thing I found awkward to translate was the use of the enum for the list of error codes. Python doesn't have a direct equivalent, being dynamic as it is. The other awkwardness was the Java iterator. In python iterators are built into the language, but don't let you backtrack, or get the current index, unlike Java ones. I was surprised to find how extensively the tests rely on this functionality. To my mind, they probe the internals of the args parser too much.

By far my most interesting observation, though, is the one I want to explore more. This code is well written Java, but directly translated, it makes pretty poor python. Why is that? What, specifically, are the smells, and what are the antidote refactorings?

I will no doubt post more on what I find, (with the help of my friends at GothPy, of course)

fun with KataArgs

I'm having fun with this KataArgs. In my last post, I took a closer look at Bob Martin's Java and Ruby solutions to it. Since then, we have tackled this Kata at a couple of GothPy meetings. (My coding dojo; the code is here.)

Several of us did some more work on the kata individually after the meetings, and a lively discussion on our mailing list ensued. I also challenged the local Ruby user group Got.rb to have a go at it, and one person posted his solution there too.

It's all good fun, anyway, and hopefully we're all learning something about what we mean by "clean code" along the way.

Wednesday 21 January 2009

KataArgs and clean code

Over Christmas I finished reading the book "Clean Code" by Robert C. Martin. I thoroughly recommend the book, which is highly practical, technical and well written. In it, Bob seeks to present the "Object Mentor school of clean code", as he puts it, "in hideous detail".

The book is full of code examples, clean and less clean, and detailed advice about how to transform the latter into the former. All the examples are written in Java, though, which leaves me wondering a little if "clean code" in the Object Mentor meaning of the word, looks the same in other languages.

In Chapter 14 of the book, there is a fully worked example of a little coding problem that I would call a code Kata. It's a little program for parsing command line arguments. I know, there are loads of libraries that do this already. But never mind. It's a non trivial problem yet small enough to code up fairly quickly. One thing that caught my attention was the footnote on page 200, just after he has presented his best Java solution to the Kata. "I recently rewrote this module in Ruby. It was 1/7th the size and had a subtly better structure." So where is the code, Bob? What is this subtly better structure?

I had nothing better to do on Boxing Day than sit around and digest leftover-turkey-curry, so I sent a little mail to Bob asking him for the code. To my delight, I got a mail back only a few hours later, with a message that I was welcome to it, and the url to where he'd put it on github. Evidently Bob also had time on his hands on Boxing Day.

I have had a look at the Ruby code, and although my Ruby is fairly ropey, I think I can follow what it does (surely a sign of clean code?). The design is very similar to the Java version presented in the book, with a couple of finesses. (The next part of the post will make most sense if you first look at Bob's Java version and Ruby version).

The first finesse I spotted, is that the Ruby version defines the argument "schema" in a much more readable fashion. Rather than "l,p#,d*" as in the Java version, it reads


parser = Args.expect do
boolean "l"
number "p"
string "d"
end

ie the program expects three flags, l,p, and d, indicating a boolean, number and string respectively. You can do this in Ruby but not Java because the language allows you to pass a code block to a method invocation. (the stuff between "do" and "end" is a code block, and the class "Args" has a method "expect") I think the Ruby version is rather more readable, don't you?

The second finesse I can see is that the argument marshallers dynamically add themselves to the parser, rather than being statically declared as in the Java version. This means that if you discover a new argument type that you want to support, in the Java version you have to crack open the code for Args.java and add a new case statement in the "parseSchemaElement" method, as well as adding the new argument marshaller class. In the Ruby version, you just add the new class, no need to modify an existing one. Bob invented the Open-Closed principle, so I guess it's not so surprising to see him following it :-)

So in Args.java:


private void parseSchemaElement(String element)
throws ArgsException {
char elementId = element.charAt(0);
String elementTail = element.substring(1);

// long if/else statement to construct all the marshallers
// cut for brevity
[...]
else if (elementTail.equals("#"))
marshallers.put(elementId, new IntegerArgumentMarshaller());
else if (elementTail.equals("*"))
[...]

or in the Ruby code, each marshaller just tells the parser to add itself:


class NumberMarshaler
Parser.add_declarator("number", self.name)
[...]

in the Parser class:

def self.add_declarator(name, marshaler)
method_text = "def #{name}(args) declare_arguments(args, #{marshaler}) end"
Parser.module_eval(method_text)
end

def declare_arguments(args, marshaler)
args.split(",").each {|name| @args[name] = marshaler.new}
end


You can do this in Ruby but not Java, since in Ruby you can dynamically construct and execute arbitrary strings as code, and add methods to classes at runtime. (The string declared as "method_text" is constructed with the details of the new marshaler, then executed as Ruby code in the next line, by Parser.module_eval) This is an example of metaprogramming.

So it seems to me that the "subtly better structure" that Bob refers to in his footnote, is made possible by powerful language features of Ruby, such as metaprogramming and closures.

Of course my favourite programming language is Python, which also has these powerful language features. I am rather interested to see if I can come up with an equally clean solution in Python. I am also interested if any hotshot Java or Ruby programmers out there can improve on Bob's solutions. To this end, I have added a description of this Kata to the catalogue on
codingdojo.org. We had a go at it at our last GothPy meeting, without any great success, although I hope we might do better at a future meeting.

So please have a go at KataArgs and see if you can write some really really clean code. Do let me and the community on codingdojo.org know how you get on!

Wednesday 14 January 2009

Behaviour Driven Development at agile2008

At agile2008 I attended a session with Dan North about Behaviour Driven Development. Someone on the agile sweden mailing list was asking about it, so I decided to write up my notes here.

Most cellphone and computer software is delivered late and over budget. The biggest contributing factor to cost bloat is building the wrong thing. So what software and business people need is "a shared understanding of what done looks like".

Test Driven Development is about design, conversations, and writing examples for a system that doesn't yet exist. It's not really about testing. However, once the system exists, your examples turn into tests, as a rather useful side effect.

A User Story is a promise of a conversation, and it is in that conversation that things go wrong. The customer and developer rarely agree what "enough" and "done" look like, which leads to over- or under- engineering.

Dan suggests a format for User Story cards which aims to prevent this communication gap.

On the front of the User Story index card, you have the title and narrative. The narrative consists of a sentence in this format:

As a stakeholder
I want feature
so that benefit

where benefit is something of value to stakeholder.

On the back of the card, you have a table with three columns

Given this context | When I do this | then this happens

Then you have 4 or 5 rows in the table, each detailing a scenario. (If you need more than that then the story is too big and should be split)

Dan finds that in his work, this leads to conversations about User Stories where "done" and "enough" are discussed, and defined.

User Stories should be about activities, not features. In order to check that your User Story is an activity, you should be able to do a thought experiment where you implement the story as a task to be performed by people on rollerblades with paper. You must think about it as a business process, not a piece of software.

When creating the story cards, the whole team should be involved, but it is primarily the business/end user stakeholders and business analysts who write the title and narrative on the cards. They then take in a tester to help them to write the scenarios.

Are people familiar with the V model of software testing? When this was conceived, they thought that the whole process would take 2 years, and span the whole project. Dan ususally does it in 2 days. Many times for each project.

Then Dan offered to show us how to do BDD using plain JUnit. He requested a pair from the audience, so I volunteered. At this point my notes dry up, and I am working from memory, but I think the general idea is like this.

You talk about "behaviour specs" not tests. The words you use influence the way you think, and "behaviour specification" gives much better associations than "tests".

Each behaviour specification should be named to indicate the behavour it is specifying. Not "testCustomerAccountEmpty" rather "customerAccountShouldBeEmpty".

In the body of the spec, you can start out by typing in the prose of one of the scenarios you have on the user story, as a comment.

//given we have a flimble containing a schmooz
// when we request the next available frooble
// then we are given a half baked frooble and the schmooz.

Then you can fill in code after the "given" comment. When you have code that does what the comment says, delete the comment. Repeat with the "when" and "then" comments.

In this way, you build up a behaviour specification that drives your development of the system. A few minutes later (hopefully) you have a system which implements the specification, and at that point your spec helpfully turns magically into a regression test which you can run. At that point you can start calling it a test if you like. But actually it is more helpful to your brain to continue to think of it as a behaviour specification. It leads to much more constructive conversations about the system.

Saturday 10 January 2009

'Agile'? or just 'more agile than ...'?

The project manager was obviously a little irritated. "By this stage of the pre-study, we really should have the requirements a little more nailed down than this. That's two new requirements since last week. Or what does everyone think?" he looks around the table, expecting support. The business representative looks disappointed and hangs his head.

Without really thinking, I follow my instinct and jump in. "It seems wrong to me to say to the customer that he can't have any more features at this stage. Perhaps he should be prepared to drop something else instead, but surely we should be prepared to respond to the needs of the business? We are still in the pre-study, after all"

The project manager was not impressed. "This pre-study has already been going on for nearly 2 months, we really need to put together a budget and project plan so we can pass the next tollgate and proceed to the implementation phase." He looked around the table sternly, then his frown softened a little. "Well ok, you can have these two new requirements. Let's take a look at them"

A few days later, the project manager called me into his office.
- "You can't go telling the customer he's allowed to change his mind. After that meeting he came to me with more requirements. Your comments confused him as to how our software development process works. Please don't do it again".
- "But I though we were trying to be agile in this project? You know, short release cycles, end of iteration demos, like in Scrum".
- "Yes, we are following a process which is more agile than the standard company mandated process. We have got permission to skip two of the tollgates, and 5% extra in the budget for 'adjustments' so we can respond to the customer feedback we get at each demo. We are not doing Scrum though. Who told you that?"

Sigh. I heard "agile". What I should have heard was "more agile than the standard company process".