September 16, 2012

Test Driven Traps, part 1

Have you ever been in a situation, where a simple change of code, broke a few hundred tests? Have you ever had the idea that tests slow you down, inhibit your creativity, make you afraid to change the code. If you had, it means you've entered the Dungeon-of-very-bad-tests, the world of things that should not be.

I've been there. I've built one myself. And it collapsed killing me in the process. I've learned my lesson. So here is the story of a dead man. Learn from my faults or be doomed to repeat them.

The story


Test Driven Development, like all good games in the world, is simple to learn, hard to master. I've started in 2005, when a brilliant guy named Piotr Szarwas, gave me the book “Test Driven Development: By Example” (Kent Beck), and one task: creating a framework.

These were the old times, when the technology we were using had no frameworks at all, and we wanted a cool one, like Spring, with Inversion-of-Control, Object-Relational Mapping, Model-View-Controller and all the good things we knew about. And so we created a framework. Then we built a Content Management System on top of it. Then we created a bunch of dedicated applications for different clients, Internet shops and what-not, on top of those two. We were doing good. We had 3000+ tests for the framework, 3000+ tests for the CMS, and another few thousand for every dedicated application. We were looking at our work, and we were happy, safe, secure. These were good times.

And then, as our code base grew, we came to the point, where a simple anemic model we had, was not good enough anymore. I had not read the other important book of that time: “Domain Driven Design”, you see. I didn't know yet, that you can only get so far with an anemic model.

But we were safe. We had tons of tests. We could change anything.

Or so I thought.

I spent a week trying to introduce some changes in the architecture. Simple things really: moving methods around, switching collaborators, such things. Only to be overwhelmed by the number of tests I had to fix. That was TDD, I started my change with writing a test, and when I was finally done with the code under the test, I'd find another few hundred tests completely broken by my change. And when I got them fixed, introducing some more changes in the process, I'd find another few thousand broken. That was a butterfly effect, a chain reaction caused by a very small change.

It took me a week to figure out, that I'm not even half done in here. The refactoring had no visible end. And at no point my code base was stable, deployment-ready. I had my branch in the repository, one I've renamed "Lasciate ogne speranza, voi ch'intrate".

We had tons and tons of tests. Of very bad tests. Tests that would pour concrete over our code, so that we could do nothing.



The only real options were: either to leave it be, or delete all tests, and write everything from scratch again. I didn't want to work with the code if we were to go for the first option, and the management would not find financial rationale for the second. So I quit.

That was the Dungeon I built, only to find myself defeated by its monsters.

I went back to the book, and found everything I did wrong in there. Outlined. Marked out. How could I skip that? How could I not notice? Turns out, sometimes, you need to be of age and experience, to truly understand the things you learn.

Even the best of tools, when used poorly, can turn against you. And the easier the tool, the easier it seems to use it, the easier it is to fall into the trap of I-know-how-it-works thinking. And then BAM! You're gone.


The truth


Test Driven Development and tests, are two completely different things. Tests are only a byproduct of TDD, nothing more. What is the point of TDD? What does TDD brings? Why do we do TDD?

Because of three, and only those three reasons.

1. To find the best design, by putting ourselves into the user's shoes.

By starting with “how do I want to use it” thinking, we discover the most useful and friendly design. Always good, quite often that's the best design out there. Otherwise, what we get is this:



And you don't want that.

2. To manage our fear.

It takes balls, to make a ground change in a large code-base without tests, and say “it's done” without introducing bugs in the process, doesn't it? Well, the truth is, if you say “it's done”, most of the time you are either ignorant, reckless, or just plain stupid. It's like with concurrency: everybody knows it, nobody can do it well.

Smart people are scared of such changes. Unless they have good tests, with high code coverage.

TDD allows to manage our fears, by giving us proof, that things work as they should. TDD gives us safety

3. To have fast feedback.

How long can you code, without running the app? How long can you code without knowing whether your code works as you think it should?

Feedback in tests is important. Less so for frontend programming, where you can just run the shit up, and see for yourselves. More for coding in the backend. Even more, if your technology stack requires compilation, deployment, and starting up.

Time is money, and I'd rather earn it, than wait for the deployment and click through my changes each time I make them.

And that's it. There are no more reasons for TDD whatsoever. We want Good Design, Safety, and Feedback. Good tests are those, which give us that.

Bad tests?

All the other tests are bad.

The bad practice


So how does a typical, bad test, look like? The one I see over and over, in close to every project, created by somebody who has yet to learn how NOT to build an ugly dungeon, how not to pour concrete over your code base. The one I'd write myself in 2005.

This will be a Spock sample, written in groovy, testing a Grails controller. But don't worry if you don't know those technologies. I bet you'll understand what's going on in there without problems. Yes, it's that simple. I'll explain all the not-so-obvious parts.

def "should show outlet"() {
  given:
    def outlet = OutletFactory.createAndSaveOutlet(merchant: merchant)
    injectParamsToController(id: outlet.id)
  when:
    controller.show()
  then:
    response.redirectUrl == null
}

So we have a controller. It's an outlet controller. And we have a test. What's wrong with this test?

The name of the test is “should show outlet”. What should a test with such a name check? Whether we show the outlet, right? And what does it check? Whether we are redirected. Brilliant? Useless.

It's simple, but I see it all around. People forget, that we need to:

VERIFY THE RIGHT THING

I bet that test was written after the code. Not in test-first fashion.

But verifying the right thing is not enough. Let's have another example. Same controller, different expectation. The name is: "should create outlet insert command with valid params with new account"

Quite complex, isn't it? If you need an explanation, the name is wrong. But you don't know the domain, so let me put some light on it: when we give the controller good parameters, we want it to create a new OutletInsertCommand, and the account of that one, should be new.

The name doesn't say what 'new' is, but we should be able to see it in the code.

Have a look at the test:

def "should create outlet insert command with valid params with new account"() {
  given:
    def defaultParams = OutletFactory.validOutletParams
    defaultParams.remove('mobileMoneyAccountNumber')
    defaultParams.remove('accountType')
    defaultParams.put('merchant.id', merchant.id)
    controller.params.putAll(defaultParams)
  when:
    controller.save()
  then:
    1 * securityServiceMock.getCurrentlyLoggedUser() >> user
    1 * commandNotificationServiceMock.notifyAccepters(_)
    0 * _._
    Outlet.count() == 0
    OutletInsertCommand.count() == 1
    def savedCommand = OutletInsertCommand.get(1)
    savedCommand.mobileMoneyAccountNumber == '1000000000000'
    savedCommand.accountType == CyclosAccountType.NOT_AGENT
    controller.flash.message != null
    response.redirectedUrl == '/outlet/list'
}

If you are new to Spock: n*mock.whatever(), means that the method “whatever” of the mock object, should be called exactly n times. No more no less. The underscore “_” means “everything” or “anything”. And the >> sign, instructs the test framework to return the right side argument when the method is called.

So what's wrong with this test? Pretty much everything. Let's go from the start of “then” part, mercifully skipping the oververbose set-up in the “given”.

1 * securityServiceMock.getCurrentlyLoggedUser() >> user

The first line verifies whether some security service was asked for a logged user, and returns the user. And it was asked EXACTLY one time. No more, no less.

Wait, what? How come we have a security service in here? The name of the test doesn't say anything about security or users, why do we check it?

Well, it's the first mistake. This part is not, what we want to verify. This is probably required by the controller, but it only means it should be in the “given”. And it should not verify that it's called “exactly once”. It's a stub for God's sake. The user is either logged in or not. There is no sense in making him “logged in, but you can ask only once”.

Then, there is the second line.

1 * commandNotificationServiceMock.notifyAccepters(_)

It verifies that some notification service is called exactly once. And it may be ok, the business logic may require that, but then... why is it not stated clearly in the name of the test? Ah, I know, the name would be too long. Well, that's also a suggestion. You need to make another test, something like “should notify about newly created outlet insert command”.

And then, it's the third line.

0 * _._

My favorite one. If the code is Han Solo, this line is Jabba the Hut. It wants Hans Solo frozen in solid concrete. Or dead. Or both.

This line, if you haven't deducted yet, is “You shall not make any other interactions with any mock, or stubs, or anything, Amen!”.

That's the most stupid thing I've seen in a while. Why would a sane programmer ever put it here? That's beyond my imagination.

No it isn't. Been there, done that. The reason why a programmer would use such a thing is to make sure, that he covered all the interactions. That he didn't forget about anything. Tests are good, what's wrong in having more good?

He forgot about sanity. That line is stupid, and it will have it's vengeance. It will bite you in the ass, some day. And while it may be small, because there are hundreds of lines like this, some day you gonna get bitten pretty well. You may as well not survive.

And then, another line.

Outlet.count() == 0

This verifies whether we don't have any outlets in the database. Do you know why? You don't. I do. I do, because I know the business logic of this domain. You don't because this tests sucks at informing you, what it should.

Then there is the part, that actually makes sense.

    OutletInsertCommand.count() == 1
    def savedCommand = OutletInsertCommand.get(1)
    savedCommand.mobileMoneyAccountNumber == '1000000000000'
    savedCommand.accountType == CyclosAccountType.NOT_AGENT

We expect the object we've created in the database, and then we verify whether it's account is “new”. And we know, that the “new” means a specific account number and type. Though it screams for being extracted into another method.

And then...

    controller.flash.message != null
    response.redirectedUrl == '/outlet/list'

Then we have some flash message not set. And a redirection. And I ask God, why the hell are we testing this? Not because the name of the test says so, that's for sure. The truth is, that looking at the test, I can recreate the method under test, line by line.

Isn't it brilliant? This test represents every single line of a not so simple method. But try to change the method, try to change a single line, and you have big chance to blow this thing up. And when those kinds of tests are in the hundreds, you have concrete all over you code. You'll be able to refactor nothing.

So here's another lesson. It's not enough to verify the right thing. You need to

VERIFY ONLY THE RIGHT THING.

Never ever verify the algorithm of the method step by step. Verify the outcomes of the algorithm. You should be free to change the method, as long as the outcome, the real thing you expect, is not changed.

Imagine a sorting problem. Would you verify it's internal algorithm? What for? It's got to work and it's got to work well. Remember, you want good design and security. Apart from this, it should be free to change. Your tests should not stay in the way.

Now for another horrible example.

@Unroll("test merchant constraints field #field for #error")
def "test merchant all constraints"() {
  when:
    def obj = new Merchant((field): val)

  then:
    validateConstraints(obj, field, error)

  where:
    field                     | val                                    | error
    'name'                    | null                                   | 'nullable'
    'name'                    | ''                                     | 'blank'
    'name'                    | 'ABC'                                  | 'valid'
    'contactInfo'             | null                                   | 'nullable'
    'contactInfo'             | new ContactInfo()                      | 'validator'
    'contactInfo'             | ContactInfoFactory.createContactInfo() | 'valid'
    'businessSegment'         | null                                   | 'nullable'
    'businessSegment'         | new MerchantBusinessSegment()          | 'valid'
    'finacleAccountNumber'    | null                                   | 'nullable'
    'finacleAccountNumber'    | ''                                     | 'blank'
    'finacleAccountNumber'    | 'ABC'                                  | 'valid'
    'principalContactPerson'  | null                                   | 'nullable'
    'principalContactPerson'  | ''                                     | 'blank'
    'principalContactPerson'  | 'ABC'                                  | 'valid'
    'principalContactInfo'    | null                                   | 'nullable'
    'principalContactInfo'    | new ContactInfo()                      | 'validator'
    'principalContactInfo'    | ContactInfoFactory.createContactInfo() | 'valid'
    'feeCalculator'           | null                                   | 'nullable'
    'feeCalculator'           | new FixedFeeCalculator(value: 0)       | 'valid'
    'chain'                   | null                                   | 'nullable'
    'chain'                   | new Chain()                            | 'valid'
    'customerWhiteListEnable' | null                                   | 'nullable'
    'customerWhiteListEnable' | true                                   | 'valid'
    'enabled'                 | null                                   | 'nullable'
    'enabled'                 | true                                   | 'valid'
}

Do you understand what's going on? If you haven't seen it before, you may very well not. The “where” part, is a beautiful Spock solution for parametrized tests. The headers of those columns are the names of variables, used BEFORE, in the first line. It's sort of a declaration after the usage. The test is going to be fired many times, once for for each line in the “where” part. And it's all possible thanks to Groovy's Abstract Syntaxt Tree Transofrmation. We are talking about interpreting and changing the code during the compilation. Cool stuff.

So what this test is doing?

Nothing.

Let me show you the code under test.

static constraints = {
  name(blank: false)
  contactInfo(nullable: false, validator: { it?.validate() })
  businessSegment(nullable: false)
  finacleAccountNumber(blank: false)
  principalContactPerson(blank: false)
  principalContactInfo(nullable: false, validator: { it?.validate() })
  feeCalculator(nullable: false)
  customerWhiteListEnable(nullable: false)
}

This static closure, is telling Grails, what kind of validation we expect on the object and database level. In Java, these would most probably be annotations.

And you do not test annotations. You also do not test static fields. Or closures without any sensible code, without any behavior. And you don't test whether the framework below (Grails/GORM in here) works the way it works.

Oh, you may test that for the first time you are using it. Just because you want to know how and if it works. You want to be safe, after all. But then, you should probably delete this test, and for sure, not repeat it for every single domain class out there.

This test doesn't event verify that, by the way. Because it's a unit test, working on a mock of a database. It's not testing the real GORM (Groovy Object-Relational Mapping, an adapter on top of Hibernate). It's testing the mock of the real GORM.

Yeah, it's that stupid.

So if TDD gives us safety, design and feedback, what does this test provide? Absolutely nothing. So why did the programmer put it here? Because his brain says: tests are good. More tests are better.

Well, I've got news for you. Every single test which does not provide us safety and good design is bad. Period. Those which provide only feedback, should be thrown away the moment you stop refactoring your code under the test.

So here's my lesson number three:

PROVIDE SAFETY AND GOOD DESIGN, OR BE GONE.

That was the example of things gone wrong. What should we do about it?

The answer: delete it.

But I yet have to see a programmer who removes his tests. Even so shitty as this one. We feel very personal about our code, I guess. So in case you are hesitating, let me remind you what Kent Beck wrote in his book about TDD:

The first criterion for your tests is confidence. Never delete a test if it reduces your confidence in the behavior of the system.

The second criterion is communication. If you have two tests that exercise the same path through the code, but they speak to different scenarios for a readers, leave them alone.

[Kent Beck, Test Driven Development: by Example]

Now you know, it's safe to delete it.

So much for today. I have some good examples to show, some more stories to tell, so stay tuned for part 2.

11 comments:

  1. Thanks for a really good post (the Jabba the Hut line made my day) about crappy tests.

    I think there's no escape and everyone is making such mistakes when starting doing TDD. You need to suffer from your own tests and have a few moments of reflection to find the right (or: better) way. The sad thing I notice is that many people go through all this pain & suffering without trying to find/understand/fix their problem. The "best" thing they do is to add -DskipTests=true when they are running their Maven builds.

    Personally I had a toy project I worked on with a bunch of friends where we spent far more time on polishing our tests than on writing the actual implementation. Such deliberate practice is necessary to get better at writing tests, but most people are not able to do that on their daily job (and many are too lazy to do that outside of working hours).

    And yes, I delete tests. Sometimes I will write some better tests to replace one I just threw into the abbyss. Sometimes it's not really worth the trouble.

    Looking forward for the 2nd part of the post!

    ReplyDelete
  2. I dont like it, sorry.
    When you want to explain something, then you make it more harder for who doesnt know Soke syntax and semantics.
    Maybe you havent taken other's shoes yet.

    ReplyDelete
  3. correction: Spock instead of Soke

    ReplyDelete
  4. There are "test smells" like "code smells" -- so now you know. TDD tests are black-box tests (testing behavior via public interfaces) and it looks like your 'bad' tests are testing the implementation (white-box tests), so changing implementation causes tests to fail.

    ReplyDelete
  5. I've just read it and it was worth reading. I am the author of many of these examples, so let me explain.

    First test "should show outlet". Of course I test the wrong thing. Why? Because Grails has many bugs when it comes to testing. Grails doesn't let me test if response.model.outlet == outlet. response.model is null, but it certinaly is not! So I must test if there was no negative consequence of not finding an outlet - redirection. Grails has to improve its tests A LOT.

    Second test "should create outlet insert command..." - securityServiceMock should be in given. But it is a Spock convention to put it in then. You're right that strict condition "1 time exactly" can be removed. As for notifyAccepters - test name doesn't say that. And you know it'd be too long. Same goes to flash.message and redirectrion.

    So Jakub I have a task for you. Please refactor "should create outlet insert command..." test and show us how it should look. It's easy to criticize, but much harder to refactor. Show me final result so discussion can come along.

    ReplyDelete
  6. I forgot about last test - constraint test. I need more arguments about that.

    Let's say I've created a domain Person that must have unique email across all instances. Should I write a test for that? Why should I or why should I not? You say these tests lock your domain and you're right. But should I test if I have unique emails or not? And why?

    ReplyDelete
  7. @Tomasz
    Here is my take on "should create outlet insert command"
    https://gist.github.com/3810821

    The method userIsLoggedIn() should be in setup, but I wanted it to be visible in gist.

    That's all I can do on my phone, while on holidays :) Hope it's enough to discuss the matter further. Let's do some pair programming when I'm back.

    As for the second comment: TDD gives us safety, design, and feedback. What does the current test give? IMHO nothing. I test the DB constraint only if:
    - I don't trust the DB (therefore gaining security)
    - I don't have the DB yet, just the business requirement (therefore gaining design)
    - I want to test/design how the application behaves when I enter two identical emails (therefore gaining both), but that would be on a different level (controller perhaps)

    ReplyDelete
  8. Wow, I must say that is very unfair (and unprofessional) to publish and criticize some one elses code like this. I mean with words like "stupid", "ignorant", "reckless".

    ReplyDelete
  9. @Paweł:
    Looks like you have not noticed, that I wrote all that myself as well. I was being stupid, and I've learned my lessons.
    You've also not noticed, that I didn't call the author stupid, I called those mistakes stupid. We all make something stupid from time to time. Would you rather have somebody point out your mistakes, so you can learn, or would you rather always take the long route of learning only when it hurts you badly? That I would call unprofessional.

    If it's the second case, I guess you don't find pair programming amusing, do you?

    Or are you perfect, and never make anything stupid?

    BTW I didn't know whose code it was (could as well be mine, see above), until after I presented those errors on our workshop, so we could all learn and have some progress.

    ReplyDelete
  10. Jakub, I don't think critique is a bad think. I am saying, that way you are doing it is wrong. Code can't be "stupid", "ignorant" or "reckless". These adjectives refer to persons. You are not criticizing but stigmatizing.

    I would prefer to hear from my pair programming partner, that code "will be hard to maintain" or "has too many responsibilities" or "introduces duplication" or "contains bugs", rather than "this code is stupid".

    ReplyDelete
  11. I see your point, and you are right. But then, It wouldn't be funny anymore.

    ReplyDelete