November 2, 2015

P2P salary review

How and why I have implemented a Peer-2-Peer salary review in a corporation: a case study.


Goal: optimize for fairness


A CEO raised his company's minimum wage to £45,000 a year, and some employees quit because of it [link]. Do you know why? Because the hard working, highly educated staff were pissed off that someone who slacks off all the time and had not invested much in their own development, earns the same money.

An interesting thing about salaries and having people happy, is that the absolute amounts are much less important than the relative fairness of the system. And that’s a bit counter-intuitive.

Imagine this: when hard times come, we have a war or an economic crisis, most people are fine with having to cope with those difficulties, when they know it’s a shared burden. On the other hand, they get angry very fast, when they see stupid, incompetent, corrupt people driving fancy cars and living in luxury villas.

Fairness is the most important aspect of a salary system. Fairness is what we should optimize for.

Bear in mind that your company is a part of a bigger system (the industry). Which means that no matter how fair you are, you cannot pay shit to your people. You have to be fair and competitive. Otherwise, the fairness of the larger system, also known as “getting a better job”, will bite your ass.

Embrace your ignorance


It’s quite easy to evaluate the people you work with, when you have a team of up to 5 +/- 2. However, beyond the number of seven, you can only have a dim idea of how people are doing. One level deeper (49) and you have no chance at all.

When I became the Head of a 12 people branch, which I’ve grown to around 50, it’s been clear right from the start that I don’t want to be the person to evaluate everyone.
It would be unfair, logically invalid, and still take way too much of my time.

What other options did I have?

People are split into small, self-organized, interdisciplinary teams. You could give the power to evaluate people to one of the special roles, like a Team Leader, or a Product Owner, or a Scrum Master. However, both Scrum Master and Product Owner do not have enough technical knowledge to evaluate developers properly, they are biased towards soft-skills which is just half of the picture. The Team Leader is biased in the opposite direction. And going with either would still leave me with having to evaluate those “special” roles myself.

But the biggest problem is, that once you put the power of deciding about your money in the hands of another person, you will not treat the person the same way anymore. So my self-organized, coherent teams would gravitate towards a “Project Manager with his minions”.

That’s the old way of centralized, report & control, zero-trust, competition over cooperation, UK management style.
 “I’ve also noticed that different countries and cultures place different values on control. Some (e.g. the UK) value control and the restraint that it brings whereas others (e.g. Scandinavia) value empowerment and motivation.” [Software Architecture for Developers; Simon Brown; Leanpub 2014]
I don’t like that. It’s very ineffective.


Client, market, and stakeholders


If not a Product Owner, not a Scrum Master, and definitely not a manager, who can do the evaluation? Who can verify whether people work hard, efficiently, whether they are competent and valuable?

Theoretically, our clients or the market could tell us that. We could analyze, whether people do bring us money, or bring us closer to our target. This however, can only work if the team can decide on what they work on, which business opportunities they are after. And even then, it would probably be unfair. You don’t want to punish people for taking risk, and failing. In this line of work, you want them to feel safe to innovate, fail and learn from it. At least as long as they do not put you out of the business.

If they wanted to risk the evaluation of the market, they would start their own companies.

I work in a corporation. Decisions about the products we work on are done at the Executive Committee level. Evaluating my developers based on whether ExCo ideas are profitable would be very unfair.


P2P


That leaves us with only one option really. Peer-to-peer evaluation. We have a system, where everybody has some data and a bias, nobody has the full picture. Let’s get all the data together, and do some data mining on it!

Our constraints:

  • For an employee, the whole process can take up to 30min, but should stay below 15 most of the time. And that’s for evaluating ALL the employees. Otherwise, the cost is huge, since every single employee has to do it (and it’s boring).
  • On the manager side, gathering and analyzing all the data should not take longer than a day.
  • The system should not be hacked (and we do have some pretty smartass hackers in here), or should be hacked in a positive way
  • It would be cool if the system brought us more than just the money levels. It would be cool if the system told each of our guys, what they should do, in order to get more money next time. And that needs to be as specific as possible.


Impact


Cool, we have constraints. Now what do we actually evaluate to be fair?

Hard work? This is not a treadmill. I don’t want my people to work hard. This is a software house. It’s creative work. I want them to work smart!

Results? What results? Results of a company. i.e.: money, brings us back to market evaluation, which is wrong. Results of the tasks taken by an individual, brings us back to hard work evaluation, I don’t want that. Perhaps dedication? That would make all the dedicated idiots very rich, but not help us much, so maybe that’s not the way?

How about evaluating IMPACT a given individual has on the company? Would that make sense? If we review the impact, and connect it with the money, the system should be fair.

You work hard every day to move us further? You learn at home as well, because you are passionate? And you code works in production? Perfect. Money for you, good sir!

You jerk off all day, but have brilliant ideas in the evening, that you implement in the morning, that solve a lot of problems for everyone and bring us closer to where we want to be? Why not, jerk off again, it’s fine with me. Just use the bathroom, please. And here, wipe your dick with those euros.

You are a genius, ego the size of a planet, you read all the books, you could move mountains with your infinite knowledge and expertise, you have a PhD on Agile Node.js Extreme Reactive microESB, but somehow you cannot really work with people and thus none of your solutions get into production? And you antagonize teams? People don’t want to work with you? Sorry, that’s no impact or even negative impact. No monies here.

Right, but measuring real impact is hard. And even more, could be unfair. This is a team sport, so your best people could get screwed by show stoppers, other poor people, business changes... you know, shit happens. Real impact, like the market evaluation, will feel unfair, when your best people get trashed by things they have no control of.

The problem here is also the solution. You should measure the potential impact (without show stoppers), and verify it against the real history (because your perception of potential feedback may be wrong), to find out, and work out, why the potential is not used in practice.


Skillsets


How does one evaluate potential impact of others? In software development potential impact is based on someone’s skills. The better the skills, the more potential impact, though both are not equal (more on that later). So how do you measure skills in a timely and meaningful manner?

Fortunately, I’ve already solved that problem once. While hiring people in TouK company, by doing a full day pair-programming (a very good, though costly method), I came to the conclusion that to be able to make a salary recommendation, I want to to present new people by comparing them with existing ones, on 6 skill groups, which I find important in software development.

These groups are

  • Hard Programming Skills (including: programming skills in relevant language, OOP/FP, frameworks, methods like XP, TDD, DDD etc.)
  • Hard DevOps Skills (including: Linux administration, deployment/build pipelines, JVM/GC tuning, database tuning, etc.)
  • Communication (How easy it is to communicate? Do you like cooperating with that person? Are you irritated by the guy? Does he/she read your mind?)
  • Understanding (How fast can one understand new, complex stuff. How fast does one learn?)
  • Self-sufficiency (Can one pretty much do anything about the software development, or does one need a lot of help and attention? Can I throw any problem at them and it will get resolved? This includes domain knowledge, broad set of skills, goal orientation, self-retrospection and adaptability.)
  • Focus & Motivation (Some people can move mountains with their sheer will. They are self-driven and have intrinsic motivation. You just need to remove blockers and let them shine. Others are easily distracted, lack self-motivation or sense, constantly need pointing the direction, waste time on facebook, or in the worst case scenario, need a lot of control & reporting)

It may surprise you that hard skills take only ⅓ of the evaluation. But my experience suggests that most developers can handle rocket science mentally if you give them enough time. The problem is, that a large number of them will drop due to a lack of focus, some due to digging themselves up in just one branch, and not being able to handle the rest (self-sufficiency), for some I cannot afford the amount of “enough time” (understanding) and a large portion of the rest will never be able to communicate their knowledge properly.

So yeah, sorry, but software development is not only about programming. At least when you have to work in a team. Programming is crucial, but it’s not enough alone.

But this is cool, because if you skip those 2 hard skills, you can also use the same groups to evaluate Product Owners and Scrum Masters. That makes the whole process easier.


Round 1: Comparison


Human beings are bad at absolute values. We are good at comparison though. So my first iteration was to ask people to draw a line for each of the skillsets, and then start putting small stickers with people’s names, relatively to each other (bottom: worse, top: better). No absolute numbers whatsoever.

Since you do not work with everyone, you should only evaluate the people you feel you can. And only in the skillsets that you feel you can.

The outcome looks something like this.



Great, we have some data. Time to clean the noise.

Round 2: Explanation and refinement


After that, I would take every single person to a one-on-one, and ask them to explain their picture to me. We would talk about the bottom and top 20%. About all the issues they have with other people (usually communication). I would ask them how the person evaluated could improve their situation. I would gather feedback.

Then I would do normalization and clustering, to [-2..+2]. Like this:



Manual normalization and clustering of data took me a lot of time even at 20 people, so this part doesn’t scale.

For the second time, I decided to put absolute values: 0 to 4, and use google forms to gather feedback. I was afraid that without continuous scale, people would have a lot of trouble with comparison. Turned out, I didn’t need to worry. These is IT crowd, they understand clusterization perfectly. Discrete numbers are totally fine.

I have also added a “Note” textbox under each skillset, so that people could write down their thoughts on top/bottom 20% themselves the moment they make the decision.



The field “Who has similar skill...” is there to help people self-verify their feelings.

Still, I had a one-on-one meeting with everyone. A completely automated system could be easily hacked. A system where you have to explain yourself to a biological creature is much harder to break. And much more humane. So while my people did clusterization of data automatically, I did normalization and verification myself.


Round 3: Analyzing data


So what do we end up with? Lots of numbers.



Each column is a person. Each row describes level. Each cell represents number of votes for a given skill at a given level. The color is based on value, so you can easily spot high numbers.

A simple spreadsheet formula, and you have a number for each skillset for each person. You can even sum this up to a single number. A single number describing someone is very tempting. That represents how good the person is, right?

Not so fast.

While numbers are great, it’s the distribution that you should analyze. Why is this guy hated by everyone but 3 people (communication)? Who are those people? What is their work relation? Why are those 3 people also evaluated badly in this skillset? Why does the other guy have no estimates in Focus & Motivation? Is he acting? Is he passive-aggressive? How about this girl, so good at pretty much everything, except DevOps? Is she still running Windows at work?

I don’t know if it’s possible to automate this, but I wouldn’t even try. The distribution tells you stories. When you add additional verbal and written feedback, you get to understand the dynamics of your teams a bit better.

And you can always ask. If you want to dig deeper somewhere, you can. The beauty of this system is that you know where to dig. Understanding the situation plays a big role in normalization of the feedback.

Impact != P2P skill review


Skills are the base for potential feedback. But how about that girl over there, that never bitches about the work, never panics, even in a dire situation, with a job that nobody wants, but that needs to be done, you will just hear a soft sigh, and she’ll get it done. It’s helpful. It’s calming. Her stoic attitude, her gentle voice, and behaviour gets all the hectic boys through the hard times. There is a lot of extra value in it.

But it doesn’t get evaluated in a P2P skills review.

That’s because Impact and P2P skills review, are not equal.

To make it simple, my formula for Impact is

Impact = normalized peer-to-peer skill review + personal traits 


Traits


Traits are simple. Traits can be both negative and positive. Let me give you some examples, you will get it in a second:

  • +Performance/Security expertise we need once a year very much
  • +Public Relations (blogs/open source libs lure more good devs in)
  • +Assertiveness (helps keep quality while negotiating with the client)
  • -Submissiveness (can be dominated by the client, and fail due to that)
  • +Iron Willpower (not braking mentally in any circumstances)
  • -Low Willpower (panic!)

You should be able to identify those yourself. Like in a good role playing game, traits define a character. I know a lot of great developers, what makes them stand out, are their interests and traits.

Traits modify the outcome of skill clustering. They usualy give cumulative +1, -1 to the final cluster the person is in.

Applying salaries


Now back to the goal of optimizing salaries for fairness. We have clustered skills, modified by traits, that describe the potential impact a person has on the company. On the way, we have found a lot of interesting information about team dynamics, and we dug up all the not so obvious. After everything said and done, we have reduced the data to a single number (cluster). People in the same cluster have about the same impact. Time to apply money to this.

The problem in many corporations is that salaries are not public. At least not officialy. This is a very stupid leftover from the previous era, when an employeer wanted to pay as little as possible to his employees. But it makes no sense? Why would I, as an employeer, want to pay as little as possible? The goal of the employeer is to make as much money as possible, if I have to pay more, to get more, it makes perfect sense. So paying less to the people makes no sense. Finding the sweet spot, when the money that people make gives the highest benefit, that makes sense.

Some will argue, that money is not a motivator, after certain amount. True, that Notch (the author of minecraft) with all his billions of euros, doesn’t seem to care about another million. But Notch is not a standard employee, and people who say money is not a motivator, usually have a lot already. Looking at the market of developers though, most of them are also motivated with money. It’s not the only factor, but it’s important enough, to make them change their jobs, but the amount that would make them not care about money anymore, is never reached throughout their lifetimes. Anybody with kids can tell you that, whatever you think is enough, will probably be wrong.

So what can we do in a corporation, where we are not allowed to publish salaries?

While we cannot publish salaries, we can publish anchor points. Lots of job comparison sites (nofluffjobs.com for example) require a salary range. If you clustering ended up with 5 different clusters, you can probably come up with some names for those levels (junior/mid/senior developer, architect, senior architect, principal senior executive architect, etc.). Now all you need to do, is publish those job offers, with the money (either as a salary range or fixed), and people can compare their own situation to the whole picture.

Make a strong commitment, to always keep salaries together linked with job titles and linked with the potential impact. Promise to keep it fair for the people. And now, because titles are public, everybody knows what the salary of everyone else is.

More or less. More if you have only one number for a cluster, less if it’s a range. Still, people will be able to see the whole picture, and tell you if they think it’s fair.

To make the commitment strong, I had to fire someone, who had much more money that their position (he was a much better negotiator than an employee), and rise the salaries for all the bad negotiators. If negotiation skills are valuable to you, make them another skill group, or add them to Traits.


Benefits


There are good things and bad things about this system. First the good things.

I don’t have to be the judge. I don’t have to play an omnipotent being. It changes our cooperation a lot, the moment people realize, they don’t need to make a good impression on me. It makes them more open.

People don’t need to convince me, when they need more money. They have to convince their peers. This means we can have a honest discussion about what to improve, how to make their potential impact into the real one, or how to increase their skills. Or the perception of their skills in other people. When someone thinks he has much higher skills than other poeple think, it’s usually about the problem with communication. Or ego. It’s something you can work on. Though egos are hard to fix. But even then, you have hard data.

If it's a problem with communication, it's no more "my boss doesn't like me". Now it's "have a look, all your peers say you are bad at communication". You cannot ignore all the people. You even cannot get angry at all the people. You finally have to understand that the problem is on your side.

It takes 15 min per developer, and 1 day per manager. On average, because it varies with how much digging up you have to do.

After investigation it gives you tons of feedback, which you can give back to people. And people want that feedback. I’ve been doing it every half a year, because we had a salary review in corpo in summer, and then a bonus review in the winter. People asked me to do it more often, even knowing that I won’t change anything about the money, because they wanted to get the feedback. More feedback is better.

And it can be used for Devs, POs, SMs, QAs, System Engineers… Basically everyone who has peers.


Caveats


Beware, there are some obvious downsides.

Peer-to-peer review doesn’t work for people, who work alone. It also doesn’t work well if your teams never mix, and stay in the same setup for years. You will soon find out, that people can review only their team members. But if you have a medium sized company, and people do not cooperate too much between teams, do not share their knowledge, perhaps you have a completely different problem to solve?

It’s tempting to dehumanize (turn people into numbers) fast. Don’t do that. Let the distribution tell you stories, and dig into it. Otherwise, your system will be unfair, and unfair systems get hacked much faster. Social justice, I guess.

You cannot compare salaries between different roles (QA/dev/PO) due to different market levels. This doesn’t sound bad, but since you have promised to be fair, what happens if someone changes his position from a better paid role to less paid? Scrum Master to PO, for example? Small problem if the person can still do their old duties (potential impact is still high), but if they can’t?

Or how about someone who learns nothing for a few years? In a fast changing world of software development, skills deteriorate fast. Since the salary can never get lower, and the majority of people who still work on their skills will rise the new base (average), the same salary for years may become more and more unfair.

Sometimes, due to fairness of the system, you will have to fire someone, who otherwise does his/her job.

It’s also hard to explain to non IT people (CEO, HR) that someone can make an improvement of +40% in a year. They may not like to give such a salary rise. But it happens in IT. Well, at least you now have some hard data on it.

Another mistake I made, was giving not enough anchor points. I've published two salary ranges (mid and senior), because it was very difficult to get my corpo to agree on anything oficially. At that time, I had 5 clusters from P2P review. Some people who just got promoted or hired to senior level, expected to get the top range of this position, while the truth was, 70% of seniors were at the beginning of this salary range. My clusters and published anchor points were not one-to-one, and that was confusing to people. Fortunately, people trusted me enough for this to not be a problem. I just had to do some explaining.

But all of that is nothing, compared to the biggest benefit of P2P salary reviews.

And the biggest benefit is: I had this system running for two years and people liked it.

At least that was the overwhelming feedback that I got. And that makes me happy.

July 12, 2013

Confitura 2013 afterthoughts

Confitura, the biggest free-of-charge Java conference in Europe, took place on the 6th of July in Warsaw. TouK's presence was heavy, with 5 separate talks, all chosen in call for papers, no sponsored bullshit. We were sponsoring deck chairs during the conference, and beer on the after-party, though. Both were quite popular. Oh, and we had Bartek Zdanowski with TouK Lab, doing funny things with lights, Jenkins, and raspberry-pi.
Last year, together with Tomasz Przybysz, I had a talk about Groovy and Grails. This year I had the pleasure of presenting revised Test Driven Traps.

Now, this wasn't the same talk I had in Tallinn. In fact, I rewrote most of it, with three major changes.

While at first I was focused on three goals of TDD (security, design, feedback), I've noticed that I underestimate the value of communication. The funny thing is, I do use TDD in that manner frequently, it just didn't appear to me as very important. Which, as an afterthought, was a big mistake.

Perhaps the reason for it, is how people distinguish between TDD, BDD (Behaviour Driven Development) and ATDD (Acceptance Test-Driven Development). I use Spock most of the time, and Spock mixes all three approaches very well. To be quite honest, I've been mixing all three of them in every testing tool I ever had. Whether my tests looks more like BDD or ATDD (that is, had more emphasis on communication and specification) depends on the nature of the project, and the nature of my client. Technology is not a problem here. It just used to be more awkward to write down user stories as a tests, in languages which were not as flexible as Groovy.

I had a lot of success using tests as a communication tool. Both internally, with developers, including myself while returning to an old code base, and with external clients. But I also had some failures.

A few months ago, while doing a mobile banking project, I had a session with a domain expert. We were trying to get to the bottom of fees and pricing in a specific payment scenario. While we were siting and talking, I wrote down a simple specification
static Closure fixed = {BigDecimal value -> new FixedFeeAddedCalculator(value: value)}
static Closure percentage = {BigDecimal value -> new PercentageFeeCalculator(value: value)}

def "should calculate fees"() {
    given:
        biller.feeCalculator = billerFeeType.call(billerFee)
        biller.feesForChannels = [new FeePerChannel(channel: Channel.USSD, feeCalculator: channelFeeType.call(feeForChannel))]
        biller.feesForPrefixes = [new FeePerPrefix(msisdnPrefix: "123", feeCalculator: prefixFeeType.call(feeForPrefix))]
    when:
        BillerFees billerFees = biller.calculateFees(productPrice, "1234", Channel.USSD)
    then:
        billerFees.total + productPrice == totalMoneyPaid
        billerFees.feeForChannel == feeForChannelPaid
        billerFees.feeForPrefix == feeForPrefixPaid
    where:
        productPrice | feeForChannel | channelFeeType | feeForPrefix | prefixFeeType | billerFee | billerFeeType || totalMoneyPaid | feeForChannelPaid | feeForPrefixPaid | feeForBillerPaid
        100          | 7             | fixed          | 3            | fixed         | 10        | fixed         || 120            | 7                 | 3                | 10
        100          | 7             | percentage     | 3            | percentage    | 10        | percentage    || 120            | 7                 | 3                | 10
        123.4        | 10            | percentage     | 10           | percentage    | 10        | percentage    || 160.42         | 12.34             | 12.34            | 12.34
        123.45       | 10            | percentage     | 10           | percentage    | 10        | percentage    || 160.47         | 12.34             | 12.34            | 12.34
        123.45       | 10.05         | percentage     | 10.05        | percentage    | 10.05     | percentage    || 160.68         | 12.41             | 12.41            | 12.41
        100.99       | 0.99          | percentage     | 9.99         | percentage    | 0.09      | percentage    || 112.17         | 1                 | 10.09            | 1
        100.99       | 0             | percentage     | 0            | fixed         | 0.01      | percentage    || 101.00         | 0                 | 0                | 0.01
        10           | 0             | fixed          | 0            | fixed         | 100       | fixed         || 110.00         | 0                 | 0                | 100
}
Those static fields/closures are a little hack, so that I'm able to use something more readable in the 'where' clause, than a 'new' with the name of the class. After we confirmed, that it looks OK, but we don't really know whether this is accurate, I just copied the 'where' part with a little description, and sent it to the bank for confirmation. It looked pretty much like what the bank likes to send with their requirements. The expert was happy with this, I was happy with this, we've had that working for us before.

No word about fees and pricing has been heard from them since. And that's about six months now.
Which shows, that no matter how good your tools are, no matter how smart domain experts are, if you have a fundamental problem with communication with your client, you are doomed anyway. And working through a proxy with clients from another continent, all I have are problems in communication. The technology is simple and easy, compared to that.

But I digress, I do encourage you to think of tests as a communication tool. It's a blast when it works. And it does help a lot with keeping our domain knowledge up to date. Something requirement documents fail miserably to do.
The thing I do not know, is whether it is better to keep specifications and user stories separate from the rest of your tests or not. I've tried both approaches, but I don't have any conclusion yet.
I also added a few simple thought on domain setup, to my presentation. Most systems need some domain objects before even unit tests will run. You can create that step by step, calling each constructor in every test, but this soon becomes a DRY violation, and a pain in the ass, as the domain gets more complex. Not to mention refactoring problems. A simple factory helps, but doesn't change that much, so in my experience, the best way it to have your aggregate roots built in a single class, preferably using the same builders you use in production code (if you use any). This kind of setup definitely simplifies writing new tests, and is trivial to refactor.

But that's most useful for unit tests. For integration tests, there are three approaches I'm aware of. The first one is the 'go crazy, copy data from production'. Fortunately, this is possible only if you join an existing project, that had no tests before, so I'll just skip that, as I am inherently incompatible with such situations. The other two are more practical.

You can have an empty database (you can even create it on the fly), where every test has to provide the data (domain objects) required for itself, or you can have a prepopulated database, with a precise set of data, representing some sample domain objects, and have that reflected in your test code in form of static finals. In both approaches, a rollback (for transactional Dbs) or a manual clean-up (for everything else) keeps the database at the initial state.

Everybody seems to prefer an empty database. But a prepopulated database has two great advantages: it usually simplifies test setups, and may have big impact on test performance. Just a few weeks ago I switched one of our projects to use the prepopulated database, and saved 15 seconds of test time.
15 seconds may not look like much, but our integration tests run now in 1m30s. I've saved another 2 seconds, by removing an ugly sleep in one test, and I can save a few more seconds, by just not being wasteful.

And that brings me to the next change in my presentation. I've pointed out the importance of performance in TDD. You have 30 seconds for unit tests, before you loose the patience of a developer. Make your tests run more than 3 minutes, and devs will only run them before push to the repo. Make them even longer, and nobody's gonna care about tests anymore.

This is a single reason, why I thing GORM (Grails object-relational mapping) is not good enough for bigger projects. Grails use Active Record pattern, with a nifty mocking of database access methods via @Mock annotation, for unit tests. But with a complex domain and large aggregate roots, you get a heavy performance hit because of those annotations, and you end up with unit tests having half the speed of integration tests.

And that's pretty bad. I have ~900 unit tests running in about 2.5 minutes, on i7. And most of that is due to @Mock annotations. I could bring it down to less than 30 seconds, if I could mock those bastard database calls the normal way, but with active record, I would either have to do meta-programming, or go into mixins and categories. The first option is error-prone, the second very cumbersome. Both are probably inefficient. Comparing the amount of work required to make those unit tests fast, with mocking a simple repository interface, puts all the other benefits of Grails active record into question. My conclusion is to not use GORM, if I'm going for anything more than 300 man-days. Or not use Grails altogether.

Those conclusions are kind of obvious, I suppose. The reason why I don't see them used too often, is because people do not treat tests as a normal object (or functional) oriented code. That at least was my excuse for years and years, till I finally found out, this approach ends up in unmaintainable test code, that keeps you production code from ever being refactored as well.

And that's pretty much all of my presentation changes.


Confitura 2013 was a fantastic experience. I am usually scared shitless, when giving a talk at a conference. Somehow, maybe because I know a lot of people on the audience personally, I'm less scared in here. During the first half of this year, I gave 6 talks, led two workshops, helped organize Warsaw Java User Group, and wrote a book. Confitura was a great finish for that marathon. Now I need a summer break.

December 16, 2012

Devoxx 2012 review


I'm sitting in a train to Charleroi, looking through a window at the Denmark landscape, street lights flashing by, people comming home from work, getting out for a Friday night party, or having a family dinner. To my left, guys from SoftwareMill are playing cards.
I don't really see them. My mind is busy elsewhere, sorting out and processing last two days in Antwerp, where 3400 developers, from 41 different countries, listened to 200 different sessions at the Devoxx, AFAIK the biggest Java conference this year.
Some talks were so popular, that people were siting on the stairs.
 
It was an interesting experience, watching the show, after six conferences in Poland that I've been to this year. Everything about Devoxx is bigger. The venue is called Cinemapolis, and as you'd expect, it's a cinema complex. At the ground level you can find a lot of the most important Vendors in the Java world. SpringSource, Typesafe, Jetbrains, JBoss, RedHat, IBM, ZeroTurnaround, 10 Gen, Atlassian, and many others, have their people presenting latests products, giving away t-shirts, books, or just asking for feedback. Google stands on the opposite side to Oracle, as expected, but you won't see them fighting each other. There are no lawyers and their bullshit in here, there are devs and devs only.
The main hall is where most of the vendors are at
For those needed extra charge, or a hackaton, long tables with plenty of power outputs were provided.
Grzegorz Duda (33rd) with Stephan Janssen (Devoxx)

A conference day starts very early to my standards (8 am) and so do keynotes, at 9:30. There are seven large cinema rooms at our disposal, and keynotes are streamed from the room where the whole show takes place to a big ass cinema screens and movie level sound system, so you feel just like you were just in front of the scene, no matter where you sit. 

After keynotes, normal sessions start, and go on till 7pm, with one half an hour break for a coffee and another one hour for lunch. Lunch is rather blunt, and the queue is long. My friends, who are organizing Confitura, Geecon and 33rd, say that you'd be overwhelmed with bad press if you tried something like this in Poland. Perhaps it's the nature of our nation to be always bitching, but fortunately, people here seem not to mind, and we certainly don't. With so much things going on, you find hardly any time, to even take a piss, so you tend to just swallow the food as fast as possible and get back to talks, vendors and socializing.

There's a plenty of ways vendors will try to get your attention for a while. All of them are fun.
Talking about the last one, I've been traveling to Antwerp with Grzegorz Duda, but we met half of SoftwareMill at the Modlin airport, and we were meeting more and more Poles on the way. During the conference, there were whiteboards where someone would post a question, and people would mark answers. One of them was “Where're you from?”, and as you can see from this picture, Polish representation was rather strong.
While you could have all the smalltalks with interesting people, that you would like, socializing is not really something that conference days of Devoxx shine with. First, with so many interesting talks packed up, there is little time for chatter, and second, such a huge crowd makes the interaction harder. Just a few weeks before, I had been to TopConf in Tallinn, which is a conference with perhaps 200 people, but it somehow makes everyone important, brings people together to talk, get to know each other, and leave the secure area of their own groups. In Devoxx, it's more like a rock concert, where you go with your friends, have fun with everybody, but don't get to know too many people really well. And the party of the second day, doesn't help much. It's in a club called Noxx, just next to the cinema, the conference is in, and I've heard it's the second largest club in Europe. It's a typical 'club' club, with club music, several dance floors, drink bars, and little else. The problem is, with 3400 devs, mostly male, you kind of end up with a large gay party, and the music is way too loud to talk with anyone. You can throw yourself into trance, and have really good fun, but don't count on getting to know any of the people.
So if you want interaction, choose BOFs, Labs, or the University day in general, instead of the regular conference days. Or just try the very long tables, where people could get their laptops and phones charged, do some work or show off some cool stuff to their friends.
RFID bracelet. Surprisingly durable, survived three showers with ease.

Back to the talks, after every one, on your way out, you could vote thumb-up, or thumb-down with your RFID bracelet, that you were ringed with, on your sign-up. This would get pushed to a node.js server, running on Raspberry Pi, and you'd be able to see in real time, which talks were the best. At least in theory, because the system didn't survive the real life experiment, because people were either confusing up and down, not leaving the room waiting for another talk (and this not voting), or voting for a wrong lecture.

Raspberry Pi was everywhere at the conference. Guys from Oracle would show us their new JVM build specifically for this device, to facilitate floating point hardware, and run latests JavaFX 2.0 on it (which, finally doesn't suck anymore, though doesn't get your jaw drop down either), but you'll have to wait a bit for it, since it's not yet public. Inspired by the show, I've ordered one, together with nine other guys here at TouK. Let's see what a cluster of Raspberries can do for you.

I cannot really describe all the talks. I've seen just 1/7 of the conference, and I have a notebook full of notes, but most of them are just 'a-ha' kind of remarks, when I've heard something really useful, like how to make a difference between your HTTP 403, and some new badly-configured firewall on the way, when you are building a fully REST-to-the-bones, services, with all HTTP statuses and so on, or why there'll be multi-inheritance of behavior in Java 8, and why it's not dangerous compared to multi-inheritance of state, or how the interfaces may have a default implementation in Java 8+, and why it's such a smart solution to the problem of backward compatibility and the need for a completely fresh collections API. All the Devoxx talks should be available soon on Parleys.com (which was rewritten in HTML5, by the way), and though they charge for viewing, I think it's worth you bucks.

So as we are down to money, here is a question I'd like to answer:

Assuming you have limited resources, and wonderful, local, much cheaper conferences, like we do in Poland, is it better to go for, say three local conferences like GeeCon, 33rd, and JDD, or to go to Devoxx?

The answer depends on what you expect from a conference.

If you want knowledge, GeeCon + 33rd + JDD will probably be better, just because while Devoxx has 7 tracks, you can only be on one at a time.

If you want inspiration and to feed on the passion of others, GeeCon + 33rd + JDD will be better, just because those are quite nicely distributed over the whole year (especially if you include free Confitura and Warsjava in the line-up).

If, however, time is your problem, and you have only so much to go to one conference that year, then Devoxx has a slight advantage in my opinion. Not a big one, but still.

On our way home.

The plane is circling over the Modlin airport, as I'm talking with Paweł Wrzeszcz about how much we both are scared giving public talks. Outside the window a thick fog surrounds us so closely, that I am unable to see the tip of the aircraft's wing. The captain explains over intercom, that Modlin lacks necessary radio hardware to guide us safely down, so we are heading to Chopin Airport in Warsaw. People applause loudly, as most, if not all of them, are heading to/through Warsaw anyway, and fly from Modlin only because it's cheaper.

We are slowly moving over the city.

The plane is loosing altitude, and I confess to Paweł, that I'm just as afraid of flying as I'm afraid of talking at conferences. This isn't the fear he shares with me, but I can see he's a bit uneasy too. The fog outside the window is milky white, and I can't see even the middle of the wing through it now. The plain starts vibrating, last corrections are being made. We are flying by the radio signal, which comes from some hardware on the airfield, for which some programmer wrote software some time ago.

And then we hit the ground.

And we stop.

People clap their hands with relief. Here's the standing ovation, here's the big thank you for the captain. And the crew.

And, perhaps, even though most people don't know it, perhaps also for the programmer who wrote that marvelous piece of software, which allowed us to land safely.

That's what I choose to think.

October 15, 2012

Friends with benefits

Back when I was at my first university, working on my B.Sc., it was obvious for us, students, to always try to get into any possible, open IT event in the town. We were hungry for knowledge, for experience, anxious looking at the future.

One of the really bad professors out there, told us on the first lecture, that whatever we learn is going to be outdated the moment we learn it. He said that we won't be able to keep up the pace, that we will burn out, we will be replaced by some newfags before we even learn to do the job. And there is no hope for us.

It wasn't a good university, it was a young one. A young university in a small town, with very old professors. Some of those guys were there, because no one else would take them. So yeah, we were anxious. Anxious that we are not getting any good education, anxious we will not find a good job afterwards, anxious we will become burned out before we know it.

But we were also very young, young enough to believe, we won't be giving it up without a fight. And thus we were learning on our own, reading books, coding, getting together and sharing what little knowledge we had. And we applied for every interesting position around, to get as much experience as we could.

Not that we had many options. As I said, it was a small town.

After B.Sc., I moved to Warsaw. Started my M.Sc. at another university, which turned out to be no better, but it didn't matter any more. I had a good job as a programmer, with fantastic, smart people. In half a year, I learned more, than in five years at all the universities.

Then I moved to another company. Changed the technology stack completely. Started anew, and had great time learning from all those smart people around me, and teaching them whatever I could.

And there was not a single event I would attend to, or hear anybody do it. It seemed like a thing of the past. We had so much knowledge to learn inside the company, why move, why go anywhere at all?

And then I switched my technology stack again. I joined TouK. This time, however, it seemed like I could spread a lot more knowledge about TDD, OOP and good practices, than before. But to do that, I had to fill the holes in my knowledge really fast. So I got myself some books, some RSS'es, some tutorials. One day, my boss sent us an invitation for a Warsaw Java User Group meeting. Some guys were coming. I signed up to the mailing group, and off I went.

There I was, back at the University, getting to know other hairy guys at WJUG meetings, learning about their craft, their interests, and their passions. Encouraged by someone, I went to a conference. Then another. I started attending conferences on regular basis. Actually, I started attending all the Java conferences in Poland. And there are quite a few, I tell you.

Something strange happened. Apart from the technologies, personal experiences, tricks and traps, I learned something completely different. I knew I was getting tired by everyday work, that my energy was much higher after right after holidays, and I'd be much less efficient just before one. I knew I could find myself burned out, one day. I heard a story here and there, in the kitchen, over the coffee table. Someone with a sad look would mention, he has no fun anymore. The job became tiresome. Someone would talk about buying a farm. Or a workshop. Getting out of this line of work.

But at those WJUG meetings, at those conferences, my energy was replenished. The enthusiasm and passion for cool technology, hanging right out in the air, would be contagious. The people, with eyes burning bright for great things they could learn and bring back home... I could feel their hunger for knowledge. And I was back, to my student years again, feeling everything is possible. Not anxious about my future this time.

That was great.

I wanted to help those, who make it happen. To somehow thank them. And I could do that in two ways: either by helping organize, or by sharing what I knew. I prepared a small presentation about Craftsmanship for WJUG, and got a positive feedback. Then I had another, about Spring Security, and another, and so on. Then I answered a call for papers, and got myself speaking at conferences. And it was great too. I joined another great group, Agile Warsaw. I even organized a weekly workshop, here at TouK, to get all the shy people to share, and learn how good it feels.

I have to confess, though. Speaking at conferences is a terrible strain for an anti-social, hairy guy like me. There is so much stress involved. If I wanted to be in spotlights, I wouldn't become a programmer in the first place. No one digs inside computers, because of love for humanity, I suppose.

Local group meetings, like WJUG, are a completely different story, however. Those are semi-formal, with 50-150 people in an old University assembly hall, with half of them hairy sociopaths in Amiga Forever T-shirts or alike, with beards that would make Richards Stallman proud, swearing like Linus Torvalds when he was thanking Nvidia, and making jokes that would fit right into bash.org, xkcd or the Jargon File. It feels good to be around them. It feels at home.

This is as close to demoscene-kind-of-feeling, as my old ass gets. Preparing a talk for them, apart from motivating me to dig the subject thoroughly, is even better.

One of the unforeseen consequences of getting out to conferences and meetings, is that you get a lot of cool gadgets. A hat from oracle, a ninja coder from Amazon, an energy drink from Microsoft with Linux/PHP/Ruby all over it, a nerd pistol, tons of T-shirts. I didn't have to buy a T-shirt for years. Sometimes, you even get a licence or a ticket.


Thanks to WJUG, I have the IntelliJ Idea and JProfiler, both of which are extremely good and handy pieces of software.

And yesterday... well yesterday, I got a ticket for Devoxx in Belgium, saving me a few hundred Euros.

So in case you didn't already, get your lazy ass out, and join a local technology group, go to conferences, write a blog, share with people. You'll be surprised by the unforeseen benefits.

Even if you are a hairy sociopath, like me.

September 17, 2012

Test Driven Traps, part 2

The Story of a Unit in Unit Tests

In the previous part of this article, you could see some bad, though popular, test samples. But I'm not a professional critic (also known as a troll, or a hater), to grumble about without having anything constructive to say. Years of TDD have taught me more than just how bad the things can go. There are many simple but effective tricks, that can make you test-life much easier.

Imagine this: you have a booking system for a small conference room in a small company. By some strange reason, it has to deal with off-line booking. People post their booking requests to some frontend, and once a week you get a text file with working hours of the company, and all the bookings (for what day, for how long, by whom, submitted at what point it time) in random order. Your system should produce a calendar for the room, according to some business rules (first come, first served, only in office business hours, that sort of things).

As part of the analysis, we have a clearly defined input data, and expected outcomes, with examples. Beautiful case for TDD, really. Something that sadly never happens in the real life.

Our sample test data looks like this:

class TestData {
    static final String INPUT_FIRST_LINE = "0900 1730\n";
    static final String FIRST_BOOKING    = "2011-03-17 10:17:06 EMP001\n" +
                                           "2011-03-21 09:00 2\n";
    static final String SECOND_BOOKING   = "2011-03-16 12:34:56 EMP002\n" +
                                           "2011-03-21 09:00 2\n";
    static final String THIRD_BOOKING    = "2011-03-16 09:28:23 EMP003\n" +
                                           "2011-03-22 14:00 2\n";
    static final String FOURTH_BOOKING   = "2011-03-17 10:17:06 EMP004\n" +
                                           "2011-03-22 16:00 1\n";
    static final String FIFTH_BOOKING    = "2011-03-15 17:29:12 EMP005\n" +
                                           "2011-03-21 16:00 3";

    static final String INPUT_BOOKING_LINES =
                                            FIRST_BOOKING +
                                            SECOND_BOOKING +
                                            THIRD_BOOKING +
                                            FOURTH_BOOKING +
                                            FIFTH_BOOKING;

    static final String CORRECT_INPUT = INPUT_FIRST_LINE + INPUT_BOOKING_LINES;

    static final String CORRECT_OUTPUT = "2011-03-21\n" +
                                         "09:00 11:00 EMP002\n" +
                                         "2011-03-22\n" +
                                         "14:00 16:00 EMP003\n" +
                                         "16:00 17:00 EMP004\n" +
                                         "";
}

So now we start with a positive test:

BookingCalendarGenerator bookingCalendarGenerator =  new BookingCalendarGenerator();

@Test
public void shouldPrepareBookingCalendar() {
    //when
    String calendar = bookingCalendarGenerator.generate(TestData.CORRECT_INPUT);

    //then
    assertEquals(TestData.CORRECT_OUTPUT, calendar);
}

It looks like we have designed a BookingCalendarGenerator with a “generate” method. Fair enough. Lets add some more tests. Tests for the business rules. We get something like this:

    @Test
    public void noPartOfMeetingMayFallOutsideOfficeHours() {
        //given
        String tooEarlyBooking = "2011-03-16 12:34:56 EMP002\n" +
                                 "2011-03-21 06:00 2\n";

        String tooLateBooking = "2011-03-16 12:34:56 EMP002\n" +
                                "2011-03-21 20:00 2\n";

        //when
        String calendar = bookingCalendarGenerator.generate(TestData.INPUT_FIRST_LINE + tooEarlyBooking + tooLateBooking);

        //then
        assertTrue(calendar.isEmpty());
    }

    @Test
    public void meetingsMayNotOverlap() {
        //given
        String firstMeeting = "2011-03-10 12:34:56 EMP002\n" +
                              "2011-03-21 16:00 1\n";

        String secondMeeting = "2011-03-16 12:34:56 EMP002\n" +
                               "2011-03-21 15:00 2\n";

        //when
        String calendar = bookingCalendarGenerator.generate(TestData.INPUT_FIRST_LINE + firstMeeting + secondMeeting);

        //then
        assertEquals("2011-03-21\n" +
                     "16:00 17:00 EMP002\n", calendar);
    }

    @Test
    public void bookingsMustBeProcessedInSubmitOrder() {
        //given
        String firstMeeting = "2011-03-17 12:34:56 EMP002\n" +
                              "2011-03-21 16:00 1\n";

        String secondMeeting = "2011-03-16 12:34:56 EMP002\n" +
                               "2011-03-21 15:00 2\n";

        //when
        String calendar = bookingCalendarGenerator.generate(TestData.INPUT_FIRST_LINE + firstMeeting + secondMeeting);

        //then
        assertEquals("2011-03-21\n15:00 17:00 EMP002\n", calendar);
    }

    @Test
    public void orderingOfBookingSubmissionShouldNotAffectOutcome() {
        //given
        List<String> shuffledBookings = newArrayList(TestData.FIRST_BOOKING, TestData.SECOND_BOOKING,
                TestData.THIRD_BOOKING, TestData.FOURTH_BOOKING, TestData.FIFTH_BOOKING);
        shuffle(shuffledBookings);
        String inputBookingLines = Joiner.on("\n").join(shuffledBookings);

        //when
        String calendar = bookingCalendarGenerator.generate(TestData.INPUT_FIRST_LINE + inputBookingLines);

        //then
        assertEquals(TestData.CORRECT_OUTPUT, calendar);
    }   

That's pretty much all. But what if we get some rubbish as the input. Or if we get an empty string? Let's design for that:

    @Test(expected = IllegalArgumentException.class)
    public void rubbishInputDataShouldEndWithException() {
        //when
        String calendar = bookingCalendarGenerator.generate("rubbish");

        //then exception is thrown
    }

    @Test(expected = IllegalArgumentException.class)
    public void emptyInputDataShouldEndWithException() {
        //when
        String calendar = bookingCalendarGenerator.generate("");

        //then exception is thrown
    }

IllegalArgumentException is fair enough. We don't need to handle it in any more fancy way. We are done for now. Let's finally write the class under the test: BookingCalendarGenerator.

And so we do. And it comes out, that the whole thing is a little big for a single method. So we use the power of Extract Method pattern. We group code fragments into different methods. We group methods and data those operate on, into classes. We use the power of Object Oriented programming, we use Single Responsibility Principle, we use composition (or decomposition, to be precise) and we end up with a package like this:



We have one public class, and several package-scope classes. Those package scope classes clearly belong to the public one. Here's a class diagram for clarity:



Those aren't stupid data-objects. Those are full fledged classes. With behavior, responsibility, encapsulation. And here's a thing that may come to our Test Driven minds: we have no tests for those classes. We have only for the public class. That's bad, right? Having no tests must be bad. Very bad. Right?

Wrong.

We do have tests. We fire up our code coverage tool and we see: 100% methods and classes. 95% lines. Not bad (I'll get to that 5% of uncertainty in the next post).



But we have only a single unit test class. Is that good?

Well, let me put some emphasis, to point the answer out:

It's a UNIT test. It's called a UNIT test for a reason!


The unit does not have to be a single class. The unit does not have to be a single package. The unit is up to you to decide. It's a general name, because your sanity, your common sense, should tell you where to stop.

So we have six classes as a unit, what's the big deal? How about if somebody wants to use one of those classes, apart from the rest. He would have no tests for it, right?

Wrong. Those classes are package-scope, apart from the one that's actually called in the test. This package-scope thing tells you: “Back off. Don't touch me, I belong to this package. Don't try to use me separately, I was design to be here!”.

So yeah, if a programmer takes one of those out, or makes it public, he would probably know, that all the guarantees are voided. Write your own tests, man.

How about if somebody wants to add some behavior to one of those classes, I've been asked. How would he know he's not breaking something?

Well, he would start with a test, right? It's TDD, right? If you have a change of requirements, you code this change as a test, and then, and only then, you start messing with the code. So you are safe and secure.

I see people writing test-per-class blindly, without giving any thought to it, and it makes me cry. I do a lot of pair-programming lately, and you know what I've found? Java programmers in general do not use package-scope. Java programmers in general do not know, that protected means: for me, all my descendants, and EVERYONE in the same package. That's right, protected is more than package-scope, not less a single bit. So if Java programmers do not know what a package-scope really is, and that's, contrary to Groovy, is the default, how could they understand what a Unit is?

How high can I get?

Now here's an interesting thought: if we can have a single test for a package, we could have a single test for a package tree. You know, something like this:



We all know that packages in Java are not really tree-like, that the only thing those have with the directory structure is by a very old convention, and we know that the directory structure is there only to solve the collision-of-names problem, but nevertheless, we tend to use packages, like if the name.after.the.dot had some meaning. Like if we could hide one package inside another. Or build layers of lasagne with them.

So is it O.K. to have a single test class for a tree of packages?

Yes it is.

But if so, where is the end to that? Can we go all the way up in the package tree, to the entry point of our application? Those... those would be integration tests, or functional tests, perhaps. Could we do that? Would that be good?

The answer is: it would. In a perfect world, it would be just fine. In our shitty, hanging-on-the-edge-of-a-knife, world, it would be insane. Why? Because functional, end-to-end test are slow. So slow. So horribly slow, that it makes you wanna throw them away and go some place where you would not have to be always waiting for something. A place of total creativity, constant feedback, and lightning fast safety.

And you're back to unit testing.

There are even some more reasons. One being, that it's hard to test all flows of the application, testing it end-to-end. You should probably do that for all the major flows, but what about errors, bad connections, all those tricky logic parts that may throw up at one point or another. No, sometimes it would be just too hard, to set up the environment for integration test like that, so you end up testing it with unit tests anyway.

The second reason is, that though functional tests do not pour concrete over your code, do not inhibit your creativity by repeating you algorithm in the test case, they also give no safety for refactoring. When you had a package with a single public class, it was quite obvious what someone can safely do, and what he cannot. When you have something enclosed in a library, or a plugin, it's still obvious. But if you have thousands of public classes, and you are implementing a new feature, you are probably going to use some of them, and you would like to know that they are fine.

So, no, in our world, it doesn't make sense to go with functional tests only. Sorry. But it also doesn't make sense to create a test per class. It's called the UNIT test, for a reason. Use that.

September 16, 2012

Test Driven Traps, part 1

Have you ever been in a situation, where a simple change of code, broke a few hundred tests? Have you ever had the idea that tests slow you down, inhibit your creativity, make you afraid to change the code. If you had, it means you've entered the Dungeon-of-very-bad-tests, the world of things that should not be.

I've been there. I've built one myself. And it collapsed killing me in the process. I've learned my lesson. So here is the story of a dead man. Learn from my faults or be doomed to repeat them.

The story


Test Driven Development, like all good games in the world, is simple to learn, hard to master. I've started in 2005, when a brilliant guy named Piotr Szarwas, gave me the book “Test Driven Development: By Example” (Kent Beck), and one task: creating a framework.

These were the old times, when the technology we were using had no frameworks at all, and we wanted a cool one, like Spring, with Inversion-of-Control, Object-Relational Mapping, Model-View-Controller and all the good things we knew about. And so we created a framework. Then we built a Content Management System on top of it. Then we created a bunch of dedicated applications for different clients, Internet shops and what-not, on top of those two. We were doing good. We had 3000+ tests for the framework, 3000+ tests for the CMS, and another few thousand for every dedicated application. We were looking at our work, and we were happy, safe, secure. These were good times.

And then, as our code base grew, we came to the point, where a simple anemic model we had, was not good enough anymore. I had not read the other important book of that time: “Domain Driven Design”, you see. I didn't know yet, that you can only get so far with an anemic model.

But we were safe. We had tons of tests. We could change anything.

Or so I thought.

I spent a week trying to introduce some changes in the architecture. Simple things really: moving methods around, switching collaborators, such things. Only to be overwhelmed by the number of tests I had to fix. That was TDD, I started my change with writing a test, and when I was finally done with the code under the test, I'd find another few hundred tests completely broken by my change. And when I got them fixed, introducing some more changes in the process, I'd find another few thousand broken. That was a butterfly effect, a chain reaction caused by a very small change.

It took me a week to figure out, that I'm not even half done in here. The refactoring had no visible end. And at no point my code base was stable, deployment-ready. I had my branch in the repository, one I've renamed "Lasciate ogne speranza, voi ch'intrate".

We had tons and tons of tests. Of very bad tests. Tests that would pour concrete over our code, so that we could do nothing.



The only real options were: either to leave it be, or delete all tests, and write everything from scratch again. I didn't want to work with the code if we were to go for the first option, and the management would not find financial rationale for the second. So I quit.

That was the Dungeon I built, only to find myself defeated by its monsters.

I went back to the book, and found everything I did wrong in there. Outlined. Marked out. How could I skip that? How could I not notice? Turns out, sometimes, you need to be of age and experience, to truly understand the things you learn.

Even the best of tools, when used poorly, can turn against you. And the easier the tool, the easier it seems to use it, the easier it is to fall into the trap of I-know-how-it-works thinking. And then BAM! You're gone.


The truth


Test Driven Development and tests, are two completely different things. Tests are only a byproduct of TDD, nothing more. What is the point of TDD? What does TDD brings? Why do we do TDD?

Because of three, and only those three reasons.

1. To find the best design, by putting ourselves into the user's shoes.

By starting with “how do I want to use it” thinking, we discover the most useful and friendly design. Always good, quite often that's the best design out there. Otherwise, what we get is this:



And you don't want that.

2. To manage our fear.

It takes balls, to make a ground change in a large code-base without tests, and say “it's done” without introducing bugs in the process, doesn't it? Well, the truth is, if you say “it's done”, most of the time you are either ignorant, reckless, or just plain stupid. It's like with concurrency: everybody knows it, nobody can do it well.

Smart people are scared of such changes. Unless they have good tests, with high code coverage.

TDD allows to manage our fears, by giving us proof, that things work as they should. TDD gives us safety

3. To have fast feedback.

How long can you code, without running the app? How long can you code without knowing whether your code works as you think it should?

Feedback in tests is important. Less so for frontend programming, where you can just run the shit up, and see for yourselves. More for coding in the backend. Even more, if your technology stack requires compilation, deployment, and starting up.

Time is money, and I'd rather earn it, than wait for the deployment and click through my changes each time I make them.

And that's it. There are no more reasons for TDD whatsoever. We want Good Design, Safety, and Feedback. Good tests are those, which give us that.

Bad tests?

All the other tests are bad.

The bad practice


So how does a typical, bad test, look like? The one I see over and over, in close to every project, created by somebody who has yet to learn how NOT to build an ugly dungeon, how not to pour concrete over your code base. The one I'd write myself in 2005.

This will be a Spock sample, written in groovy, testing a Grails controller. But don't worry if you don't know those technologies. I bet you'll understand what's going on in there without problems. Yes, it's that simple. I'll explain all the not-so-obvious parts.

def "should show outlet"() {
  given:
    def outlet = OutletFactory.createAndSaveOutlet(merchant: merchant)
    injectParamsToController(id: outlet.id)
  when:
    controller.show()
  then:
    response.redirectUrl == null
}

So we have a controller. It's an outlet controller. And we have a test. What's wrong with this test?

The name of the test is “should show outlet”. What should a test with such a name check? Whether we show the outlet, right? And what does it check? Whether we are redirected. Brilliant? Useless.

It's simple, but I see it all around. People forget, that we need to:

VERIFY THE RIGHT THING

I bet that test was written after the code. Not in test-first fashion.

But verifying the right thing is not enough. Let's have another example. Same controller, different expectation. The name is: "should create outlet insert command with valid params with new account"

Quite complex, isn't it? If you need an explanation, the name is wrong. But you don't know the domain, so let me put some light on it: when we give the controller good parameters, we want it to create a new OutletInsertCommand, and the account of that one, should be new.

The name doesn't say what 'new' is, but we should be able to see it in the code.

Have a look at the test:

def "should create outlet insert command with valid params with new account"() {
  given:
    def defaultParams = OutletFactory.validOutletParams
    defaultParams.remove('mobileMoneyAccountNumber')
    defaultParams.remove('accountType')
    defaultParams.put('merchant.id', merchant.id)
    controller.params.putAll(defaultParams)
  when:
    controller.save()
  then:
    1 * securityServiceMock.getCurrentlyLoggedUser() >> user
    1 * commandNotificationServiceMock.notifyAccepters(_)
    0 * _._
    Outlet.count() == 0
    OutletInsertCommand.count() == 1
    def savedCommand = OutletInsertCommand.get(1)
    savedCommand.mobileMoneyAccountNumber == '1000000000000'
    savedCommand.accountType == CyclosAccountType.NOT_AGENT
    controller.flash.message != null
    response.redirectedUrl == '/outlet/list'
}

If you are new to Spock: n*mock.whatever(), means that the method “whatever” of the mock object, should be called exactly n times. No more no less. The underscore “_” means “everything” or “anything”. And the >> sign, instructs the test framework to return the right side argument when the method is called.

So what's wrong with this test? Pretty much everything. Let's go from the start of “then” part, mercifully skipping the oververbose set-up in the “given”.

1 * securityServiceMock.getCurrentlyLoggedUser() >> user

The first line verifies whether some security service was asked for a logged user, and returns the user. And it was asked EXACTLY one time. No more, no less.

Wait, what? How come we have a security service in here? The name of the test doesn't say anything about security or users, why do we check it?

Well, it's the first mistake. This part is not, what we want to verify. This is probably required by the controller, but it only means it should be in the “given”. And it should not verify that it's called “exactly once”. It's a stub for God's sake. The user is either logged in or not. There is no sense in making him “logged in, but you can ask only once”.

Then, there is the second line.

1 * commandNotificationServiceMock.notifyAccepters(_)

It verifies that some notification service is called exactly once. And it may be ok, the business logic may require that, but then... why is it not stated clearly in the name of the test? Ah, I know, the name would be too long. Well, that's also a suggestion. You need to make another test, something like “should notify about newly created outlet insert command”.

And then, it's the third line.

0 * _._

My favorite one. If the code is Han Solo, this line is Jabba the Hut. It wants Hans Solo frozen in solid concrete. Or dead. Or both.

This line, if you haven't deducted yet, is “You shall not make any other interactions with any mock, or stubs, or anything, Amen!”.

That's the most stupid thing I've seen in a while. Why would a sane programmer ever put it here? That's beyond my imagination.

No it isn't. Been there, done that. The reason why a programmer would use such a thing is to make sure, that he covered all the interactions. That he didn't forget about anything. Tests are good, what's wrong in having more good?

He forgot about sanity. That line is stupid, and it will have it's vengeance. It will bite you in the ass, some day. And while it may be small, because there are hundreds of lines like this, some day you gonna get bitten pretty well. You may as well not survive.

And then, another line.

Outlet.count() == 0

This verifies whether we don't have any outlets in the database. Do you know why? You don't. I do. I do, because I know the business logic of this domain. You don't because this tests sucks at informing you, what it should.

Then there is the part, that actually makes sense.

    OutletInsertCommand.count() == 1
    def savedCommand = OutletInsertCommand.get(1)
    savedCommand.mobileMoneyAccountNumber == '1000000000000'
    savedCommand.accountType == CyclosAccountType.NOT_AGENT

We expect the object we've created in the database, and then we verify whether it's account is “new”. And we know, that the “new” means a specific account number and type. Though it screams for being extracted into another method.

And then...

    controller.flash.message != null
    response.redirectedUrl == '/outlet/list'

Then we have some flash message not set. And a redirection. And I ask God, why the hell are we testing this? Not because the name of the test says so, that's for sure. The truth is, that looking at the test, I can recreate the method under test, line by line.

Isn't it brilliant? This test represents every single line of a not so simple method. But try to change the method, try to change a single line, and you have big chance to blow this thing up. And when those kinds of tests are in the hundreds, you have concrete all over you code. You'll be able to refactor nothing.

So here's another lesson. It's not enough to verify the right thing. You need to

VERIFY ONLY THE RIGHT THING.

Never ever verify the algorithm of the method step by step. Verify the outcomes of the algorithm. You should be free to change the method, as long as the outcome, the real thing you expect, is not changed.

Imagine a sorting problem. Would you verify it's internal algorithm? What for? It's got to work and it's got to work well. Remember, you want good design and security. Apart from this, it should be free to change. Your tests should not stay in the way.

Now for another horrible example.

@Unroll("test merchant constraints field #field for #error")
def "test merchant all constraints"() {
  when:
    def obj = new Merchant((field): val)

  then:
    validateConstraints(obj, field, error)

  where:
    field                     | val                                    | error
    'name'                    | null                                   | 'nullable'
    'name'                    | ''                                     | 'blank'
    'name'                    | 'ABC'                                  | 'valid'
    'contactInfo'             | null                                   | 'nullable'
    'contactInfo'             | new ContactInfo()                      | 'validator'
    'contactInfo'             | ContactInfoFactory.createContactInfo() | 'valid'
    'businessSegment'         | null                                   | 'nullable'
    'businessSegment'         | new MerchantBusinessSegment()          | 'valid'
    'finacleAccountNumber'    | null                                   | 'nullable'
    'finacleAccountNumber'    | ''                                     | 'blank'
    'finacleAccountNumber'    | 'ABC'                                  | 'valid'
    'principalContactPerson'  | null                                   | 'nullable'
    'principalContactPerson'  | ''                                     | 'blank'
    'principalContactPerson'  | 'ABC'                                  | 'valid'
    'principalContactInfo'    | null                                   | 'nullable'
    'principalContactInfo'    | new ContactInfo()                      | 'validator'
    'principalContactInfo'    | ContactInfoFactory.createContactInfo() | 'valid'
    'feeCalculator'           | null                                   | 'nullable'
    'feeCalculator'           | new FixedFeeCalculator(value: 0)       | 'valid'
    'chain'                   | null                                   | 'nullable'
    'chain'                   | new Chain()                            | 'valid'
    'customerWhiteListEnable' | null                                   | 'nullable'
    'customerWhiteListEnable' | true                                   | 'valid'
    'enabled'                 | null                                   | 'nullable'
    'enabled'                 | true                                   | 'valid'
}

Do you understand what's going on? If you haven't seen it before, you may very well not. The “where” part, is a beautiful Spock solution for parametrized tests. The headers of those columns are the names of variables, used BEFORE, in the first line. It's sort of a declaration after the usage. The test is going to be fired many times, once for for each line in the “where” part. And it's all possible thanks to Groovy's Abstract Syntaxt Tree Transofrmation. We are talking about interpreting and changing the code during the compilation. Cool stuff.

So what this test is doing?

Nothing.

Let me show you the code under test.

static constraints = {
  name(blank: false)
  contactInfo(nullable: false, validator: { it?.validate() })
  businessSegment(nullable: false)
  finacleAccountNumber(blank: false)
  principalContactPerson(blank: false)
  principalContactInfo(nullable: false, validator: { it?.validate() })
  feeCalculator(nullable: false)
  customerWhiteListEnable(nullable: false)
}

This static closure, is telling Grails, what kind of validation we expect on the object and database level. In Java, these would most probably be annotations.

And you do not test annotations. You also do not test static fields. Or closures without any sensible code, without any behavior. And you don't test whether the framework below (Grails/GORM in here) works the way it works.

Oh, you may test that for the first time you are using it. Just because you want to know how and if it works. You want to be safe, after all. But then, you should probably delete this test, and for sure, not repeat it for every single domain class out there.

This test doesn't event verify that, by the way. Because it's a unit test, working on a mock of a database. It's not testing the real GORM (Groovy Object-Relational Mapping, an adapter on top of Hibernate). It's testing the mock of the real GORM.

Yeah, it's that stupid.

So if TDD gives us safety, design and feedback, what does this test provide? Absolutely nothing. So why did the programmer put it here? Because his brain says: tests are good. More tests are better.

Well, I've got news for you. Every single test which does not provide us safety and good design is bad. Period. Those which provide only feedback, should be thrown away the moment you stop refactoring your code under the test.

So here's my lesson number three:

PROVIDE SAFETY AND GOOD DESIGN, OR BE GONE.

That was the example of things gone wrong. What should we do about it?

The answer: delete it.

But I yet have to see a programmer who removes his tests. Even so shitty as this one. We feel very personal about our code, I guess. So in case you are hesitating, let me remind you what Kent Beck wrote in his book about TDD:

The first criterion for your tests is confidence. Never delete a test if it reduces your confidence in the behavior of the system.

The second criterion is communication. If you have two tests that exercise the same path through the code, but they speak to different scenarios for a readers, leave them alone.

[Kent Beck, Test Driven Development: by Example]

Now you know, it's safe to delete it.

So much for today. I have some good examples to show, some more stories to tell, so stay tuned for part 2.