Cassandra Teams

Cassandra (public domain)

In Greek mythology, Cassandra was cursed: her prophecies would be right, but nobody would believe her. Unfortunately, software teams are sometimes in the same position: they identify an upcoming problem, but can’t convince others that it needs to be addressed.

The following examples are from three different teams.

Team A: Hitting the Disk Space Wall

Before scalable solutions were common, this team supported a database that required a fair bit of space. It was fine when it started, but was growing, and headed towards filling up the disk.

The team reported this through normal channels: “Room for 8 months growth, but the ops team needs a 3-month lead time to acquire, harden, and install the needed space.” This company used “stoplight” reports – so it started green, and over time went yellow, then red.

A manager two or three levels up would have to authorize the upgrade, but the manager ignored the reports and would not sign off.

The team’s reports became more and more urgent: “We’re running out of time to upgrade” then “The disk is almost full and we still haven’t upgraded – we expect to run out this month.”

The disk got full. The manager finally signed off, begged for emergency support from the ops group, and got the system reactivated. The team did a bit of extra work and recovered missing transactions.

Why?

Why did this have to happen? I don’t know. It would have taken at most half an hour for the manager to address the problem early on, but he waited until it was a crisis. A $100 problem became a $20K problem.

I can speculate:

  • The manager [at least felt they] had more important things to deal with
  • The manager didn’t believe the growth rate
  • The team didn’t communicate clearly (though I find this one hard to credit – red lights were usually escalated)
  • The manager wanted a crisis
  • …?

But speculation is just that. I can’t judge the decision without knowing. But I can’t help thinking that a 10-second message could have prevented a lot of angst: “I understand – thanks for reporting that, and keep reporting it until it’s fixed, but we can’t address it now for reasons I can’t go into.”

Team B: Internal Quality Goes External

This team was working on an application that had been valuable for years, but it was becoming uglier and uglier the behind the scenes.

The software relied on old frameworks and old tools. Two key frameworks were no longer supported. The compiler was outdated, and the debugger no longer worked on newer systems. The code relied on many deprecated calls to the operating system.

The product manager repeatedly faced a choice: new features, or address internal quality. They consistently chose new features, forcing the development team to “make do.”

The deprecated calls were the final trigger. The operating system had been promising to eliminate support for several releases, removing the calls from beta releases but restoring them in the final release, until they didn’t.

This put the team in an unfortunate place – the software could only run on the previous operating system release. New machines could only run on the new OS, so the software couldn’t run on new machines at all. The team was forced to ship that way for months while they addressed the issue.

Why?

I think it came down to the product manager’s preference for the concreteness of features over the abstractness and invisibility of maintenance. The developers had pushed (hard) for the needs of the system itself, and the costs of not upgrading things, but they couldn’t break through to the decision-maker on this.

Limited sales for a noticeable part of the year did get the product manager’s attention:) But by then, the cost and urgency was much higher.

Team C: Omniscient QA

The final team’s working approach was that developers implemented every feature in a separate branch, then left it to QA to integrate the right branches and test the end result. (I should use “team” in quotes – developers worked as individuals.)

The QA team had no development background, and no training or background in configuration management either.

QA consistently struggled to figure out which branches were current, how to handle conflicts, or even how to figure out which developers to talk to about the problems. The QA people complained, but their problem didn’t get addressed. Plus, the company was having trouble retaining QA people, and that made things worse.

Why? As best as I can tell, this was a small shop that had grown to medium-sized, and had evolved a team-less way of working where developers were only expected to spit out code, and never had to integrate or test anything. Management had never tried to make a real team with shared goals.

Experience and tools were a contributing factor. The managers, developers, and QA people all had limited experience, not much familiarity with alternatives. Their configuration management tool was configured in a way that made integration painful, and nobody had experience with alternate models that could make it better.

In addition, I don’t think management could “hear” QA’s concerns to take them seriously. Management was more concerned about the expensive and more senior developers.

(I know the company finally moved to a different model, but I never heard whether things got better.)

Avoiding Cassandra

These teams all felt like Cassandra: predicting something true, consequential, and avoidable, but seeing it happen anyway.

It’s hard to learn to report things in a way that can be heard. It’s a challenge to face what’s true; tell it many times, in many ways; check what was heard; etc.

But talking isn’t enough. Sometimes the listener doesn’t want to hear, or doesn’t know what to do, or doesn’t want to do it.

Management is tricky. It’s hard to see the warning signs of problems, but it certainly doesn’t pay to ignore what the team members are saying. Poor communication back to the team makes it worse: teams need to know decisions and the reasons for them.

It’s not just individuals either: some companies have cultures that discourage reporting problems, where “Don’t shoot the messenger” is ignored. But ignoring problems isn’t safe in the long run.

It’s hard to be Cassandra. If it happens, look deeply at yourself, at your team, your managers, and your organization – what caused it, and can you prevent it happening again?