How TechCrunch is like the Iliad

The drive for social status created the worst, most important part of the Iliad. Now it’s filling up investment announcements.

Picture by Mikuláš Prokop

My fancy liberal arts school hazed me, like it does all students: I had to read The Iliad and The Odyssey.

We did more than read. We wrote. We talked. We dissected, for meaning and history. Me, and a dozen other kids I’d just met. It was school, after all.

The Odyssey is great. A proper story. Easy to read, and easy to see why it stuck around.

The Iliad is… not. It’s hard to read. Everyone in it is kind of a jerk. The biggest jerks are the biggest stars. The entire story rotates around a woman — Helen — without giving her agency. Maybe she didn’t want to go home?

For all its difficulty, it’s the more important book. Studying it taught me a lot.

Founders could learn from it even today.

In a hard book to read, one section is by far the hardest, weirdest, and seemingly most pointless. We called it the Parade of Ships, but Wikipedia uses the less glamorous “Catalogue of Ships.” It is exactly what it sounds like: A description of a lot of ships. More than a thousand. You know. Because Helen’s face was so beautiful it launched a thousand ships.

This gives us the millihelen: Enough beauty to launch one ship.

The Catalogue is scintillating:

First the Boeotians, led by Peneleos, Leitus, Arcesilaus, Prothoenor and Clonius; they came from Hyrie and stony Aulis, from Schoenus, Scolus and high-ridged Eteonus; from Thespeia and Graea, and spacious Mycalessus; from the villages of Harma, Eilesium and Erythrae; from Eleon, Hyle, Peteon, Ocalea and Medeon’s stronghold; from Copae, Eutresis, and dove-haunted Thisbe; from Coroneia and grassy Haliartus, Plataea and Glisas, and the great citadel of Thebes; from sacred Onchestus, Poseidon’s bright grove; from vine-rich Arne, Mideia, holy Nisa and coastal Anthedon. They captained fifty ships, each with a hundred and twenty young men.

That’s just the first paragraph! Every time I read this I delight in its nothingness. Now that I don’t have an essay due.

This litany, 2,500 years later, wakes our deepest fears about dusty old books. You’re probably feeling pretty good about skipping it. Yet it drove people to tell this story again and again. Being in it mattered. To your family. To your village. To everyone in Greece. Without the Catalogue of Ships, The Iliad might not survive.

Retelling a great story would always draw a crowd. (Remember: Both of these books were told in oral form long before they were ever written down.) But giving every listener a chance to brag or shrink because of the behavior of one of their ancestors… jackpot!

I was reading a funding announcement recently, and was struck by this:

Investors in the $10.1 million round for the company were led by ArcTern Ventures and joined by new backers Capricorn Investment Group, Incite Ventures. Previous financiers in the company included Wireframe Ventures, Congruent Ventures, Ulu Ventures, Energy Foundry, Hardware Club, 1/0 Capital, and Wells Fargo Strategic Capital […].

That’s a long list. Especially so for a company likely raising only its second round of funding (based on the amount).

Then it hit me:

These investors are listed for the exact same reason the ships are catalogued in The Iliad!

The Greek warriors were fighting for timé, a kind of honor and fame. The stories helped them pass it on to their descendants.

Investors are fighting for the modern equivalent (named, ironically, after a different, also unpleasant Greek story). Now it’s earned in investor announcements on sites like TechCrunch, not ship descriptions in stories told in the town square.

This is more funny than bad. There’s value in being able to track down which investors work with what kinds of companies. More openness is a great trade-off for a little exposure for the investors.

Still. Seeing the parallel was a delightful lift to the morning. I have a science degree but a liberal arts education. I love what the combination has done for my career. It’s nice to have it be a source of humor, too.

The parallel provides a lesson for founders:

The catalogue of ships describes a thousand vessels, and far more people. But most of them were never mentioned again in the story.

Don’t look for those involved in the investment. Look for who helped the company succeed. Who wrote the first check.

P-Hacking in Startups

Science has a problem.

It’s kind of broken.

Well. Not all of it. Mostly the social sciences and medicine. And I don’t just mean the fact that they consider Freud canon.

It started with a trickle. A retracted paper here. A study that couldn’t be repeated, there.

Then someone decided to get systematic. It opened the floodgates. A study in 2016 showed that 70% of scientists had failed to replicate another scientist’s work, and fully half had failed to reproduce their own work.

Reproducibility is fundamental to the scientific method — it’s supposed to be a study of the natural world, which doesn’t change all that often — so what does its absence mean? Are we incompetent? Can we trust anything? Do we know anything?

The high failure rate of venture-backed startups is its own kind of replication crisis: “How could my company fail? I followed the growth-hacking, blitz-scaling advice from the founders who made it big!” I don’t mean to give blogs and podcasts the weight of peer-reviewed science. But our industry seems to trust them as if they deserve it.

What does it mean if a founder can’t get similar results when following the practices of another?

Science has begun to heal itself. It’s time for startups to go through their own reckoning. Their methods are failing most people. It’s time to learn why and how to get better.

What’s wrong with science?

The crisis in science has multiple, interconnected causes. A lot of them come down to taking techniques from simpler systems and applying them to the far more complex study of humans. The practices useful for studying minerals also worked great on metals, but with people? Not so much.

One of the most famous examples of these studies that fizzle under scrutiny is the marshmallow experiment, conducted at Stanford University in 1972 on the children of students enrolled there. It produced original, important conclusions on the ability of children to endure delayed gratification, and later studies showed that ability was highly correlated to success later in life. Suddenly we’ve got a new tool for understanding how successful you’ll be at a very young age.

Or… maybe not. Further studies showed the original work was actually just exposing the socioeconomic background of the kids. If your family is well off, you are comfortable with delayed gratification and, just coincidentally, are also likely to be well off when you’re older. If you’re from a poor family, delayed gratification is harder to accept and, huh, you’re also more likely to be poor than those kids of rich parents.

Once someone reran the study with a larger group of kids (900 instead of 90) and controlled for socioeconomic background… the effect largely disappeared. It’s not all that surprising that kids with no food insecurity are better at delaying gratification and also will be more successful in life. It certainly doesn’t grab the headlines like announcing that kids who can wait five minutes to eat a marshmallow will earn more money than those who can’t. No HBR article for that one.

It’s been almost fifty years since this study was published. That’s five decades of science based on flawed work, five decades of science that has to be unwound and retried. The longer these mistakes last, the more expensive they are to fix. And like that HBR article above, many conclusions never get retracted.

One particular “technique” has helped trigger the crisis in science. Many a growth-hacking product manager has fallen into the same trap. They can only be rescued through discipline and rigor.

The how and why of P-hacking

Abusing data is a sure way to get bad results. Unlike startups, scientists rarely just make up their data. They make more subtle mistakes, like P-Hacking. This probably sounds pretty cool, but it’s actually a common form of data misuse. Wikipedia describes it this way:

…performing many statistical tests on the data and only reporting those that come back with significant results.

It works like this:

A researcher comes up with an idea for a study. He collects a bunch of data, runs the experiment and… no dice. The idea didn’t pan out.

Hmm. “I have all this data. I can’t just throw it away.”

So he starts slicing the data looking for something that stands out. After a while, sure enough, he finds some correlation that is strong enough to stand up — usually its P-value is under 0.05, and thus considered statistically significant. He publishes this in a paper and looks like a genius. It gets big exposure in the press. Journalists love weird and surprising science. They can report on it without understanding it.

But no one can reproduce the work. The paper gets retracted. He gets uninvited from the big conferences. (Don’t worry. The papers never follow up and publish the retraction.)

What went wrong?

He left out one key piece: How he got the data.

Let’s say he thinks breastfed kids are healthier than bottle-fed kids. He sets up a study that tries to isolate just these variables, which means he wants his population to be reasonably homogenous (similar quality of life, similar locations, etc). Put simply, the difference being researched should be the only material one in the population (unlike in the marshmallow experiment).

But then he looks at the data and — like most of these studies — find there’s no significant difference in health outcomes between breastfed and bottle-fed kids.

He could just toss the data. But, well, he’s already paid to collect it. He’s got all these graduate students who are working nearly for free. He might as well try something. So he puts a student or two on trying to find useful results.

They nearly always do, but… that success kills his work. All those controls to make it work for his original experiment fatally bias it for other studies.

Let’s say he discovers that the study participants who were bottle-fed tended to move around a lot more than people who were breastfed. He concludes, oh, wow, getting bottle-fed causes you to hate your parents and move away. (Yes, this is exactly the kind of headline that would get picked for a result like this.)

He has not proven that. All he has shown is in this particular — probably small, and certainly narrow — data set, that happens to be the case.

He should throw away all existing data. Start from scratch controlling for everything except this new variable under test. Only then can you look for correlations between how a baby was fed and mobility.

But he was too lazy or scared to do that. He found a match in that smaller, biased data set, and then published the results without admitting the problems in either his data or his methods. A few decades ago he would have gotten away with it: A big splashy result on publication, and then everyone just assuming this was true, with no attempt to reproduce and no real questioning of the result.

Today, no chance. Science has developed defenses against this kind of malpractice.

Preregistration of experiments is a key tool.

Researchers register with a central database that they are going to study the health of breastfed vs. bottle-fed babies. When they get results, they point to that registration and say, see, this is what led to my data collection.

If they then wanted to publish some other study, people would say, no, you didn’t pre-register this, which makes us suspect you’re p-hacking, so we’re going to do a deep dive on how you got your data. On second thought, we’re just going to reject your paper. Come back when the results hold on a clean dataset.

From social science to startups

This might not initially seem to have anything to do with startups. Product managers and marketers aren’t commissioning studies — and they certainly aren’t controlling for variables!

Hmm. If you look at it a bit funny… Every data-backed marketing campaign and feature launch is an experiment.

Let’s build an analogous example.

A product manager builds a new feature, and because he’s growth hacking, he has lots of telemetry to tell him exactly how people are using it.

His theory is that people will use this new feature in some specific way. But he builds it, ships it, and observes, well, hmm, no, almost no one is using it. It’s a bust. I’m sure you’ve never worked on a project like this, but trust me, it happens.

Except… hey, there’s this small group that is using it, and widely. He looks into it more closely, and realizes they’re using it at 10x the rate people use the rest of the product. So he changes plans, and he rebuilds the feature around the specific thing those few people were doing with it.

Wait, what? No one uses that feature, either, and even worse, the people who originally used it aren’t any more, now that it’s focused on their actual usage!

What went wrong?

You got caught p-hacking

The data set from his failed feature is bad data. He got the most important result: This feature did not work well for his users. He wasn’t willing to let go of failed work. Just like the scientists, he went looking for some other way to reuse it. And instead of developing new hypotheses and running new experiments, he took his biased data and tried to find new correlations cheaply.

Unfortunately for him, he did.

But when he published the new feature, he is faced with a harsh truth: Those few people who were using the feature in unexpected ways don’t look like the rest of his users. A new feature built for that purpose doesn’t help everyone else. And because he relied on data to make his decisions instead of talking to actual users, he learned too late that those unrepresentative users were doing something even more weird. His simplified feature actually removed that weirdness in the name of simplicity that everyone can use.

So now he’s two features in and nothing to show for it. So much for growth-hacking.

How do I fix it?

The solution is very similar to what science has done.

Connect your data to experiments. With discipline. You must get new, clean data for each new test. I know this is anathema to modern data-oriented product management. But it’s the only real way to trust your results.

That word discipline is key. You don’t need to build some international central registry. Whatever your mission statement says, you’re not really saving the world, and you’re not actually doing science. You’re just trying to build a product people love. What you need is rigorous internal practices, and to hold each other accountable so you can’t cheat at statistics.

Unfortunately, this requires you let go of one of Silicon Valley’s most cherished and wrong beliefs.

No, you don’t learn more from failure than success.

Experiments fail. This might be an important part of the process, but it’s not very valuable. Congratulations. Of all the possible ways you could fail, you’ve discovered one of them. Don’t let it go to your head.

Don’t work too hard to salvage that failure. You’re p-hacking, and just making it worse. Yes, obviously, you get personal lessons. You might be lucky enough to learn something that triggers your next experiment. But you have to go run that separately.

You can’t build on the detritus of failure.

So my data is now worthless?!

Of course not. I still rely on data for all kinds of problems. One of the great things about building a company today is how easily you can get information at scale.

But never let yourself forget that your data is heavily biased, especially by how it was collected. One of my favorite examples is from when YouTube dramatically reduced response time. Their average response times went up! Suddenly people with much worse connectivity found it worth using, making the average worse. The developers thought they were helping existing users, but the biggest impact was in creating new ones.

You have to recognize your job isn’t to find some way to make the data valuable. Your job is to make high-quality decisions. Use data when you can. If you don’t have data, go get it.

But the job of the data is to inform you, not give you answers. Use it to hone your instinct, to improve your decision-making. When something doesn’t add up, go talk to the actual humans who are the source of the data. And even, spend some time with people not represented in it.

If you’re working at a software startup, you’re not doing science (even if, like me, you have a science degree). But you should still take advantage of its discipline and practices.

Don’t stop at protecting yourself from P-hacking. One founder’s success might be hard to replicate for many reasons. Gain what lessons you can. But don’t blindly trust others’ story of their work.

Because failure on your part won’t be paired with the retraction of a Nature paper, it’ll be an announcement of layoffs in TechCrunch.