Why You Aren’t Learning As Much As You Could From Your Experiments

Why You Aren’t Learning As Much As You Could From Your Experiments

I love that A/B testing has become so popular. It’s an important tool in our toolbox. But it frustrates me that we use it so poorly.

Most teams make one of the following two mistakes: They either gate releases with an A/B test trying to understand whether or not the feature had the intended impact or they test variables blindly hoping to stumble upon a better variation.

The first isn’t necessarily a mistake. An A/B test is an effective way to tell whether or not a feature had the intended impact.

However, it’s not the most effective way to learn—all the learning happens after you’ve built the feature. I prefer to do my learning before any code has been written.

The second mistake is much more insidious. It’s equivalent to throwing spaghetti at the wall and hoping something sticks.

Even The Best of Us Get It Wrong

The problem is evident in the language that we use. We talk about “validating our ideas.”

We start by taking our favorite idea and we run experiments to prove that the idea will work.

We look for confirming evidence. We set ourselves up to see what we want to see and to never learn anything new.

We are running experiments like 19th century scientists. We should know better.

Few of us start with good hypotheses, our experiment design skills are underwhelming, and we rarely decide up front how we’ll act on the data we do collect.

But this post isn’t about the shortcomings of our experimental methods. I’ve already written volumes on that.

I want to focus instead on an even bigger problem. One that even those of us who have invested in our experimentation skills still make.

Turning to a 100-Year-Old Book for Advice

John Dewey's How We Think

To illustrate this problem, I want to talk about the work of a landmark educational philosopher, John Dewey.

Some of you might remember that just last month, Dewey’s How We Think made my recommended reading list.

There is no other book that has had such a big impact on the way that I think about product management.

Dewey was profoundly insightful on the topic of critical thinking. I love this quote:

“To maintain the state of doubt and to carry on systematic and protracted inquiry—these are the essentials of thinking.”

It captures so effortlessly the challenge that so many of us struggle with when working on hard problems—the need to jump hastily to a solution.

Instead, Dewey advises us to doubt our first solution, to keep searching, and to search for much longer than expected, as this is what is required for good thinking.

And like we saw with Jonassen last week, Dewey wasn’t just interested in defining critical thinking, he was interested in developing critical thinkers.

Dewey’s Double Movement of Reflection

Induction <-> Deduction

Dewey argues it’s not enough to simply have beliefs, but that we must also do the work required to examine them, to understand why we hold them, and to assess the consequences of holding such beliefs.

He argues thinking consists of the organization of facts and conditions into theories:

The facts as they stand are the data, the raw material of reflection; their lack of coherence perplexes and stimulates to reflection. There follows the suggestion of some meaning which, if it can be substantiated, will give a whole in which various fragmentary and seemingly incompatible data find their proper place. The meaning suggested supplies a mental platform, an intellectual point of view, from which to note and define the data more carefully, to seek for additional observations, and to institute, experimentally, changed conditions.

Dewey is arguing that we start with a collection of facts. As we start to make sense of those facts, we assign meaning, and that meaning in turn helps us to understand additional facts.

For example, as a product manager, if I’m trying to assess what makes for a good email subject line, I might reflect on a set of subject lines and the open rates each garnered.

This reflection on the data will suggest meaning, perhaps something like subject lines that invoke curiosity generate better open rates.

This meaning can now help me understand the performance of other subject lines.

Dewey continues:

There is thus a double movement in all reflection: a movement from the given partial and confused data to a suggested comprehensive (or inclusive) entire situation; and back from this suggested whole—which as suggested is a meaning, an idea—to the particular facts, so as to connect these with one another and with additional facts to which the suggestion has directed attention. Roughly speaking, the first of these movements is inductive; the second deductive. A complete act of thought involves both—it involves, that is, a fruitful interaction of observed (or recollected) particular considerations and of inclusive and far-reaching (general) meanings.

Dewey argues that thinking requires back and forth movement between first an observation or assessment of facts and second the suggestion of general meanings from those facts.

This double movement between the assessment of facts and the generalizing of meaning is something we are doing all the time. The key to critical thinking is to be diligent about this process:

We may, in short, accept readily any suggestion that seems plausible; or we may hunt out additional factors, new difficulties, to see whether the suggested conclusion really ends the matter

In other words, we can add meanings to a collection of facts in any which way we like.

If I’m analyzing email subject lines, I can act on my first idea—that invoking curiosity leads to higher open rates. Or I can slow down and consider other possibilities.

If I slow down, I might also consider who sent the email, at what time it was sent, the length of the subject line, and a variety of other factors.

I might also test my first meaning. Do all subject lines that invoke curiosity garner higher open rates? Are there variations that I need to investigate?

Dewey defines the generalization of a theory from a collection of facts as inductive discovery or induction and the movement toward applying a theory to generate new facts or the testing of a theory as deductive proof or deduction.

When pains are taken to make each aspect of the movement as accurate as possible, the movement toward building up the idea is known as inductive discovery (induction, for short); the movement toward developing, applying, and testing, as deductive proof (deduction, for short).

He further elucidates:

The inductive movement is toward discovery of a binding principle; the deductive toward its testing confirming, refuting, modifying it on the basis of its capacity to interpret isolated details into a unified experience. So far as we conduct each of these processes in the light of the other, we get valid discovery or verified critical thinking.

Bringing It Back to Product Experiments

While experimentation is becoming more pervasive, we aren’t being smart about it.

Dewey would advise us to diligently assess a collection of facts. When generalizing from those facts to a coherent theory, he would advise us to consider other factors, to prolong our search. And when experimenting we should deductively test our theory generated through inductive discovery.

Most of us aren’t doing any of these things.

We aren’t conducting “systematic and protracted inquiry.” We are testing our first idea without first doing the work of inductive discovery. We are rejecting particulars without understanding the implications of rejecting those particulars.

When we do have theories generated through inductive discovery (say through design thinking or customer development interviews), we aren’t explicit about our inductive theory. We don’t use it to guide our subsequent experiments.

We have lost the connection between the back and forth movements of reflection.

Applying Reflective Thinking to Product Discovery

Suppose you are developing an app to help people set and track progress toward their goals. Someone on your team suggests that New Year’s would be a good time to acquire new customers.

This seems like a reasonable idea. New Year’s resolutions are popular and go hand-in-hand with setting goals.

Most teams would start on the following activities:

  • Launch an email campaign on New Year’s day encouraging existing users to set new goals.
  • Buy ads to run on New Year’s day to acquire new users.
  • Add an in-app tutorial that encourages people to set goals for the new year.

What would you learn by doing this?

You would learn whether or not your specific email, the ads you ran, and the tutorial you built worked or not.

If they were a wild success, you might conclude New Year’s is a good time to encourage people to set goals.

If one or more fell short of expectations, you might conclude people aren’t more likely to set and track goals around New Year’s.

That seems reasonable. But how could you learn more?

Don’t Skip Inductive Discovery

If you were to follow Dewey’s advice, you would start with inductive discovery. You might start by asking, why are people more likely to set goals around New Year’s Eve?

Katy Milkman, an assistant professor at the Wharton School who studies behavioral economics, asked this very question.

Based on the popularity of New Year’s resolutions she posited what she called her “Fresh Start” theory. From a collection of facts—the behavior of many people—she posited a theory—that people are more likely to set goals at the New Year because they consider it a fresh start.

Note how this is different from what our product team did. Instead, they jumped right into testing the particular—people are more likely to set goals around New Year's. They skipped the step of inductive discovery to suggest a generalized theory—the “Fresh Start” theory.

Use Deduction to Test Your Theory

Returning to Milkman, she then asked, if this were true what else (what other specific instances of this theory) might I expect to see?

Asking this question, helps her identify appropriate deductive tests. She would expect people to experience the “Fresh Start” effect at other times of the year—on their birthdays, at the start of each quarter, month, week, and maybe even each day.

Again, note what she did. She started with a collection of facts that indicated that people are more likely to set goals around New Year’s, she used inductive discovery to posit her “Fresh Start” theory, and then she used that theory to move back to particulars—she started to explore how to test it deductively.

First, she mined Google search trends to see if people searched for the word “diet’ (diets are the most common New Year’s resolutions) more often around the new year and also around other “Fresh Start” milestones like birthdays and the start of months. They did.

This provided confirming information that people search for resolution-related information around other “Fresh Starts,” but Milkman wanted to know if they also took action.

So she mined gym attendance data and found that people don’t just go to the gym more often in January, they also go more often at the beginning of the month and around their birthdays.

Milkman is exhibiting the double movement of reflection. She started with a specific—New Year’s resolutions—and generalized to a theory—her “Fresh Start” theory—and then designed experiments to deductively test that theory.

Milkman didn’t just learn that people are more likely to set goals around the New Year, she learned something far more valuable. She posited and supported a much more general case—her “Fresh Start” theory.

You can learn more about Milkman’s research in this Freakonomics podcast episode.

As a product manager, this learning is much more actionable. You didn’t just learn whether a specific particular worked or not, instead you generated support for a theory that you can now use to guide future experimentation.

You can start to experiment with encouraging people to set goals around different “Fresh Start” milestones. You might find that some people tend to favor New Year’s while others favor their birthdays.

You are no longer throwing spaghetti at the wall. Now you are carrying out a systematic search through experimentation.

To learn more, start with inductive discovery, follow with deductive tests. Repeat.

Make the Implicit Explicit

The primary reason why we don’t learn more from our experiments is because we skip over inductive discovery. We don’t take the time to formulate a coherent theory from the inputs we’ve gathered.

When we don’t do this, our theories remain implicit. We can’t see them. We can’t examine them. And most importantly, when we get experimental data we have no idea how to use it to revise our theory.

This leads to a lot of waste.

Instead, we want to work through inductive discovery making our implicit theories explicit.

P.S. The excerpts in this article came from Ch. 7 Systematic Inference: Induction and Deduction of Dewey's How We Think. You can find the text of this chapter online here. However, it's also available for free with better formatting in Kindle format here.