Update: I’ve since revised this hypothesis format. You can find the most current version in this article:
“My hypothesis is …”
These words are becoming more common everyday. Product teams are starting to talk like scientists. Are you?
The internet industry is going through a mindset shift. Instead of assuming we have all the right answers, we are starting to acknowledge that building products is hard. We are accepting the reality that our ideas are going to fail more often than they are going to succeed.
Rather than waiting to find out which ideas are which after engineers build them, smart product teams are starting to integrate experimentation into their product discovery process. They are asking themselves, how can we test this idea before we invest in it?
This process starts with formulating a good hypothesis.
These Are Not the Hypotheses You Are Looking For
When we are new to hypothesis testing, we tend to start with hypotheses like these:
- Fixing the hard-to-use comment form will increase user engagement.
- A redesign will improve site usability.
- Reducing prices will make customers happy.
There’s only one problem. These aren’t testable hypotheses. They aren’t specific enough.
A good hypothesis can be clearly refuted or supported by an experiment. – Tweet This
The 5 Components of a Good Hypothesis
To make sure that your hypotheses can be supported or refuted by an experiment, you will want to include each of these elements:
- the change that you are testing
- what impact we expect the change to have
- who you expect it to impact
- by how much
- after how long
The Change: This is the change that you are introducing to your product. You are testing a new design, you are adding new copy to a landing page, or you are rolling out a new feature.
Be sure to get specific. Fixing a hard-to-use comment form is not specific enough. How will you fix it? Some solutions might work. Others might not. Each is a hypothesis in its own right.
Design changes can be particularly challenging. Your hypothesis should cover a specific design not the idea of a redesign.
In other words, use this:
- This specific design will increase conversions.
- Redesigning the landing page will increase conversions.
The former can be supported or refuted by an experiment. The latter can encompass dozens of design solutions, where some might work and others might not.
The Expected Impact: The expected impact should clearly define what you expect to see as a result of making the change.
How will you know if your change is successful? Will it reduce response times, increase conversions, or grow your audience?
The expected impact needs to be specific and measurable. – Tweet This
You might hypothesize that your new design will increase usability. This isn’t specific enough.
You need to define how you will measure an increase in usability. Will it reduce the time to complete some action? Will it increase customer satisfaction? Will it reduce bounce rates?
There are dozens of ways that you might measure an increase in usability. In order for this to be a testable hypothesis, you need to define which metric you expect to be affected by this change.
Who Will Be Impacted: The third component of a good hypothesis is who will be impacted by this change. Too often, we assume everyone. But this is rarely the case.
I was recently working with a product manager who was testing a sign up form popup upon exiting a page.
I’m sure you’ve seen these before. You are reading a blog post and just as you are about to navigate away, you get a popup that asks, “Would you like to subscribe to our newsletter?”
She A/B tested this change by showing it to half of her population, leaving the rest as her control group. But there was a problem.
Some of her visitors were already subscribers. They don’t need to subscribe again. For this population, the answer to this popup will always be no.
Rather than testing with her whole population, she should be testing with just the people who are not currently subscribers.
This isn’t easy to do. And it might not sound like it’s worth the effort, but it’s the only way to get good results.
Suppose she has 100 visitors. Fifty see the popup and fifty don’t. If 45 of the people who see the popup are already subscribers and as a result they all say no, and of the five remaining visitors only 1 says yes, it’s going to look like her conversion rate is 1 out of 50, or 2%. However, if she limits her test to just the people who haven’t subscribed, her conversion rate is 1 out of 5, or 20%. This is a huge difference.
Who you test with is often the most important factor for getting clean results. – Tweet This
By how much: The fourth component builds on the expected impact. You need to define how much of an impact you expect your change to have.
For example, if you are hypothesizing that your change will increase conversion rates, then you need to estimate by how much, as in the change will increase conversion rate from x% to y%, where x is your current conversion rate and y is your expected conversion rate after making the change.
This can be hard to do and is often a guess. However, you still want to do it. It serves two purposes.
First, it helps you draw a line in the sand. This number should determine in black and white terms whether or not your hypothesis passes or fails and should dictate how you act on the results.
Suppose you hypothesize that the change will improve conversion rates by 10%, then if your change results in a 9% increase, your hypothesis fails.
This might seem extreme, but it’s a critical step in making sure that you don’t succumb to your own biases down the road.
It’s very easy after the fact to determine that 9% is good enough. Or that 2% is good enough. Or that -2% is okay, because you like the change. Without a line in the sand, you are setting yourself up to ignore your data.
The second reason why you need to define by how much is so that you can calculate for how long to run your test.
After how long: Too many teams run their tests for an arbitrary amount of time or stop the results when one version is winning.
This is a problem. It opens you up to false positives and releasing changes that don’t actually have an impact.
If you hypothesize the expected impact ahead of time than you can use a duration calculator to determine for how long to run the test.
Finally, you want to add the duration of the test to your hypothesis. This will help to ensure that everyone knows that your results aren’t valid until the duration has passed.
If your traffic is sporadic, “how long” doesn’t have to be defined in time. It can also be defined in page views or sign ups or after a specific number of any event.
Putting It All Together
Use the following examples as templates for your own hypotheses:
- Design x [the change] will increase conversions [the impact] for search campaign traffic [the who] by 10% [the how much] after 7 days [the how long].
- Reducing the sign up steps from 3 to 1 will increase signs up by 25% for new visitors after 1,000 visits to the sign up page.
- This subject line will increase open rates for daily digest subscribers by 15% after 3 days.
After you write a hypothesis, break it down into its five components to make sure that you haven’t forgotten anything.
- Change: this subject line
- Impact: will increase open rates
- Who: for daily digest subscribers
- By how much: by 15%
- After how long: After 3 days
And then ask yourself:
- Is your expected impact specific and measurable?
- Can you clearly explain why the change will drive the expected impact?
- Are you testing with the right population?
- Did you estimate your how much based on a baseline and / or comparable changes? (more on this in a future post)
- Did you calculate the duration using a duration calculator?
It’s easy to give lip service to experimentation and hypothesis testing. But if you want to get the most out of your efforts, make sure you are starting with a good hypothesis.
Did you learn something new reading this article? Keep learning. Subscribe to the Product Talk mailing list to get the next article in this series delivered to your inbox.