I’m disappointed to see the rise of generative AI tools that are designed to replace discovery with real humans.
Don’t get me wrong. I’m a big fan of generative AI. I use it daily in both my personal life and at work.
But when we use generative AI to replace customer interviews, to generate opportunity solution trees, or to do our thinking for us, we fundamentally misunderstand the purpose of discovery.
When we use generative AI to replace customer interviews, to generate opportunity solution trees, or to do our thinking for us, we fundamentally misunderstand the purpose of discovery. – Tweet This
So I want to take some time to review why we do discovery. I’ll then share how and where I think generative AI can help, and clearly identify what we should avoid.
I also want to note that the world of generative AI is moving quickly. My advice in this article may not stand the test of time. I’ll do my best to update it as the technology evolves. But this is where I stand today.
Understanding Why We Do Discovery: It’s Not About Checking Boxes
We do discovery so that we can increase our hit rate for creating customer value while creating business value.
I break the underlying structure of discovery into three parts:
- Starting with a clear desired outcome—in other words, starting with a clear definition of success.
- Discovering unmet customer needs, pain points, and desires—AKA opportunities.
- Discovering solutions that address those opportunities in a way that drives business value.
The outcomes represent business value. The opportunities represent customer value. Our job is to make sure that our solutions satisfy both.
I recommend teams conduct story-based customer interviews to discover opportunities and run assumption testing to discover the right solutions. And I like to use opportunity solution trees to visualize our work to help us stay aligned on what we think we know.
Interviewing helps us build empathy for our customers. When we interview continuously, we are continuously investing in our understanding of the humans who use our products. This understanding, over time, becomes a competitive advantage. The better we understand our customers, the better we can meet their needs, and the more we can differentiate our product from the competitors.
Assumption testing helps us ensure that we build the right solutions—solutions that fully satisfy an opportunity (creating customer value) in a way that drives our outcome (building customer value). We test desirability, usability, feasibility, usability, and ethical assumptions. With each test, we remove risk from our solutions and increase the likelihood they will work for both our customers and our business.
The goal is to ship value to customers in a way that creates value for our business. Everything we do in discovery is in service of that.
Along the way, we might create interview snapshots, experience maps, opportunity solution trees, story maps, KPI trees, or other visual artifacts. But these artifacts are not the end goal. They are tools that help us think through what we know. They help us uncover customer value and business value. They help us explore the diverse perspectives on our cross-functional team and then align around a shared understanding. These visuals are not the discovery work; they are the by-product of discovery work.
Too many generative AI tools miss this point. They think the artifact is the end goal. It’s why we now have tools that generate interview transcripts from synthetic customers, fictional interview snapshots and opportunity solution trees, and detailed story maps.
Again, I’m not a generative AI Luddite. I use these tools every day. I’m even using them in my own discovery work. So I want to share four guidelines to keep in mind when choosing generative AI tools to support your discovery efforts.
1. Don’t Let Generative AI Tools Replace Talking to Real Humans
When we use generative AI to generate interview snapshots from interviews with synthetic customers, we get a nice summary of pretend people. We learn nothing about the real humans who might use our product some day.
We don’t interview to learn about what people generally want. We interview customers to learn about their unique, specific stories about their real past behavior. These stories help us understand their actual unmet needs, pain points, and desires.
As we previously reviewed, interviewing helps us build empathy for the people who might use our products or services. We interview so that we can learn how we can positively intervene in their lives. The more we interview, the more we identify gaps between their expectations and the products they currently use. These gaps allow us to differentiate our products and services from our competitors’.
When we interview a synthetic user or create an AI-generated interview snapshot, we get generalizations. It’s very challenging to get specifics. Even when we do, they are made up. They don’t represent the needs, pain points, or desires of real humans.
When we interview a synthetic user or create an AI-generated interview snapshot, we get generalizations. It’s very challenging to get specifics. Even when we do, they are made up. – Tweet This
As long as you are building products for real humans, generative AI can’t replace interviewing real humans.
Still not convinced? Let’s look at some examples.
To start, I asked ChatGPT to tell me what are the common needs, pain points, and desires of Netflix customers. Here’s what it told me:
Needs:
- Diverse Content: Viewers seek a wide variety of genres and types of content, including movies, TV shows, documentaries, and original programming.
- Quality Streaming: Users need a reliable streaming service with high-quality video and audio, and minimal buffering or interruptions.
- Personalized Recommendations: Customized suggestions based on viewing history to easily find new content that matches their preferences.
Pain Points:
- Content Availability: Some users are frustrated by regional restrictions or the removal of favorite shows and movies from the platform.
- Subscription Costs: As prices increase, some customers feel that the cost is too high, especially when compared to other streaming services.
- User Interface Issues: Navigating the app can sometimes be cumbersome, with users wanting more intuitive ways to find and organize content.
- Account Sharing Limitations: Restrictions on the number of devices or profiles that can be used simultaneously can be a pain point for families or households with multiple users.
Desires:
- Exclusive Content: Access to exclusive or early-release content that isn’t available on other platforms.
- Ad-Free Experience: A completely ad-free viewing experience, ensuring uninterrupted entertainment.
- Enhanced Features: Features such as offline viewing, customizable profiles, and better parental controls.
- Social Integration: Options to share recommendations and watch content with friends and family, even remotely.
- Continuous Improvement: Regular updates and enhancements to the platform to keep it fresh, user-friendly, and ahead of competitors.
Okay, this might be helpful if I know absolutely nothing about streaming entertainment. But it’s not specific enough to be actionable.
I gave ChatGPT the benefit of the doubt and I asked for more specifics: What are some specific user interface issues that someone might encounter?
Here’s an excerpt from ChatGPT’s reply:
Difficulty Finding New Content:
- Overwhelming Choices: The vast library can be overwhelming, making it hard for users to discover new shows and movies.
- Poor Search Functionality: The search feature sometimes doesn’t return relevant results or can be slow to suggest new content based on partial titles or vague keywords.
Take a minute and suppose you were on the search team. Is this helpful feedback?
Of course not. We need to know exactly what the customer was searching for and why the results weren’t relevant in order for this type of feedback to be actionable.
In other words, we need to know what a real human did that led to unsatisfactory search results and why those results were unsatisfactory.
I tried again. I asked: What are some reasons or use cases why the search results might be irrelevant?
Ambiguous Keywords:
- Common Words: Searching for titles with common words or phrases (e.g., “Love,” “House”) might return a wide range of unrelated results.
- Multiple Meanings: Keywords with multiple meanings or uses (e.g., “Rock” for both music and geology) can produce irrelevant search results.
Partial Titles:
- Incomplete Titles: Entering only part of a title can lead to results that include any content with matching keywords, not necessarily the intended show or movie.
- Typographical Errors: Spelling mistakes or typos in the search query can lead to irrelevant results.
If I was brand-new to search, this information might be helpful, as it can help me get up to speed on common challenges with search interfaces. But it’s not nearly specific enough to replace interviews.
Okay, let’s try another tactic. I asked ChatGPT: You are an engaged Netflix customer who enjoys watching TV at the end of your work day. Tell me about the last time you watched Netflix.
Me again: Set the scene for me. Where were you?
Does this sound like a real human to you? It sounds like the scene from a Crate & Barrel showroom. “A glass of my favorite drink?” Come on!
But I know there are doubters out there, so I continued: Okay, so you sat in your cozy living room. You turned on your TV. How did you get to Stranger Things?
And of course, ChatGPT obliged:
After settling into the couch, I reached for the remote and turned on the TV. The Netflix app was already set up on my smart TV, so I navigated to it with a few clicks. As soon as the familiar Netflix logo appeared, I felt a sense of anticipation.
Once inside the app, I scrolled through the “Continue Watching” section, where “Stranger Things” was prominently displayed. I had been eagerly following the new season, so it was right at the top of my list. I selected the latest episode, and within seconds, the screen filled with the opening credits and the show’s distinctive, eerie music.
With the episode queued up, I grabbed my popcorn and drink, adjusted the pillows for maximum comfort, and hit play. The adventure was about to begin.
Me again: Tell me about your experience watching the episode.
As the episode began, I felt myself instantly drawn into the world of Hawkins, Indiana. The familiar characters, who had grown so much over the seasons, were dealing with new challenges and mysteries. The show’s signature blend of supernatural suspense and heartfelt moments was immediately engaging.
The episode started with a tense scene in the Upside Down, the ominous parallel dimension. The special effects were impressive, creating a sense of dread and excitement. I found myself gripping the edge of my seat as the characters navigated through eerie environments and faced off against terrifying creatures.
The pacing was perfect, with moments of high tension balanced by quieter, emotional scenes. One particular scene, where the group shared a heartfelt conversation about their struggles and hopes, was particularly touching. It added depth to the characters and made me care even more about their fates.
The soundtrack, a mix of 80s hits and atmospheric music, enhanced the experience. It brought back memories of classic 80s movies and added a nostalgic layer to the show. The episode’s cliffhanger ending left me eager to see what would happen next, perfectly setting up the anticipation for the following episode.
Overall, watching “Stranger Things” was an immersive experience. It was the perfect escape, allowing me to forget about the stresses of the day and lose myself in a thrilling and emotional story.
Me still playing along: Did you encounter any challenges?
The experience was mostly smooth and enjoyable, but there were a couple of minor challenges.
Firstly, there was a brief moment when my internet connection stuttered, causing the video to buffer. This disrupted the flow of the episode for a minute or so, but it quickly resolved itself, and I was able to continue watching without further interruptions.
Secondly, one of my neighbors decided to do some late-night vacuuming, which was a bit distracting. I had to increase the volume a bit to fully immerse myself in the show and drown out the background noise.
Other than these small issues, everything went well. The episode was captivating, and despite the minor challenges, it was a great way to unwind and enjoy the evening.
Okay, I have to say, ChatGPT is pretty good at writing fiction. But I’m not interested in fiction. I’m interested in real human stories.
ChatGPT is pretty good at writing fiction. But I’m not interested in fiction. I’m interested in real human stories. – Tweet This
I’ll spare you the details, but I asked for ChatGPT to change its persona and tell me another story. In total, I generated five stories all representing different “characters.”
In every story, the customer:
- Was happy with their Netflix experience.
- Loved the show they were watching.
- Had minor internet hiccups and were interrupted by something (e.g. a neighbor, their dog).
I pushed ChatGPT for more details, for more variation, and it struggled. Here’s why: These aren’t real people. They are fictional characters based on what ChatGPT can glean from its training data about what people like and dislike about Netflix. It’s not specific enough. It’s not real.
Even if it was, it doesn’t build empathy and it doesn’t help us differentiate from our competitors, because our competitors can get access to the exact same information.
Just in case you still need convincing, let’s look at a cleaned-up version of a real story I collected about a real human watching TV:
“My husband and kids were away so I had some time to myself. I decided to watch the Olympics live on NBC. There are so many commercials and qualifying rounds that I don’t care about that I started working on my computer at the same time. I just want to see the big moments—the final races. I got distracted by Facebook and then got pulled into YouTube to watch a Jimmy Fallon video. Before I knew it, I had spent 45 minutes browsing funny videos. I got pulled back into the Olympics when gymnastics came on. But it’s not as fun when you aren’t watching with other people. I got distracted again, started reading my phone, and accidentally saw the results of the event I was watching. I hate it when that happens. But I watched the event anyway.”
This is a human story. Humans are messy. We get distracted. We make mistakes. We aren’t overly positive about everything.
I’m going to say it again: If you are building products for humans, don’t replace interviewing real humans with interviewing generative AIs.
2. Don’t Use AI to Replace Thinking
I have now seen generative AI tools that will generate an interview snapshot, an opportunity solution tree, an experience map, and a story map from the click of a button. In case my opinion isn’t clear: I do not recommend using these tools.
We create these artifacts to examine our thinking, to align as a team, and to facilitate the synthesis process. The goal of these documents is not to create the document. The goal is to have the conversation, to do the examination, to do the thinking and the aligning. The goal is the process.
We create artifacts like interview snapshots to examine our thinking, to align as a team, and to facilitate the synthesis process. The goal of these documents is not to create the document. – Tweet This
If you create an opportunity solution tree from the click of a button, what value does it create? Does anyone on your team have any idea where the opportunities came from? Do they represent real customer needs? Does everyone agree with the outcome, the structure, the framing of each opportunity?
I know from arguing with people on social media that some of you are wondering, “But can’t I use a tool like this to solve the blank page problem—to get to a first draft—and then we can refine it as a team from there?”
My short answer is no. When we let AI do the first draft, we let AI dominate the conversation.
We’ve all been there. You are sitting in a team meeting, maybe you are tired or just not that interested in the conversation, you don’t really want to put in the work, so you let the most vocal person dictate the conversation. We all do it. There’s even an academic term for it—social loafing. Someone on your team is running with it, they have energy around it, what’s the harm in letting them take the lead?
The harm is that we don’t take the time to explore the diverse perspectives on our team. We don’t leverage everyone’s unique knowledge and expertise. We defer to the extrovert, the person who is willing to take the lead, the person who has a fresh coffee. Whatever the reason, it rarely leads to the best outcome.
Most of the work—most of the thinking—happens when we go from a blank page to a first draft. When we outsource this to generative AI, we let generative AI do all the thinking for us. And we’ve already seen generative AI isn’t specific enough, doesn’t know about real humans, can’t (yet) replace high-quality discovery work. When we rely on generative AI to generate the first draft, everyone on the team falls prey to social loafing. Don’t do this.
Most of the work—most of the thinking—happens when we go from a blank page to a first draft. When we outsource this to generative AI, we let generative AI do all the thinking for us. – Tweet This
3. Use AI to Analyze Large Data Sets (But Check Its Work)
You can use generative AI to help you make sense of large data sets. It can help you find insights and patterns in your behavioral analytics. It can do a sentiment analysis on your NPS verbatims. It can help you summarize and categorize your support tickets.
But it will make mistakes. So check its work.
I recently used ChatGPT-4o to help me analyze responses to a qualitative survey question. I had over 300 open-ended answers and I wanted to group and sort related responses together to get a better understanding of the most common themes.
First, I fed ChatGPT-4o the full list and asked it to find common themes. It did a mediocre job at this. The themes were ambiguous, they overlapped, and many didn’t make sense to me at all.
But it did solve the blank page problem for me. Seeing the half-baked themes helped me to start to identify my own themes. I identified six to eight themes myself and asked ChatGPT to sort all of the responses according to those themes. I also asked it to list any responses that didn’t match those themes. It did a pretty good job of this. I was then able to look at the ones that didn’t fit and identify additional themes.
Once I had a complete list of themes, I fed ChatGPT the complete list of themes and the individual responses and asked it to group them again. This worked great. I ended up with a list of responses grouped by theme.
To check its work, I made sure that the number of responses matched my original list and that there were no duplicates across the themes. This was fairly easy to do using Google Sheets and asking it to remove duplicates and then checking the counts.
In total, this took about one hour. Had I tried to group all of the responses myself, I easily would have spent three to four hours on this task.
But notice that I didn’t outsource the work to ChatGPT-4o. I collaborated with it. I iterated through several attempts until I was happy with the quality. I was engaged. I didn’t outsource my thinking. I used ChatGPT-4o to help with the tedious elements, but I still put in the work.
Let’s look at a second example. I recently used ChatGPT to analyze the engagement I get on my LinkedIn posts. I exported my top 50 posts on LinkedIn and added engagement data to them (comments, reactions, reposts, new followers). I categorized each post by topic and by type. I then fed this data to ChatGPT and asked it to create a heat map showing which posts were most successful at driving impressions, engagement, and new followers.
ChatGPT gave me a beautiful heat map, but I quickly realized my most successful posts were the types that I share most frequently. I realized that ChatGPT was using raw counts. I then asked it to redo the heat maps using averages for category and type.
This is an example where I still have to know what to ask for to get reliable analysis. Even with averages, the heat map was hard to read because the numbers were all over the place. So then I asked for normalized averages and I got heat maps that clearly showed my most successful categories and post types.
To check its work, I reviewed the Python code it used to generate the heat maps to make sure there were no mistakes in the math. And I manually calculated a couple of the values to make sure it matched my expectations. With the help of ChatGPT-4o, this took about two hours of focused effort. Had I done this on my own, it would have taken a day or two at least.
You can even use generative AI to help you get up to speed on a new market or a new customer segment. As we saw in the Netflix example above, if I was brand-new to streaming entertainment or to search, those initial responses would have helped me build my foundation. But don’t let it replace customer interviews with real humans—that’s where you’ll learn how you can differentiate from everyone else in the market.
Remember, good continuous discovery is a competitive advantage that compounds over time. Don’t outsource it to a tool that everyone has access to.
Remember, good continuous discovery is a competitive advantage that compounds over time. Don’t outsource it to a tool that everyone has access to. – Tweet This
4. Engage with AI Like It’s Another Member of Your Team
Discovery is a team sport. It works best when we work as a cross-functional team from the very beginning when deciding what to build. This is why I advocate for the product trio to do discovery together.
However, we only get the benefits of cross-functional collaboration when we leverage the different knowledge, expertise, and experience on the team. If we let one person do all of the work, dominate the conversation, or make all of the decisions, we don’t get the benefits of cross-functional collaboration.
In all of our courses, we teach teams how to collaborate cross-functionally by following a simple pattern. We have everyone do the work individually, before they meet as a team to discuss a team version. This requires that everyone puts in the effort and develops an individual perspective.
We then teach teams how to explore the diverse perspectives on their team. We encourage them to pay particular attention to the differences across the unique perspectives. Rather than advocating for one point of view, we encourage them to share how they developed their unique perspectives. Then we ask them to co-create a new team perspective.
This allows the team to leverage each of the unique perspectives, do the work to build a shared understanding, and then develop a stronger team perspective to work from moving forward.
For teams that follow this pattern, it can be extremely helpful to treat generative AI as if it’s another member of the team. Generative AI tools can help us brainstorm and they are reasonably good at generating story maps and identifying critical assumptions. They can even help us identify opportunities in interview transcripts.
But for each of these tasks, I don’t want to outsource the team work, but rather I want to treat the generative AI as if it’s another member of the team. So rather than blindly moving forward with its answer, we treat it like one of the many perspectives on our team and integrate it with the other perspectives on our team.
I use generative AI for these types of tasks almost every day. It helps me get out of my own head and see that there are other options. The same can be true for cross-functional teams.
It’s easy to get fixated on a single direction, have blindspots to our own assumptions, or miss a salient point in an interview. Generative AI can help us catch some of these errors. But again, I have to reiterate this: Don’t outsource your thinking to generative AI. Embrace it as a helpful member of your team, but do the work to figure out what you think first, and then use generative AI to augment your and your team’s perspective.
It’s easy to get fixated on a single direction, have blindspots to our own assumptions, or miss a salient point in an interview. Generative AI can help us catch some of these errors. –Tweet This
Final Thoughts
Don’t let this article discourage you from using generative AI in your work. I will reiterate that I use these tools every day. But do be thoughtful about how and when you use them. We don’t want to forget that as long as we are building for humans, we need to be engaging with other humans in our discovery efforts. Technology needs more humanity; not less.
As we are building for humans, we need to be engaging with other humans in our discovery efforts. Technology needs more humanity; not less. – Tweet This