More and more start-ups are pitching ChatGPT/LLM wrappers, some more obvious than others. One emerging “AI start-up” trend that may look like it’s not, but most likely is, is start-ups focusing on providing “synthetic users”. Instead of paying to survey/interview real people who may be prospective or current customers, you essentially ask ChatGPT what it thinks (!).
I would argue that “synthetic users” are a type of synthetic data, since they’re computer-generated/simulated, not customer-generated.
Please note: I will be using the terms “AI”, LLM(s), and ChatGPT interchangeably in this blog post, as the terms are currently synonymous in pop culture (whether I agree with that or not).
Why I’m Skeptical of this Trend
Reason 1: Catch-22 of Learning from Simulations
While I was pursuing my graduate degree in Statistics, we had a number of courses on simulations. They went something like this:
Start with an idea of what we’re trying to simulate, e.g., the end goal.
For example, the purchasing habits of 10 first-year college students.
Can you draw samples from the population of what we’re trying to simulate, or do you need to approximate this?
For example, can you access a dataset of all purchases from <your favorite grocery store> made by first-year students over the last 2 weeks, and randomly select 10? Or do you need to assume that all 10 will buy some combination of “ramen, bread, peanut butter, and … lots and lots of beer” [Reddit].
You can see where I’m going with this. Suppose the start-up doesn’t have access to customer data, especially if the start-up is pre-MVP or pre-Sales. In that case, they’ll need to make a lot of assumptions about who the potential customers are and how they behave – to simulate how a synthetic focus group may behave, from which we’ll then try to learn who the potential customers are and how they may interact with the new product… As you can see, this catch-22 is not very informative, though easy to implement!.
TIP: If this were a start-up I were mentoring, I’d suggest they skip the simulation and talk to – or better yet – capture how people interact with the product (demo) to better understand what’s resonating, how, and with whom to inform the next product direction/focus as a product-led growth (PLG) motion.
Reason 2: LLMs don’t do Research
If you're not yet convinced, remember that LLMs don’t actually do research, but learn how words are associated with each other, and make that the response!
Reason 3: Product and Customer Uniqueness
I would also argue that the product a start-up is developing is unique and has a unique set of customers, since it’s developed to fill a gap in the market in the first place! Until the start-up can understand how their customers are and aren't using the product, they won't be able to iterate/improve it to win more market share. Hypothetical/simulated users won’t help them get there!
Ways Early-Stage Start-ups Can Use Simulated Users
Having said that, I see two potential use cases for simulated users generated by your favorite LLM/ChatGPT.
Scenario 1: Very, Very, Very Preliminary Customer Discovery
When founders are in the idea validation phase of their product and are doing customer discovery to learn about potential customers’ needs and pain points – ahead of a “fake door test” [BLOG], and way ahead of building an MVP – they can use ChatGPT/LLM to see if it can “impersonate” different customer types to give “feedback” as if it were the customer.
Please note: the word “feedback” is in quotes per Reason 2 above; you should expect this “feedback” to be fairly generic, and apply across products in the same category; for those familiar with the product’s industry, the insights should not be surprising.
Those founders unfamiliar with the product’s industry may uncover an interesting focus area to explore in customer interviews (during customer discovery) from the “feedback”.
TIP: I would argue that if a start-up doesn’t understand the industry it’s trying to disrupt, it’s missing a “Killer Wedge,” as NFX puts it, from powering its company’s growth.
Scenario 2: Augment ChatGPT with Customer Data
While a start-up in Scenario 1 (above) has no moat, start-ups with customer purchase and engagement data may be able to get more tailored “feedback,” depending on the quality of the data in its ability to distinguish between the different customer personas that feed into the LLM.
As an aside, this blog post by Weavely shows you how to augment ChatGPT with customer data..
TIP: I would argue that if a start-up has customer data, a product-led growth motion is lower-hanging fruit with more (revenue) impact than debugging a black-box LLM’s personas (at 3 AM) when its suggestions no longer make sense and cost years of runway in implementation costs.
Topics and Questions to Consider in Diligence
In evaluating “synthetic user” products, consider confirming whether the product is/isn’t a ChatGPT wrapper, its differentiation from the competition, and how much it costs to maintain (as the costs will typically go up from here as an LLM company grows). Here are a few questions on these topics.
Product
Is there a clear ICP and GTM strategy for the product (focusing on solving one vertical’s specific pain point)?
What is the moat? If it’s data, how was this acquired?
Follow-up: How often is the LLM fed new customer data?
Follow-up: Why offer “feedback” as a service (over PLG or something else)?
How does the start-up make sure that different products in the same vertical get tailored “advice”?
Team
Does the team have AI and technical expertise to guide execution and iteration?
Business
What are their cloud compute costs?
Does the revenue model align incentives with the product offering?
Summary
“Synthetic Users” are, by definition, not real, and won’t provide real insights into customers the start-up has not seen before! They may still be useful during idea validation for founders unfamiliar with an industry they’re trying to disrupt, but should they be doing that, while missing a “Killer Wedge” from powering their company’s growth? Is there a moat – that doesn’t cost five figures (or more) per month, even at the earliest stages, to run in the cloud? I imagine most start-ups in this category won’t meet these criteria. What are you seeing?
References
Cover image per subiz