Evaluating "Synthetic User" Start-ups

What’s Real, What’s Fake, and How to Tell the Difference

May 01, 2025

∙ Paid

More and more start-ups are pitching ChatGPT/LLM wrappers, some more obvious than others. One emerging “AI start-up” trend that may look like it’s not, but most likely is, is start-ups focusing on providing “synthetic users”. Instead of paying to survey/interview real people who may be prospective or current customers, you essentially ask ChatGPT what it thinks (!).

I would argue that “synthetic users” are a type of synthetic data, since they’re computer-generated/simulated, not customer-generated.

Please note: I will be using the terms “AI”, LLM(s), and ChatGPT interchangeably in this blog post, as the terms are currently synonymous in pop culture (whether I agree with that or not).

Why I’m Skeptical of this Trend

Reason 1: Catch-22 of Learning from Simulations

While I was pursuing my graduate degree in Statistics, we had a number of courses on simulations. They went something like this:

Start with an idea of what we’re trying to simulate, e.g., the end goal.
- For example, the purchasing habits of 10 first-year college students.
Can you draw samples from the population of what we’re trying to simulate, or do you need to approximate this?
- For example, can you access a dataset of all purchases from <your favorite grocery store> made by first-year students over the last 2 weeks, and randomly select 10? Or do you need to assume that all 10 will buy some combination of “ramen, bread, peanut butter, and … lots and lots of beer” [Reddit].

You can see where I’m going with this. Suppose the start-up doesn’t have access to customer data, especially if the start-up is pre-MVP or pre-Sales. In that case, they’ll need to make a lot of assumptions about who the potential customers are and how they behave – to simulate how a synthetic focus group may behave, from which we’ll then try to learn who the potential customers are and how they may interact with the new product… As you can see, this catch-22 is not very informative, though easy to implement!.

TIP: If this were a start-up I were mentoring, I’d suggest they skip the simulation and talk to – or better yet – capture how people interact with the product (demo) to better understand what’s resonating, how, and with whom to inform the next product direction/focus as a product-led growth (PLG) motion.

Reason 2: LLMs don’t do Research

If you're not yet convinced, remember that LLMs don’t actually do research, but learn how words are associated with each other, and make that the response!

Reason 3: Product and Customer Uniqueness

I would also argue that the product a start-up is developing is unique and has a unique set of customers, since it’s developed to fill a gap in the market in the first place! Until the start-up can understand how their customers are and aren't using the product, they won't be able to iterate/improve it to win more market share. Hypothetical/simulated users won’t help them get there!

Ways Early-Stage Start-ups Can Use Simulated Users

Having said that, I see two potential use cases for simulated users generated by your favorite LLM/ChatGPT.

Scenario 1: Very, Very, Very Preliminary Customer Discovery

When founders are in the idea validation phase of their product and are doing customer discovery to learn about potential customers’ needs and pain points – ahead of a “fake door test” [BLOG], and way ahead of building an MVP – they can use ChatGPT/LLM to see if it can “impersonate” different customer types to give “feedback” as if it were the customer.

Please note: the word “feedback” is in quotes per Reason 2 above; you should expect this “feedback” to be fairly generic, and apply across products in the same category; for those familiar with the product’s industry, the insights should not be surprising.
Those founders unfamiliar with the product’s industry may uncover an interesting focus area to explore in customer interviews (during customer discovery) from the “feedback”.

TIP: I would argue that if a start-up doesn’t understand the industry it’s trying to disrupt, it’s missing a “Killer Wedge,” as NFX puts it, from powering its company’s growth.

Scenario 2: Augment ChatGPT with Customer Data

While a start-up in Scenario 1 (above) has no moat, start-ups with customer purchase and engagement data may be able to get more tailored “feedback,” depending on the quality of the data in its ability to distinguish between the different customer personas that feed into the LLM.

As an aside, this blog post by Weavely shows you how to augment ChatGPT with customer data..

TIP: I would argue that if a start-up has customer data, a product-led growth motion is lower-hanging fruit with more (revenue) impact than debugging a black-box LLM’s personas (at 3 AM) when its suggestions no longer make sense and cost years of runway in implementation costs.

Topics and Questions to Consider in Diligence

Continue reading this post for free, courtesy of Irina Kukuyeva PhD.

Or purchase a paid subscription.

Advice for AI Startup Diligence

Evaluating "Synthetic User" Start-ups

What’s Real, What’s Fake, and How to Tell the Difference

Continue reading this post for free, courtesy of Irina Kukuyeva PhD.