Advice for AI Startup Diligence

Advice for AI Startup Diligence

Spotting Misleading Metrics in Early-Stage (GenAI) Start-ups

Advice for Diligence

Irina Kukuyeva PhD's avatar
Irina Kukuyeva PhD
May 28, 2025
∙ Paid

I’ve previously shared advice on why you may want to be skeptical of high accuracy and retention metrics; you’ve probably read about the start-up inflating its customer metrics. Here’s one more to dig into with diligence. 😀

It’s becoming increasingly difficult for early-stage start-ups to secure funding. With technology making it easier to develop prototypes in days, there is intense competition for fewer investors, who now require traction that’s more indicative of later rounds from a few years ago.

Advice for AI Startup Diligence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

One way a startup can demonstrate traction is by showing increased customer usage before and after introducing a feature. If the feature solves a critical customer pain point, its metrics should look good, even if the solution is overengineered. Here are two ways that the customer engagement metric can be inflated.

Scenario 1: High Traction becomes the One Metric that Matters – Above All Else

One way to achieve high traction is to prioritize it, even at the expense of everything else — a phenomenon also known as Goodhart’s Law. Often, there are catastrophic side effects.

We’ve all heard the news story about Wells Fargo opening over 2 million fake accounts for its customers, largely due to excessive sales quotas. These Redditors share other examples of this law in their employers’ businesses.

Start-ups are not immune to this either. To attract more businesses to its platform ahead of its Series F funding, DoorDash scraped restaurant websites and, in one documented instance, sold $16 pizzas to customers that it paid $24 for.

When traction seems “too good to be true,” consider digging into any incentives around the metric. Ideally, you’d also be able to dive into the LTV/CAC ratio (by cohort) – along with how it’s calculated— to understand market demand for the product during diligence. Though, as you know, many early-stage start-ups won’t have enough sales for it to be a stable metric, and will mention retention instead. (I dive into ways that 30-day retention by cohort can be (inadvertently) inflated in this blog post.)

Scenario 2: High Traction when Missing Baseline

Another way to achieve product traction is to assume that our customers, patients, or prospective clients cannot schedule appointments on demand via a calendar widget on the product’s app or website. Then, if our product allows them to finally do so, even if it’s using AI agents, its performance should look good!

Scheduling rates after the AI agent implementation should be higher than before, as this widget was not available previously! Had we compared the AI agent’s scheduling performance to scheduling rates with a calendar widget, I would expect it to do worse (!), since the calendar widget should not only be a fraction of the cost and time to go live, but it does not need to account for the nuances and diversity of the language in doing so.

You may think this is a contrived example, but I’ve done diligence on multiple start-ups that use this as their GTM strategy: start with an AI agent for scheduling, then expand from there. Because whatever the start-up builds, they’ll need to fix and maintain as things break for customers, it will be harder to scale when products are overengineered from the beginning, whether an LLM wrote this code or not.

Advice

When every start-up seems to be pitching costly “AI” solutions, consider evaluating the following in diligence:

Keep reading with a 7-day free trial

Subscribe to Advice for AI Startup Diligence to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Irina Kukuyeva, PhD · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture