Foundational Models are not SaaS Products
Why Teams Focusing on Foundational Models are Yellow Flags: Advice for AI Diligence
I don’t have to tell you that AI investments are hot right now. You may be considering investing in teams that develop foundational models; however, you shouldn’t believe the hype! Especially if in diligence, the team’s answers about the future include foundational models as a hiring, GTM, or scaling strategy (!). Here’s why, what yellow flags to watch out for, and what to dig deeper into in diligence.
What is a Foundational Model?
There are many definitions out there, even if the job title is the same: “AI Engineer!”
When I hear of companies focused on developing “foundational models”, I think they’re changing the underlying architecture of (Generative AI) models, even if those changes are very small, such as whether there should be 3 or 4 layers.
In an LLM example, the architecture essentially outlines a process for how the algorithm combines, squeezes, and/or rotates, ignoring parts of an image, repeatedly -- as many times as there are layers.
In a regression example, this would be the question of deciding between whether the data points can best be modeled with one line (by using “linear regression”, shown in panel 1 of the cartoon) or by multiple, shorter lines stuck together (by using “linear splines”, shown in panel 2 of the cartoon).
This is different from what I think of as fine-tuning existing models, where companies are tuning a (relatively small) subset of the billions of parameters to get the LLM (or a Machine Learning model) to perform better on their problem, for example, by finding that a parameter should be equal to 2, not 3.
Foundational models are just math! Fine-tuning involves coding up the math and solving for (e.g., identifying) all the unknown parameters defined by the mathematical equations in the foundational model when it’s applied to a dataset. In doing so, one trains, tests, and validates an algorithm.
As you can imagine, there are no guarantees that it will actually work/converge on the dataset, or yield reasonable results, especially on the first pass. But if it does, the result essentially becomes a vertical AI solution.
Why There Is No One Foundational Model to Rule Them All
I’m a baking experimenter who takes a baking recipe as a suggestion and not a rule. (Many failed, but tasted great with ice cream!) Adding a tablespoon of cornmeal (I wanted to use up) to this Instant Pot banana bread recipe makes the texture less pancake-like! And following advice from commenters on King Arthur’s deep-dish crust, I now use a combination of all-purpose and semolina flour (and was pleasantly surprised when the crust came out better than a restaurant selling deep-dish pizzas!).
But even though there is such a thing as all-purpose flour -- it’s not actually a silver bullet for all bakes. (And it’s worse if you’re making airy cakes or prefer Italian pizza, or have a gluten sensitivity/allergy!) I’ve learned that how well my baking excitement turns out mainly depends on the protein composition of the flour and the moisture content of the batter/dough.
There is no single foundational model that rules them all! Just like regression is not always the answer! While regression has a billion fewer parameters, is easier to figure out what went wrong when it breaks at 3 in the morning, and takes much less time to update and validate, it’s just not always the right ingredient for recipes! (Especially if you have customers tracked over time.)
PhD students (in Statistics, Mathematics, Physics, and other disciplines) across the world are inventing new algorithms (e.g., foundational models) as you’re reading this. There will always be a better foundational model!
How do I know this?
For my PhD thesis, I invented a foundational model, which is a special case of the many LLM models available today. (Fun fact: it would have been called “Independent Tensor analysis”, had the acronym not been my Dutch advisor’s ex-wife’s name.) To earn a PhD in Statistics, I needed to demonstrate that my approach was superior to existing models.
Over 10 years ago, it took over a week of runtime, without much optimization, to run on UCLA’s Hoffman2 High-Performance Compute Cluster. It took months to get it to run without breaking. I only know it worked when the atmospheric signatures it uncovered from two storms on Jupiter were similar! NASA’s Juno mission, years later, confirmed my findings.
Then, I did it again! As a Director of an internal consulting center of a global Market Research company, I developed a foundational (Bayesian Networks) model, which I then implemented and fine-tuned to uncover what drives someone to purchase a CPG product, so that the global SaaS product that helped client service teams in 64 countries make more robust and reliable recommendations. It was used in over 1500 market research studies, generating (many) millions of dollars in revenue for the company.
And again (in AdTech)! And again (in healthcare)! Every foundational model I’ve invented, I then implemented and fine-tuned/validated, to ensure it worked for the specific use case at hand.
Challenges with Foundational Models



