☁ Databricks & Enterprise AI

MosaicML, Open Source, and Private LLMs

Thomas Robb

Jul 31, 2023

Simplify cybersecurity & learn to model SaaS companies
Join students from Goldman Sachs, Microsoft, Bank of America, Palo Alto Networks, Cowen, SVB, Scotiabank, VCs, and ETF Managers

40% off ends tonight! (Code: EarlySaaS40)

Join Live Cohort!

Databricks’ $1.3B Bet on Enterprise AI

Databricks’ made a bold bet last month (June 26th) acquiring MosaicML for a headline $1.3B.

Here’s what I’m thinking about 👇

What is MosaicML and how are they different?
The importance and price of efficiency (especially with GPUs)
LLMs are becoming table stakes
Open source models and the importance of privacy
What is Microsoft doing?

Overall, I see this acquisition and the early bets by large data companies as foundational to the enterprise AI space moving forward.

What is MosaicML?

MosaicML is known for its state-of-the-art MPT large language models (LLMs). With over 3.3 million downloads of MPT-7B and the recent release of MPT-30B, MosaicML has showcased how organizations can quickly build and train their own state-of-the-art models using their data in a cost-effective way. Customers such as AI2 (Allen Institute for AI), Generally Intelligent, Hippocratic AI, Replit and Scatter Labs leverage MosaicML for a wide variety of generative AI use cases. - Databricks

What does MosaicML do in English?

While the AI models Mosaic’s customers use are less sophisticated than OpenAI’s, they’re typically less expensive to run and more tailored to companies’ internal use cases, such as retrieving internal information for employees, according to Mosaic customers like Replit, a software development tool provider, and Glean AI, which makes software that tracks company spending and suggests ways to cut costs. - The Information

(Replit did a full analysis on the benefits of training your own LLM Model)

Unpopular Opinion: Your LLM doesn’t need to philosophize about how Rome fell:

The startup released its MPT-7B model in May, which cost $200,000 to build.
“It’s not $100 million,” Rao emphasized of the price tag. “Everyone needs to get that out of their mind.”
As he put it, models don’t need to have the capability to philosophize about topics such as how Rome fell. Organizations just need to ensure general capabilities and correctness for their particular use cases. “That’s not necessarily what OpenAI has built,” he said. - Venture Beat

At $20M ARR, the acquisition was quoted at 65x… but closer to 33x at today’s pricing

In 6 months, MosaicML ARR went from $1M to $20M. Run rate growth would be hard to calculate at this scale, but suffice to say its fast and clearly the demand is there. Further you get Naveen Rao and 62 AI/ML experts in one of the hottest fields around.

In reality, the value of the deal is much less. Databricks is paying for Mosaic in stock at the same share price as Databricks’ last equity financing round, in 2021, which valued it at $38 billion, Ghodsi said. That period was the peak of startup valuations, so Databricks’ current valuation may be closer to half that. If Databricks’ share price were to be cut in half, the Mosaic deal would be worth closer to $650 million, or 32.5 times revenue. That’s well above the value put on most enterprise software deals outside AI, which lately have been done at less than 10 times next year’s revenue. - The Information

Efficiency as a competitive advantage

Efficiency has real life consequences when code starts to hit real life (silicon)

Mosaic’s models are cheaper but are also able to more effectively use GPU capacity which is highly valuable with limited industry capacity:

He also said that MosaicML was one of the few labs that have access to Nvidia H100 GPUs, which increased the throughput-per-GPU by over 2.4 times and resulted in a faster finish time. - Venture Beat

Nvidia has historically been the leader in LLM GPUs but Mosaic released a study noting the gap between Nvidia and AMD was within 73-80%:

Performance was competitive with our existing A100 systems. We profiled training throughput of MPT models from 1B to 13B parameters and found that the per-GPU throughput of MI250 was within 80% of the A100-40GB and within 73% of the A100-80GB. We expect this gap will close as AMD software improves. - MosaicML

SemiAnalysis pointed out that you really only need Nvidia at these speeds if you’re running 1000s of GPUs, giving Mosaic/AMD a competitive advantage as Nvidia GPUs are currently hard to source:

Mosaic’s stack, much of which is open source, is an obvious choice unless every last drop needs to be squeezed out with many dedicated scaling engineers for clusters of 10,000s of GPUs.
Now, MosaicML is going to be able to offer the same with AMD hardware. They have only just gotten their Instinct MI250 GPUs this quarter, but they are already close to matching Nvidia. - SemiAnalysis

Side note: SemiAnalysis is hella deep - would highly recommend if you’re heavy on the semi/hardware side

This is not Rao’s first rodeo building an ML/AI company which helps accelerate hardware. He was also the founder and CEO of Nervana which built open-sourced software and customized computer chips for machine learning applications. He sold the company to Intel in 2016.

Nervana brings expertise on the software side that can be used right away as well as designs for specialized chips that can be accelerated with Intel’s chipmaking know-how. - Fortune

LLM as Table Stakes + Importance of Privacy

Which all brings us back to the enteral life question (at least in SaaS): What is a moat?

Ben Thompson made a great point that with the simplification of models (plus lower cost and easier access to GPUs), LLMs are now just table stakes for data companies. Data remains the only moat:

It seems apparent — and frankly, makes intuitive sense — that the data storage providers see providing LLM capabilities as table stakes. At the same time, it also reinforces the idea that it’s not apparent that LLM capabilities will in-and-of-themselves be a long-term differentiator, or moat; what is a moat, though, is holding a company’s data — storage is sticky, which is to say that LLMs may attract customers, but inertia and egress costs will keep them. - Stratechery

While MosaicML focuses on open source, Snowflake has gone the opposite direction:

Neeva/Snowflake and MosaicML/Databricks also seem like the right pairing: Snowflake abstracts everything away from customers in Snowflake managed storage with a Snowflake proprietary schema, and it makes sense to add on a natural language interface on top of that (even if that interface is powered by an open source LLM, or OpenAI). Databricks, meanwhile, has an open source heritage (Apache Spark), and more of a picks-and-shovels approach; this very much fits with MosaicML which offers both open source models and tools for training data (Databricks is also much stronger in terms of storing data for machine learning). - Stratechery

Ghodsi and Rao were drawn together by the same belief that private and open-source models would be the future:

Both Ghodsi and Rao wanted to democratize AI and enable lots of people to have access to these models, both knew security was a big issue for their customers, and neither was interested in building a consumer app. This is counter to all the publicity generated by ChatGPT, the AI chatbox from Microsoft or Google’s Bard. “We’re not interested in that. We are focused on enterprises and big organizations, B2B, which is usually something very boring that no one cares about,” Ghodsi said. - Fortune

On the security front, Microsoft is planning to release a private version of ChatGPT that will cost 10x but keep data separate from other customers:

Later this quarter Microsoft’s Azure cloud server unit plans to sell a version of ChatGPT that runs on dedicated cloud servers where the data will be kept separate from those of other customers, according to two people with knowledge of the upcoming announcement. The idea is to give customers peace of mind that their secrets won’t leak to the main ChatGPT system, the people said. But it will come at a price: The product could cost as much as 10 times what customers currently pay to use the regular version of ChatGPT, one of these people said. - The Information

The fun part of any strategic analysis is figuring out where Microsoft stands and, even more importantly, what is their next move. Microsoft seems to have partnerships with EVERYONE which in the database world includes Snowflake AND Databricks. I snagged the below chart from @SouthernValue95 which was in a presentation leaked in a FTC inquiry (highlighting their’s).

I don’t have context beyond the slide, but what seems clear (at least as of FY22) is they want to clearly differentiate against Snowflake and seem more closely aligned with Databricks:

"Deliver the “Data Cloud”: Create a new data platform category by integrating databases, analytics, AI, an data governance into a comprehensive offering that further differentiates against point players such as Snowflake.