Elon Musk admits xAI trained Grok on OpenAI models—so much for the moral high ground

Elon Musk sat down for a deposition this week and dropped a bombshell that shouldn’t surprise anyone who’s been paying attention: xAI trained Grok using outputs from OpenAI’s models. Specifically, he confirmed the use of “distillation”—a technique where a smaller model learns from a larger one’s outputs.

This is the same Elon Musk who co-founded OpenAI, sued them for abandoning their nonprofit mission, and built xAI on a platform of “open” AI. The irony is thick enough to cut with a GPU.

Distillation isn’t new. It’s been a staple in ML for years. You take a big, expensive model like GPT-4, run a bunch of queries, and use those outputs to train a smaller, cheaper model that approximates the larger one’s behavior. Frontier labs hate it because it lets smaller players piggyback on their billions in R&D without paying for it. OpenAI, Google, and Anthropic have all tried to block or detect it.

But here’s the thing: Musk’s testimony essentially confirms what many in the community suspected. When Grok launched, it was competent but not groundbreaking. It had the same flavor as GPT-3.5/4 in its early versions. The question wasn’t whether they did distillation—it was whether they’d ever admit it.

Now they have.

Musk’s defense, according to reports, is that using model outputs for training is standard practice and not the same as copying the model itself. Technically true. Practically, it’s a gray area that every major lab dances around. OpenAI themselves have been accused of training on outputs from other models. The difference is they don’t brag about being the open, ethical alternative.

What makes this particularly awkward is Musk’s ongoing lawsuit against OpenAI, where he claims they breached their original nonprofit charter by becoming a for-profit entity and licensing technology to Microsoft. The deposition testimony undermines his moral positioning. You can’t simultaneously accuse someone of being a closed, greedy monopoly while using their work to build your own product.

I’ve seen this pattern before in AI. Everyone complains about distillation until they need it to catch up. Startups do it. Researchers do it. Even the big labs do it to each other. It’s the dirty secret of the industry that nobody wants to talk about in public.

The real question is whether this changes anything. OpenAI could theoretically pursue legal action, but proving damages from distillation is notoriously difficult. And they’d be opening themselves up to counterclaims about their own training data practices.

For now, the practical impact is minimal. Grok is already a decent model, and xAI has enough funding to train their own from scratch if they wanted to. This testimony just confirms that the shortcut was taken—and that the AI industry’s ethical lines are drawn in sand, not stone.

Musk’s deposition also revealed that xAI used a cluster of 10,000 H100 GPUs for training, which is a lot but not insane by current standards. For context, Meta’s Llama 3 was trained on 16,000+ H100s. So xAI isn’t exactly operating at Google scale.

What bothers me is the hypocrisy more than the act. Distillation is a legitimate technique. The problem is claiming the moral high ground while doing it. If Musk had just said “we used distillation to bootstrap Grok, so what?” this would be a non-story. Instead, he built a narrative around openness and transparency while doing the same closed-door shortcut everyone else uses.

At the end of the day, this is a reminder that the AI race is messy, full of compromises, and driven by competitive pressure more than principles. The companies that look the most principled are often just the ones who haven’t been caught yet.

Elon Musk admits xAI trained Grok on OpenAI models—so much for the moral high ground

Comments (0)