AI's Underpants Gnomes Problem: We've Got the Tech, Now What?

I picked up a flyer at an anti-AI march in London back in February. The people from Pause AI had clearly been watching South Park reruns. “Step 1: Grow a digital super mind,” it read. “Step 2: ? Step 3: ?”

If you remember the underpants gnomes episode from 1998, you know the joke. The gnomes steal underpants (Phase 1), skip straight to Phase 3 (Profit), and leave Phase 2 as a complete mystery. Elon Musk once used it to explain his Mars funding plan. Now it perfectly describes the entire AI industry.

Companies have built the models. They’ve promised transformation. But that middle bit? The part where you actually make money from this stuff? Still a question mark.

Pause AI thinks Step 2 should be regulation. Fair enough, but nobody can agree on what that regulation looks like or who enforces it. The AI boosters think Step 3 is salvation and just skip over the messy middle entirely. OpenAI’s chief scientist Jakub Pachocki told me we’re racing toward “economically transformative technology” on the back of AI. He knows where he wants to go. It’s just hazy up there and nobody’s taking the same route.

Here’s where things get interesting. Two recent studies paint completely different pictures. Anthropic published research predicting which jobs LLMs would affect most. Managers, architects, media folks should worry. Groundskeepers, construction workers, hospitality people? Not so much. But those predictions are basically educated guesses based on what LLMs seem good at in tests, not how they actually perform in messy real-world workplaces.

Then Mercor, an AI hiring startup, ran a different study. They tested AI agents from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks that human bankers, consultants, and lawyers actually do. Every single agent failed most of its duties. Every single one.

Why such disagreement? Depends who’s making the claim and why. Anthropic has skin in the game. Most of the people telling us something big is about to happen are basing that on how fast AI coding tools are getting. But not every job involves coding. Other studies show LLMs are terrible at strategic judgment calls.

And here’s the thing nobody wants to talk about: these tools don’t get dropped into clean rooms. They land in workplaces contaminated with actual humans and existing workflows. Sometimes adding AI makes things worse. Sure, maybe you need to tear up those workflows and rebuild them around the technology. But that takes time and guts, and most companies have neither.

That big hole where Step 2 should be creates an information vacuum. Every week some wild claim fills it, evidence be damned. A single social media post can shake markets because we’re so unmoored from any real understanding of what’s coming and how it’ll actually get deployed.

We need fewer guesses and more evidence. That means transparency from model makers, coordination between researchers and businesses, and new ways to evaluate this technology that tell us what really happens when it hits the real world.

The entire tech industry rests on the promise that AI will be transformative. But that’s not a sure bet yet. Next time someone makes bold claims about the future, remember: most businesses are still figuring out what to do with their underpants.

AI’s Underpants Gnomes Problem: We’ve Got the Tech, Now What?

Comments (0)