Vantage: Google's AI Experiment for Scoring Future-Ready Skills

Google Research has been quietly working on something that might actually matter more than the next chatbot feature: measuring the human skills that AI can’t replace.

Today they’re opening up Vantage, a research experiment that uses generative AI to put students into simulated conversations and score their “future-ready” skills like critical thinking, collaboration, and creative problem-solving. It’s built with NYU pedagogy experts and is now available for sign-up on Google Labs.

The problem with measuring squishy skills

We’ve all seen the lists from the OECD and World Economic Forum — critical thinking, collaboration, creative thinking — the durable competencies that are supposed to survive whatever automation throws at us. These aren’t new. Teachers have been trying to develop them for decades. The problem is assessment.

Standardized tests are terrible at capturing how someone actually thinks through a problem or works with others. You can’t grade conflict resolution with a multiple-choice question. And real human evaluation? Too expensive, too inconsistent, too slow. How do you fairly assess a group’s collaboration skills when one team never disagrees, or another settles on the first mediocre idea that comes up?

This is the gap Vantage is trying to fill. The team built a system where students interact with AI avatars in dynamic, multi-party conversations. Think preparing for a debate or pitching a creative vision, but the avatars are scripted to push back, introduce conflicts, and create situations that force the student to demonstrate those hard-to-measure skills.

How the AI scoring actually works

The architecture is interesting. There’s an “Executive LLM” running in the background that analyzes the conversation in real-time. It uses a predefined rubric to steer the avatars — deciding when to challenge an idea, when to introduce disagreement, when to ask for clarification. By the end of the conversation, it claims to have gathered enough data to assess the student’s skills.

This is essentially an adaptive test engine, but for soft skills. The avatars aren’t just chat bots; they’re assessment instruments. The system controls the environment while still feeling more authentic than a written exam.

Does it actually work?

The early results are promising but not surprising. In their study with NYU, the AI scoring was “on par with human experts.” That’s the standard line for these things. I’d be more interested in seeing the edge cases — where did the AI miss something a human caught? What kind of students did it misjudge?

The tech report is available, and I’d encourage anyone seriously interested in assessment to dig into the methodology. The key question isn’t whether the AI can match human raters on average, but whether it can handle the outliers — the neurodivergent student who communicates differently, the non-native speaker, the kid who freezes in simulated pressure.

The classroom reality check

Vantage is currently positioned as a sandbox for practice and validated assessment. That’s smart. They’re not claiming to replace teachers or standardized tests entirely. The idea is to give educators a tool that makes these skills measurable enough to actually teach them.

Because here’s the thing: what gets measured gets taught. If we can’t assess critical thinking, schools won’t prioritize it. Math and science have clear tests. Future-ready skills have been the educational equivalent of “we’ll get to it next semester.”

I’m cautiously optimistic about Vantage. The approach is thoughtful — using AI to create controlled but authentic scenarios is genuinely novel. But I’ve seen too many AI-in-education projects that work beautifully in demos and fall apart in actual classrooms with spotty internet and overworked teachers.

The fact that it’s a research experiment, not a product launch, is a good sign. Google’s being honest about what they don’t know yet. If you’re in education or assessment, it’s worth signing up and putting it through its paces. The rest of us can watch and see if this actually moves the needle on how we teach the skills that matter.

Vantage: Google’s AI Experiment for Scoring Future-Ready Skills

The problem with measuring squishy skills

How the AI scoring actually works

Does it actually work?

The classroom reality check

Comments (0)