Testing LLMs on Superconductivity Research Questions

Google Research just published a paper in the Proceedings of the National Academy of Sciences that I find genuinely interesting. They tested whether LLMs can act as expert-level research partners in condensed matter physics—specifically, high-temperature superconductivity.

This is a smart choice of domain. High-Tc superconductivity in cuprates has been an open problem since the 1987 Nobel Prize. Thousands of papers exist, multiple competing theories, and the literature is dense enough that even experienced researchers struggle to keep a neutral, comprehensive view. A new grad student entering this field could really use a knowledgeable, unbiased tutor.

So the researchers took six LLMs and asked them high-level questions about the underlying mechanisms. Then a panel of physics experts graded the responses on accuracy, completeness, and how well they handled competing theories.

The top performers? NotebookLM and a custom-built system. Both pulled from a closed ecosystem of certified, quality-controlled sources. That’s not surprising—when you need to avoid hallucinated references or outdated claims, curating the input makes a huge difference.

But here’s the kicker: even the best systems had clear areas for improvement. The paper doesn’t sugarcoat it. These models aren’t ready to replace a human expert, but they can serve as a thought partner—especially for getting up to speed on a complex field or exploring research directions without bias.

I’ve seen similar approaches before. Google’s earlier CURIE benchmark tested LLMs on analytical tasks across six scientific disciplines, and other groups have used AI to generate hypotheses or write scientific software. This new work feels like a natural progression: instead of just fact-checking or equation-solving, it’s about navigating open scientific questions where there’s no single right answer.

The practical takeaway? If you’re a researcher in a specialized field, tools like NotebookLM with curated references could save you hours of literature review. But don’t trust them blindly—especially on topics where consensus hasn’t formed yet. The models still struggle with nuance and can oversimplify competing theories.

I’d love to see this extended to other fields—maybe protein folding or climate modeling—where the literature is equally vast and fragmented. For now, this is a solid step toward making AI a genuine research partner, not just a glorified search engine.

Testing LLMs on Superconductivity Research Questions

Comments (0)