Claude Opus 4.7 Is Here: Better at Hard Code, Better Vision, and a Nod to Safety

Anthropic quietly (well, not that quietly) released Claude Opus 4.7 today, and honestly, this one feels like more than a point release. It’s available now across all Claude products, the API, Bedrock, Vertex AI, and Microsoft Foundry—same pricing as Opus 4.6, so no surprise there: $5 per million input tokens, $25 per million output.

What’s Actually Better

The headline improvement is in advanced software engineering. Anthropic says users are now handing off their hardest coding work—the kind that used to require close hand-holding—to Opus 4.7 with real confidence. I’ve seen this pattern before with model releases, but the early tester quotes are unusually specific. One tester noted that Opus 4.7 “catches its own logical faults during the planning phase,” which is exactly the kind of self-correction that separates a useful coding assistant from a frustrating one.

Vision is also substantially better. Higher resolution support means it can actually read chemical structures and complex technical diagrams now. For anyone working in life sciences or engineering documentation, that’s a genuine quality-of-life improvement.

The Safety Angle Matters

Here’s the part that caught my attention: Opus 4.7 is the first model to ship with new cybersecurity safeguards that automatically detect and block prohibited or high-risk cybersecurity requests. This is directly tied to their Project Glasswing announcement from last week, where they flagged both the risks and benefits of AI in cybersecurity.

Anthropic is being pretty transparent here—they deliberately reduced Opus 4.7’s cyber capabilities compared to their most powerful model, Claude Mythos Preview. The idea is to test these safeguards on a less capable model first before eventually releasing something Mythos-class more broadly. I respect this approach, even if it means Opus 4.7 isn’t the absolute top-tier model for every use case.

If you’re a security professional who needs Opus 4.7 for legitimate work like vulnerability research or penetration testing, they’ve set up a Cyber Verification Program. Worth checking out if that’s your domain.

Benchmark Performance (The Numbers)

The benchmarks look solid, though not earth-shattering. One team reported a 13% lift in resolution on their 93-task coding benchmark, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve. Another noted that on their internal research-agent benchmark, Opus 4.7 scored 0.715 overall and showed the most consistent long-context performance of any model they tested.

A particularly interesting data point: on a General Finance module, it scored 0.813 versus Opus 4.6’s 0.767, and testers emphasized that it’s better at admitting when data is missing rather than fabricating plausible-sounding but incorrect answers. That’s a subtle but critical improvement for anyone doing real analysis work.

What Early Testers Are Saying

The feedback from early-access users is genuinely positive, and I don’t think that’s just PR spin. Hex called it “the strongest model we’ve evaluated” and noted that low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. Cognition’s team said it “takes long-horizon autonomy to a new level in Devin” and handles hours-long coherent work without giving up.

The common thread across all the tester quotes is consistency and thoroughness over long-running tasks. That’s not flashy, but it’s exactly what matters when you’re using these models for real work, not just demos.

The Bottom Line

Opus 4.7 isn’t a revolution—it’s a solid, meaningful evolution. Better at hard coding, better vision, better at catching its own mistakes, and shipped with a thoughtful safety strategy. If you’re already on Opus 4.6, the upgrade is a no-brainer at the same price. If you were waiting for something that could handle your most difficult multi-step tasks with less supervision, this might be the one.

And honestly? The fact that they’re being upfront about the cyber capability reductions and the phased safety rollout gives me more confidence in the model than any benchmark number could.