DeepInfra Joins Hugging Face Inference Providers – What That Actually Means

DeepInfra Joins Hugging Face Inference Providers – What That Actually Means

6 0 0

Hugging Face just added DeepInfra to its Inference Providers lineup, and honestly, this is one of those integrations that makes you wonder why it took this long.

DeepInfra has been flying under the radar for a while, quietly offering some of the most competitive per-token pricing in the serverless AI inference space. With over 100 models in their catalog – covering LLMs, text-to-image, embeddings, and even text-to-video – they’ve built a solid reputation among developers who care about cost and latency.

Now that they’re officially part of the Hugging Face ecosystem, you can call DeepInfra-hosted models directly from model pages, the Python SDK (huggingface_hub >= 1.11.2), or the JavaScript SDK (@huggingface/inference). No extra setup, no juggling multiple accounts unless you want to.

What’s Supported Right Now

For this initial rollout, DeepInfra is focusing on conversational and text-generation tasks. That means you get access to popular open-weight LLMs like DeepSeek V4, Kimi-K2.6, GLM-5.1, and more. Text-to-image, text-to-video, and embeddings are coming soon – which is fine, because the LLM lineup alone is pretty solid.

Two Ways to Route Requests

This is where it gets flexible. You can either:

  1. Use your own DeepInfra API key – requests go directly to DeepInfra, and you’re billed on your DeepInfra account. This is great if you already have an account or want to manage costs separately.
  1. Let Hugging Face route the request – you authenticate with your HF token, and the charges hit your Hugging Face account instead. No markup from Hugging Face, just the standard provider rates passed through. (They mention revenue-sharing might come later, but for now, it’s clean pass-through.)

Both options are configurable in your account settings, where you can also set provider order preferences for the widget and code snippets on model pages.

Billing and the PRO Perk

If you’re a Hugging Face PRO subscriber ($2 worth of Inference credits every month), those credits can be used across providers – including DeepInfra. That’s actually a nice deal if you’re already paying for PRO for Spaces or ZeroGPU. Free-tier users get a small quota too, but honestly, if you’re doing anything serious, just upgrade.

The Developer Experience

The integration is seamless. Here’s a Python example using the OpenAI-compatible client:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[
        {"role": "user", "content": "Write a Python function that returns the nth Fibonacci number using memoization."}
    ],
)

print(completion.choices[0].message)

And the JavaScript version:

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "https://router.huggingface.co/v1",
    apiKey: process.env.HF_TOKEN,
});

const chatCompletion = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages: [
        { role: "user", content: "Write a Python function that returns the nth Fibonacci number using memoization." }
    ],
});

console.log(chatCompletion.choices[0].message);

Notice the model identifier format: org/model-name:provider. That :deepinfra suffix tells the router which provider to use. Clean and unambiguous.

Agent Harness Integration

This is where it gets interesting for people building agents. Hugging Face Inference Providers are already integrated into most popular agent frameworks – Pi, OpenCode, Hermes Agents, OpenClaw, and others. So you can plug DeepInfra-hosted models into your agent workflows without writing glue code. That’s a time-saver.

What I Think

DeepInfra’s pricing is genuinely competitive, and the 100+ model catalog gives you plenty of options. The fact that you can use it through Hugging Face’s existing infrastructure without signing up for yet another platform is a win.

That said, the initial support is limited to text generation. If you need embeddings or image generation right now, you’ll have to wait for the next rollout. Also, the free tier quota is small – but that’s expected for a service that’s trying to make money.

The PRO credit perk is nice, but $2/month won’t get you far if you’re running serious workloads. Still, for experimentation and prototyping, it’s a decent starting point.

One thing I appreciate: Hugging Face isn’t adding a markup on routed requests. That’s transparent and developer-friendly. The future revenue-sharing model might change that, but for now, it’s a clean pass-through.

Bottom Line

DeepInfra on Hugging Face is a solid addition. If you’re already in the Hugging Face ecosystem and want cheap, serverless inference for open-weight LLMs, this is worth trying. Just don’t expect the full model catalog on day one – the rollout is phased, and text generation is just the beginning.

Comments (0)

Be the first to comment!