The AI Inference platform

Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.

Serverless pricing

Pay-per-inference pricing with no idle costs. No guessing what.

Rich model catalog

50+ models running close to users in 200+ cities

Widely compatible

One API call, works with any OpenAI SDK or task type

Scale up, and down

Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.

What you pay for
on a hyperscaler
What you pay for
on Cloudflare

AI models easily accessible via code, OpenAI SDK or API

Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.

Kimi K2.6

Powerful vision and agentic tool calling model

GLM 4.7 Flash

Rapid multilingual agent with expert tool calling

GPT-OSS-120B

Specialized for coding and debugging

Llama 4 Scout

Balanced generalist for everyday tasks

Run any AI model with one API call

Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.

Background Pattern