The AI Inference platform
Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Serverless pricing
Rich model catalog
Widely compatible
Scale up, and down
Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.
on a hyperscaler
on Cloudflare
AI models easily accessible via code, OpenAI SDK or API
Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.
Kimi K2.6
Powerful vision and agentic tool calling model
GLM 4.7 Flash
Rapid multilingual agent with expert tool calling
GPT-OSS-120B
Specialized for coding and debugging
Llama 4 Scout
Balanced generalist for everyday tasks
Run any AI model with one API call
Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.