Model Pricing & Rate Limits

Introduction

This reference consolidates all actively supported (non-deprecated) models from OpenAI, Anthropic, Google, and xAI into separate provider tables. For each model we list:

RPM (Requests per Minute). If the vendor does not publish a per-model limit, we use an industry-standard placeholder of 60 RPM (est.)—adjust to your contract/tenant tier.
Costs per 1M tokens (USD) for input and output. If a model is billed per-call or per-image, we show N/A in the token columns and note the billing method.
Release date (when known).
Best used for guidance based on model positioning and typical enterprise usage.

Currency: USD.
Estimates: Where a vendor does not publish pricing/limits, we mark the field (est.) so your Spherium admins can budget conservatively and update once your account terms are confirmed.
Last reviewed: Aug 7, 2025.

OpenAI — GPT-5, o-series, GPT-4.x, GPT-4o, Image/Audio, Embeddings (non-deprecated)

Note: Some 4.x/4o endpoints are billed per call rather than per token. For those, set token costs to 0 in Spherium and use the “per-call” override field in your cost settings.

Model (API name or family)	RPM	Input $/1M	Output $/1M	Release (YYYY-MM-DD)	Best used for
GPT-5 (`gpt-5`)	60 (est.)	1.25	10.00	2025-08-07	Top-tier coding, agentic workflows, long context, “think hard” tasks.
GPT-5 Mini (`gpt-5-mini`)	60 (est.)	0.25	2.00	2025-08-07	High-throughput assistants, app backends, fast iteration.
GPT-5 Nano (`gpt-5-nano`)	120 (est.)	0.05	0.40	2025-08-07	Ultra-low-cost classification, routing, summarization at scale.
o3-pro (`o3-pro`)	60 (est.)	20.00	80.00	2025-06-10	Highest-reliability reasoning, deep tool use, eval-heavy tasks.
o3 (`o3`)	60 (est.)	2.00	8.00	2025-04-16	Strong general reasoning at moderate cost; coding & analysis.
o3-deep-research (`o3-deep-research`)	60 (est.)	10.00 (est.)	40.00 (est.)	2025	Automated long-form research, multi-step synthesis.
o4-mini (`o4-mini`)	60 (est.)	1.10	4.40	2025-04-16	“Turbo” tier: fast mixed workloads with solid reasoning.
o1-pro (`o1-pro`)	60 (est.)	150.00	600.00	2025-03-19	Premium “o1” reasoning; complex, correctness-critical work.
o1 (`o1`)	60 (est.)	15.00	60.00	2024-12-17	Reasoning-first model for tough prompts, planning, tutoring.
o1-mini (`o1-mini`)	120 (est.)	3.00 (est.)	12.00 (est.)	2024-12	Lightweight “o1” for scaled reasoning tasks.
GPT-4.1 (`gpt-4.1`)	60 (est.)	Per-call	Per-call	2025-04-14	High-quality general 4.x; great for broad assistants.
GPT-4.1 Mini / Nano	60 (est.)	Per-call	Per-call	2025-04	Lighter 4.1 variants for scale.
GPT-4o (`gpt-4o`)	60 (est.)	Per-call	Per-call	2024-05-13	Multimodal (vision/voice); production assistants.
GPT-4o Mini (`gpt-4o-mini`)	120 (est.)	0.15	0.60	2024-07-18	Ultra-low-cost text/vision; background jobs.
GPT-4o Realtime (voice)	60 (est.)	Per-minute	Per-minute	2024–2025	Live voice agents; streaming ASR + TTS.
Image: `gpt-image-1`	60 (est.)	N/A (per-image)	N/A	2024	Image generation (per-image pricing by resolution).
Audio TTS: `gpt-4o-mini-tts`	60 (est.)	N/A (per-minute)	N/A	2024–2025	Natural speech synthesis for apps/IVR/agents.
Audio STT: `whisper-1`	60 (est.)	N/A (per-minute)	N/A	Ongoing	High-quality speech-to-text transcription.
Embeddings: `text-embedding-3-large`	300 (est.)	0.13	N/A	2024-01	Search, RAG, clustering with best accuracy.
Embeddings: `text-embedding-3-small`	300 (est.)	0.02	N/A	2024-01	Cost-efficient embeddings at scale.

Anthropic — Claude 4.x & 3.x (non-deprecated)

Model	RPM	Input $/1M	Output $/1M	Release	Best used for
Claude Opus 4.1 (`claude-opus-4-1`)	60 (est.)	15.00	75.00	2025-08-05	Flagship coding & complex agentic workflows; precise edits.
Claude Opus 4 (`claude-opus-4`)	60 (est.)	15.00	75.00	2025-05-22	High-stakes reasoning; long tasks; migrations to 4.1 recommended.
Claude Sonnet 4 (`claude-sonnet-4`)	60 (est.)	3.00	15.00	2025-05-22	Enterprise default: speed, cost, and quality balance.
Claude Sonnet 3.7	60 (est.)	3.00	15.00	2025-02	Mature “workhorse” Sonnet; broad enterprise use.
Claude Sonnet 3.5	60 (est.)	3.00	15.00	2024-06	Cost-effective general tasks & summarization.
Claude Haiku 3.5	120 (est.)	0.80	4.00	2024-10	High-throughput Q&A, extraction, light analysis.

Note: Older Claude 3 models (Opus/Sonnet/Haiku) may still be callable on some platforms. Include only if your vendor/region lists them as active.

Google — Gemini 2.5/2.0 & Imagen (non-deprecated)

Billing note: Google commonly prices text models by tokens and image/video models per output. Where token pricing is not published, we show (est.) and recommend confirming in your Google Cloud (Vertex AI / Gemini API) console.

Model	RPM	Input $/1M	Output $/1M	Release	Best used for
Gemini 2.5 Pro	60 (est.)	1.25 (est.)	10.00 (est.)	2025-03	Advanced reasoning/coding; 1M+ token context.
Gemini 2.5 Flash	120 (est.)	0.30 (est.)	2.50 (est.)	2025	Fast, low-cost generation; assistants & tools.
Gemini 2.5 Flash-Lite	180 (est.)	0.10 (est.)	0.40 (est.)	2025	Ultra-cheap, latency-sensitive tasks.
Gemini 2.0 Flash	120 (est.)	0.075–0.15 (est.)	0.30–0.60 (est.)	2024–2025	“Flash” class: scale workloads with large prompts.
Gemini 1.5 Pro	60 (est.)	3.50 (est.)	10.50 (est.)	2024-02	Large-context multimodal; RAG & analysis.
Gemini 1.5 Flash	120 (est.)	0.35 (est.)	1.05 (est.)	2024-02	Throughput-oriented, cost-optimized tasks.
Imagen 3 (image gen)	60 (est.)	N/A (per-image)	N/A	2024	High-quality image generation in Vertex AI/Gemini API.
Embeddings (Text)	300 (est.)	0.10 (est.)	N/A	2023+	Search, recommendations, clustering.

xAI — Grok family & Image (non-deprecated)

Model	RPM	Input $/1M	Output $/1M	Release	Best used for
Grok-4	480 (published)	5.00 (est.)	20.00 (est.)	2025-07	High-end reasoning/coding; real-time X data apps.
Grok-3	60 (est.)	1.25 (est.)	5.00 (est.)	2025-02	General text generation with solid accuracy.
Grok-3-mini	120 (est.)	0.20 (est.)	0.60 (est.)	2025-02	Lightweight, high-volume tasks & automations.
Grok Image (latest)	60 (est.)	N/A (per-image)	N/A	2024–2025	Image generation inside the Grok/xAI stack.

Note: xAI’s public API pricing is evolving. Treat token costs as placeholders until confirmed by your account team. RPM for Grok-4 (480 RPM / 2M TPM) is commonly published; others vary by plan.

How to configure this in Spherium.ai

Go to Settings → Organization → Model Integration Settings.
Add each model with its canonical ID (e.g., gpt-5, o3-pro, claude-opus-4-1, gemini-2.5-pro, grok-4, gpt-image-1).
Enter pricing using the tables above. For per-call or per-image SKUs, leave token fields at 0 and set per-call/per-image cost in the override field.
RPM: If your contract doesn’t specify a rate, begin with 60 RPM (est.) and adjust after vendor confirmation.
Verify usage and spend under Reports → Model Usage & Cost after rollout.

Tips & Troubleshooting

💡 Route smartly: Use Spherium’s Routing Rules to send low-stakes, high-volume tasks to smaller models (e.g., GPT-5 Nano, Gemini 2.5 Flash-Lite), reserving GPT-5 / Opus 4.1 / o3-pro for critical reasoning.
💡 Per-call SKUs: For GPT-4.1 / 4o families billed per call, set token prices to 0 and define per-call cost so reports stay accurate.
⚠️ Image & audio models: These often use per-image or per-minute billing. Mark token columns N/A and use Spherium’s override fields to capture the right unit price.
⚙️ Batch & caching: If your provider supports batch pricing or prompt caching (e.g., Anthropic), model your effective $/1M in Spherium to reflect typical cache hit rates.

How to Register a Model in Spherium.ai
How to Register for an OpenAI API Key
How to Register for an Anthropic API Key
How to Register for a Google Gemini API Key
How to Register for an xAI (Grok) API Key

Conclusion

Use this page to keep your model catalog clean and your costs predictable. If your vendor adds or retires a model, update the relevant table and Spherium’s Model Integration Settings so routing and reporting stay accurate. Questions or corrections? Email support@spherium.ai.

Copyright & Use

This content is proprietary to Spherium.ai and subject to our license agreement. Redistribution without permission is prohibited.

If you have any questions about this policy or need assistance, please contact us at support@spherium.ai.