Skip to content
English
  • There are no suggestions because the search field is empty.

Model Pricing & Rate Limits

A single source of truth for current model families, token costs (USD), RPM, release timing, and best-fit uses—

 

Introduction

This reference consolidates all actively supported (non-deprecated) models from OpenAI, Anthropic, Google, and xAI into separate provider tables. For each model we list:

  • RPM (Requests per Minute). If the vendor does not publish a per-model limit, we use an industry-standard placeholder of 60 RPM (est.)—adjust to your contract/tenant tier.

  • Costs per 1M tokens (USD) for input and output. If a model is billed per-call or per-image, we show N/A in the token columns and note the billing method.

  • Release date (when known).

  • Best used for guidance based on model positioning and typical enterprise usage.

Currency: USD.
Estimates: Where a vendor does not publish pricing/limits, we mark the field (est.) so your Spherium admins can budget conservatively and update once your account terms are confirmed.
Last reviewed: Aug 7, 2025.


OpenAI — GPT-5, o-series, GPT-4.x, GPT-4o, Image/Audio, Embeddings (non-deprecated)

Note: Some 4.x/4o endpoints are billed per call rather than per token. For those, set token costs to 0 in Spherium and use the “per-call” override field in your cost settings.

Model (API name or family) RPM Input $/1M Output $/1M Release (YYYY-MM-DD) Best used for
GPT-5 (gpt-5) 60 (est.) 1.25 10.00 2025-08-07 Top-tier coding, agentic workflows, long context, “think hard” tasks.
GPT-5 Mini (gpt-5-mini) 60 (est.) 0.25 2.00 2025-08-07 High-throughput assistants, app backends, fast iteration.
GPT-5 Nano (gpt-5-nano) 120 (est.) 0.05 0.40 2025-08-07 Ultra-low-cost classification, routing, summarization at scale.
o3-pro (o3-pro) 60 (est.) 20.00 80.00 2025-06-10 Highest-reliability reasoning, deep tool use, eval-heavy tasks.
o3 (o3) 60 (est.) 2.00 8.00 2025-04-16 Strong general reasoning at moderate cost; coding & analysis.
o3-deep-research (o3-deep-research) 60 (est.) 10.00 (est.) 40.00 (est.) 2025 Automated long-form research, multi-step synthesis.
o4-mini (o4-mini) 60 (est.) 1.10 4.40 2025-04-16 “Turbo” tier: fast mixed workloads with solid reasoning.
o1-pro (o1-pro) 60 (est.) 150.00 600.00 2025-03-19 Premium “o1” reasoning; complex, correctness-critical work.
o1 (o1) 60 (est.) 15.00 60.00 2024-12-17 Reasoning-first model for tough prompts, planning, tutoring.
o1-mini (o1-mini) 120 (est.) 3.00 (est.) 12.00 (est.) 2024-12 Lightweight “o1” for scaled reasoning tasks.
GPT-4.1 (gpt-4.1) 60 (est.) Per-call Per-call 2025-04-14 High-quality general 4.x; great for broad assistants.
GPT-4.1 Mini / Nano 60 (est.) Per-call Per-call 2025-04 Lighter 4.1 variants for scale.
GPT-4o (gpt-4o) 60 (est.) Per-call Per-call 2024-05-13 Multimodal (vision/voice); production assistants.
GPT-4o Mini (gpt-4o-mini) 120 (est.) 0.15 0.60 2024-07-18 Ultra-low-cost text/vision; background jobs.
GPT-4o Realtime (voice) 60 (est.) Per-minute Per-minute 2024–2025 Live voice agents; streaming ASR + TTS.
Image: gpt-image-1 60 (est.) N/A (per-image) N/A 2024 Image generation (per-image pricing by resolution).
Audio TTS: gpt-4o-mini-tts 60 (est.) N/A (per-minute) N/A 2024–2025 Natural speech synthesis for apps/IVR/agents.
Audio STT: whisper-1 60 (est.) N/A (per-minute) N/A Ongoing High-quality speech-to-text transcription.
Embeddings: text-embedding-3-large 300 (est.) 0.13 N/A 2024-01 Search, RAG, clustering with best accuracy.
Embeddings: text-embedding-3-small 300 (est.) 0.02 N/A 2024-01 Cost-efficient embeddings at scale.
 

Anthropic — Claude 4.x & 3.x (non-deprecated)

Model RPM Input $/1M Output $/1M Release Best used for
Claude Opus 4.1 (claude-opus-4-1) 60 (est.) 15.00 75.00 2025-08-05 Flagship coding & complex agentic workflows; precise edits.
Claude Opus 4 (claude-opus-4) 60 (est.) 15.00 75.00 2025-05-22 High-stakes reasoning; long tasks; migrations to 4.1 recommended.
Claude Sonnet 4 (claude-sonnet-4) 60 (est.) 3.00 15.00 2025-05-22 Enterprise default: speed, cost, and quality balance.
Claude Sonnet 3.7 60 (est.) 3.00 15.00 2025-02 Mature “workhorse” Sonnet; broad enterprise use.
Claude Sonnet 3.5 60 (est.) 3.00 15.00 2024-06 Cost-effective general tasks & summarization.
Claude Haiku 3.5 120 (est.) 0.80 4.00 2024-10 High-throughput Q&A, extraction, light analysis.
 

Note: Older Claude 3 models (Opus/Sonnet/Haiku) may still be callable on some platforms. Include only if your vendor/region lists them as active.


Google — Gemini 2.5/2.0 & Imagen (non-deprecated)

Billing note: Google commonly prices text models by tokens and image/video models per output. Where token pricing is not published, we show (est.) and recommend confirming in your Google Cloud (Vertex AI / Gemini API) console.

Model RPM Input $/1M Output $/1M Release Best used for
Gemini 2.5 Pro 60 (est.) 1.25 (est.) 10.00 (est.) 2025-03 Advanced reasoning/coding; 1M+ token context.
Gemini 2.5 Flash 120 (est.) 0.30 (est.) 2.50 (est.) 2025 Fast, low-cost generation; assistants & tools.
Gemini 2.5 Flash-Lite 180 (est.) 0.10 (est.) 0.40 (est.) 2025 Ultra-cheap, latency-sensitive tasks.
Gemini 2.0 Flash 120 (est.) 0.075–0.15 (est.) 0.30–0.60 (est.) 2024–2025 “Flash” class: scale workloads with large prompts.
Gemini 1.5 Pro 60 (est.) 3.50 (est.) 10.50 (est.) 2024-02 Large-context multimodal; RAG & analysis.
Gemini 1.5 Flash 120 (est.) 0.35 (est.) 1.05 (est.) 2024-02 Throughput-oriented, cost-optimized tasks.
Imagen 3 (image gen) 60 (est.) N/A (per-image) N/A 2024 High-quality image generation in Vertex AI/Gemini API.
Embeddings (Text) 300 (est.) 0.10 (est.) N/A 2023+ Search, recommendations, clustering.
 

xAI — Grok family & Image (non-deprecated)

Model RPM Input $/1M Output $/1M Release Best used for
Grok-4 480 (published) 5.00 (est.) 20.00 (est.) 2025-07 High-end reasoning/coding; real-time X data apps.
Grok-3 60 (est.) 1.25 (est.) 5.00 (est.) 2025-02 General text generation with solid accuracy.
Grok-3-mini 120 (est.) 0.20 (est.) 0.60 (est.) 2025-02 Lightweight, high-volume tasks & automations.
Grok Image (latest) 60 (est.) N/A (per-image) N/A 2024–2025 Image generation inside the Grok/xAI stack.
 

Note: xAI’s public API pricing is evolving. Treat token costs as placeholders until confirmed by your account team. RPM for Grok-4 (480 RPM / 2M TPM) is commonly published; others vary by plan.


How to configure this in Spherium.ai

  1. Go to Settings → Organization → Model Integration Settings.

  2. Add each model with its canonical ID (e.g., gpt-5, o3-pro, claude-opus-4-1, gemini-2.5-pro, grok-4, gpt-image-1).

  3. Enter pricing using the tables above. For per-call or per-image SKUs, leave token fields at 0 and set per-call/per-image cost in the override field.

  4. RPM: If your contract doesn’t specify a rate, begin with 60 RPM (est.) and adjust after vendor confirmation.

  5. Verify usage and spend under Reports → Model Usage & Cost after rollout.


Tips & Troubleshooting

  • 💡 Route smartly: Use Spherium’s Routing Rules to send low-stakes, high-volume tasks to smaller models (e.g., GPT-5 Nano, Gemini 2.5 Flash-Lite), reserving GPT-5 / Opus 4.1 / o3-pro for critical reasoning.

  • 💡 Per-call SKUs: For GPT-4.1 / 4o families billed per call, set token prices to 0 and define per-call cost so reports stay accurate.

  • ⚠️ Image & audio models: These often use per-image or per-minute billing. Mark token columns N/A and use Spherium’s override fields to capture the right unit price.

  • ⚙️ Batch & caching: If your provider supports batch pricing or prompt caching (e.g., Anthropic), model your effective $/1M in Spherium to reflect typical cache hit rates.


Related Articles


Meta

  • Meta tags: AI model pricing, GPT-5, o1 o3 models, Claude 4.1, Gemini 2.5, Grok-4

     


Conclusion

Use this page to keep your model catalog clean and your costs predictable. If your vendor adds or retires a model, update the relevant table and Spherium’s Model Integration Settings so routing and reporting stay accurate. Questions or corrections? Email support@spherium.ai.


Copyright & Use

This content is proprietary to Spherium.ai and subject to our license agreement. Redistribution without permission is prohibited.

If you have any questions about this policy or need assistance, please contact us at support@spherium.ai.