Reference · Rate limits
Rate limits
Cosavu applies per-key rate limits to protect platform stability. Limits are returned on every response so you can adjust your client behaviour in real time.
Limits by plan
Default limits per API key. Concurrency limits apply per endpoint, not globally:
| Plan | Requests / min | Concurrent | Tokens / day |
|---|---|---|---|
| Test | 60 | 5 | 1M |
| Free | 300 | 10 | 10M |
| Pro | 3,000 | 50 | 1B |
| Scale | 30,000 | 500 | Unlimited |
| Enterprise | Custom | Custom | Custom |
Need higher limits? Get in touch — Enterprise contracts can raise any limit.
Rate-limit headers
Every successful and rate-limited response includes the following headers:
X-RateLimit-Limit: 3000 X-RateLimit-Remaining: 2487 X-RateLimit-Reset: 1714843200 X-RateLimit-Window: 60 Retry-After: 12
| Header | Description |
|---|---|
X-RateLimit-Limit | Total requests allowed in the current window. |
X-RateLimit-Remaining | Requests remaining before you hit the limit. |
X-RateLimit-Reset | Unix timestamp when the window resets. |
X-RateLimit-Window | Length of the rate-limit window in seconds. |
Retry-After | Sent on 429 responses. Seconds to wait before retrying. |
Handling 429 responses
The SDK automatically respects Retry-After and retries with exponential backoff. If you're calling the API directly, follow the same pattern:
async function callWithBackoff(fn: () => Promise<Response>, max = 5) { for (let attempt = 0; attempt < max; attempt++) { const res = await fn() if (res.status !== 429) return res const retryAfter = Number(res.headers.get("Retry-After") ?? 1) const jitter = Math.random() * 0.3 * retryAfter await new Promise(r => setTimeout(r, (retryAfter + jitter) * 1000)) } throw new Error("Exceeded max retries") }
Best practice
Watch X-RateLimit-Remaining on successful responses. Start slowing down preemptively at 20% remaining — don't wait for 429s.
Burst windows
Rate limits use a sliding window with burst tolerance. You can briefly exceed the per-minute rate as long as your trailing average stays under the limit. Typical burst headroom is 2× the steady-state limit for up to 10 seconds.
This means short spikes won't hit 429s, but sustained traffic above the limit will. Plan for the steady-state number, not the burst ceiling.