API v1.2

Documentation

The Cosavu API provides programmatic access to our state-of-the-art prompt optimization engine. Build cost-efficient LLM agents by reducing noise and enforcing structural integrity at the source.

Authentication

The Cosavu API uses Bearer Authentication. All requests must include an API key in the request header. You can manage your API keys in the developer dashboard.

Example Header
Authorization: Bearer cosavu_live_key****************

Base URL

All API requests should be made to the following base internal endpoint.

api.cosavu.com/v1

Rate Limits

Rate limits vary by plan tier. Limits are applied per API key on a per-minute basis.

PlanLimit (RPM)Window
Free10 req60 seconds
Pro100 req60 seconds
EnterpriseCustomContinuous
POST

/optimize

Submits raw prompt text to the optimization engine. The engine decomposes the input into structural blocks, refines instructions, and strips redundant tokens.

Request Body

promptstring (required)
target_modelstring (optional)

cURL Example

curl -X POST https://api.cosavu.com/v1/optimize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a summary of the quarterly report focus on finance.",
    "target_model": "gpt-4"
  }'

Response: Prompt IR Object

The engine returns a Prompt Intermediate Representation (IR) consisting of identified blocks and token metadata.

{
  "original_text": "...",
  "blocks": [
    {
      "block_type": "INSTRUCTION",
      "content": "Summarize Q3 financial report...",
      "original_tokens": 42,
      "optimized_tokens": 12,
      "is_compressed": true
    }
  ],
  "total_original_tokens": 42,
  "total_optimized_tokens": 12,
  "latency_ms": 284.5
}

Block Types

IDENTITY
System persona definitions
DATA
Background data/knowledge
INSTRUCTION
Specific action requests
CONSTRAINT
Formatting or safety rules
EXAMPLE
Few-shot demonstrations
OUTPUT
Format specifications

Error Handling

400
Bad Request
Often due to empty prompt or invalid JSON.
401
Unauthorized
Invalid or missing API key.
429
Too Many Requests
Rate limit exceeded for your tier.
500
Internal Error
Optimization cluster timeout or malfunction.
GET

/health

Returns the current operational status of the optimization clusters.

{
  "status": "ok",
  "engine": "Cosavu-Cluster-Alpha-7",
  "version": "1.2.0"
}

Best Practices

Explicit Structure Separation

The optimizer works best when background data is clearly distinct from instructions. Use clear headers like DATA: or INFO: in your prompts.

Target Specific Models

Setting the target_model parameter allows the engine to strip tokens known to be redundant for that specific architecture (e.g., removing excessive formatting for GPT-4).

Latency Management

Optimization takes 200-500ms on average. For real-time chat applications, we recommend optimizing system prompts asynchronously or during agent initialization, rather than on every user turn.

Integrate Enterprise Optimization

Deploy custom model tiers and 128k Max Input Words capacity with Enterprise clusters.