API v1.2

Documentation

The Cosavu API provides programmatic access to our state-of-the-art prompt optimization engine. Build cost-efficient LLM agents by reducing noise and enforcing structural integrity at the source.

Authentication

The Cosavu API uses Bearer Authentication. All requests must include an API key in the request header. You can manage your API keys in the developer dashboard.

Example Header

Authorization: Bearer cosavu_live_key****************

Base URL

All API requests should be made to the following base internal endpoint.

api.cosavu.com/v1

Rate Limits

Rate limits vary by plan tier. Limits are applied per API key on a per-minute basis.

Plan	Limit (RPM)	Window
Free	10 req	60 seconds
Pro	100 req	60 seconds
Enterprise	Custom	Continuous

POST

/optimize

Submits raw prompt text to the optimization engine. The engine decomposes the input into structural blocks, refines instructions, and strips redundant tokens.

Request Body

promptstring (required)

target_modelstring (optional)

cURL Example

curl -X POST https://api.cosavu.com/v1/optimize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a summary of the quarterly report focus on finance.",
    "target_model": "gpt-4"
  }'

Response: Prompt IR Object

The engine returns a Prompt Intermediate Representation (IR) consisting of identified blocks and token metadata.

{
  "original_text": "...",
  "blocks": [
    {
      "block_type": "INSTRUCTION",
      "content": "Summarize Q3 financial report...",
      "original_tokens": 42,
      "optimized_tokens": 12,
      "is_compressed": true
    }
  ],
  "total_original_tokens": 42,
  "total_optimized_tokens": 12,
  "latency_ms": 284.5
}

Block Types

IDENTITY

System persona definitions

DATA

Background data/knowledge

INSTRUCTION

Specific action requests

CONSTRAINT

Formatting or safety rules

EXAMPLE

Few-shot demonstrations

OUTPUT

Format specifications

Error Handling

400

Bad Request

Often due to empty prompt or invalid JSON.

401

Unauthorized

Invalid or missing API key.

429

Too Many Requests

Rate limit exceeded for your tier.

500

Internal Error

Optimization cluster timeout or malfunction.

GET

/health

Returns the current operational status of the optimization clusters.

{
  "status": "ok",
  "engine": "Cosavu-Cluster-Alpha-7",
  "version": "1.2.0"
}

Best Practices

Explicit Structure Separation

The optimizer works best when background data is clearly distinct from instructions. Use clear headers like DATA: or INFO: in your prompts.

Target Specific Models

Setting the target_model parameter allows the engine to strip tokens known to be redundant for that specific architecture (e.g., removing excessive formatting for GPT-4).

Latency Management

Optimization takes 200-500ms on average. For real-time chat applications, we recommend optimizing system prompts asynchronously or during agent initialization, rather than on every user turn.

Integrate Enterprise Optimization

Deploy custom model tiers and 128k Max Input Words capacity with Enterprise clusters.