pricing

we take a cut of what we save you.

fgy charges 15% of the provider inference cost it avoids on your behalf. if nothing is saved, nothing is charged. there is no platform fee, no minimum spend, and no charge for passing traffic through on a miss.

billing formula
avoided_cost = tokens_saved * provider_rate
fgy_charge   = avoided_cost * 0.15
# concrete example with gpt-4o-mini output tokens
tokens_saved  = 1000
provider_rate = 0.000060 # $0.060 per 1k output tokens
avoided_cost  = 0.060   # $0.060 saved
fgy_charge    = 0.009   # $0.009 charged

per-request cost

costs map directly to outcomes.

every request through fgy takes one of three paths. two of them save you money. one of them costs fgy nothing to charge for.

cache outcome
your provider cost
fgy charge
what happened
x-fgy-cache: exact
$0
15% of saved cost
prompt hash matched an ets entry. returned from memory, no upstream call.
x-fgy-cache: semantic
$0
15% of saved cost
prompt embedding matched a prior result above the 0.92 cosine threshold. no upstream call.
x-fgy-cache: miss
normal provider rate
$0
no match. request went upstream. response stored for future hits.

how fgy fits in

a cache layer, not a key vault.

fgy does not store or manage your provider api keys. you send your own bearer token on every request and fgy passes it through if the cache misses. the only credential fgy issues is a cache key that identifies your tenant.

what you send
POST https://api.fgy.ai/v1/chat/completions
Authorization: Bearer fgy_tenant_key
X-Provider-Auth: Bearer sk-your-own-key
Content-Type: application/json
{"model": "gpt-4o-mini", "messages": [...]}

your provider key travels with the request. on a miss, fgy forwards it upstream and returns the response. on a hit, the provider key is never used.

what fgy manages
your fgy tenant key for identifying billing and cache namespace
the ets and pgvector cache stores scoped to your tenant and model
hit counts and token savings for dashboard and billing
your upstream provider key — never stored

deployment modes

choose how much you depend on the cache.

because fgy does not hold your provider keys, you can decide whether the cache is a hard dependency or an optional layer. these are the two common integration postures.

mode a

cache as a guaranteed layer

all traffic routes through fgy. fgy handles cache hits and forwards misses upstream using your bearer token. your application treats fgy as the provider. if fgy is unavailable, requests fail.

suitable when you want the simplest possible integration and fgy's uptime sla is acceptable as a dependency. maximises cache coverage.

client = OpenAI(
  base_url="https://api.fgy.ai/v1",
  api_key="fgy_..."
)
# fgy routes misses to provider automatically
# your sk-... travels in x-provider-auth header
higher cache hit potential. simpler code. fgy on the critical path.
mode b

cache as a soft layer

your application tries fgy first. if fgy is unreachable or times out, it falls back to calling the provider directly. fgy is never on your critical path. the tradeoff is that fallback requests are not cached.

suitable when your application cannot tolerate any dependency on a third party and you want to add caching opportunistically.

try:
  r = fgy_client.chat.completions.create(
    ..., timeout=3.0
  )
except (APIConnectionError, APITimeoutError):
  # fgy unreachable, go direct
  r = direct_client.chat.completions.create(...)
higher uptime ceiling. fgy off critical path. fallbacks not cached.

plans

one model, two scales.

pay as you save
15% of avoided provider cost

no monthly fee. no minimum. no charge on misses. billed only when the cache successfully avoids a provider call on your behalf.

unlimited tenant api keys
exact and semantic caching
request coalescing included
dashboard with per-key savings breakdown
response headers for local observability
enterprise
custom

for high-throughput deployments that need dedicated infrastructure, flat-rate billing, or custom cache policy controls.

dedicated fly.io cluster
configurable similarity thresholds
flat-rate billing options
sla and support agreement
tenant-level policy controls