pricing

we take a cut of what we save you.

fgy charges 15% of the provider inference cost it avoids on your behalf. if nothing is saved, nothing is charged. there is no platform fee, no minimum spend, and no charge for passing traffic through on a miss.

billing formula

avoided_cost = tokens_saved * provider_rate

fgy_charge = avoided_cost * 0.15

# concrete example with gpt-4o-mini output tokens

tokens_saved = 1000

provider_rate = 0.000060 # $0.060 per 1k output tokens

avoided_cost = 0.060 # $0.060 saved

fgy_charge = 0.009 # $0.009 charged

per-request cost

costs map directly to outcomes.

every request through fgy takes one of three paths. two of them save you money. one of them costs fgy nothing to charge for.

cache outcome

your provider cost

fgy charge

what happened

x-fgy-cache: exact

15% of saved cost

prompt hash matched an ets entry. returned from memory, no upstream call.

x-fgy-cache: semantic

15% of saved cost

prompt embedding matched a prior result above the 0.92 cosine threshold. no upstream call.

x-fgy-cache: miss

normal provider rate

no match. request went upstream. response stored for future hits.

how fgy fits in

a cache layer, not a key vault.

fgy does not store or manage your provider api keys. you send your own bearer token on every request and fgy passes it through if the cache misses. the only credential fgy issues is a cache key that identifies your tenant.

what you send

POST https://api.fgy.ai/v1/chat/completions

Authorization: Bearer fgy_tenant_key

X-Provider-Auth: Bearer sk-your-own-key

Content-Type: application/json

{"model": "gpt-4o-mini", "messages": [...]}

your provider key travels with the request. on a miss, fgy forwards it upstream and returns the response. on a hit, the provider key is never used.

what fgy manages

your fgy tenant key for identifying billing and cache namespace

the ets and pgvector cache stores scoped to your tenant and model

hit counts and token savings for dashboard and billing

your upstream provider key — never stored

deployment modes

choose how much you depend on the cache.

because fgy does not hold your provider keys, you can decide whether the cache is a hard dependency or an optional layer. these are the two common integration postures.

mode a

cache as a guaranteed layer

all traffic routes through fgy. fgy handles cache hits and forwards misses upstream using your bearer token. your application treats fgy as the provider. if fgy is unavailable, requests fail.

suitable when you want the simplest possible integration and fgy's uptime sla is acceptable as a dependency. maximises cache coverage.

client = OpenAI(

base_url="https://api.fgy.ai/v1",

api_key="fgy_..."

)

# fgy routes misses to provider automatically

# your sk-... travels in x-provider-auth header

higher cache hit potential. simpler code. fgy on the critical path.

mode b

cache as a soft layer

your application tries fgy first. if fgy is unreachable or times out, it falls back to calling the provider directly. fgy is never on your critical path. the tradeoff is that fallback requests are not cached.

suitable when your application cannot tolerate any dependency on a third party and you want to add caching opportunistically.

try:

r = fgy_client.chat.completions.create(

..., timeout=3.0

)

except (APIConnectionError, APITimeoutError):

# fgy unreachable, go direct

r = direct_client.chat.completions.create(...)

higher uptime ceiling. fgy off critical path. fallbacks not cached.

plans

one model, two scales.

pay as you save

15% of avoided provider cost

no monthly fee. no minimum. no charge on misses. billed only when the cache successfully avoids a provider call on your behalf.

unlimited tenant api keys

exact and semantic caching

request coalescing included

dashboard with per-key savings breakdown

response headers for local observability

create account

enterprise

custom

for high-throughput deployments that need dedicated infrastructure, flat-rate billing, or custom cache policy controls.

dedicated fly.io cluster

configurable similarity thresholds

flat-rate billing options

sla and support agreement

tenant-level policy controls

contact engineering