#openai#rate-limit#buffers#429

“I'm batch-calling the OpenAI API and keep getting 429 rate-limit errors”

Rate-limit OpenAI calls with a buffer (stop hitting 429s)

Create a buffer pointed at the OpenAI endpoint with a requests-per-second cap, then push your prompts in. Fliq releases them at your rate, in order, retrying 429s for free.

You loop over a few thousand rows and fire an OpenAI completion for each. Halfway through you start eating 429 Too Many Requests, and your Promise.all turns into a mess of ad-hoc sleeps and retries. A Fliq buffer is a token bucket in front of one endpoint: set a per-second limit once, push every request in, and Fliq drains them at exactly that rate — in submission order, one at a time.

The request

Create the buffer with the OpenAI URL, your auth header, and a rate_limit. Then push one item per prompt.

# 1. Create the buffer (do this once)
curl -X POST https://api.fliq.sh/buffers \
  -H "Authorization: Bearer fliq_sk_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai-completions",
    "url": "https://api.openai.com/v1/chat/completions",
    "method": "POST",
    "headers": {
      "Authorization": "Bearer sk-openai-...",
      "Content-Type": "application/json"
    },
    "rate_limit": 8,
    "max_retries": 3,
    "backoff": "exponential"
  }'

# 2. Push an item per prompt (BUFFER_ID from the create response)
curl -X POST https://api.fliq.sh/buffers/BUFFER_ID/items \
  -H "Authorization: Bearer fliq_sk_your_token" \
  -H "Content-Type: application/json" \
  -d '{
    "body": "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarise row 1\"}]}"
  }'
const FLIQ = { Authorization: "Bearer fliq_sk_your_token", "Content-Type": "application/json" };

// 1. Create the buffer once.
const buf = await (await fetch("https://api.fliq.sh/buffers", {
  method: "POST",
  headers: FLIQ,
  body: JSON.stringify({
    name: "openai-completions",
    url: "https://api.openai.com/v1/chat/completions",
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
      "Content-Type": "application/json",
    },
    rate_limit: 8,          // <= 8 requests/sec to OpenAI
    max_retries: 3,
    backoff: "exponential",
  }),
})).json();

// 2. Push every prompt — Fliq paces the delivery.
for (const row of rows) {
  await fetch(`https://api.fliq.sh/buffers/${buf.id}/items`, {
    method: "POST",
    headers: FLIQ,
    body: JSON.stringify({
      body: JSON.stringify({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: `Summarise ${row.text}` }],
      }),
    }),
  });
}

Pushing an item is cheap and returns immediately — you can enqueue thousands without blocking.

What Fliq handles for you

  • The rate cap. A per-second token bucket means OpenAI never sees more than rate_limit requests/sec. No bursts, no Promise.all stampede.
  • 429s for free. If the endpoint returns 429, Fliq reschedules the item using the Retry-After header — and it does not count against the item’s retry budget.
  • In-order, one at a time. At most one request per buffer is in flight; a failing item holds its place and retries with backoff rather than letting later items jump ahead.
  • Status at a glance. GET /buffers/BUFFER_ID/stats returns the pending/running/completed/failed breakdown across the whole buffer.
Pace your OpenAI calls without Redis — free during beta