Streaming#

Передай "stream": true — ответ придёт как Server-Sent Events.

Формат#

Каждый кадр — отдельный объект chat.completion.chunk:

``` data: {"id":"gen-123","choices":[{"delta":{"role":"assistant"}}]}

data: {"id":"gen-123","choices":[{"delta":{"content":"При"}}]}

data: {"id":"gen-123","choices":[{"delta":{"content":"вет!"}}]}

data: {"id":"gen-123","choices":[{"delta":{},"finish_reason":"stop"}]}

data: {"id":"gen-123","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20,"cost":0.000018}}

data: [DONE] ```

Финальный кадр с usage всегда присутствует — мы автоматически выставляем stream_options.include_usage = true на твоих запросах. Поэтому корректно учесть стоимость стрим-ответа = взять usage.cost из последнего непустого кадра перед [DONE].

Примеры#

PythonNode.jscurl

python stream = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Расскажи про космос"}], stream=True, ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) if chunk.usage: print(f"\nstream cost: ${chunk.usage.cost:.6f}")

```javascript const stream = await client.chat.completions.create({ model: "openai/gpt-4o-mini", messages: [{ role: "user", content: "Tell me about space" }], stream: true, });

for await (const chunk of stream) { const delta = chunk.choices[0]?.delta?.content; if (delta) process.stdout.write(delta); if (chunk.usage) console.log(\ncost: $${chunk.usage.cost}); } ```

bash curl -N https://api.ml-router.su/v1/chat/completions \ -H "Authorization: Bearer orb_live_…" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}], "stream": true }'

Флаг -N отключает буферизацию — увидишь токены по мере поступления.

Что произойдёт при обрыве соединения#

Если клиент закрывает SSE до финального usage-кадра — мы всё равно дождёмся ответа модели и спишем фактическую стоимость. Обрыв соединения не отменяет оплату уже запущенной генерации: запрос провайдеру сделан и стоит денег.

Headers#

Стрим-ответ всегда:

http HTTP/2 200 Content-Type: text/event-stream Cache-Control: no-cache X-Accel-Buffering: no