AI streaming

Streaming lets you display tokens as the response is generated. This improves perceived speed in chats, copilots, internal assistants, and long-form interfaces.

When to use streaming

Use streaming when:

the response can be long
the user is looking at an interactive interface
you want to show progress before the final answer
the application needs to reduce perceived waiting time

For backend tasks, asynchronous reports, or automations without an interface, a non-streaming call is usually simpler.

Node.js example

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://ai.zenifra.com/v1',
  apiKey: process.env.ZENIFRA_AI_API_KEY,
});

const stream = await client.chat.completions.create({
  model: 'zenifra/qwen3.6-35b-a3b',
  stream: true,
  messages: [
    {
      role: 'user',
      content: 'Explain continuous deployment in five sentences.',
    },
  ],
});

for await (const chunk of stream) {
  const token = chunk.choices[0]?.delta?.content;
  if (token) process.stdout.write(token);
}

curl example

curl https://ai.zenifra.com/v1/chat/completions \
  -H "Authorization: Bearer $ZENIFRA_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zenifra/qwen3.6-35b-a3b",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Create a short deployment checklist." }
    ]
  }'

Production considerations

handle connection drops on the client
show a clear error if the key is invalid or out of budget
do not log sensitive prompts unnecessarily
limit concurrency to avoid cost spikes
track tokens, cost, and used models in the console

Next steps

FAQ

Does streaming change billing?

Usage is still based on tokens. The difference is how the response reaches the client.

Can I use streaming in a backend?

Yes, but it is most useful when an interface displays the answer in real time.

What should I do if the stream fails midway?

Finish the partial response, show a clear message, and allow retry with backoff.

On this page