AI
AI streaming
Streaming lets you display tokens as the response is generated. This improves perceived speed in chats, copilots, internal assistants, and long-form interfaces.
When to use streaming
Use streaming when:
- the response can be long
- the user is looking at an interactive interface
- you want to show progress before the final answer
- the application needs to reduce perceived waiting time
For backend tasks, asynchronous reports, or automations without an interface, a non-streaming call is usually simpler.
Node.js example
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://ai.zenifra.com/v1',
apiKey: process.env.ZENIFRA_AI_API_KEY,
});
const stream = await client.chat.completions.create({
model: 'zenifra/qwen3.6-35b-a3b',
stream: true,
messages: [
{
role: 'user',
content: 'Explain continuous deployment in five sentences.',
},
],
});
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content;
if (token) process.stdout.write(token);
}curl example
curl https://ai.zenifra.com/v1/chat/completions \
-H "Authorization: Bearer $ZENIFRA_AI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zenifra/qwen3.6-35b-a3b",
"stream": true,
"messages": [
{ "role": "user", "content": "Create a short deployment checklist." }
]
}'Production considerations
- handle connection drops on the client
- show a clear error if the key is invalid or out of budget
- do not log sensitive prompts unnecessarily
- limit concurrency to avoid cost spikes
- track tokens, cost, and used models in the console
Next steps
FAQ
Does streaming change billing?
Usage is still based on tokens. The difference is how the response reaches the client.
Can I use streaming in a backend?
Yes, but it is most useful when an interface displays the answer in real time.
What should I do if the stream fails midway?
Finish the partial response, show a clear message, and allow retry with backoff.