API Endpoint
Full compatibility with OpenAI official formatLaozhang API is fully compatible with OpenAI official interface format, you can directly replace
https://api.openai.com/v1
with https://api.laozhang.ai/v1
to use.Request Parameters
Required Parameters
Model name to useSupported models:
- OpenAI Series:
gpt-4o
,gpt-4o-mini
,gpt-4-turbo
,gpt-3.5-turbo
, etc. - Claude Series:
claude-3-5-sonnet
,claude-3-opus
,claude-3-haiku
, etc. - Gemini Series:
gemini-1.5-pro
,gemini-1.5-flash
,gemini-2.0-flash-exp
, etc. - Chinese Models:
deepseek-chat
,qwen-max
,glm-4-flash
,yi-lightning
, etc.
Conversation message array, each message contains Role Descriptions:
role
and content
system
: System prompt, defines AI assistant behavioruser
: User messageassistant
: AI assistant’s previous response
Optional Parameters
Randomness of generated results, range 0-2
- 0: Deterministic, minimal randomness (recommended for translation, summarization, etc.)
- 0.7: Balanced, suitable for most scenarios
- 1.5-2: High creativity (recommended for creative writing, brainstorming, etc.)
Maximum number of tokens to generateRecommended Values:
If not set, model will use its default limit. If response is truncated, try increasing this value.
- Short responses: 500-1000
- Medium responses: 2000-4000
- Long responses: 8000+
Whether to use stream output
false
: Wait for complete responsetrue
: Receive response in chunks (better user experience)
Nucleus sampling parameter, range 0-1Controls diversity of output. Generally use either
temperature
or top_p
, not both simultaneously.Frequency penalty, range -2.0 to 2.0Positive values reduce repetition of already appearing content.
Presence penalty, range -2.0 to 2.0Positive values encourage discussion of new topics.
Stop sequences, generation stops when these strings are encounteredCan be a single string or array of up to 4 strings.
End user unique identifier for abuse detectionRecommended for multi-user scenarios.
Message Format
Basic Text Message
Multimodal Message (Image Understanding)
- Image URL
- Base64 Image
Multimodal ModelsSupport image understanding:
- OpenAI:
gpt-4o
,gpt-4o-mini
,gpt-4-turbo
- Claude:
claude-3-5-sonnet
,claude-3-opus
,claude-3-sonnet
,claude-3-haiku
- Gemini:
gemini-1.5-pro
,gemini-1.5-flash
,gemini-2.0-flash-exp
Request Examples
cURL
Node.js
Python
Go
Response Format
Standard Response
Stream Response
Each chunk format:Response Field Descriptions
Unique request identifier
Object type:
chat.completion
: Standard responsechat.completion.chunk
: Stream response chunk
Creation timestamp (Unix timestamp)
Model name used
Generated results array, typically containing one result
Result index
Incremental content (stream response)
This chunk’s content
Completion reason:
stop
: Natural completionlength
: Reachedmax_tokens
limitcontent_filter
: Content filtered by policynull
: Not yet finished (stream output)
Special Usage
GPT-4o Vision
Multiple ImagesGPT-4o supports analyzing multiple images simultaneously, just add multiple
image_url
objects to the content array.Claude Native Format
Claude models also support native format:O1 Series Special Parameters
O1 series models (o1-preview, o1-mini) have parameter limitations:O1 Series Limitations
- Do not support
system
role messages - Do not support stream output (
stream
must befalse
) - Do not support
temperature
,top_p
,presence_penalty
,frequency_penalty
parameters max_tokens
defaults to model’s maximum value
Usage Tips
Multi-turn Dialogue
Implement multi-turn dialogue by passing context:JSON Output
Get structured JSON output:JSON Mode SupportCurrently supports JSON mode models:
- GPT-4o series
- GPT-4-turbo series
- GPT-3.5-turbo-1106 and later versions
Billing
Billing is based on actual token usage: Total Cost = (Input Tokens × Input Price + Output Tokens × Output Price)Save Costs
- Choose appropriate models: Most scenarios don’t require GPT-4o, gpt-4o-mini or gpt-3.5-turbo are sufficient
- Control context length: Only pass necessary historical messages
- Set
max_tokens
: Avoid unnecessarily long output - Use mini series models: For simple tasks, mini models are much cheaper
Model Price Reference
Model | Input Price | Output Price | Features |
---|---|---|---|
gpt-4o-mini | $0.15/1M tokens | $0.60/1M tokens | Cost-effective, supports image understanding |
gpt-4o | $2.5/1M tokens | $10/1M tokens | Strongest capabilities, supports multimodal |
claude-3-5-sonnet | $3/1M tokens | $15/1M tokens | Excellent reasoning, supports image understanding |
gemini-1.5-flash | $0.075/1M tokens | $0.3/1M tokens | Fastest speed, long context |
Error Handling
Common error codes:Error Code | Meaning | Solution |
---|---|---|
401 | API Key invalid or missing | Check if API Key is correct |
429 | Request rate limit exceeded | Slow down request frequency or upgrade plan |
500 | Server internal error | Retry request or contact support |
400 | Request parameter error | Check if request parameters conform to API documentation |
Best Practices
-
Use Appropriate Temperature
- Translation, summarization, Q&A: temperature=0
- General dialogue: temperature=0.7
- Creative writing: temperature=1.0-1.5
-
Control Context Length
- Only pass necessary historical messages
- Regularly clean up irrelevant context
- Long documents can be processed in segments
-
Choose Right Model
- Simple tasks: gpt-4o-mini, gpt-3.5-turbo
- Reasoning tasks: claude-3-5-sonnet, gpt-4o
- Cost-sensitive: gemini-1.5-flash
-
Error Retry
- Implement exponential backoff retry mechanism
- Catch and handle different error types
- Set reasonable timeout
-
Stream Output
- Better user experience for long responses
- Reduce perceived latency
- Can implement typewriter effect
Related Resources
- Models API - Get complete available model list
- Images API - Image generation and editing
- OpenAI Models Guide - Detailed GPT-4o usage
- Claude Models Guide - Detailed Claude usage
- Gemini Models Guide - Detailed Gemini usage