Documentation Index
Fetch the complete documentation index at: https://docs.laozhang.ai/llms.txt
Use this file to discover all available pages before exploring further.
API Endpoint
Full compatibility with OpenAI official formatLaozhang API is fully compatible with OpenAI official interface format, you can directly replace
https://api.openai.com/v1 with https://api.laozhang.ai/v1 to use.Request Parameters
Required Parameters
Model name to useSupported models:
- OpenAI Series:
gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-3.5-turbo, etc. - Claude Series:
claude-3-5-sonnet,claude-3-opus,claude-3-haiku, etc. - Gemini Series:
gemini-1.5-pro,gemini-1.5-flash,gemini-2.0-flash-exp, etc. - Chinese Models:
deepseek-chat,qwen-max,glm-4-flash,yi-lightning, etc.
Conversation message array, each message contains Role Descriptions:
role and contentsystem: System prompt, defines AI assistant behavioruser: User messageassistant: AI assistant’s previous response
Optional Parameters
Randomness of generated results, range 0-2
- 0: Deterministic, minimal randomness (recommended for translation, summarization, etc.)
- 0.7: Balanced, suitable for most scenarios
- 1.5-2: High creativity (recommended for creative writing, brainstorming, etc.)
Maximum number of tokens to generateRecommended Values:
- Short responses: 500-1000
- Medium responses: 2000-4000
- Long responses: 8000+
Whether to use stream output
false: Wait for complete responsetrue: Receive response in chunks (better user experience)
Nucleus sampling parameter, range 0-1Controls diversity of output. Generally use either
temperature or top_p, not both simultaneously.Frequency penalty, range -2.0 to 2.0Positive values reduce repetition of already appearing content.
Presence penalty, range -2.0 to 2.0Positive values encourage discussion of new topics.
Stop sequences, generation stops when these strings are encounteredCan be a single string or array of up to 4 strings.
End user unique identifier for abuse detectionRecommended for multi-user scenarios.
Message Format
Basic Text Message
Multimodal Message (Image Understanding)
- Image URL
- Base64 Image
Multimodal ModelsSupport image understanding:
- OpenAI:
gpt-4o,gpt-4o-mini,gpt-4-turbo - Claude:
claude-3-5-sonnet,claude-3-opus,claude-3-sonnet,claude-3-haiku - Gemini:
gemini-1.5-pro,gemini-1.5-flash,gemini-2.0-flash-exp
Request Examples
cURL
Node.js
Python
Go
Response Format
Standard Response
Stream Response
Each chunk format:Response Field Descriptions
Unique request identifier
Object type:
chat.completion: Standard responsechat.completion.chunk: Stream response chunk
Creation timestamp (Unix timestamp)
Model name used
Generated results array, typically containing one result
Result index
Incremental content (stream response)
This chunk’s content
Completion reason:
stop: Natural completionlength: Reachedmax_tokenslimitcontent_filter: Content filtered by policynull: Not yet finished (stream output)
Special Usage
GPT-4o Vision
Claude Native Format
Claude models also support native format:O1 Series Special Parameters
O1 series models (o1-preview, o1-mini) have parameter limitations: Correct usage:Usage Tips
Multi-turn Dialogue
Implement multi-turn dialogue by passing context:JSON Output
Get structured JSON output:JSON Mode SupportCurrently supports JSON mode models:
- GPT-4o series
- GPT-4-turbo series
- GPT-3.5-turbo-1106 and later versions
Billing
Billing is based on actual token usage: Total Cost = (Input Tokens × Input Price + Output Tokens × Output Price)Model Price Reference
| Model | Input Price | Output Price | Features |
|---|---|---|---|
| gpt-4o-mini | $0.15/1M tokens | $0.60/1M tokens | Cost-effective, supports image understanding |
| gpt-4o | $2.5/1M tokens | $10/1M tokens | Strongest capabilities, supports multimodal |
| claude-3-5-sonnet | $3/1M tokens | $15/1M tokens | Excellent reasoning, supports image understanding |
| gemini-1.5-flash | $0.075/1M tokens | $0.3/1M tokens | Fastest speed, long context |
Error Handling
Common error codes:| Error Code | Meaning | Solution |
|---|---|---|
| 401 | API Key invalid or missing | Check if API Key is correct |
| 429 | Request rate limit exceeded | Slow down request frequency or upgrade plan |
| 500 | Server internal error | Retry request or contact support |
| 400 | Request parameter error | Check if request parameters conform to API documentation |
Best Practices
-
Use Appropriate Temperature
- Translation, summarization, Q&A: temperature=0
- General dialogue: temperature=0.7
- Creative writing: temperature=1.0-1.5
-
Control Context Length
- Only pass necessary historical messages
- Regularly clean up irrelevant context
- Long documents can be processed in segments
-
Choose Right Model
- Simple tasks: gpt-4o-mini, gpt-3.5-turbo
- Reasoning tasks: claude-3-5-sonnet, gpt-4o
- Cost-sensitive: gemini-1.5-flash
-
Error Retry
- Implement exponential backoff retry mechanism
- Catch and handle different error types
- Set reasonable timeout
-
Stream Output
- Better user experience for long responses
- Reduce perceived latency
- Can implement typewriter effect
Related Resources
- Models API - Get complete available model list
- Images API - Image generation and editing
- OpenAI Models Guide - Detailed GPT-4o usage
- Claude Models Guide - Detailed Claude usage
- Gemini Models Guide - Detailed Gemini usage