Skip to main content

API Endpoint

POST https://api.laozhang.ai/v1/chat/completions
Full compatibility with OpenAI official formatLaozhang API is fully compatible with OpenAI official interface format, you can directly replace https://api.openai.com/v1 with https://api.laozhang.ai/v1 to use.

Request Parameters

Required Parameters

model
string
required
Model name to useSupported models:
  • OpenAI Series: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, etc.
  • Claude Series: claude-3-5-sonnet, claude-3-opus, claude-3-haiku, etc.
  • Gemini Series: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp, etc.
  • Chinese Models: deepseek-chat, qwen-max, glm-4-flash, yi-lightning, etc.
For complete model list, see API Reference - Models
messages
array
required
Conversation message array, each message contains role and content
[
  {
    "role": "system",
    "content": "You are a helpful assistant"
  },
  {
    "role": "user", 
    "content": "Hello!"
  }
]
Role Descriptions:
  • system: System prompt, defines AI assistant behavior
  • user: User message
  • assistant: AI assistant’s previous response

Optional Parameters

temperature
number
default:"1"
Randomness of generated results, range 0-2
  • 0: Deterministic, minimal randomness (recommended for translation, summarization, etc.)
  • 0.7: Balanced, suitable for most scenarios
  • 1.5-2: High creativity (recommended for creative writing, brainstorming, etc.)
max_tokens
integer
Maximum number of tokens to generate
If not set, model will use its default limit. If response is truncated, try increasing this value.
Recommended Values:
  • Short responses: 500-1000
  • Medium responses: 2000-4000
  • Long responses: 8000+
stream
boolean
default:"false"
Whether to use stream output
  • false: Wait for complete response
  • true: Receive response in chunks (better user experience)
top_p
number
default:"1"
Nucleus sampling parameter, range 0-1Controls diversity of output. Generally use either temperature or top_p, not both simultaneously.
frequency_penalty
number
default:"0"
Frequency penalty, range -2.0 to 2.0Positive values reduce repetition of already appearing content.
presence_penalty
number
default:"0"
Presence penalty, range -2.0 to 2.0Positive values encourage discussion of new topics.
stop
string | array
Stop sequences, generation stops when these strings are encounteredCan be a single string or array of up to 4 strings.
user
string
End user unique identifier for abuse detectionRecommended for multi-user scenarios.

Message Format

Basic Text Message

{
  "role": "user",
  "content": "Please introduce yourself"
}

Multimodal Message (Image Understanding)

  • Image URL
  • Base64 Image
{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.jpg"
      }
    }
  ]
}
Multimodal ModelsSupport image understanding:
  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo
  • Claude: claude-3-5-sonnet, claude-3-opus, claude-3-sonnet, claude-3-haiku
  • Gemini: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp

Request Examples

cURL

curl https://api.laozhang.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant"
      },
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Node.js

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://api.laozhang.ai/v1'
});

// Basic dialogue
const completion = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'system', content: 'You are a helpful assistant' },
    { role: 'user', content: 'What is the capital of France?' }
  ]
});

console.log(completion.choices[0].message.content);

// Stream output
const stream = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'user', content: 'Tell me a story' }
  ],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.laozhang.ai/v1"
)

# Basic dialogue
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(completion.choices[0].message.content)

# Stream output
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Tell me a story"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Image understanding
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Go

package main

import (
    "context"
    "fmt"
    "github.com/sashabaranov/go-openai"
)

func main() {
    config := openai.DefaultConfig("YOUR_API_KEY")
    config.BaseURL = "https://api.laozhang.ai/v1"
    client := openai.NewClientWithConfig(config)

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "gpt-4o-mini",
            Messages: []openai.ChatCompletionMessage{
                {
                    Role:    openai.ChatMessageRoleSystem,
                    Content: "You are a helpful assistant",
                },
                {
                    Role:    openai.ChatMessageRoleUser,
                    Content: "What is the capital of France?",
                },
            },
        },
    )

    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }

    fmt.Println(resp.Choices[0].Message.Content)
}

Response Format

Standard Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699999999,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Stream Response

Each chunk format:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1699999999,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Paris"
      },
      "finish_reason": null
    }
  ]
}
Last chunk (finish_reason not null):
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1699999999,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Response Field Descriptions

id
string
Unique request identifier
object
string
Object type:
  • chat.completion: Standard response
  • chat.completion.chunk: Stream response chunk
created
integer
Creation timestamp (Unix timestamp)
model
string
Model name used
choices
array
Generated results array, typically containing one result
choices[].index
integer
Result index
choices[].message
object
Message object (standard response)
choices[].message.role
string
Role, always assistant
choices[].message.content
string
Generated content
choices[].delta
object
Incremental content (stream response)
choices[].delta.content
string
This chunk’s content
choices[].finish_reason
string
Completion reason:
  • stop: Natural completion
  • length: Reached max_tokens limit
  • content_filter: Content filtered by policy
  • null: Not yet finished (stream output)
usage
object
Token usage statistics
usage.prompt_tokens
integer
Input tokens
usage.completion_tokens
integer
Output tokens
usage.total_tokens
integer
Total tokens

Special Usage

GPT-4o Vision

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.laozhang.ai/v1"
)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please describe this image in detail"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg",
                        # Optional: control image quality
                        # "detail": "high"  # or "low"
                    }
                }
            ]
        }
    ],
    max_tokens=1000
)

print(completion.choices[0].message.content)
Multiple ImagesGPT-4o supports analyzing multiple images simultaneously, just add multiple image_url objects to the content array.

Claude Native Format

Claude models also support native format:
import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.laozhang.ai/v1"
)

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)

print(message.content[0].text)

O1 Series Special Parameters

O1 series models (o1-preview, o1-mini) have parameter limitations:
O1 Series Limitations
  • Do not support system role messages
  • Do not support stream output (stream must be false)
  • Do not support temperature, top_p, presence_penalty, frequency_penalty parameters
  • max_tokens defaults to model’s maximum value
Correct usage:
completion = client.chat.completions.create(
    model="o1-mini",
    messages=[
        {
            "role": "user",
            "content": "Please solve this math problem: ..."
        }
    ]
)

Usage Tips

Multi-turn Dialogue

Implement multi-turn dialogue by passing context:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
]

# First round
messages.append({"role": "user", "content": "What's the weather in Beijing?"})
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
messages.append({"role": "assistant", "content": response.choices[0].message.content})

# Second round
messages.append({"role": "user", "content": "What about Shanghai?"})
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

JSON Output

Get structured JSON output:
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Always output in JSON format."
        },
        {
            "role": "user",
            "content": "Extract personal information from the following text: Zhang San, 28 years old, software engineer"
        }
    ],
    response_format={"type": "json_object"}  # Force JSON output
)

import json
result = json.loads(completion.choices[0].message.content)
print(result)
JSON Mode SupportCurrently supports JSON mode models:
  • GPT-4o series
  • GPT-4-turbo series
  • GPT-3.5-turbo-1106 and later versions

Billing

Billing is based on actual token usage: Total Cost = (Input Tokens × Input Price + Output Tokens × Output Price)
Save Costs
  1. Choose appropriate models: Most scenarios don’t require GPT-4o, gpt-4o-mini or gpt-3.5-turbo are sufficient
  2. Control context length: Only pass necessary historical messages
  3. Set max_tokens: Avoid unnecessarily long output
  4. Use mini series models: For simple tasks, mini models are much cheaper

Model Price Reference

ModelInput PriceOutput PriceFeatures
gpt-4o-mini$0.15/1M tokens$0.60/1M tokensCost-effective, supports image understanding
gpt-4o$2.5/1M tokens$10/1M tokensStrongest capabilities, supports multimodal
claude-3-5-sonnet$3/1M tokens$15/1M tokensExcellent reasoning, supports image understanding
gemini-1.5-flash$0.075/1M tokens$0.3/1M tokensFastest speed, long context
For complete pricing, see Pricing

Error Handling

Common error codes:
Error CodeMeaningSolution
401API Key invalid or missingCheck if API Key is correct
429Request rate limit exceededSlow down request frequency or upgrade plan
500Server internal errorRetry request or contact support
400Request parameter errorCheck if request parameters conform to API documentation
Error response example:
{
  "error": {
    "message": "Invalid API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Best Practices

  1. Use Appropriate Temperature
    • Translation, summarization, Q&A: temperature=0
    • General dialogue: temperature=0.7
    • Creative writing: temperature=1.0-1.5
  2. Control Context Length
    • Only pass necessary historical messages
    • Regularly clean up irrelevant context
    • Long documents can be processed in segments
  3. Choose Right Model
    • Simple tasks: gpt-4o-mini, gpt-3.5-turbo
    • Reasoning tasks: claude-3-5-sonnet, gpt-4o
    • Cost-sensitive: gemini-1.5-flash
  4. Error Retry
    • Implement exponential backoff retry mechanism
    • Catch and handle different error types
    • Set reasonable timeout
  5. Stream Output
    • Better user experience for long responses
    • Reduce perceived latency
    • Can implement typewriter effect
I