Skip to main content

Overview

Moderation API detects harmful or inappropriate content in text, helping you:
  • Content Filtering: Automatically filter inappropriate user submissions
  • Safety Review: Detect potential violations before publishing
  • Compliance Check: Ensure content meets platform guidelines
  • Risk Warning: Identify potentially harmful content types
Moderation API uses OpenAI’s moderation model and is free to use without consuming token quota.

Quick Start

Basic Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-laozhang-api-key",
    base_url="https://api.laozhang.ai/v1"
)

response = client.moderations.create(
    input="Text content to be reviewed"
)

result = response.results[0]
print(f"Contains harmful content: {result.flagged}")
print(f"Category scores: {result.category_scores}")

Batch Moderation

texts = ["Text 1", "Text 2", "Text 3"]

response = client.moderations.create(input=texts)

for i, result in enumerate(response.results):
    status = '⚠️ Flagged' if result.flagged else '✅ Safe'
    print(f"Text {i+1}: {status}")

Detection Categories

CategoryDescription
hateHate speech targeting specific groups
hate/threateningThreatening hate speech
harassmentHarassing content
harassment/threateningThreatening harassment
self-harmSelf-harm related content
self-harm/intentIntent to self-harm
self-harm/instructionsSelf-harm instructions
sexualSexual content
sexual/minorsSexual content involving minors
violenceViolent content
violence/graphicGraphic violence

Practical Examples

1. User Input Filter

def check_content(text):
    """Check if content is safe"""
    response = client.moderations.create(input=text)
    result = response.results[0]
    
    if result.flagged:
        violations = [
            cat for cat, flagged in result.categories.model_dump().items()
            if flagged
        ]
        return False, violations
    
    return True, []

# Usage
is_safe, violations = check_content(user_input)
if not is_safe:
    print(f"⚠️ Violation categories: {violations}")

2. Chatbot Safety Layer

def safe_chat(user_message):
    """Chat with safety checks"""
    # Check user input
    if client.moderations.create(input=user_message).results[0].flagged:
        return "Sorry, your message contains inappropriate content."
    
    # Process after safety check
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": user_message}]
    )
    
    ai_reply = response.choices[0].message.content
    
    # Check AI response
    if client.moderations.create(input=ai_reply).results[0].flagged:
        return "Sorry, unable to generate appropriate response."
    
    return ai_reply

3. Custom Threshold

def check_with_threshold(text, threshold=0.5):
    """Check with custom threshold"""
    response = client.moderations.create(input=text)
    result = response.results[0]
    
    scores = result.category_scores.model_dump()
    high_risk = {
        cat: score for cat, score in scores.items()
        if score > threshold
    }
    
    return len(high_risk) == 0, high_risk

Best Practices

Multi-layer Protection

def process_with_safety(user_input):
    # 1. Check user input
    if not is_safe(user_input):
        return "Input content violates guidelines"
    
    # 2. Process
    result = process(user_input)
    
    # 3. Check output
    if not is_safe(result):
        return "Unable to generate appropriate content"
    
    return result

Pricing

Moderation API is currently free to use and does not count towards token consumption.

FAQ

Based on OpenAI’s model with high accuracy, but not 100% reliable. Recommend:
  • Combine with human review for critical scenarios
  • Set appropriate thresholds
  • Provide appeal channels
Supports multiple languages including Chinese. English works best.
There are rate limits. Control request frequency for batch processing.