Overview
Moderation API detects harmful or inappropriate content in text, helping you:
Content Filtering : Automatically filter inappropriate user submissions
Safety Review : Detect potential violations before publishing
Compliance Check : Ensure content meets platform guidelines
Risk Warning : Identify potentially harmful content types
Moderation API uses OpenAI’s moderation model and is free to use without consuming token quota.
Quick Start
Basic Example
from openai import OpenAI
client = OpenAI(
api_key = "sk-your-laozhang-api-key" ,
base_url = "https://api.laozhang.ai/v1"
)
response = client.moderations.create(
input = "Text content to be reviewed"
)
result = response.results[ 0 ]
print ( f "Contains harmful content: { result.flagged } " )
print ( f "Category scores: { result.category_scores } " )
Batch Moderation
texts = [ "Text 1" , "Text 2" , "Text 3" ]
response = client.moderations.create( input = texts)
for i, result in enumerate (response.results):
status = '⚠️ Flagged' if result.flagged else '✅ Safe'
print ( f "Text { i + 1 } : { status } " )
Detection Categories
Category Description hateHate speech targeting specific groups hate/threateningThreatening hate speech harassmentHarassing content harassment/threateningThreatening harassment self-harmSelf-harm related content self-harm/intentIntent to self-harm self-harm/instructionsSelf-harm instructions sexualSexual content sexual/minorsSexual content involving minors violenceViolent content violence/graphicGraphic violence
Practical Examples
def check_content ( text ):
"""Check if content is safe"""
response = client.moderations.create( input = text)
result = response.results[ 0 ]
if result.flagged:
violations = [
cat for cat, flagged in result.categories.model_dump().items()
if flagged
]
return False , violations
return True , []
# Usage
is_safe, violations = check_content(user_input)
if not is_safe:
print ( f "⚠️ Violation categories: { violations } " )
2. Chatbot Safety Layer
def safe_chat ( user_message ):
"""Chat with safety checks"""
# Check user input
if client.moderations.create( input = user_message).results[ 0 ].flagged:
return "Sorry, your message contains inappropriate content."
# Process after safety check
response = client.chat.completions.create(
model = "gpt-4.1" ,
messages = [{ "role" : "user" , "content" : user_message}]
)
ai_reply = response.choices[ 0 ].message.content
# Check AI response
if client.moderations.create( input = ai_reply).results[ 0 ].flagged:
return "Sorry, unable to generate appropriate response."
return ai_reply
3. Custom Threshold
def check_with_threshold ( text , threshold = 0.5 ):
"""Check with custom threshold"""
response = client.moderations.create( input = text)
result = response.results[ 0 ]
scores = result.category_scores.model_dump()
high_risk = {
cat: score for cat, score in scores.items()
if score > threshold
}
return len (high_risk) == 0 , high_risk
Best Practices
Multi-layer Protection
def process_with_safety ( user_input ):
# 1. Check user input
if not is_safe(user_input):
return "Input content violates guidelines"
# 2. Process
result = process(user_input)
# 3. Check output
if not is_safe(result):
return "Unable to generate appropriate content"
return result
Pricing
Moderation API is currently free to use and does not count towards token consumption.
FAQ
How accurate is moderation?
Based on OpenAI’s model with high accuracy, but not 100% reliable. Recommend:
Combine with human review for critical scenarios
Set appropriate thresholds
Provide appeal channels
Does it support multiple languages?
Supports multiple languages including Chinese. English works best.
There are rate limits. Control request frequency for batch processing.