Skip to content

Response Parameters

The Regolo API follows the OpenAI Chat Completions API specification and supports standard parameters that control the behavior and quality of model responses.

Overview

These parameters allow you to fine-tune how the model generates responses, controlling aspects like creativity, length, diversity, and repetition.

Standard Parameters

temperature

Controls the randomness of the model's output.

  • Type: float
  • Range: 0.0 to 2.0
  • Default: 1.0

Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 0.8) increase creativity and variability.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Tell me a story"}
    ],
    "temperature": 0.7  # Balanced creativity
}

max_tokens

Sets the maximum number of tokens that can be generated in the response.

  • Type: integer
  • Default: Varies by model

Higher values allow for longer responses but may increase latency and costs. Lower values provide more concise outputs.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Explain quantum computing"}
    ],
    "max_tokens": 1000  # Limit response length
}

top_p

Nucleus sampling parameter that controls the diversity of tokens considered.

  • Type: float
  • Range: 0.0 to 1.0
  • Default: 1.0

Lower values make the model more focused on likely tokens, while higher values allow more diverse choices.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Write a creative poem"}
    ],
    "top_p": 0.9  # Allow more diverse word choices
}

frequency_penalty

Reduces the likelihood of the model repeating frequent tokens.

  • Type: float
  • Range: -2.0 to 2.0
  • Default: 0

Positive values reduce repetition, while negative values increase it. Useful for avoiding repetitive phrases.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "List 10 different ideas"}
    ],
    "frequency_penalty": 0.5  # Reduce repetition
}

presence_penalty

Reduces the likelihood of the model reusing tokens that are already present in the text.

  • Type: float
  • Range: -2.0 to 2.0
  • Default: 0

Positive values encourage the model to use new tokens and explore different approaches.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Generate diverse solutions"}
    ],
    "presence_penalty": 0.6  # Encourage new approaches
}

stop

Array of strings that cause the model to stop generating when encountered.

  • Type: array[string] or string
  • Default: null

Useful for controlling where the response ends or preventing certain phrases.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Count to 10"}
    ],
    "stop": ["11", "twelve"]  # Stop at these sequences
}

n

Number of completions to generate for each prompt.

  • Type: integer
  • Default: 1

Generates multiple response variations.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Suggest a name"}
    ],
    "n": 3  # Generate 3 different suggestions
}

seed

Seed for reproducible outputs.

  • Type: integer
  • Default: null

When set, the model will produce more deterministic outputs, useful for testing and reproducibility.

data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Generate a random number"}
    ],
    "seed": 42  # Reproducible output
}

Complete Example

import requests

api_url = "https://api.regolo.ai/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
    "model": "gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Write a creative story"}
    ],
    "temperature": 0.8,
    "max_tokens": 500,
    "top_p": 0.9,
    "frequency_penalty": 0.3,
    "presence_penalty": 0.4
}

response = requests.post(api_url, headers=headers, json=data)
result = response.json()
print(result["choices"][0]["message"]["content"])

Best Practices

  • For factual tasks: Use lower temperature (0.2-0.4) for more deterministic outputs
  • For creative tasks: Use higher temperature (0.7-0.9) for more varied responses
  • For long responses: Set appropriate max_tokens to avoid truncation
  • To reduce repetition: Use frequency_penalty (0.3-0.7) and presence_penalty (0.4-0.6)
  • For consistency: Use seed when you need reproducible outputs