Speech To Text

The Speech to Text (STT) API enables you to extract and transcribe text from audio files using models such as faster-whisper-large-v3.
We recommend using audio chunks of less than 2 minutes to prevent hallucinations and duplicate transcriptions.

API Call Parameters

file: A binary audio file in OGG format.
model: The identifier for the model used for transcription, e.g., faster-whisper-large-v3.
language: A two-letter ISO language code specifying the language of the audio, such as en (English), it (Italian), etc.

Important Note

The models have a timeout limit. It is recommended to split audio files into smaller segments, such as five-minute clips, to ensure optimal performance.

Example Requests

Using Regolo ClientOpenAI ClientPythonCURL

import regolo
from pathlib import Path

# Regolo configuration
regolo.default_key = "YOUR_REGOLO_KEY"
regolo.default_audio_transcription_model = "faster-whisper-large-v3"

# Audio file to transcribe
AUDIO_FILE = "/path/to/your/audio"
OUTPUT_FILE = "/path/to/output/transcription.txt"

# Transcribe the file
transcript = regolo.static_audio_transcription(file=AUDIO_FILE)

# Save the transcription
output_path = Path(OUTPUT_FILE)
output_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(transcript)

print(f"Transcription saved to: {OUTPUT_FILE}")

import openai
from pathlib import Path

# OpenAI client configuration
openai.api_key = "YOUR_REGOLO_KEY"
openai.base_url = "https://api.regolo.ai/v1/"

# Audio file to transcribe
AUDIO_FILE = "/path/to/your/audio"
OUTPUT_FILE = "/path/to/output/transcription.txt"

# Transcribe the file
with open(AUDIO_FILE, "rb") as audio_file:
    transcript = openai.audio.transcriptions.create(
        model="faster-whisper-large-v3",
        file=audio_file,
        language="en",
        response_format="text"
    )

# Save the transcription
output_path = Path(OUTPUT_FILE)
output_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(transcript)

print(f"Transcription saved to: {OUTPUT_FILE}")

import requests
from pathlib import Path

def main():
    api_url = "https://api.regolo.ai/v1/audio/transcriptions"
    api_key = "YOUR_REGOLO_KEY"

    AUDIO_FILE = "/path/to/your/audio.ogg"

    audio_path = Path(AUDIO_FILE)
    if not audio_path.is_file():
        print(f"Audio file does not exist: {audio_path}")
        return

    headers = {
        # Don't set Content-Type here; requests will set correct multipart boundary
        "Authorization": f"Bearer {api_key}",
    }

    with audio_path.open("rb") as audio_file:
        files = {
            "file": (audio_path.name, audio_file, "application/octet-stream")
        }
        data = {
            "model": "faster-whisper-large-v3",
            "language": "en",
            "response_format": "text",
        }

        response = requests.post(api_url, headers=headers, data=data, files=files)

    if response.status_code == 200:
        transcript_text = response.text
        print("=== Transcription ===")
        print(transcript_text)
        print("=====================")
    else:
        print("Failed transcription request:")
        print("Status code:", response.status_code)
        print("Response body:", response.text)

if __name__ == "__main__":
    main()

curl --request POST \
  --url 'https://api.regolo.ai/v1/audio/transcriptions' \
  --header 'Authorization: Bearer YOUR_REGOLO_KEY' \
  -F "file=@/path/to/your/audio" \
  -F "model=faster-whisper-large-v3"

Example Implementation

For a practical example of how to use this API, you can refer to the Telegram Transcriber GitHub Repository. This repository provides a complete implementation for transcribing audio messages from Telegram using the Speech to Text API.

For the exhaustive API's endpoints documentation visit docs.api.regolo.ai.