Audio
SwiftyAI exposes two dedicated audio paths: transcribe for audio-to-text and generateSpeech for text-to-audio.
| API | Direction | Output |
|---|---|---|
transcribe | Audio to text | TranscriptionResponse with text and optional metadata |
generateSpeech | Text to audio | SpeechResponse with audio bytes, format, and media type |
| Multimodal audio input | Audio inside a model prompt | Text or object response from the language model |
Transcription
let transcriptionModel = OpenAICompatibleProvider(
baseURL: "https://api.openai.com/v1",
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-4o-transcribe"
)
let audio = AIAudioInput(
data: audioData,
filename: "meeting.wav",
mediaType: .wav
)
let transcript = try await transcribe(
model: transcriptionModel,
audio: audio,
options: TranscriptionOptions(
language: "en",
prompt: "This is a product planning meeting.",
responseFormat: .json
)
)
print(transcript.text)TranscriptionResponse includes text plus optional language, duration, and model fields.
Speech Generation
let speechModel = OpenAICompatibleProvider(
baseURL: "https://api.openai.com/v1",
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]!,
model: "gpt-4o-mini-tts"
)
let speech = try await generateSpeech(
model: speechModel,
text: "Your export is ready.",
options: SpeechOptions(
voice: "alloy",
format: .mp3,
speed: 1.0,
instructions: "Calm, clear, and brief."
)
)
try speech.data.write(to: outputURL)SpeechResponse returns the audio bytes, format, media type, and model.
| Option | Use |
|---|---|
voice | Pick the provider voice or speaker style |
format | Choose MP3, WAV, or another supported audio container |
speed | Slow down or speed up narration where supported |
instructions | Guide tone, pacing, or style |
Gemini Audio
Gemini can also be used through GeminiProvider for supported audio operations:
let gemini = GeminiProvider(
apiKey: ProcessInfo.processInfo.environment["GEMINI_API_KEY"]!,
model: "gemini-audio-model"
)
let transcript = try await transcribe(model: gemini, audio: audio)Use provider documentation to choose the exact model that supports your desired operation.