SwiftyAISwiftyAI

Search documentation

Find a docs page by title or section

1

Multimodal Input

Multimodal prompts use [AIMessageContent] instead of a single String. The same prompt parts can be sent to generateText, streamText, and generateObject when the provider supports the media type.

Part typeConstructorsCommon use
Text.textInstructions, questions, surrounding context
Images.imageURL, .imageData, .imageBase64Screenshot review, chart description, visual QA
PDFs.pdfURL, .pdfData, .pdfBase64Report summarization, policy extraction, document Q&A
Files.fileURL, .fileData, .fileBase64Provider-supported file prompts with explicit media type
Audio.audioData, .audioBase64Audio understanding inside a prompt
Video.videoURL, .videoData, .videoBase64Video understanding inside a prompt

Images

let response = try await generateText(
    model: model,
    prompt: [
        .text("List the visible accessibility issues in this screen."),
        .imageURL(URL(string: "https://example.com/screen.png")!, detail: .high)
    ]
)

For local images, pass data:

let data = try Data(contentsOf: screenshotURL)
 
let response = try await generateText(
    model: model,
    prompt: [
        .text("Describe this chart in one paragraph."),
        .imageData(data, mediaType: .png, detail: .auto)
    ]
)

PDFs And Files

let report = try Data(contentsOf: reportURL)
 
let summary = try await generateText(
    model: model,
    prompt: [
        .text("Summarize the risks in this report."),
        .pdfData(report, filename: "q4-risk-report.pdf")
    ]
)

Generic files can be sent with fileURL, fileData, or fileBase64 and an explicit AIMediaType.

Audio And Video Parts

let response = try await generateText(
    model: model,
    prompt: [
        .text("Extract action items from this audio clip."),
        .audioData(audioData, mediaType: .wav, filename: "standup.wav")
    ]
)
let response = try await generateText(
    model: model,
    prompt: [
        .text("Describe what happens in this product demo."),
        .videoURL(videoURL)
    ]
)

There is no dedicated .audioURL prompt constructor in the current package. For audio prompts, load the bytes into Data or pass an existing base64 string. Use audio for dedicated transcription and speech APIs. Multimodal input is for model understanding; media APIs are for producing or transcribing media.

Provider Support

AIMessageContent can represent many media types, but the provider decides which ones are accepted. If a model does not support a media part, the provider call can fail with an API error or AIError.unsupportedFeature.

Provider behaviorHow to handle it
Accepts only textUse generateText with a String prompt
Accepts images but not filesSplit document workflows into extraction and generation
Accepts files with upload idsUse the file URL/base64/data constructor that matches the provider implementation
Rejects a media typeCatch provider errors and offer a text-only fallback
Related docs

Read generate text for the text API and generate object for structured extraction from multimodal prompts.