Multimodal Input

Multimodal prompts use [AIMessageContent] instead of a single String. The same prompt parts can be sent to generateText, streamText, and generateObject when the provider supports the media type.

Part type	Constructors	Common use
Text	`.text`	Instructions, questions, surrounding context
Images	`.imageURL`, `.imageData`, `.imageBase64`	Screenshot review, chart description, visual QA
PDFs	`.pdfURL`, `.pdfData`, `.pdfBase64`	Report summarization, policy extraction, document Q&A
Files	`.fileURL`, `.fileData`, `.fileBase64`	Provider-supported file prompts with explicit media type
Audio	`.audioData`, `.audioBase64`	Audio understanding inside a prompt
Video	`.videoURL`, `.videoData`, `.videoBase64`	Video understanding inside a prompt

Images

let response = try await generateText(
    model: model,
    prompt: [
        .text("List the visible accessibility issues in this screen."),
        .imageURL(URL(string: "https://example.com/screen.png")!, detail: .high)
    ]
)

For local images, pass data:

let data = try Data(contentsOf: screenshotURL)
 
let response = try await generateText(
    model: model,
    prompt: [
        .text("Describe this chart in one paragraph."),
        .imageData(data, mediaType: .png, detail: .auto)
    ]
)

PDFs And Files

let report = try Data(contentsOf: reportURL)
 
let summary = try await generateText(
    model: model,
    prompt: [
        .text("Summarize the risks in this report."),
        .pdfData(report, filename: "q4-risk-report.pdf")
    ]
)

Generic files can be sent with fileURL, fileData, or fileBase64 and an explicit AIMediaType.

Audio And Video Parts

let response = try await generateText(
    model: model,
    prompt: [
        .text("Extract action items from this audio clip."),
        .audioData(audioData, mediaType: .wav, filename: "standup.wav")
    ]
)

let response = try await generateText(
    model: model,
    prompt: [
        .text("Describe what happens in this product demo."),
        .videoURL(videoURL)
    ]
)

There is no dedicated .audioURL prompt constructor in the current package. For audio prompts, load the bytes into Data or pass an existing base64 string. Use audio for dedicated transcription and speech APIs. Multimodal input is for model understanding; media APIs are for producing or transcribing media.

Provider Support

AIMessageContent can represent many media types, but the provider decides which ones are accepted. If a model does not support a media part, the provider call can fail with an API error or AIError.unsupportedFeature.

Provider behavior	How to handle it
Accepts only text	Use `generateText` with a `String` prompt
Accepts images but not files	Split document workflows into extraction and generation
Accepts files with upload ids	Use the file URL/base64/data constructor that matches the provider implementation
Rejects a media type	Catch provider errors and offer a text-only fallback

Related docs

Read generate text for the text API and generate object for structured extraction from multimodal prompts.