Multimodal
Baleybots provides a builder pattern for multimodal inputs. These functions are imported from @baleybots/core and passed as input to bot.process().
Text
import { Baleybot, text } from '@baleybots/core';
const bot = Baleybot.create({ name: 'bot', goal: 'Answer questions' });
const result = await bot.process(text('What is the capital of France?'));
You can also pass a plain string directly -- text() is mainly useful when combining with other modalities.
Image
import { Baleybot, image, combine, text } from '@baleybots/core';
const bot = Baleybot.create({ name: 'vision', goal: 'Describe images' });
// From a URL
const result = await bot.process(
combine(text('Describe this image'), image('https://example.com/photo.jpg'))
);
// From a Buffer
const buffer = await fetch('https://example.com/photo.jpg').then(r => r.arrayBuffer());
const result2 = await bot.process(image(buffer));
Audio
import { Baleybot, audio, combine, text } from '@baleybots/core';
const bot = Baleybot.create({ name: 'audio-bot', goal: 'Analyze audio content' });
const audioBuffer = await fetch('https://example.com/speech.mp3').then(r => r.arrayBuffer());
const result = await bot.process(
combine(text('Transcribe this audio'), audio(audioBuffer))
);
Video / Frames
The frames() function accepts an array of extracted frames or a MediaStream for real-time video. The video() function is an alias for frames().
import { Baleybot, video, combine, text } from '@baleybots/core';
const bot = Baleybot.create({ name: 'video-bot', goal: 'Analyze video content' });
// From a live MediaStream
const stream = await navigator.mediaDevices.getDisplayMedia({ video: true });
const result = await bot.process(
combine(text('What is happening in this video?'), video(stream))
);
File
For documents, PDFs, CSVs, and other file types:
import { Baleybot, file, combine, text } from '@baleybots/core';
const bot = Baleybot.create({ name: 'doc-bot', goal: 'Analyze documents' });
const pdfBlob = await fetch('report.pdf').then(r => r.blob());
const result = await bot.process(
combine(text('Summarize this document'), file(pdfBlob, 'application/pdf'))
);
Combining inputs
Use combine() to pass multiple input types together:
import { Baleybot, combine, text, image, audio } from '@baleybots/core';
const bot = Baleybot.create({
name: 'multimodal-bot',
goal: 'Analyze multimedia content',
});
const result = await bot.process(
combine(
text('Compare the audio description with what you see in the image'),
image('https://example.com/photo.jpg'),
audio(audioBuffer)
)
);
Summary
| Function | Input types | Description |
|---|---|---|
text(string) | Plain text | Text input |
image(url | buffer) | URL string, Buffer, ArrayBuffer | Single image or array of images |
audio(url | buffer) | URL string, Buffer, ArrayBuffer | Audio content |
video(frames | stream) | MediaFrame[], MediaStream | Video frames or live stream |
frames(frames | stream) | MediaFrame[], MediaStream | Alias for video() |
file(data, mimeType) | Blob, Uint8Array, ArrayBuffer, string | Generic file with MIME type |
combine(...inputs) | Any of the above | Merge multiple inputs together |