Chat with vision-language models.
Attach images, stream responses, and keep conversations local with models selected for practical Mac performance.
Native Apple Silicon app
Local AI creation for Mac, beyond chat.
Run chat, vision, image, speech, and music models on your Mac with a packaged MLX runtime, curated model downloads, and local output management.
One native workspace
MLXtra keeps the workflow direct: choose a model, prompt from the composer, and get generated text, images, speech, or music back inside the same conversation. Runtime setup, downloads, and generated files are handled by the app.
Workflows
Attach images, stream responses, and keep conversations local with models selected for practical Mac performance.
Use local image models, save generated files automatically, and open or export outputs when they are ready.
Generate spoken audio with local TTS models and keep outputs organized as app-generated media.
Prompt for instrumental or vocal music and receive local audio output inside the conversation.
Download runtime components and models from the app, with size and hardware fit surfaced before you install.
Mac-first runtime
The app ships and updates a checked Apple Silicon runtime instead of relying on manual terminal setup.
Generated images, speech, and music are saved on your Mac and surfaced directly in conversation.
Models are curated for realistic local use, with larger options shown only when the hardware fit makes sense.
Demo
Model catalog
Vision language model with chat and image understanding support.
5.6 GB downloadSmaller chat and vision model for lighter Apple Silicon Macs.
1.5 GB downloadLocal image generation and editing from the MLXtra composer.
15.0 GB downloadCompact local speech generation for fast TTS workflows.
0.67 GB downloadPrompt-based local music generation through an isolated runtime.
4.8 GB downloadHigher-capacity vision model shown when the Mac has enough memory.
20.4 GB downloadDownload
First launch sets up the local runtime in the background. Pick a starter model, let downloads finish, then use the same app for chat, image, speech, and music workflows.